5,647 Matching Annotations
  1. Apr 2023
    1. Author Response

      Reviewer #1 (Public Review):

      Tunneling nanotubes, contrary to exosomes, directly connect remote cells and have been shown to allow the transfer of material between cells, including cellular organelles and RNAs. However, whether sorting mechanisms exist that allow to specifically transfer subspecies of RNAs, especially of mRNA, has not been shown, and the transcriptional consequences of RNA transfer have not been addressed yet.

      Using cocultures (or mix or single cultures as controls) of human MCF7 breast cancer cell line, and immortalized mouse embryo fibroblasts (MEFs), followed by separation of human and mouse cells by cell sorting, the authors performed deep sequencing of the human mRNAs detected in mouse cells. An accurate analysis of the transferred material shows that all donor cell mRNAs transfer in a manner that correlates with their expression level, with less than 1% of total mRNA being transferred in acceptor cells.

      These results show that the process of RNA transfer is nonselective and that the consequences on the cells receiving the RNAs should depend on the phenotype of the sending cells.

      Although we did not address this last point in the original paper, we concur with this statement since we presented evidence to this effect in our previous publication (Haimovich et al., 2017) and which we discussed in the in the original Discussion section (lines 498-508 in the original manuscript; lines 529-539 in the revised manuscript). We have now amended the Introduction (line 91 of the modified manuscript) to reflect this idea.

      These results are complemented by the last part of the manuscript where the authors convincingly show that the coculture of the two cell lines results in significant transcriptomic changes in acceptor MEF cells that could become CAF-like cells.

      Reviewer #2 (Public Review):

      In this manuscript, the authors characterize the extent of RNA transfer between cells in culture, with an emphasis on trying to identify RNAs that are transferred through tunneling nanotubes (TNTs). They use an in vitro human-mouse cell co-culture model, consisting of mouse embryonic fibroblasts and human MCF7 breast cancer cells. They take advantage of the CD326 cell surface molecule, which is specifically expressed on MCF7 cells, to separate the two cell populations using magnetic beads conjugated to anti-CD326 antibodies, followed by deep sequencing to identify human RNAs present in mouse cells. They identify many 'transferred' RNAs. Further analysis of sequencing data together with experiments using synthetic reporters indicate that RNA transfer is non-selective, that the amount of transfer strongly correlates with the level of expression in donor cells, and does not appear to require specific RNA motifs. The authors also note that co-culture with MCF7 cells leads to significant changes in the MEF transcriptome.

      The experiments are overall carefully designed, and the data are clearly and quite carefully presented to point out limitations in interpretation and to distinguish speculations from experimental conclusions.

      We thank the reviewer for this comment.

      It should however be kept in mind that it is unclear to what extent these limitations influence the conclusions reached. For example, the identification of transferred RNAs relies on the purity of the isolated cell populations ad, while the authors provide some supporting evidence for this, nevertheless potential caveats remain. For instance, the isolated MEF samples used for analysis appear to lack single MCF7 cells, but still contain components, labeled as 'double stained' and 'unstained' cells, which are uncharacterized. The authors present some arguments as to why these would not contribute to 'transferred' reads, but given the low level of detectable transferred RNAs, and the unclear origin of these components, whether they influence the results could be debatable.

      It is unlikely that these populations contributed to the human mRNA signals in the MEFs, since the percentage of these populations was substantially higher in the “Mix” samples than in the “Co-culture” samples. We now added the following text (lines 174-181 in the revised manuscript) which clarifies this point: “In addition, we found small sub-populations of double-stained and unstained cells within the purified populations that we suspect are mostly MEFs (see Methods). These sub-populations were greater in the Mix-derived MEFs vs. the Co-culture-derived MEFs (i.e. 0.08% and 0.03% double-stained, and 2.8% and 2.67% unstained in Mix samples vs. 0% and 0.03% double-stained, and 1% and 1.9% unstained in the Co-culture samples). As a consequence, if these double-stained and unstained cells had contributed to the background of human reads in the MEFs, we would’ve expected to have many more human reads in the Mix-derived MEFs.” However, this was not the case, rather we observed a 6.6-fold increase in human RNA presence in the Co-culture-derived MEFs (versus that in the Mix-derived MEFs) after subtraction of the single culture background. In addition, we note that the level of detectable human RNAs in the MEFs is not low, rather it is the percentage of human RNA that undergoes transfer that is low.

      Furthermore, the small number of replicates (2 replicates for the genome-wide studies and 1 replicate for most of the subsequent experiments) minimizes the confidence in the conclusions.

      We apologize for not stating it clearly that the smFISH, RT-qPCR ,and quadrapod experiments were all performed in 2 replicates. This information has now been added to the figure legends.

      In this context, it is also notable that the profile of transferred RNAs between the two replicates of co-cultured samples appears quite different by PCA analysis. It is thus conceivable that there might be specificity in the RNA 'transferome', influenced by unknown experimental variables, which is though masked when averaging those samples in subsequent analyses.

      We have replied to Reviewer #1 on this issue. PCA analysis (Figure 2B) of the heat map data (Figure 2A) reveals the similarity between the different samples, whereby 78% of the variability in the data is revealed by PC1 and 6.7% by PC2. Given that PC2 measures only 6.7% of the variation in the data, it likely results from small differences in the individual co-culture samples (such differences are often observed within replicas of RNA-seq experiments) and not via major differences in the measured transferomes. This indicates that the co-culture samples were overall quite similar as can also be observed from the heat map shown in Figure 2A, as differentiated from the controls (e.g. Mix, Single culture). Thus, we do not believe that further replicas will greatly change the results showing the abundant presence of human RNAs in the mouse cells after subtraction of the Mix background. We included additional sentences in the text and figure legend to clarify this point (lines 208-212 in the revised manuscript).

      While the manuscript emphasizes the role of TNTs in RNA transfer, the actual involvement of TNTs relies solely on the observation that potential TNTs form between co-cultured cells. Other means of transfer, such as through engulfment or phagocytosis of cell fragments, could still possibly contribute.

      While it is possible that transfer might occur through other means, our earlier paper (Haimovich et al., 2017) showed that engulfed apoptotic bodies rarely contribute to mRNA transfer, even upon near-100% of donor cell death. Moreover, RNAs in apoptotic bodies found in acceptor cells can be clearly identified by smFISH, as the RNAs are tightly clumped together. Likewise, our quadrapod experiments (Figure 6-figure supplement 1) might have revealed RNA transfer if engulfment of cell fragments had occurred.

      Furthermore, the dependence of mRNA transfer on direct cell-to-cell contact is demonstrated for 5 RNAs and extrapolated to transcriptome-wide RNA transfer, an assumption which might, or might not, be valid.

      We concur that we extrapolate from the few validated examples and have now added the following text (line 604-611 in the revised manuscript): “We validated several examples of transferred mRNAs that transfer via a contact-dependent mechanism, likely TNTs (Figure 6 and Figure 6-figure supplements 1 and 3), and extrapolate from them to the entire transcriptome. Although it is possible that some or many mRNAs transfer by means other than TNTs, we think it unlikely, since the results on TNT-mediated cell-to-cell transfer in both this and our previous publication (Haimovich, 2017), as well as by others (Ortin-Martinez et al., 2021; Su and Igyarto, 2019), tested a variety of mRNAs from different families and which localize to various sub-cellular localizations. This indicates that the pathway we have uncovered is more general than the few examples presented here.” In addition, we now cite in the Discussion (lines 611-621 in the revised manuscript) a new pre-print recently posted to bioRxiv that shows similar results of mRNA transfer in a human-mouse cells co-culture model.

      Finally, the results on gene expression changes induced by co-culture (Figures 7, 8) are of unclear relevance. As the authors point out, it is uncertain whether RNA transfer or other paracrine or adhesion-mediated signaling events, underlie these changes. It is therefore not easy to see how these results relate to the rest of the presented work. Furthermore, while the authors expand on the potential significance of changes observed in genes related to cancer-associated fibroblasts or to immunity-related genes, these remain speculative and untested.

      We concur that the part of the paper regarding the consequences of co-culture (upon the endogenous transcriptome) does not clarify the specific contribution of the “transferome” to the phenomenon. Future co-culture studies measuring transcriptome-wide transfer using the quadrapod co-culture system versus cell-cell contact co-culture could be performed. Yet, to make the distinction between TNT-dependent and -independent effects when cells are in contact will require further mechanistic knowledge of TNT-mediated mRNA transfer, which is beyond the scope of this paper. Nevertheless, we believe that the data on the endogenous gene expression in co-culture is important and could be useful to the cancer research community outside the context of the transferome information.

      Overall, the manuscript presents evidence indicating that RNA is transferred non-selectively in co-cultured cells, under specific conditions and between the cell types tested. The impact of the work is reduced by the lack of mechanistic understanding underlying this transfer and the uncertainty of whether this phenomenon has any subsequent physiological relevance.

      Our global analysis of TNT-mediated transfer (the transferome) is only a second step towards understanding this important and only recently identified process (i.e. the first step). Obviously, we would be happy to gain more mechanistic insight and knowledge of physiological relevance. We are currently working on several projects to try and answer some of these questions, but as one can understand, these are technically challenging, and have not yet come to fruition.

    1. Author Response

      Reviewer #1 (Public Review):

      The human genetic variant Dantu increases the surface tension of red blood cells making it hard for malaria parasites to invade. This was shown beautifully by Kariuki et al in 2020 (doi.org/10.1038/s41586-020-2726-6) by analysing blood from children using in vitro assays with cultured malaria parasites. Now Kariuki et al show that parasite growth is indeed restricted in vivo by infecting Dantu adults under controlled conditions with cryopreserved Plasmodium falciparum sporozoites and analysing parasite growth by qPCR. The authors compare parasite growth, peak parasitaemia and if / when treatment was sought for malaria symptoms between non-Dantu (111) and Dantu heterozygous (27) and homozygous (3) participants. Dantu either completely prevented malaria parasite detection in the blood (for 21 days) or slowed down parasite growth considerably.

      The authors present compelling in vivo evidence that Dantu conveys protection by preventing malaria parasites from establishing a blood-stage infection. Because the effect on parasite growth is crystal clear the link to uncomplicated malaria follows - no/less parasites leads to less participants experiencing malaria symptoms and seeking treatment. It should however be noted that the paper does not show that Dantu reduces symptomatology at identical parasite densities to non-Dantu. Its protective effect seems to be purely parasitological.

      Given that all volunteers were exposed to malaria prior to being experimentally infected (in various transmission settings ranging from low to high) the authors state that they adjusted for factors like schizont antibody concentration in their multi-variate analysis. More details on the assumptions and which dependent / independent variables were included would benefit interpretation. It would be also good to see if Dantu individuals were spread homogeneously across all transmission settings - if e.g. they all had history of intense malaria exposure and thus strong pre-existing anti-malaria immunity this might account in part for reduced parasite growth when compared to non-Dantu from lower transmission settings. Being able to de-convolute the effect of pre-existing immunity from Dantu would strengthen the paper.

      Thank you for the positive feedback and summary of the key findings. We absolutely agree that breaking down the impact of Dantu genotype by transmission would have been very interesting, but the sample numbers for some of the genotypic groups were simply too small to make stratification by area of residence meaningful. Instead, to address the core issue of whether prior immunity is a complicating factor in our analysis, we used measurements of antibodies to whole schizont extract as a proxy indicator of transmission setting or “malaria exposure” in our multivariate analyses. There was no difference in anti-schizont antibody levels across Dantu genotype groups – these data are now included in Figure 3 – figure supplement 1, as requested. This suggests that differences in pre-existing anti-malaria immunity between Dantu and non-Dantu cannot explain the differences seen in our current study. Regarding the comment about assumptions and variables in the multivariate analysis, we have added more details as requested, as outlined in further detail in subsequent points below.

      The authors also presents data on other red cell polymorphisms known to modulate malaria infection and improve outcome: G6PD, blood group O, alpha thalassaemia and ATP2B4. However, no statistically significant differences between non-carriers and hetero/homozygous individuals were observed. This is probably because these mutations exert their effect not directly on parasite growth but modulate disease symptoms when parasite burden is high - which cannot be investigated in controlled human malaria infection settings as ethical considerations mandate treatment of all volunteers at parasite densities >500 parasites/ ul or any parasitaemia with symptoms. Controlled infections need to be complemented with other methods to understand the protective impact of genetic polymorphisms.

      We thank the reviewer for this helpful observation with which we completely agree. To acknowledge this issue, we have added some consideration of this point to the Discussion section of the revised manuscript, within the sub-section that discusses protective mechanisms of other red cell polymorphisms on page 14.

    1. Author Response

      Reviewer #2 (Public Review):

      Despite high bone mineral density, increased fracture risk has been associated with T2D in humans. In this study, the authors established a model that could mimic some aspects of T2D in mice and then study bone turnover and metabolism in detail.

      Strengths

      This is an exciting study, the methods are detailed and well done, and the results are presented coherently and support the conclusions.

      Previous work from Dr. Long's group over this last decade has established a requirement for glycolysis in osteoblast differentiation. They showed the requirement for glycolysis not only for the anabolic action of PTH but also as an effector downstream of Wnt signaling. Using the T2D mouse model they have generated, they test if manipulating glycolysis and oxidative phosphorylation can rescue some of the detrimental effects on bone in this model. They use several novel approaches, they use glucose-labeling studies that are relatively underutilized, and it provides some insights into defective TCA cycle. They also utilize BMSCs that have been sorted for performing single-cell sequencing studies to identify specific populations modified with T2D. Unfortunately, the results are modest and need some clarification on what these populations add to the story.

      We appreciate the positive comments. Although T2D had only modest effect on the relative pool size of each cell population, the changes in metabolic pathways (glycolysis and oxphos) in several clusters were notable and provided support to the central notion that T2D altered cellular metabolism in osteoblast-lineage and other bone marrow cells.

      The authors use two approaches: a drug (Metformin) and a number of mouse genetic models to over-express genes involved in the glycolytic pathway using Dox inducible models. The results with overexpressing HIF1 and PFKFB3 show a potential rescue of bone defects with T2D, and Glut1 overexpression does not rescue T2D-induced bone loss.

      Concerns

      The authors have generated several overexpression models to manipulate the glycolytic pathway to recuse T2D-induced bone loss. The use of DOX in drinking water has been shown to affect mitochondrial metabolism. Did the authors control for these effects? Since both the groups of mice got the DOX in drinking water, there is internal control.

      The experiments were controlled for any potential effects of DOX per se as all animals were subjected to the same DOX regimen.

      Only one of the rescue experiments had control with the Chow diet. There are some studies that have shown a high-fat diet to be protective of bone loss in TID models.

      We have now added the chow diet control for the Hif1a rescue experiment as well (Fig. 7).

      The use of metformin to correct metabolic dysfunction and, thereby, bone mass is an exciting result. Did the authors test to see if they had in any way rescued this phenotype because of reducing ROS levels? The decrease in OxsPhos seen with the seahorse experiments suggests there could be mitochondrial dysfunction often associated with ROS generation.

      I appreciate the reviewer’s insight here. We have not examined ROS levels but agree that changes in ROS levels could potentially contribute to the bone phenotype in diabetes.

      All of the experiments used male mice (because STZ use and ease of T2D establishment in males). It would be better if this were made clear in the title.

      The title has been revised to specify male mice.

      Is the T2D model presented really represent what is observed in humans? Some experiments to test the other factors implicated in T2D and whether those are modulated in the rescue experiments might help address this.

      Our T2D model exhibited all typical features of T2D patients, those including obesity, glucose intolerance and insulin resistance. We have shown that metformin modestly improved glucose tolerance and insulin sensitivity in the T2D mice (Fig. 6C, E). We have not examined whether those global metabolic features were modulated in the genetic rescue experiments which targeted only osteoblasts.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper establishes a strong case for the post-translational modification of C/EBPalpha to play a strong role in its effects, in this case, to promote macrophage differentiation in collaboration with PU.1. The cellular system being used for most of the experiments here takes advantage of the dual roles of PU.1 in B cells, which normally do not express C/EBP family factors, and in myeloid cells, which normally do express C/EBP family factors. The authors and others have previously shown that PU.1 and C/EBPalpha are very powerful collaborators, both needed to establish a macrophage identity. Thus, the title of the paper provocatively implies that the C/EBP modification that keeps it from being methylated on Arg35 works by increasing the re-distribution of PU.1 from B cells to myeloid gene sites in combination with C/EBP. Indeed, the authors show proximity ligation data to show that PU.1-C/EBPalpha juxtaposition is more frequent in the nucleus if C/EBPalpha cannot be Arg-methylated. The paper also shows careful and thorough characterization of the B to myeloid lineage conversion gene expression changes and the mapping of the Arg residues in C/EBPalpha that are most important to keep demethylation. Similarly, the paper provides strong evidence that it is Carm1, and not another protein arginine methyltransferase, that is responsible for the regulatory modification. This is a valuable and well-characterized demonstration of a mechanism that should be considered more generally as a regulator of transcription factor action.

      The mechanism proposed by the authors is that C/EBPalpha relocates PU.1 to macrophage sites and that C/EBPalpha R35A binds and relocates PU.1 more efficiently than wildtype, and this seems likely and appealing. However, it is not as strongly supported by data within the paper itself as the other points in the paper are. There is a puzzling gap in the data: no direct evidence is shown that C/EBPalpha is really relocating PU.1 from B cell to macrophage regulatory elements at all. Despite the figure titles (Fig. 4 and Fig. S4), there is no ChIP-seq data to show PU.1 binding sites before and after interaction with either wildtype or R35A mutant C/EBPalpha, just accessibility data. There is also a question of whether such a redistribution would occur fast enough to account for the impressive speed of the R35A mutant's other effects. These questions seem fairly straightforward to address. If relevant data could be added, it would greatly increase the impact and generality of the paper. The paper could be published with this claim converted to a suggestion, based on the current data, or it could be published in a higher-impact form if additional data could be provided to demonstrate the relocation more directly. The authors would be more expert about the logistics of the experiment, but it seems that a direct ChIP-seq-based comparison should be feasible and powerful for the argument of the paper.

      We have now included PU.1 and C/EBPa ChIP-seq experiments, using C/EBPaWT and C/EBPaR35A- induced cells, replacing the virtual ChIP-seq experiments. Integrating the data obtained with our dynamic ATACseq data, the new findings largely support the previously proposed PU.1 redistribution (‘theft’) model. To make the data easier to understand, we now first show the PU.1 and C/EBPa binding to distinct B cell- and macrophage- restricted GREs contained in a single genomic fragment (new Fig. 5). The findings nicely visualize how PU.1 becomes redistributed from B-GREs to M-GREs, in a C/EBPa mutant-accelerated manner. We were also happy to see that a genome-wide analysis of the data again shows the accelerated redistribution of PU.1 by C/EBPaR35A (new Fig. 6). Finally, the comparison of the ChIP-seq and ATAC-seq data also added more mechanistic detail, such as by revealing that chromatin remodeling of lineage restricted GREs can be uncoupled from the regulation of associated genes.

      Finally, the effect of the mutation is assumed to be only on the interface for interaction between C/EBPalpha and PU.1 (or other co-factors). However, C/EBPalpha is such a short-lived protein that any modification that slightly increased its half-life could increase its potency. It seems important to present some quantitative protein staining evidence to clarify whether the steady-state level of C/EBPalpha in C/EBPalpha R35A-expressing cells is really unchanged from C/EBPalpha wild-type-expressing cells.

      We agree that this is an important issue and have therefore now performed a cycloheximide experiment with 3T3 cells expressing inducible forms of the two proteins. The data in Figure S4C show that C/EBPaR35A exhibits a similar stability than wild type protein and is expressed at 20-30% lower levels under steady-state conditions in uninduced cells. They also show that C/EBPa is surprisingly stable. These new findings are in line with the comparison of the two proteins by Western blots of mutant and wild type transfected 293T cells and of infected B cells, which also show similar levels of the two proteins (Fig. 7C and D). Therefore, the finding that expression of C/EBPaR35A is similar or slightly lower than that of the wild type argues against the possibility that an elevated expression level of the mutant could explain the effects observed.

      Finally, although not requested by the reviewer, we have now addressed the possibility that that the effect of the alanine replacement of R35 is mostly due to a change from a charged to a non-charged hydrophobic residue. This is not the case, as a replacement of arginine 35 by the charged amino acid lysine still leads to an accelerated BMT induction (Figure S7).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper is based on the premise that ketamine exerts antidepressant effects that are rapid by increasing glutamatergic transmission. However, the authors note that how this effect occurs is unclear because ketamine antagonizes the NMDA receptor, a glutamatergic receptor. Others have suggested a compensatory change in the glutamatergic transmission and the authors suggest how this might occur. The authors should clarify if prior studies suggested a mechanism different from theirs and if so, which might be correct.

      There are also other mechanisms, such as the block of NMDA receptors on interneurons and the disinhibition of principal cells. It is important to clarify if this has already been addressed in the literature. Also, if their cultures are primarily glutamatergic neurons or they include interneurons and glia.

      The authors show calcineurin is reduced after ketamine exposure and this increases AMPA receptor GluA1 phosphorylation. They also show that Calcium permeable AMPA receptors (CP-AMPARs) increase.

      They also use suggest that the CP-AMPARs and other changes lead to enhanced synaptic plasticity, which could lead to antidepressant effects.

      Although a lot of work is done in cultured hippocampal neurons, 14 days in vitro, they show effects in vivo that are consistent with the data from cultures. For example, ketamine increases GluA1 phosphorylation. Also, blocking CPAMPARs in vivo reduces anxiety/depressive behaviors such as the open field and tail suspension tests.

      Overall the study appears to be done well and the presentation, writing, and references are good. There are important concerns regarding statistics, behavior, and pharmacology and several minor concerns.

      Major concerns

      1) Statistics.

      What was the stat test if the control was always 1? Often the control group is 1.00 with no SD but in other tests, the control group is 1.000 with an SD.

      In the previous submission, we neglected to include this information. Immunoblotting data have variable raw values; hence, the control group was used to normalize each group and was compared to the experimental groups. Thus, the control value for immunoblotting was always 1.000 without SD. Similarly, for imaging data, the average peak amplitude in control cells was used to normalize the peak amplitude in each cell and was compared to the experimental groups' average; thus, the control group is 1.000 with SD. The Franklin A. Graybill Statistical Laboratory at Colorado State University has been consulted for statistical analysis in the current study, including sample size determination, randomization, experiment conception and design, data analysis, and interpretation. Grouped results of single comparisons were tested for normality with the Shapiro-Wilk normality or Kolmogorov-Smirnov test and analyzed using the unpaired two-tailed Student’s t-test when data are normally distributed. Differences between multiple groups with normalized data were assessed by nonparametric Kruskal-Wallis test with the Dunn’s test.

      2) Behavior.

      It is not clear that the open field and tail suspension tests measure antidepressant actions. Why were more standard tests such as forced swim or sucrose preference, novelty-suppressed feeding, etc not used?

      We agree with the Reviewer’s concern. However, both the open field test and tail suspension test have long been used to determine animals’ anxiety-like and depression-like behaviors, respectively, in rodents (Seibenhener and Wooten, 2015; Ueno et al., 2022). Specifically, the open field test has been widely used to measure the ketamine effects on anxiety-like behavior in rodents (Guarraci et al., 2018; Pitsikas et al., 2019; Shin et al., 2019; Akillioglu and Karadepe, 2021; Yang et al., 2022; Acevedo et al., 2023). The tail suspension test has also been used to examine the ketamine effects on depression-like behavior in animals (Fukumoto et al., 2017; Yang et al., 2018; Ouyang et al., 2021; Rawat et al., 2022; Viktorov et al., 2022). Studies suggest that the forced swim test and the tail suspension test are based on the same principle: measurement of immobility duration while rodents are exposed to an inescapable situation (Castagne et al., 2011). Importantly, it has been suggested that the tail suspension test is more sensitive to antidepressant agents than the forced swim test because the animal will remain immobile longer in the tail suspension test than the forced swim test (Cryan et al., 2005). For this reason, we chose to use the tail suspension test instead of the forced swim test. This information has now been included in the revised manuscript. Additionally, because ketamine produces antidepressant effects within one hour after administration in humans (Berman et al., 2000; Zarate et al., 2006; Liebrenz et al., 2009), our study aims to understand the mechanism underlying ketamine's rapid (less than an hour) antidepressant effects. Given that sucrose preference test and the novelty suppressed feeding test need multiple days, it would not be suitable to achieve our goals.

      3) Pharmacology.

      The conclusions rest on the specificity of drugs.

      Is 5 uM FK506 specific?

      20 μM 1-naphthyl acetyl spermine (NASPM)?

      10 mg/kg IEM-1460?

      We neglected to add the rationale for the drug concentrations in the previous submission. Previous research, including our own, has employed FK506 at a variety of different concentrations to inhibit neuronal calcineurin activity (1 - 50 μM) (Hsieh et al., 2006; Schwartz et al., 2009; Kim and Ziff, 2014). Specifically, we have shown that 5 μM FK506 treatment for 12 hours significantly reduces neuronal calcineurin activity to increase GluA1 phosphorylation, which induces the expression of CP-AMPARs to elevate AMPAR-mediated synaptic activity (Kim and Ziff, 2014). Moreover, previous studies, including our own, have used NASPM at a variety of different concentrations to inhibit CP-AMPARs (3 - 250 μM) (Tsubokawa et al., 1995; Koike et al., 1997; Noh et al., 2005; Nilsen and England, 2007; Hou et al., 2008; Kim and Ziff, 2014). In fact, we have shown that 20 μM NASPM significantly reduces CP-AMPAR-mediated synaptic and Ca2+ activity (Kim and Ziff, 2014; Kim et al., 2015b). Finally, multiple reports demonstrate that 10 mg/kg IEM-1460 significantly reduces in vivo CP-AMPAR activity (Wiltgen et al., 2010; Szczurowska and Mares, 2015; Adotevi et al., 2020). This information has now been included in the revised manuscript.

      Reviewer #3 (Public Review):

      Ketamine has been shown to be effective at producing a rapid-antidepressant effect at low doses, but the underlying molecular mechanism of this effect is still not clear. Previous studies have suggested that the effect of low-dose ketamine may occur by promoting neuronal plasticity in the hippocampus. However, this goes against the findings that ketamine acts as a noncompetitive NMDA receptor antagonist, which should prevent NMDAR-dependent plasticity. Furthermore, a therapeutic dose of ketamine has been shown to increase neuronal Ca2+ signaling, which again does not conform to its antagonistic action on NMDA receptors. In this paper, the authors provide evidence that therapeutic low-dose ketamine increases the expression of Ca2+-permeable AMPA receptors (CP-AMPARs) by increasing phosphorylation of GluA1 subunit of AMPARs and surface expression of GluA1-containing CP-AMPARs. They further provide evidence that this is likely mediated by a decrease in calcineurin activity and that blocking CP-AMPARs prevent the antidepressant effect of ketamine in mice. One interesting finding of this study is that the authors see heightened sensitivity of ketamine in female mice, both at the level of behavioral readout and for molecular correlates. This finding is interesting in light of the different pharmacokinetics of ketamine reported in females and that ketamine metabolites can bind estrogen receptors.

      Based on their data and previous findings, the authors outline a plausible molecular signaling mechanism for the antidepressant effect of ketamine. Specifically, the authors propose that reduced neuronal activity, which could be triggered by ketamine-induced NMDAR antagonism, causes homeostatic plasticity to upregulate GluA1-containing CP-AMPARs. Their data would support this idea, as phosphorylation of GluA1 as well as increased surface expression and functional incorporation of CP-AMPARs at synapses have been shown before in models of homeostatic plasticity.

      1) Overall, the study is well-done and the data presented support the main conclusions. One main question is whether the current finding provides a conceptual advancement in our understanding of the molecular signaling involved in ketamine's antidepressant effects.

      We thank the reviewer's critique. In fact, research suggests multiple potential mechanisms of ketamine-induced neural plasticity. The main mechanism by which ketamine produce their therapeutic benefits on mood recovery is the enhancement of neural plasticity in the hippocampus (Miller et al., 2016; Aleksandrova et al., 2020; Kavalali and Monteggia, 2020; Grieco et al., 2022). However, ketamine is a noncompetitive NMDAR antagonist that inhibits excitatory synaptic transmission (Anis et al., 1983). A hypothesis to explain these paradoxical effects is that ketamine acts via direct inhibition of NMDARs localized on inhibitory interneurons, leading to disinhibition of excitatory neurons and a resultant rapid increase in glutamatergic synaptic activity to activate Ca2+ signaling pathway (Deyama and Duman, 2020; Gerhard et al., 2020). This stimulates the brain-derived neurotrophic factor (BDNF) signal pathway, which subsequently increases the translation and synthesis of synaptic proteins to enhance AMPAR-mediated synaptic plasticity (Deyama and Duman, 2020). Another potential explanation is that ketamine inhibits NMDARs on excitatory neurons, which induces a cell-autonomous form of homeostatic synaptic plasticity resulting in increased excitatory synaptic drive onto these neurons (Miller et al., 2016; Kavalali and Monteggia, 2020). Homeostatic synaptic plasticity is a negative-feedback response employed to compensate for functional disturbances in neurons and expressed via the regulation of AMPAR trafficking and synaptic expression (Wang et al., 2012). According to this hypothesis, ketamine disrupts basal activation of NMDARs on excitatory neurons, which engages a mechanism of homeostatic synaptic plasticity that results in a rapid compensatory increase in synaptic AMPAR expression in these neurons in a protein-synthesis dependent manner (Kavalali and Monteggia, 2023). Additionally, there is a NMDAR inhibition-independent mechanism mediated by hydroxynorketamine (HNK), the ketamine metabolite that lacks NMDAR inhibition properties (Carrier and Kabbaj, 2013; Franceschelli et al., 2015; Zanos et al., 2016). The current study offers a new neurobiological basis for ketamine’s actions that depend on the NMDAR inhibition-mediated elevation of GluA1-containing AMPAR trafficking, which is likely independent from the previous described mechanisms including the BDNF-induced protein synthesis-dependent (Deyama and Duman, 2020) or the NMDAR inhibition-independent pathway (Carrier and Kabbaj, 2013; Franceschelli et al., 2015; Zanos et al., 2016). Nonetheless, there are still many important questions surrounding the molecular mechanisms of ketamine's actions. This new information has now been included in the revised manuscript.

      2) There are previous studies that showed an increase in CP-AMPARs in the nucleus accumbens and an increase in the expression of GluA1 in the hippocampus with low-dose ketamine. In addition, ketamine's antidepressant effect has been shown to require GluA1 phosphorylation. The main contribution of this paper might be that it provides the potential molecular signaling within the same preparation (i.e. hippocampal neurons) and provides a causal link of CP-AMPARs in mediating the behaviorally measured antidepressant effect of ketamine.

      The study showing that ketamine induces the insertion of CP-AMPARs in the nucleus accumbens did not examine whether this change resulted in antidepressant behaviors (Skiteva et al., 2021). Therefore, it is difficult to conclude that the ketamine-induced expression of CP-AMPARs in the nucleus accumbens plays a role in behaviors. Moreover, as described above, a recent study shows that the hippocampus is selectively targeted by ketamine (Davoudian et al., 2023). We thus chose the hippocampus as our experimental model to test our hypothesis. However, we are unable to rule out the potential role of nucleus accumbens in ketamine’s antidepressant actions.

      3) Another question is whether the behavioral effect of ketamine is due to molecular changes in the hippocampus as outlined in this paper. A more targeted inhibition of CP-AMPAR function could resolve this issue. With the systemic application of CP-AMPAR antagonist as done in this study, it would be hard to know the role of CP-AMPAR upregulation in the hippocampus in mediating ketamine's effect. Especially, considering that low-dose ketamine has been shown to upregulate CP-AMPARs in the nucleus accumbens. While it would have been nice to know the site of action, this does not alter the conclusion that CP-AMPARs are involved in mediating the antidepressant effect of ketamine on behavioral readouts.

      We agree with this point. We have thus removed “the hippocampus” in the title and have further made equivalent revisions in the other parts of the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have used computational models and protein design to enhance antibody binding, which should have broad applications pending a few additional controls. The authors' new method could have a broad and immediate impact on a variety of diagnostic procedures that use antibodies as sensitivity is often an issue in these kinds of experiments and the sensitivity enhancement achieved in the two test cases is substantial. Affinity maturation is a viable approach, but it is laborious and expensive. If the catenation method is generalizable, it will open up opportunities for antibody optimization for cases where affinity maturation is either not feasible or otherwise impractical. Less clear is how this method might enhance therapeutic potency. Issues that arise when using therapeutic antibodies are often multifactorial and vary depending on the target and disease state. Many issues that occur with antibody-based therapies will not be rectified with affinity enhancement.

      We agree with the limitation.

      Reviewer #2 (Public Review):

      The paper presents an interesting design approach to having homodimeric IgGs with higher binding affinity to the antigens on a surface by fusing a weakly homodimerizing protein (a catenator) to the C-terminus of IgG. Considering the homodimeric IgGs with likely enhanced antigen binding ability and their stabilization with a reversible catenation when bound to the surface is an interesting idea. With agent-based modeling - the simulations based on Markov Chain Monte Carlo (MCMC) sampling - and proof of concept experiments, it has been possible to show the enhanced antigen binding ability of the homodimer Igs for many folds, where the weakly homodimerizing ability of the catenator is indicated to have a central role, enabling proximity effect driven catenation on the antigen bound surfaces. While the results render the enhanced binding affinity of the catenated homodimeric IgGs, the study would benefit from a more elaborated interpretation and discussions of the results.

      The following discussion is now stated in the revision (pages 19-20, in the revision); “While we demonstrated that dual catenator-fused heterodimeric IgGs can enhance binding avidity, the oligomer formation or potential intramolecular homodimerization of the catenator necessitates the development of a more robust catenator for application to conventional homodimeric IgGs. Specifically, the ideal catenator should geometrically disallow intramolecular homodimerization, exhibit fast association kinetics, and be able to withstand the standard low pH purification step. On the other hand, our demonstration indicates that this approach can be applied to bispecific antibodies employing a heterodimeric Fc.”

      One interesting base of the discussion may include how the fusion of the catenator may likely affect the binding behavior, the intrinsic binding behavior, and/or on the global structural changes, of IgGs (monomeric and homodimeric (catenated) per se beyond its proximity-driven contribution. Would it lead to a more restricted structure in the mobility in the unbound states so as to decrease the entropic cost for the binding and thus increase the binding avidity/affinity (in addition to external proximity-driven association). In other words, what would be the role of entropy in the free energy of binding, given that the enthalpic contributions remain the same? Possible effects of the length of the catenator should also in parts be related to the entropy. For example, if a longer and more flexible catenator is considered, what would the resulting observation experimentally and computationally be?

      The binding site occupancy depends on [catAb]/KD. Figure 4-figure supplement 2 shows the binding site occupancy and (KD)eff as a function of (KD)catenator. In this simulation, [catAb] was fixed (10-9 M) while KD was varied (from 10-8 to 10-6). In the figure legend and in the main text, we now explicitly state that KD was varied from 10-8 to 10-6 (page 30, in the revision). To address this comment, we set KD = 10 nM (as used for simulation in Figures 3 and 4), and varied [catAb] from 0.1 to 10 nM. The binding site occupancy and (KD)eff as a function of [catAb] are plotted for three different set values of (KD)catenator (1 μM, 10 μM and 100 μM). The new figures are now presented as Figure 4-figure supplement 3. This simulation shows that the enhancement of (KD)eff by increasing the concentration of catAb is much less dramatic than that by increasing the affinity for catenator homodimerization at [catAb] > 10 nM.

      On the other side, simple simulation approaches have a high value with a level of abstraction while still keeping the physical and biological relevance. In the simulations, i.e. in the sampling of various states, three main terms/rules to govern the behavior are implemented. One is a term favoring an increase in the ability to bind (preventing to unbinding) to the surface upon the catenation of IgGs. This may need to be substantiated for the simulations not imposing a preassumed ability to increase the binding (or decrease the unbinding) ability upon the catenation.

      We agree with the review in that the third rule favors the binding ability of catenated IgGs, because it assumes that catenated antibodies are not allowed to dissociate from the binding site. While this assumption is not exactly correct, we think that it is valid, considering the behavior of a multivalent ligand. When the IgG portion dissociates completely from the binding site, it is still anchored by the catenation arm, and thus it will rebind the same binding site immediately. This postulation agrees with the quantitative analysis showing that multivalent ligand exhibits orders of magnitude binding likelihood increase when the ligand size is comparable to the stretch length of a conjugating linker [Liese, S. & Netz, R. R., ACS Nano, 12, 4140 (2018)].

      The weakly homodimerizing state of the catenator appears as one of the important aspects of the proposed design strategy. Would it also be possible that the experimental observations may readily also imply the higher binding ability of the catenator fused IfgG without the homodimerization on the surface (due to the reduced entropic cost for the binding)? The presentation of the evidence of the homodimerization of the catenator and the catenated IgGs on the surface would strengthen the findings and discussions.

      To fully address this comment, we would need to consider the detailed molecular behavior of the IgG part, the catenator and the linker, probably using molecular dynamics simulation, which we think is outside the scope of the current work. We like to qualitatively describe what we think about the raised issues. Fused to the C-terminus of Fc, the catenator won’t affect the complementary determining region (CDR) of Fab which is located on the opposite side of the C-terminus of Fc. This notion is supported by the observation that the SDF-1α-fused antibodies exhibited association kinetics similar to those of the mother antibodies (Figure 5).

      Regarding the mobility of the structure, we presume that the fused catenator would not interact with the antibody portion and thus it would not affect the intrinsic structural mobility of the antibody.

      Since the catenator is fused to the C-terminus of Fc by a flexible linker, the homodimerization of catenator would decrease the entropy upon catenation. However, the enthalpic contribution would overcome the entropic loss, and result in negative free energy of the catenator homodimerization.

      Figure 2-figure supplement 1 (in the revision) shows the simulation for five different values of the reach length (R), which is the sum of the linker length and half of the catenator length. The simulation results show that the likelihood of catenation decreases as the linker length increases over the distance (d) between the two adjacent catAb-2Ag complexes, while it is maximum when the reach length equals d. Since the catenator length is fixed, increasing the linker length (such that R > d) will lower the catenation effect.

      Reviewer #3 (Public Review):

      The authors proposed an antibody catenation strategy by fusing a homodimeric protein (catenator) to the C-terminus of IgG heavy chain and hypothesized that the catenated IgGs would enhance their overall antigen-binding strength (avidity) compared to individual IgGs. The thermodynamic simulations supported the hypothesis and indicated that the fold enhancement in antibody-antigen binding depended on the density of the antigen. The authors tested a catenator candidate, stromal cell-derived factor 1α (SDF-1α), on two purposely weakened antibodies, Trastuzumab(N30A/H91A), a weakened variant of the clinically used anti-HER2 antibody Trastuzumab, and glCV30, the germline version of a neutralizing antibody CV30 against SARS-CoV-2. Measured by a binding assay, the catenator-fused antibodies enhanced the two weak antibody-antigen binding by hundreds and thousands of folds, largely through slowing down the dissociation of the antibody-antigen interaction. Thus, the experimental data supported the catenation strategy and provided proof-of-concept for the enhanced overall antibody-antigen binding strength. Depending on specific applications, an enhanced antibody-antigen binding strength may improve an antibody's diagnostic sensitivity or therapeutic efficacy, thus holding clinical potential.

      Thanks for the favorable comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The introduction does not clearly set up the background for the key questions that the manuscript addresses. One of the key parts of the manuscript is to attempt to determine whether locomotory behaviour evolves because of direct or indirect selection of the traits. However, the authors don't provide an argument for why a salty environment would select for locomotory traits. Indeed, in the discussion, the authors point out that it is likely an unmeasured trait (body size) correlated with locomotory traits that are under selection. They present arguments for why this might be the case and point to un-included data that show body size significantly genetically covaries with all of the traits studied. Since the authors appear to have these data, and one of their key questions is comparing direct vs. indirect responses to selection, it would be more powerful to include the body size data and estimate selection on all traits together.

      We now include body size in all of our phenotypic and genetic analyses. We also include estimates of selection gradients from the ancestral selection differentials and the Gmatrix. We detail in the Introduction the biological significance of locomotion traits and their potential relationship with body size, in low and high salt environments. The experimental results show that divergence in locomotion traits (Figure 6) correlates with adaptation (Figure 5), because of direct and indirect selection (Figure 9).

      Phenotypic plasticity was estimated from a series of univariate models, with estimates arranged in a vector. As the authors point out in the manuscript, traits that are not included in a model but covary with traits that are can largely bias estimates of the traits that are included. For this reason, it would make sense to estimate phenotypic plasticity using a multivariate model, as has been done for G matrices.

      We analyze the ancestral phenotypic plasticity and the phenotypic divergence during evolution using a multivariate approach (MANOVA). This approach simplifies the text as from the eigen decomposition of the SSCP matrices we can estimate canonical traits of ancestral phenotypic plasticity (pmax; see Table 1 with notation definitions) and phenotypic divergence in the new target high salt environment (dmax). We continue to do the univariate analysis as it allows us to estimate BLUPs for each inbred line (used for visual representation), as well as the significance of phenotypic divergence at each replicate population relative to the ancestral population (delta_q). Both multivariate and univariate approaches led to similar results (shown as supplementary figures).

      The estimation and interpretation of G matrices are a critical part of the manuscript. The authors state that broad sense estimates of G are a good proxy for additive genetic variation in this system, but in the Discussion they also state that overdominance was likely important during evolution to the salt environment, leading to some lack of clarity on whether dominance is important or not.

      We are sorry for the lack of clarity. We have eliminated the discussion on overdominance as it was peripheral to our results. Broad-sense genetic variances should be a good proxy for additive genetic variances when there is no inbreeding depression and no directional dominance or dominance epistasis; cf. Lynch and Walsh 1998. We previously showed that there is no inbreeding depression for the trait we use as surrogate for relative fitness (self-fertility) and also that there is no directional dominance for locomotion behavior traits. We now explain our use of broad-sense genetic (co)variances as a proxy for additive genetic (co)variances in the Introduction and Methods.

      It is also unclear how uncertainty in estimated G matrices was assessed. Showing that G differs from noise is critical to the majority of the results presented. The authors cite Morrissey and Bonnet (2019) as providing the method for generating the null distribution of G, however, this paper does not appear to propose or describe a method to do this.

      Thanks for this comment. Morrissey and Bonnet (J Heredity, 2019) was incorrectly cited and the explanation for finding the expected noise distributions was misleading. In brief, we produced a set of 1000 G-matrices each computed after shuffling the line ID and the block ID from the phenotypic dataset. This was done to produce random expectations of the genetic variances as the MCMC estimates are positive-definite. We computed the posterior mode for each of these 1000 G-matrices to obtain a null distribution (shown in orange). To infer significance, we compared the posterior mode of the empirical estimate with the 95% CI of the posterior mode distribution obtained from the randomized G-matrices. When determining which eigenvectors explain standing genetic variation we also used the distribution of posterior modes of the randomized G-matrices. However, as pointed out by Sztepanacz and Blows (Genetics, 2017), the eigenvalues of the eigenvectors do not follow a uniform distribution, as would be expected by chance. Because of this we asked the question of whether the amount of variance in the eigenvectors of the empirical G-matrix (gmax, g2, etc.) was expected, by projecting the random G-matrices onto these eigenvectors. This is a null that is conditional on the observed data. We show these results in Figure 2 - supplement figure 3. Both approaches are similar, particularly for the first 2 eigenvectors. There is now a paragraph in the Discussion about finding potential consequences for adaptation of traits with little genetic variance.

      Although the figure captions state that they are showing estimates of genetic variances, it appears to be heritability (bounded between 0 and 1). Whether the authors are studying heritability or genetic variance is an important difference, particularly in the context of a changing environment and phenotypic plasticity, where environmental variation is important and expected to change. For example, the result that G is smaller in evolved populations could simply be due to their being larger environmental variance in the salt environment (as you would expect). This is unrelated to an evolutionary response.

      There might have been some confusion because transition rates are positive and not normally distributed. To achieve normality they were log transformed. We have not reported estimates of heritability, all estimates presented are of genetic variances, unscaled. The only exception is body size where the raw data was multiplied by 50 in order to have a similar phenotypic scale as the transition rates when estimating genetic (co)variances, not heritability. We agree that the evolution of environmental stochastic variance is interesting but not immediately relevant to the questions we address.

      It seems that comparisons to the ancestral population were done for A160, not the founding population for each evolved line at G0. It is not clear whether the founder effects of each replicate are important and if this is the most appropriate comparison (the Discussion suggests that founder effects are important).

      We have better detailed in the Methods, and also with an introductory section in the Results section, the derivation of the experimental populations. The population acronyms might have been misleading. The A6140 is a population that was domesticated to the lab conditions for 140 generations (replicate #6 of the domestication process). We report the evolution of 3 GA populations, which were all derived from A6140 with minimal sampling problems for the estimated effective population sizes (sampled 10^4 individuals from A6140 for each GA, for Ne of 1000 during domestication - Chelo and Teotónio Evolution 2013 -). Therefore, GA populations after 50 generations of evolution are appropriately compared with their (unique) ancestor population. We no longer discuss potential founder effects.

      Overall, there is much interesting data collected and analysed in this manuscript, addressing a valuable question. However, it is not obvious whether the estimates of G matrices are different from noise, and heritability may not be the most appropriate scale to ask questions about phenotypic plasticity and evolution in a novel stressful environment that may affect levels of environmental variation.

      Please see previous replies. Our ancestral G-matrix estimates indicate that at least 3 eigentraits are different from random expectation in both environments (Figure 2, supplement figure 3), and in high salt evolved populations continue to have more than expected genetic variance at 3-5 eigentraits (Figure 7, supplement 2). We are conservative in these estimates as depending on the null we could consider more eigentraits. In the previous version of the manuscript we concluded that only 2 ancestral eigentraits were orthogonal due to an error in the code (we did not divide by 2 the null expectations). But even presuming that only one eigentrait (gmax) has genetic variance in the ancestral population, we previously reported that mutational variance is not in the same trait (see Mallard et al., G3, 2023; and mmax in Table 3), and further that the trait under selection is neither gmax or mmax (compared in Table 3 the selection gradients with gmax or mmax). At a minimum there are 3 genetically or environmentally independent traits. As noted in previous replies, we estimate and present genetic variances throughout. We do not present estimates of environmental variances and feel that doing so would make the manuscript overly complicated.

      Reviewer #2 (Public Review):

      Response to selection: It was not clear to me that it was appropriate to interpret locomotor behavior as having evolved in response to the salinity environment. Specifically, where is the evidence that any change in trait means is a (direct or indirect) response to selection imposed by increased salinity rather than the neutral drift of a trait due to the reduction in population size caused by the salinity? Strong evidence of adaptive evolution would be provided by all 3 replicates significantly diverging from the ancestor in the same direction. Model 2 seems to aim to test the null hypothesis that the three replicates diverged from one another via a random effects model - but with only three replicates, there is very low power, and variance is likely to be estimated as zero. I'm not sure what is shown in Tables 3 & 4, or how these results relate to models 2 & 3, so my interpretation of the information may be incorrect. Nonetheless, and noting that the errors around estimates are not presented, there seems to be considerable heterogeneity in size and direction of divergence between replicates for most of the traits. Is this study really dissecting responses to directional selection, or is it dissecting drift?

      We have modified the statistical modelling of the phenotypic data. Model 2 is no longer presented. We provide a MANOVA multivariate analysis equivalent to model 2 (with replicate populations as fixed effects) but now including both environments, together with the univariate models. MANOVA results show that all traits are significantly different across populations (i.e., at least two populations differ from one another). The fitted estimates from the MANOVA are not reported with errors in R but it is obvious that not all traits evolved in each replicate GA population (Figure 6). We therefore tested the difference between each of the evolved populations and the ancestral population using a univariate approach (Figure 6, supplemental source data table 2). In this univariate analysis, block was modeled as having random effects (which we could not model with MANOVA). In the high salt environment, the replicates GA 1,2,4 differed significantly for respectively 4, 6 and 4 transitions rates (out of 6). The traits are all evolving in the same direction, and this even when the trait difference between evolved and ancestral populations is not significant. We provide compelling evidence of parallel evolution and thus selection (see review about how to infer selection in evolution experiments in Teotónio et al. Genetics 2017). We tried to be exhaustive in our statistical reporting but would happily provide additional details if requested.

      What are the traits, and what is the confidence in G? My outsider's interpretation of these results is that defining 6 transition states is a way of getting at a single behavioral trait, and I was not convinced that these data were suitable for addressing questions about multivariate evolution. Genetic parameters were estimated using MCMCglmm, which imposes boundaries on estimates. The authors state that they followed Morrissey and Bonnet 2019, but I was unable to infer what this means with respect to accounting for the contribution of sampling error to covariances (or how they accounted for the positive variance constraint). Because I was unsure how sampling error was being assessed for G, I was not confident about the interpretation of statistical support for individual parameters, or for eigenvalues of G. Following this forward, if the measured characteristics constitute a single trait, with an entirely shared genetic basis, then the results of strong alignment of everything with gmax makes complete sense - there is a single trait, that is heritable and plastic, and for which the mean evolved.

      Our initial draft was misleading and we now provide more detailed description (see also replies #5 and #12 above). We computed 1000 randomized G matrices to account for the constraints imposed by the MCMCglmm algorithm. This should account for the bias inherent with variance estimation and the eigen decomposition we did given our sample sizes. You will find that all 6 transition rates show genetic variance (Figure 2, supplement figure 2) and that up to three eigentraits have more genetic variance than the randomized G-matrices (Figure 2, supplement figure 3).

      The 6 transition rates are the mathematical description of changing movement states in 1-dimensional space (under memoryless assumptions). A priori we do not know how many relevant traits there are, if they are genetically or environmentally independent. To help the reader, we provide a Table 3 with the trait loadings for the several canonical traits of phenotypic plasticity, divergence and selection. The first canonical trait of standing genetic variation, gmax, is indeed aligned with phenotypic divergence (dmax; Figure 8, panels A and B) and with the axis of genetic variance reduction during evolution (emax; Figure 8, panels C and D), but not with ancestral plasticity (pmax; Figure 3) or mutational variance (mmax, from Mallard et al. G3 2023). pmax, for example, is aligned with g3, the third eigenvector of the ancestral G matrix. Note, however, that we do not have any power to detect the influence of g2 or g3 on phenotypic divergence or genetic divergence (Figure 8), though they together explain about 15% of the genetic variance. This is because performing such a test would require an alignment of the deviations in divergence not explained by gmax with g2 or g3. We now mention this issue in the Discussion. Overall, however, there are clearly several behavioral traits.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Zheng et al. examined the disease-causing mechanisms of two missense mutations within the homeodomain (HD) of CRX protein. Both mutations were found in humans and can produce severe dominant retinopathy. The authors investigated the two CRX HD mutants via in vitro DNA-binding assay (Spec-seq), in vivo chromatin-binding assay (ChIP-seq), in vivo expression assay of downstream target genes (RNA-seq), and retinal histological and functional assays. They concluded that p.E80A increased the transactivation activity of CRX and resulted in precocious photoreceptor differentiation, whereas p.K88N significantly changed the binding specificity of CRX and led to defects in photoreceptor differentiation and maintenance. The authors performed a significant amount of analyses. The claims are sufficiently supported by the data. The results not only uncovered the underlying disease-causing mechanisms, but also can significantly improve our understanding of the interaction between HD-TF and DNA during development.

      Thank you for summarizing the key findings and strengths of our manuscript.

      Minor concerns:

      1) The E80A, K88N and R90W (previously reported by the same group) mutations are located very close to each other in the homeodomain (Figure 1A), but had distinct effects on the activity of CRX. Has the structure of the homeodomain (of CRX) been resolved? If so, could the authors discuss this phenomenon (mutations close to each other but have distinct effects) based on the HD-DNA structure?

      In paragraphs 2, 4, 5 of the discussion section, we have added explanations on how each mutation could affect CRX HD-DNA interactions differently based on published structural studies. And we further explain how these biochemical changes relate to the molecular perturbations and cellular phenotypes seen in vivo.

      In addition, has this phenomenon been observed in other homeodomain TFs?

      Disease associated missense mutations at residues HD50 (K88) and HD52 (R90) have also been reported in other HD TFs implicated in CNS development (see discussion paragraph 7). Distinctively, different substitutions at CRX E80 residue have been reported in multiple CoRD cases, suggesting its essential role in HD-DNA-mediated regulation during retinal development. These new points are now included in the discussion section.

      2) The authors should briefly summarize the effects/disease-causing-mechanisms of all the reported CRX mutations in the discussion part. The readers can then have a better overview of the topic.

      We have added a concise summary of previously proposed CRX mutation classification scheme, all characterized Crx mutant mouse models and their pathogenic mechanisms. Please see paragraph 9 in the discussion section.

      3) CRX can also function as a pioneer factor (reported by the same group). Would these HD mutations distinctively affect chromatin accessibility (which then leads to ectopic binding on the genome)?

      Prior evidence has demonstrated that regulatory regions for many photoreceptor genes failed to stay accessible upon loss of CRX in the Crx-/- model (PMID: 30068366). It is unclear with the existing data whether CRX could initiate the chromatin remodeling (true pioneering function) of these regions, or it simply maintains the accessibility once these regions became accessible. Future studies comparing epigenomic landscape changes in mutant Crx KI models at various ages can be informative, particularly for the CRX K88N ectopic binding events. Determining how the CRX K88N mutant protein alters chromatin landscape important for photoreceptor fate and/or differentiation during development would shed light on the nature of these ectopic binding events.

      4) The discussion part can be shortened and simplified.

      We have re-written the discussion section to make it concise and to incorporate discussions on mutant CRX HD structures. Please see the revised manuscript.

      Reviewer #2 (Public Review):

      Zheng et al., investigated the molecular and functional mechanisms of two homeodomain missense mutations causing human retinal photoreceptor degeneration diseases in photoreceptor development regulated by the CRX transcription factor. They analyzed the E80A mutation associated with dominant cone-rod dystrophy (CRD) and the K88N mutation associated with dominant Leber Congenital Amaurosis (LCA). The authors found that E80A CRX binds to the same target DNA sites as WT CRX, but the binding specificity of K88N CRX is altered from that of WT in an in vitro assay. They generated Crx(E80A) and Crx(K88N) KI mice and performed ChIP assay and observed that K88N CRX binds to novel genomic regions from the WT-binding sites, while E80A binds to the WT sites. In addition, using the KI mice, they found that E80A and K88N differently affect the expression of Crx target genes. This study is well executed with proper and solid methodologies, and the manuscript is clearly written. This study gives us the insights how single missense CRX mutations lead to different types of human retinal photoreceptor degeneration diseases.

      We greatly appreciate the reviewer’s summary and positive comments.

      While the study has strengths in principle, it has a couple of weaknesses. One is how well E80A KI mice function as a pathological model of dominant CRD, in which cones are mainly first affected, is not clearly shown in this study. More data investigating how cones are affected by performing histological, molecular, and physiological analyses will be helpful and useful. For example, in the Discussion, the authors describe that E80A associates with S-cone opsin promoter results is "data now shown". This data must be presented for the readers. In addition, more molecular insights as to how E80A affects cones will strengthen this study.

      The mouse retina is rod dominant and contains only a small number of cones (3% of all photoreceptors) that are born prenatally. This poses technical challenges to appropriately assess cone-specific changes during disease initiation/progression. We are in the process of developing cellular/molecular tools to investigate how cones are being affected in Crx E80A KI model, but this is beyond the scope of the current study.

      At the same time, we have added a supplemental panel showing that, based on P0 retinal immunostaining of the early cone marker RXRγ, cones were initially born, and fate specified in CrxE80A retinas (see Figure S7A). Since the E80A protein also hyper-activated S-cone opsin promoter-luciferase (Sop-luc) reporter in HEK293 cells (see Figure S7B), we predict that CRX E80A affects cone photoreceptor differentiation in a similar manner as rod photoreceptors. Furthermore, the cone transcriptional program might be more prone to perturbations by abnormal CRX activities. These possibilities require future investigations. For this manuscript, we have included all these points in the discussion section.

      Another point is that it will be very valuable if the authors could show how E80A and K88N differently affect the 3D structure of the CRX homeodomain. Even a simulation model would be valuable.

      Please see our answer to Point 1 of Reviewer #1. In short, we have added in the discussion section our explanations on how each mutation could affect CRX HD-DNA interactions differently based on structural studies. We further explain how these biochemical changes relate to the molecular perturbations and cellular phenotypes seen in vivo. Additionally, since TF-DNA interactions are diverse and dynamic across binding sites with different sequence features and genomic environments, future studies that systematically and quantitatively evaluate CRX transcriptional activity at different regulatory sequences would be important.

    1. Author Response:

      We thank the reviewers for their insightful comments and will resubmit a revised version where we address most of the issues raised. At this time, our immediate responses are as follows.

      1. We have data to confirm the presence of the merodiploid strain by PCR but did not show the data in the original version for brevity. We will show that data in the revised version.

      2. We also have, of course, a no ATC control in our CRISPRi experiments and will also show that data in the resubmission.

      3. As a loading control for the SecA2 strains, we will show PknG blots (a protein secreted by SecA2;PMID: 29709019) that we have with us.

      4. In the nanoluc assays, the construct we made that was fused to CFP10 was generated so that there was a long linker between the C-terminus of CFP10 and nanoluc. We also have other controls in that experiment to show that the CFP10-nanoluc protein was secreted in the ΔRD10 strain and not in the ΔSecA2 strain. We will attempt to show fusion protein secretion using CFP10 antibodies in the revised version of the manuscript.

      5. We will perform experiments with the inhibitor using the merodiploid strain and in partial knockdown strains to confirm that the inhibitor does indeed specifically act on Rv1636.

      6. We will modify the discussion to talk more about the role and processes of cAMP synthesis and degradation in the revised version of the paper. Further, the manuscript will be checked for spelling and grammatical errors before resubmission, and the arrangement of data modified as suggested by the reviewers.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes the differences in the plasma proteome and metabolome in healthy Tanzanian and healthy Dutch adults. The inflammatory plasma proteome was measured using the Olink 92 Inflammation panel, while the plasma metabolome was analyzed using a mass spectrometry-based untargeted approach. The plasma metabolome was measured only in the Tanzanian cohort. This study aimed to link the pro-inflammatory proteome of Tanzanian and Dutch healthy individuals with environmental factors and dietary lifestyles.

      The correlation between the plasma proteome and food-derived metabolome profiles can shed light on the development of non-communicable diseases. This observation stresses the importance of dietary transition and lifestyle changes in expressing inflammation-related molecules. Moreover, this study describes the inflammatory proteome profile in healthy Tanzanian individuals covering a cohort with limited studies. The molecular differences in circulating biomolecules between healthy individuals living in East Africa and individuals living in Western Europe and the correlations with intrinsic and environmental features are novel.

      This study lacks a robust and solid validation of some of the differentially regulated circulating proteins and correlations between food-derived metabolites and proteins in a selected cohort. The discovery-driven approach in this manuscript highlights potential findings that need to be supported by a validation phase. According to this reviewer, the lack of such validation impacts the robustness of the results and the hypotheses generated. Due to that, the manuscript should incorporate validation experiments.

      We acknowledge that our study was limited by the lack of a validation phase. To address this issue, we have undertaken additional analyses to validate our key findings related to the proteins associated with mTOR and Wnt/β-catenin pathways. These analyses involved data from a proof-of-concept intervention study conducted at the same site. Our response below provides more information on these validations.

    1. Author Response

      Reviewer #1 (Public Review):

      For PRLR, the question being asked is whether and how the intracellular domain (ICD) interacts with the cellular membrane or how the disordered ICD can relay and transmit information. The authors show that PI(4,5)P2 in the membrane localizes around the transmembrane domain (TMD) due to charge interactions and facilitates binding of the ICD to the membrane, even in the absence of the TMD. Furthermore, the ICD and PI(4,5)P2 form a co-structure with JAK2 which locks a disordered part of the ICD into an extended conformation, allowing for signal relay and, through multiple complex conformations, may enable switching signalling on and off.

      Strengths:

      • NMR paired with MD is a powerful way to probe an interaction especially when peaks disappear and become difficult to probe by NMR.

      • Using NMR and MD to formulate hypotheses which are then tested by cell studies is quite informative. The combination of MD, NMR, and cell biology is a strength.

      • The authors are diligent in testing MD simulations on systems with and without PIP2.

      • The use of Pep1 and Pep2 to differentiate the KxK region that interacts with PIP2 is helpful.

      • The four utilized mutants help illustrate the co-dependence of the respective regions in the formation of the co-structure.

      Weaknesses:

      • In Figure 2G, there is a big change in CSP between 280 and 290, which the authors do not comment about.

      The region referred to contributes to binding but is on the edge of the main binding site and where the local affinities are weaker. Therefore, the exchange rate is high and allows for following the chemical shift changes. In support of this, we see an almost inverse correlation between the CSPs and the changes in intensities. For the main binding site, the exchange rate between bound and free states is slower because the affinity is stronger. Therefore, we cannot follow the chemical shifts to extract the CSPs to the bound state, as the peaks disappear. We have commented on this in the main text (p.8) as follows:

      “In the region from D285-E292 we observed an almost inverse correlation between the CSPs and the intensities. This suggests that in contrast to the preceding region, a faster local exchange rate allows us to follow the resonances from the bound state in this region, giving rise to the large CSPs.”

      • The data in Figure 2 are summarized as indicating the formation of extended structure in the ICD upon binding. It is not clear to me what data show an extended structure.

      The information on the extended structure comes from the analyses of the peptide Pep1 titrated with C8-PI(4,5)P2. The CD signature that develops in the bound state has a minimum ellipticity at 218 nm, which is a strong indicator of extended structure. We find this information adequately described in the main text (p.8), but have emphasized this further as follows:

      “In contrast, for Pep1, large spectral changes were seen, which were unrelated to helix formation. Subtracting the spectra in the presence and absence of C8-PI(4,5)P2, revealed a negative ellipticity minimum at 218 nm, a strong indication of B-strands, showing that when bound to C8-PI(4,5)P2, a distinct extended (strand-like structure) signature was seen (Figure 2G).”

      • No modelling or experiments were done with PIP3 despite conclusions and models which rely on the phosphorylation of PIP2 to PIP3. At the very least, these would be useful as negative controls.

      We have in a previous work addressed the affinity for phosphoinositides using lipid dot blots where we observed a preference for certain species, including PI(4,5)P2 (Haxholm et al., BJ, 2015). In this study, we also observed that there was no affinity for PI(3,4,5)P3, but may not have highlight this sufficiently in the introduction. This can have caused some confusion in understanding our choices. We have now more explicitly described these data, both in the introduction (p.4), in the result section (p. 8) and later in the discussion (p. 21). We thank the reviewer for bringing this up.

      • Only R2 experiments were done when the authors mention investigating dynamics. R1 and -HetNOE dynamics would be useful for creating a complete picture.

      Our aim with recording the R2 values was not to map the detailed dynamics of the disordered regions, but to explain the changes in the peak intensities we see for the variants when adding C8PI(4,5)P2. In this case, the R2 values supported our suggestion of internal contacts and, although we agree with the reviewer that R1s and HetNOEs would be important and relevant for a more in-depth and complete analyses of the dynamics, we find that in this case, the R2 values suffice.

      • Some of the exciting results are under-emphasized including Fig 3H and 3I.

      A new version of Figure 3 has been generated to consider the reviewers’ comments and suggestions. This figure has been restructured to further emphasize some of the major conclusions obtained from the simulations. We have moved the former Figure 3 A, B, C and D to the supplemental information to increase this focus.

      Reviewer #2 (Public Review):

      The authors combine NMR experiments, cell experiments, and molecular simulations to address the question of how lipid interactions of the prolactin receptor contribute to signalling. They assess the interactions of the disordered cytoplasmic tail of the receptor with phosphoinositides among others by chemical shift perturbations from NMR for different PIP2-containing membranes, by coarse-grained simulations, as well as site-directed mutagenesis and subsequent cell signalling experiments to monitor the activation of the mutants. A major result is that PIP2 interactions are functionally important, which so far has not been known for this receptor. Their results are likely relevant for other non-receptor tyrosine kinases.

      The hypothesis that the protein complex is regulated by IDR-membrane interactions is very novel. A major strength is the close connection of and feedback between state-of-the-art experiments and simulations.

      We thank the reviewer for the positive comments on our work and on the novelty and importance of the work

      This is where I see weaknesses:

      1) The motivation of focusing on LID1 is limited.

      We have now provided our rationale for selectively focusing on the LID1 in the PRLR. The selection was done to address the conundrum on how structural disorder in the juxtamembrane regions would be able to transmit the knowledge of extracellular hormone binding to the bound JAK2. This constitute the first step of signaling on the intracellular side and given the distance to the other two LIDs (LID2 and LID3) and their disconnect to the TMD by long disordered regions, they were disregarded, focusing on LID1 in this work. We have emphasized this choice in the introduction and in more detail in the result section (p. 5-6).

      2) The data and analysis for the JAK2-PRLR complex appear somewhat superficial, and a connection between conformational states to their functional relevance is lacking. In fact, the majority of the simulation part of the paper is about suggesting different states of the PRLR-JAK2 complex but the states and their hypothesized functional relevance are not further taken up, e.g. by experiments, and yet presented as major results, e.g. in the abstract.

      In the original manuscript we already provided a detailed analysis of the different states, highlighting accessible residues and lipid interacting residues and compare these across the states. From our experiment, including those performed in cellular assay, we cannot with certainty link the two major state to active and/or inactive states. We have therefore no intention or support from the data to claim this. However, what we do put forward as a major result, in the presence of more than one major state as also stated in the abstract and in the conclusion of the result section as follows:

      “Another key observation is the existence of different states in which different regions of both JAK2 FERM-SH2 domain and LID1 of PRLR are exposed to the solvent or hidden below the bilayer.”

      In the discussion we do speculate as to which state may be the active and/or inactive dimer/monomer but make no firm claims. We have now made the major find of more states clearer in the text, and further compare the two major states, the Y and the Flat state, to the resent cryo-EM structures of JAK1 bound to IFNAR1, which lend some support to our speculations. The abstract now reads:

      “We find that the co-structure exists in different states which we speculate could be relevant for turning signalling on and off.”

      To discern the functional relevance of these state, if possible, will require experiments also in cells that by themselves would be a new study. We have to the best of our ability clarified that the functional relevance of the states has not been elucidated by the current work.

      3) The connection between simulations and mutational study is not very direct. An open question is if the mutants can distinguish between the effects of PRLR-PIP2 interaction or PRLR-JAK2 interaction, even though this conclusion is still drawn from the data.

      We have now explained in much more detail by which arguments the different mutations were selected (see also answer above), which property of the co-structure they are most likely to engage in and affect, and we have emphasized that the separation of function by mutation may be complicated by the intimate structure formation among the three components of the co-structure. The conclusion has therefore also been softened.

      4) The conclusions drawn from the mutagenesis study (lines 547-555) are not directly supported by data. Only a partial correlation between PRLR membrane localisation and STAT5 activation is no reason to attribute the unexplained part of the STAT5 activation to PRLR-JAK2 interactions without further studies.

      We have now explained in much more detail by which arguments the different mutations were selected (see also answer above), which property of the co-structure they are most likely to affect and emphasized that the separation of function by mutation may be complicated by the intimate structure formation among the three components of the co-structure. The conclusion has therefore also been softened.

      5) PIP2 is identified as an important regulator, with very solid support from the presented data. PIP3 is part of the model but not discussed before or as part of the results. The analysis could be similarly applied or the data directly relevant to the understanding of PIP3 plays a similar role, as interactions are likely primarily electrostatically driven.

      We have in a previous work addressed the affinity for phosphoinositides using lipid dot blots where we observed a preference for certain species, including PI(4,5)P2 (Haxholm et al., BJ, 2015). In this study, we also observed that there was no affinity for PI(3,4,5)P3, but we agree that we did not highlight this sufficiently in the introduction. This have caused some confusion in understanding our choices. We have not more explicitly described these data, both in the introduction (p.4), in the result section (p. 8) and later in the discussion (p. 21). We thank the reviewer for bringing this up.

      Reviewer #3 (Public Review):

      Araya-Secchi and coauthors present a very interesting study on the role of PIP2 lipids in the potential modulation of prolactin receptor signaling. The study is well-conducted and employs an integrated approach that combines NMR spectroscopy, modeling (primarily coarse-grain MD simulations), and cell biology. This combination of methods is crucial for gaining a deeper understanding of cell receptors, from their biophysical properties to their cellular functions.

      The modelling work is mainly based on both coarse grain forcefield versions Martini2.2 and Martini3. These two versions of the forcefield may produce different results. Therefore, depending on the system being modeled, the results presented here should be considered in light of the limitations inherent to each version of the forcefield.

      We thank the reviewer for the positive appraisal of our work and the approach we employed. It is true that one must be aware of the limitations of the tools and models employed in this type of work. We agree that perhaps we were not too explicit about limitations of our methods in the presentation of the results. However, we have addressed and discussed such limitations in the revised version of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates that Chinmo promotes larval development as part of the metamorphic gene network (MGN), in part by regulating Br-C expression in some tissues (exemplified in the wing disc) and in a Br-C independent manner in other tissues such as the salivary gland. I have included below the following comments on the submitted version of this manuscript:

      1) The authors have shown experimentally that Chinmo regulates Br-C expression in the wing disc but not the larval salivary gland. Based on this, they posit that Chinmo promotes larval development in a Br-C-dependent manner in imaginal tissues and a Br-C-independent manner in other larval tissues. This generalization of Chinmo's role in development would be more compelling if the relationship between Chinmo and Br-C were explored in other examples of imaginal/larval tissues.

      We agree with the referee that confirmation of our observations in other tissues might help to generalize Chinmo’s role. To this aim, we have analyzed the role of chinmo in an additional larval, the larval tracheal system, and imaginal tissue, the eye disc. Consistent with the results reported in the manuscript, we found that the mode of action of Chinmo is conserved, as depletion of Br-C in the eye disc is able to rescue the lack of chinmo, whereas in the tracheal system it is not. We included this new information in the main text and in new SFigures 1 and 3.

      2) Chinmo, Br-C, and E93 have all been shown to be EcR-regulated in larval tissues, including the brain and wing disc (as in Zhou et al. 2006, Dev Cell; Narbonne-Reveau and Maurange 2019, PLOS Biology; Uyeharu et al. 2017, ). It would be interesting (and I believe relevant to this study) to know whether the roles of these factors in their respective developmental stages are EcR-dependent and whether their regulation by EcR (or lack thereof) depends on whether the tissue is larval or imaginal.

      Although the relevance of EcR on the regulation of the genes that conform the metamorphic gene network has been already established, a different response of EcR-mediated signalling of these genes in larval and imaginal tissues is still not properly addressed. Finding this possible different output of the EcR signalling would be very interesting. However, we think that this is out of scope of this report as the main aim of this study was to determine the main role of the temporal genes during development and their repressive interactions.

      3) In the chinmo qPCR analysis shown in Fig1A, whether animals were sex-matched or controlled was not indicated. Since Chinmo has a published role in regulating sexual identity (Ma et al. 2014, Dev Cell; Grmai et al. 2018, PLOS Genetics), and since growth/body size is known to be a sexually dimorphic trait (Rideout et al. 2015, PLOS Genetics), it seems important to establish whether the requirement of Chinmo for larval development and/or growth. I recommend either 1) controlling for sex by repeating qPCRs in Fig 1A in either males or females, or 2) reporting male/female chinmo levels at each stage side-by-side.

      As the referee pointed out, chinmo has been related to sexual identity raising the possibility of a different effect of chinmo in growth of males and females during development. However, several observations discard this option. First of all, the role of chinmo in sexual identity has been only reported in adult testis and specifically in cyst stem cells. In fact, specific mutations of chinmo that only affects the expression of chinmo in testis, do not affect testis formation but its maturation, suggesting a role of chinmo in sex determination specifically in the testis cyst stem cells (Ma et al. 2014, Dev Cell; Grmai et al. 2018, PLOS Genetics). Second, it has been described a sex dependent growth rate during larval development (Rideout et al. 2015, PLOS Genetics; Sawala A. and Gould AP, PLoS Biol, 2017). However, the main difference in growth rate between males and females is found in L3 larvae (Sawala A. and Gould AP, PLoS Biol, 2017), when the expression of chinmo strongly declines in both males and females, indicating that chinmo impact on sex dimorphism during larval development might be at least, limited.

      Thus, considering that, based on our results, chinmo exerts its main role in larval tissue growth during L1 and L2 stages and that body growth is practically identical in male and female during these stages (Sawala A. and Gould AP, PLoS Biol, 2017), we can assume that chinmo might not contribute to sexual body size dimorphism.

      Nevertheless, we would like to clarify that we have performed the measurements of chinmo expression always in females, when sex identification was possible, namely in L3 larvae. L1 and L2 larvae qPCRs were not sex-discriminated as sex identification was not possible in our conditions.

      4) In Fig2E, the authors show that salivary gland secretion (sgs) genes are repressed in salivary glands lacking chinmo. Sgs genes are expressed during late larval stages as the animal prepares to pupate. Thus, based on the proposed model where Chinmo promotes larval development and represses the larval-to-pupal transition, one might expect that larval salivary glands lacking chinmo would express higher than normal levels of sgs genes. This expectation directly opposes the observed result - it would be helpful to speculate on this in the interpretation of results.

      This is an interesting observation. As Sgs genes are regulated by Br-C (Duan et al. Cell Reports 2020), precocious expression of this transcription factor in chinmo depleted animals might result in an early activation of those genes. Interestingly, we were not able to detect any Sgs genes expression in chinmo depleted salivary glands. We think that this is due to the fact that in absence of chinmo, this organ does not properly develop and mature, and therefore it is unable to express Sgs genes. Proof of that is that the double knockdown of Br-C and chinmo shows the same dramatically low levels of those genes. Altogether, these results strongly suggest that SGs lacking chinmo expression are unable to grow and synthesise Sgs proteins, even in the premature presence of Br-C. We discussed this point in the main text of the edited Ms. Please also see the response to referee 2.

      Reviewer #2 (Public Review):

      The evolution and control of the three-part life history of holometabolous insects have been controversial issues for over a century. While the functioning of broad as a master gene controlling the pupal stage and of E93 as a master gene for the adult stage has been known for about a decade or more, chinmo has only recently been proposed as being the master gene responsible for maintaining the larval stage (Truman & Riddiford, 2022). While the former paper focused on the embryonic and early larval function of Chinmo, this paper explores its metamorphic effects and defines the roles of Broad and E93 in the phenotypes produced by manipulations of Chinmo expression.

      Overall, the paper is well presented but in places, readers would be helped if the authors were more explicit about the logic and details of their manipulations. There are a couple of conceptual issues that the authors should address.

      The role of Broad in larval tissues:

      One intriguing issue relates to the relationship of Chinmo to Broad and E93 in larval versus imaginal tissues prior to metamorphosis. The knock-down of chinmo in imaginal discs results in severe suppression of growth and the lack of metamorphic patterning genes such as cut and wingless. Normal growth and patterning are reestablished though, if broad is also knocked-down, supporting the notion that the effects of the lack of Chinmo are mediated through the premature expression of Broad.

      In the salivary glands, by contrast, chinmo knock-down suppresses growth, and this growth suppression is not reversed by simultaneous broad knockdown. They properly conclude that the role of Chinmo in supporting the growth of larval tissues does not involve Broad, but their data on the expression of salivary gland proteins suggest that Broad still plays some role in Chinmo function in salivary glands. Fig. 5E shows the levels of various salivary glue proteins in the glands of Chinmo knock-down larvae. The levels are reduced, as expected by the lack of salivary gland growth, but a significant finding is that they are there at all! The Costantino et al. (2008) paper shows that these genes are only induced in the mid-L3. Ecdysone, acting through Broad isoforms, is necessary for their appearance and these SGS genes can be induced in the L1 and L2 stages by ectopic expression of some Broad isoforms. Their low levels in Fig 5, would be due to the small size of the gland, but the gland's premature expression of Broad likely causes their induction. In larval cells, then, Chinmo may feed into two parallel pathways, one that does not involve broad and regulates growth and the other, utilizing Broad, regulating premetamorphic changes.

      It would be useful to look at early larval salivary gland proteins such as ng-1 to -3 that are expressed in salivary glands before the critical weight. Also, it would be interesting if the appearance of the SGS proteins after chinmo knock-down (Fig 5E) is abolished by simultaneous knock-down of broad.

      This is an interesting observation. We think that the main problem has derived from the way we presented the data. Our results showed that depletion of chinmo in the SGs dramatically impairs the induction of Sgs gene expression, even with the premature presence of Br-C, which has been shown to be responsible for Sgs expression (Duan et al. Cell Reports 2020). The confusion might come from the way we presented the level of expression of those genes. In fact, the levels of Sgs in both chinmoRNAi and chinmoRNAi/Br-CRNAi SGs were virtually undetectable, suggesting that chinmo in the SG is not only required for Br-C repression but also for proper development of the gland. We believe that based on the fact that the very low levels of expression of Sgs genes in chinmo depleted SGs are still detected in the double knockdown chinmoRNAi/Br-CRNAi. Dramatically reduced expression of the early larval SGs ng1-3 genes in chinmoRNAi and double knockdown chinmoRNAi/Br-CRNAi supports this statement. Altogether these results suggest that Br-C is necessary but not sufficient for the expression of those specific SGs genes. We have changed the plots in Figure 2 and 3 to clarify this point and added the levels of expression of ng1-3.

      Role of Chinmo and Broad in Hemimetabolous insects:

      In the conclusion of their comparative studies on the cockroach (line 342), the authors state that Broad exerts no role in the development of hemimetabolous insects. However, this conclusion is not consistent with the literature. The first study of broad knockdown in a hemimetabolous insect was in the milkweed bug Oncopeltus fasciatus by Erezyilmaz et al. (2006). Surprisingly to Erezyilmaz et al., broad knock-down in early-stage nymphs did not cause premature metamorphosis. However, Broad expression was essential for tissues of the wing pads and dorsal thorax to undergo morphogenetic growth (rather than simple isomorphic growth), and for stage-specific changes in coloration through the nymphal series (but not for the nymph to adult color change). A similar function for Broad on wing growth during the later nymphal stages was later shown in Blattella (Fernandez-Nicolas et al., 2022; Huang et al., 2013). The wing- and genital pads represent "imaginal" tissues in the nymph and the need for Broad in these tissues are the same as seen in imaginal discs as the latter shift from isomorphic growth to morphogenesis at the critical weight checkpoint in the L3. This would suggest that important roles for Broad and E93 are already established in the hemimetabolous insects with E93 controlling the shift from immature (nymphal) to adult phenotypes and Broad controlling the premetamorphic growth of imaginal tissues in early-stage nymphs. Chinmo might then be needed to keep both in check.

      We are sorry for not having dealt with these observations in the submitted manuscript. We have taken them into consideration in the new version to discuss about the role of Br-C in the transition from hemimetabolous to holometabolous.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors study single and pairs of MDCK cells adherent to an H-shaped geometry on a flat surface. In this pattern, the cells form strong peripheral stress fibers. To a lesser extent, these cells also exhibit stress fibers in the cell interior, which otherwise has a rather homogenous actin distribution. Using a combination of traction force microscopy, from which they infer the stress distribution by monolayer stress microscopy, and "contour analysis" the authors quantify the 'bulk' and the 'surface' stress in these cells. This analysis shows that single cells are mechanically polarized whereas pairs are not.

      The authors then go on to optogenetically activate the actomyosin contractility of either one half of a single cell or one cell of a pair. Combining their stress measurements in these situations and using a finite element mechanical model, the authors convincingly show that the mechanical response in the non-activated part is active. By varying the aspect ratio of the adhesion patterns, they also find that the efficacy of active stress propagation depends on the mechanical and structural polarity of the cell. Furthermore, they provide evidence that their results on cell pairs generalize to tissues.

      Strengths:

      This study uses a nice combination of physical tools to address an important question in tissue mechanics. The data is compelling and fully supports the authors' conclusions.

      Weaknesses:

      There are no major weaknesses.

      In summary, although the fact that mechanical stress propagation in tissues is an active process might not come as a surprise, the study makes substantial contributions to a quantitative contribution of this process. As such it is of fundamental significance in the field. It will be interesting to explore the consequences of this mechanism for mechanical stress propagation in the context of developmental processes. It will be also of great interest to study how this local process can be accounted for in large-scale theories.

      We thank reviewer #1 for this very positive assessment. We agree that in the future, our results should be used on the theory side to upscale them to tissue level. One way to do this would be the discontinuous Galerkin method, but it will take time to work this out. We also note that we would have loved to experimentally study intermediate cases between two and many cells, but it turned out to be very difficult to position few cells on micropattern and to repeat the force propagation analysis which we present here for two cells and for small tissues. In fact, it might be more rewarding to use optogenetics early in a developmental process with clearly defined cell positioning. In the revised manuscript, we now have added a comment on the challenge to work with three or four cells with the micropatterning approach, and that therefore we turned to small monolayers.

    1. Author Response

      Reviewer #3 (Public Review):

      In this study, the authors probe the molecular changes that occur in a neural circuit for learned behavior that depends on sensory input to maintain stereotypy. Songbirds, as the Bengalese finches used here are, are premier systems in which to ask these questions because they produce a highly stereotyped song that emerges after sensory learning becomes integrated into the function of a sensorimotor neural circuit responsible for singing. By deafening a group of birds (who show a shift in their song structure) and comparing them to hearing birds, clues as to how plasticity in motor output may emerge from genomic changes that alter the function of cells within the various components of the neural circuit.

      There are multiple strengths of the paper:

      1) The results may have broad implications because the type of sensorimotor neural circuit (cortico-basal ganglia-thalamic-cortical loop) used for singing is generally necessary for learned behaviors.

      2) The methods and analyses are generally rigorous, including the parsing of song elements, and the type of detailed RNA sequencing and analysis that demonstrates the power of a genomic view of neural plasticity as it relates to behavioral plasticity.

      3) Because the authors assayed the pallial (cortical) areas, as well as the basal ganglia component, of the sensorimotor circuit they were able to creatively compare how different facets of the network contributed to a) unmodified brain properties, b) properties perturbed after the loss of the auditory input that is required to stabilize song structure. As a result, they have added to the known molecular profiles for each of these brain areas, the accounting of how they may be specialized in comparison to the surrounding non-song brain, and what changes occur after deafening. Utilizing some existing single-cell sequencing data, the authors present for the first time some insight into what cell types may be showing the most robust changes, and therefore which may be driving the shift in song structure. The analysis further pushes in new ways to suggest how the molecular properties of a given brain area may relate to those of directly-connected areas. Together, these findings provide valuable clues as to the specific cell types and signaling properties that may be central to the production of stabilized, learned behavior.

      4) One of the cortical brain areas, LMAN, was lesioned in a subset of the hearing subjects because it projects to the area that showed the greatest molecular difference between deafened and hearing birds (RA). The idea was to compare how this affected molecular properties with properties after the loss of auditory input; because RA is the output motor area for the song, its properties may be most directly tied to song structure. Using unilateral lesions was a strong choice of experimental design that allowed for rigorous analysis of this idea, and was interpretable because birds do not have a direct inter-hemispheric callosum.

      The foundation of the paper is solid, though the results shown raise several questions that are not fully addressed, and limit some of the power of the implications.

      The biggest questions arise from the finding that RA shows the largest number of molecular changes after deafening. The analysis and interpretations do not fully incorporate what we know of this circuit, at least from another well-studied songbird, the zebra finch, from which the authors derive other types of information. For example, it is not yet clear if RA is most changed because it is most directly involved in song output or because it receives projections from two areas within the sensorimotor circuit (LMAN and HVC). How do we consider the fact that by adulthood, LMAN and HVC cells project onto the same RA neurons? Are those the cell types being identified here? Would HVC lesions be expected to have the same effect as LMAN lesions? Are the cell types showing the greatest change those that are most involved in song output (e.g. are they projecting to nXIIts)? How do these results relate to the findings of changes in RA after HVC and LMAN lesions reported decades ago? How do these findings compare to an earlier study that also performed sequencing on areas from the sensorimotor circuit in deafened juveniles? Further, RA also receives information from the auditory processing regions of the brain, via the immediate structure RA-cup. It is not yet explicitly addressed how some effects may be from the loss of this more direct access to auditory information, rather than from information and projections originating within the sensorimotor circuit, and reinforces the question of whether or not the number of inputs to a particular brain area is a driving factor in the general pattern of changed RNAs after perturbation.

      We thank the reviewer for their review and for their excellent suggestions on how to improve its impact. The reviewer raises several important points, which we have expanded on in the Discussion of the revised manuscript, and will address here:

      First, there is the general consideration of how the structure of inputs to RA influences the interpretation of our results. There is the question about whether we can consider RA expression alterations as due to its direct projections to song motoneurons (‘output’) or the convergence of two important song nuclei, HVC and LMAN, onto RA (‘input’). This is a difficult question to untangle. We could interpret ‘output’ only effects as local perturbations that do not depend on song circuit afferent activity, such as hormonal fluctuations associated with the loss of hearing. ‘Input’ effects would occur through changes in afferent activity, such as those that elicit plasticity associated with song destabilization or more general alterations to the amount of afferent neural activity (a point addressed in the revised manuscript, lines 842-848). By focusing on a measure of song destabilization in our differential expression analysis, we are specifically seeking to identify gene expression responses that are associated with changes to behavioral output. Yet these behavioral changes are certainly driven by alterations in upstream regions or the manner in which they converge onto RA. The reviewer also notes inputs from RA-cup as a potential avenue through which the loss of auditory information could more directly influence expression in RA. It is certainly possible that the loss of auditory information itself could influence gene expression in different components of the song system, a point we note in the revised Discussion (lines 848-853). We also note there that future experiments leveraging different plasticity induction techniques (TS cut, delayed auditory feedback) will be important to resolve the influence of this input.

      Our lesion experiments aimed to characterize how input from LMAN influences expression in RA, due to LMAN’s important role in mediating song plasticity. We would expect HVC lesions to elicit different expression responses because of its distinct mode of transmission onto RA projection neurons (primarily AMPAR in contrast to primarily NMDAR for LMAN), the distinct activity patterns of HVC and LMAN, and likely distinct neuromodulatory signaling from the two afferents (e.g. LMAN acts as source of BDNF). We discuss how HVC lesions would be useful to further disentangle the influence of afferent input on RA gene expression in the Discussion of the revised manuscript (lines 926-946). In the revised manuscript, we also cite previous work that examined the influence of HVC and LMAN on RA neural activity, morphology, and cell survival (lines 928-932).

      As to the cell types in RA that show expression changes following deafening, we show in Figure 5 that both glutamatergic projection neurons (‘RA_Glut’), i.e. the neurons that project to subcerebral structures such as nXIIts, as well as GABAergic interneurons (‘GABA’) show substantial expression alterations. In the Discussion, we highlight the functional roles of several genes that have enriched expression in each class (lines 864-873 and 887-893).

      In the revised manuscript, we have added a paragraph in the Discussion (lines 854-862) that references results from Mori, C. & Wada, K. Audition-independent vocal crystallization associated with intrinsic developmental gene expression dynamics. J. Neurosci. 35, 878–889 (2015). This work examined the influence of early deafening on gene expression in the song motor pathway and identified a strong developmental and audition-independent expression response. It identified an important separation between developmentally-driven and experience-dependent molecular responses in the song system. We note that the aims were distinct from the present study, which sought to identify gene expression responses to deafening-induced song plasticity.

      Importantly, since the LMAN lesions did not create significant changes in the song structure, it is difficult to know how to interpret the meaning of these molecular changes in RA, alone and in combination with the comparison to the RA profiles from deafened birds. Of importance is the question of whether or which molecular profiles are thus signatures of behavioral plasticity or not.

      The reviewer raises an important set of followup experiments that assess the extent to which the transcriptional state of the song system tracks with song plasticity state. Coupling LMAN lesions with deafening, a manipulation that prevents song degradation, would be a strong approach to identify genes whose expression is closely tied to song destabilization, a possibility that we now discuss (lines 936-946).

    1. Author Response

      Reviewer #1 (Public Review):

      1) There are two main 'weaknesses'. The first is the limited power that comes from only using measuring the phenotype of 387 strains. Whether this is because of the expense/ difficulty of the inToxSa is not discussed, leaving open the question of how much this assay could be scaled up in the future.

      A previous study investigating the toxicity of S. aureus culture supernatants assessed 217 clinical strains (https://doi.org/10.1371/journal.pbio.1002229).That study had sufficient power to uncover important genetic determinants of S. aureus virulence. Here, we significantly increased the throughput to 387 clinical strains combined with a sophisticated cell toxicity assay that measures the kinetics of cytotoxicity caused by intracellular S. aureus. We have investigated the S. aureus genetic associations using this rich dataset (each of the 387 strains were assessed in 3 to 15 replicates, accruing 655,005 measurements corresponding to kinetic cytotoxicity assessments of intracellular S. aureus). This rich dataset enabled the accurate identification of genomic signatures that modulate cytotoxicity; genomic signatures that we then validated by reconstructing the mutations, thus demonstrating the power of our approach. The upscaling of this method (4-fold, with adequate technical adjustments) should be possible with the adoption of a 384-well plate format instead of a 96-well plate. We will continue to investigate additional clinical isolates and explore the use of 384-well plates, but the analysis we present of data from the 96-well format is already a substantial advance for the field.

      Across this study, and as presented in the current manuscript, the maximum throughput of the InToxSa assay was of 7x 96 well plates per week, thus corresponding to 98 distinct clinical strains testable per week (encompassing 6 individual replicates, each tested across 2 different days/plates). Following the reviewer suggestion, we have added this information to the discussion (Lines: 406-409).

      2) The second is that the main output of the assay is actually reduced intracellular toxicity (PI uptake AUC), which is inferred to be strongly linked to increased intracellular persistence. The linkage between the phenotypes comes primarily from microscopic studies on a limited number of strains. It may be true of all cases, but the possibility exists that for some of the strains, reduced cytotoxicity may be associated with intracellular elimination, which would presumably be a negative outcome for systemic infection.

      Whilst the reviewer’s comment is pertinent, we note that none of the least cytotoxic S. aureus isolates identified by the InToxSa assay have resulted from bacterial clearance, intracellular bacterial growth defects or evasion from their cellular niche, as we have assessed intracellular bacterial loads at 3h and 24h (post-bacterial uptake) in experimental conditions using cell-impermeant antibiotics (which would kill extracellular bacteria and prevent over-infection of non-infected bystander HeLa cells), as shown in figures 5F and 5H and also in Figure 5 Supplementary figure 5, highlighting an inverse correlation between cytotoxicity and intracellular persistence.

      Reviewer #2 (Public Review):

      1) …Thus, my concerns are focused on further understanding the practical utility of the approach and whether or not the HeLa cell model recapitulates what happens in professional phagocytes.

      HeLa cells have proven a useful cellular model in infection and in pathogen biology to assess the ability of bacterial pathogens to invade, persist and replicate within host cells. Several studies have convincingly used HeLa cells to assess S. aureus phenotypes at the bacteria-host cell interface, as exemplified by the following recent research (DOIs: 10.1128/mBio.02250-20, 10.1371/journal.ppat.1009874, 0.138/s41598-019-51894-3, and 10.1128/mSphere.00374-18). We do also acknowledge the limitations of cell line models in the discussion (Lines 494-510).

      2) …it is not clear to me that this system has the statistical power to find novel, biologically relevant rare mutations without first being very mindful in selecting strains that are extremely genetically similar.

      As described, this is a S. aureus bacteraemia study, wherein the strains composing the collection are, by definition, closely related. We articulated this in the manuscript “We used InToxSa to identify S. aureus pathoadaptive mutations, enriched in bacterial populations that are associated with human disease (e.g., upon transit from colonising to invasive”. “We hypothesised that these mutations would support an intracellular persistence for S. aureus.”) We see no foreseeable reasons preventing this type of study of being replicated elsewhere.

      3) It is also not clear to me that the toxicity assay captures the important features of the intracellular persistence that occurs in vivo within professional phagocytic cells.

      Response: Indeed, it is possible that InToxSa using HeLa cells may not capture the features of intracellular S. aureus persistence within professional phagocytes. However, our data shows that it remains possible to uncover genomic features related to intracellular cytotoxicity and persistence, both traits relevant S. aureus-host cell biology. The cells forming physical barriers, such as the epithelial cells and endothelial cells play major roles in staphylococcal pathobiology. Whilst HeLa cells are a model cell line, their tractability makes them ideal for high throughput studies tested over longer infection times.

    1. Author Response

      Reviewer #1 (Public Review):

      Mermithid nematodes are ecologically important parasitoids of arthropods, annelids and mollusks today. Their fossil record in amber reaches back into the Early Cretaceous, some 135 million years ago. Luo et al. more than triple this record by presenting, with ample illustrations, exceptionally well preserved new specimens from the beginning of the Late Cretaceous (99 Ma ago) of Myanmar. Their most important finding is that mermithids parasitized a number of insect clades in the Cretaceous that they are not known to infect today or in Cenozoic amber; further, the proportion of holometabolous insects among the hosts is found to be lower in the Cretaceous than in the Cenozoic. The strengths of the paper lie in the specimens, the illustrations of the specimens, and the documentation of when, where and how the specimens were acquired. Certain nomenclatural aspects of the paper require improvement. A potential weakness of the paper could be collection bias: it is not tested whether the collections used to show the shift toward holometabolous hosts from the mid-Cretaceous to the Cenozoic are representative of the fossil record as it is preserved and accessible today.

      Thank you very much for pointing out these issues. We have added a new Figure 10 and Table 1 to our paper. Indeed, collection bias is almost present in all amber biotas. However, we believe we have robust reasons to argue that the shift to holometabolous hosts does exist. Although Kachin amber has only been studied extensively in the last two decades (compared with centuries of study in Baltic amber or Dominican amber), it has become by far the most intensively studied amber biota since its Cretaceous age was appreciated, now comprising an exceptional 700 families (Ross, 2023). Also, the fossil record of holometabolous insects is clearly much better than heterometabolous insects in Kachin amber (1296 spp. vs 465 spp. respectively). But as shown in our paper, the nematodes we found in Kachin amber are mainly associated with heterometabolous insects. Therefore, even if collection bias might exist, such as the presence of some unreported nematode-Holometabola associations, we believe our conclusion about the shift is robust. We also add some explanation in our paper.

      Reviewer #2 (Public Review):

      This manuscript reports on mermithid nematode fossils from amber which dates from the Cretaceous period. The specimens described in the manuscript consist of insects and associated nematodes which have been trapped in amber and fossilised. The nematodes have been identified as belonging to the Mermithidae family, a family of nematode worm that infect insects. The findings of this manuscript provide an insight into the evolution history of nematodes and parasitism. Despite the ubiquity of both nematodes and parasites in extant ecosystems, fossil records of both are very rare. This is because nematodes and many parasites are soft bodied, and many are located inside their hosts' bodies, thus they rarely become fossilised. Thus, most of what is known about the evolutionary history of nematodes, and evolution of parasitism are based on what could be inferred from extant examples.

      The specimens described in this manuscript provides a valuable contribution to our understanding of parasitism in the geological past. These amber specimens are a snapshot of parasite-host interactions - interactions which are commonly found in nature but are rarely captured in fossils. The identification of the specimens as mermithid nematodes are based on sound scientific reasoning. The worms' morphology and position in relation to the insects are consistent with what have been observed with extant mermithid nematodes.

      Additionally, one of the values of such parasite fossils is that they provide us with insight into parasite-host combinations or interactions which may have existed throughout the geological past, but no longer exist today or cannot be inferred from extant taxa. It helps fill in major gaps in our understanding of parasitism. This was the case with the amber fossil that contained a bristletail with its nematode parasite.

      We are very grateful for the positive and encouraging comments.

      Reviewer #3 (Public Review):

      The authors provide a timely description of new mermithid nematodes from Cretaceous amber and use it to argue an important shift in insect host exploitation. The descriptions are state-of-the-art and will become valid once the appropriate zoobank numbers are used after publication. The authors also compiled crucial and detailed new information on the host exploitation in amber nematodes in the supplementary material. This data is also depicted in pie diagrams and seems at first glance to support their interpretations of a shifts in host exploitation in fossil amber deposits when analysed appropriately and statistically but such an true analysis and depiction should be part of the main manuscript to do the compilation and interpretation justice. For the sake of reproducibility and the field, such fundamental statistical analysis as well as a statistical comparison with modern hosts would make this broad-sweeping claim of a major host shift and importance of amber deposits containing such nematode-insect interactions since the Cretaceous (even) more robust and fundamental.

      Thanks. We realized this drawback and now we calculated the 95% CI using the Agresti-Coull method of the “binom.confint” function from the binom R package (https://cran.r-project.org/package=binom) of R 4.2.2. We also added a new Figure 10 and Table 1 in our paper. But, since we compiled the “occurrence” of invertebrate–nematode associations from these amber localities, it is impossible to compare with modern mermithids. For example, the parasite of Cretacimermis chironomae occurs five times in Kachin amber, but an extant dipteran-parasitized mermithid species can occur many times just in a single pond. However, it is evident that mermithids and all invertebrate-parasitized nematodes prefer to infect holometabolous insects rather than other invertebrates (Poinar, 1975; Poinar, personal observation). We have also added some explanation to our paper.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Li et al characterize sex differences in the impact of macrophage RELMa in protection against diet-induced obesity [DIO]. This is a key area of interest as obesity studies in mice have generally focused exclusively on male animals, as they tend to gain more weight, faster than female mice. The authors use a combination of flow cytometry, adoptive transfer, and single-cell transcriptomics to characterize the mechanism of action for female-specific DIO protection. They identify a potential role for eosinophils in mediating female DIO protection downstream of RELMa production by macrophage. They also use the transcriptomic characterization of the stromal vascular fraction of the adipose tissue to evaluate molecular and cellular drivers of this sex-specific DIO protection.

      Although the authors provide solid evidence for many claims in the manuscript, there is generally not enough information about the studies' methods (especially on the computational/data analysis aspects) for a careful evaluation of the result's robustness at this stage.

      We have significantly expanded the methodology, especially of the scRNAseq, and deposited the script and raw data in public repositories. We also validated our methods and can confirm that the analysis presented is robust. This resubmission contains new Fig 7 and new supplementary material with this methodology and validation.

      Reviewer #2 (Public Review):

      In the study by Li et al., the authors hypothesize that RELMa, a macrophage-derived protein, plays a sex-dimorphic role as a protective factor in obesity in females vs males. The authors perform largely in vivo studies utilizing male and female WT and RELMa KO mice on a high-fat diet and perform an in-depth analysis of immune cell composition, gene expression, and single-cell RNA Sequencing. The authors find that WT females are protected from obesity and inflammation vs males, and this protection is lost in female RELMa KO mice. Further analysis by the authors including flow cytometry of the visceral fat SVF in female WT mice showed reduced macrophage infiltration, higher levels of eosinophils, and Th2 cytokine expression compared to WT male mice and female KO mice. The authors show that protection from obesity and inflammation in female RELMa KO mice can be rescued with an injection of eosinophils and recombinant RELMa. Lastly, the authors use single-cell RNA-Sequencing to further analyze SVF cells in WT and KO male and female mice on a high-fat diet.

      Overall, we find that the study represents an important finding in the immunometabolism field showing that RELMa is a key myeloid-derived factor that helps influence the macrophage-eosinophil function in female mice and protects from diet-induced obesity and inflammation in a sexually dimorphic manner. Overall, the study provides strong and convincing data supporting the authors' hypothesis and conclusion.

      We thank the reviewer for their positive review of our manuscript and their helpful feedback which we address below.

      Reviewer #3 (Public Review):

      Li, Ruggiero-Ruff et al. examine the role of RELMα, an anti-inflammatory macrophage signature gene, in mediating sex differences in high-fat diet (HFD)-induced obesity in young mice. Specifically, the authors hypothesize that RELMα protects females against HFD-induced obesity. Comparisons between RELMα-knockout (KO) and wildtype (WT) mice of both sexes revealed sex- and RELMα-specific differences in weight gain, immune cell populations, and inflammatory signaling in response to HFD. RELMα-deficiency in females led to increased weight gain, expansion of pro-inflammatory macrophage populations, and eosinophil loss in response to HFD. Female RELMα-deficiency could be rescued by RELMα treatment or eosinophil transfer. Single-cell RNA-sequencing (scRNA-seq) of adipose stromal vascular fraction (SVF) revealed sex- and RELMα-dependent differences under HFD conditions and identified potential "pro-obesity" and "anti-obesity" genes in a cell-type-specific manner. Using trajectory analysis, the authors suggest dysregulation of macrophage-to-monocyte transition in RELMα-deficient mice.

      The conclusions of this paper are mostly well supported by the data, but some aspects of the statistical and single-cell analyses will need to be corrected, clarified, and extended to enhance the report.

      We thank Dr. Ocanas for their positive comments and for the helpful feedback to improve our study. We have addressed all the comments and significantly revised the manuscript.

      Strengths:

      The authors use several orthogonal approaches (i.e., flow cytometry, immunohistochemistry, scRNA-Seq) and models to support their hypotheses.

      The authors demonstrate that phenotypes observed in HFD-fed females with RELMα-deficiency (i.e., weight gain, loss of eosinophils, a gain of M1 macrophages) can be rescued by RELMα treatment or eosinophil transfer.

      The authors recognized the complexity of macrophage activation that is beyond the 'M1/M2' paradigm and informed readers in the introduction as to why this paradigm was used in this study. During the scRNA-seq analyses, the authors further sub-cluster macrophages to include more granularity.

      Weaknesses:

      1) There are several instances in the text where the authors claim that there is a significant difference between the two groups, but the statistics for these comparisons are not shown in the figure.

      Because we are dealing with three variables: genotype, diet and sex, and many differences, we thought it too complicated to add all the significant differences on the graph, but sometimes just mentioned these in the text with a p value, or didn’t mention at all if the difference was obvious, or not meaningful (for example, we weren’t interested in comparing a WT male on a Ctr diet with a RELMalpha KO female on a HFD for the purpose of our hypothesis). We have now ensured clarity in the text and in the figures, and addressed the specific point-by-point comments from the reviewer. We have also now carefully re-evaluated the text to ensure that any significant differences we discuss are shown in the figure.

      2) It is unfortunate that eosinophils could not be identified in the single-cell analysis since this population of cells was shown to be important in rescuing the RELMα-deficiency in HFD-fed females. The authors should note in the discussion how future scRNA-Seq experiments could overcome this limitation (i.e., enriching immune cells prior to scRNA-Seq).

      We were indeed disappointed that we were not able to obtain eosinophil single cell seq, but realize that this is a reported issue in the field. We have expanded our discussion of this and cited a paper that performs eosinophil single cell sequencing (published at the time our manuscript was being submitted): ““At the same time as our ongoing analysis, the first publication of eosinophil single cell RNA-seq was published, using a flow cytometry based approach rather than 10x, including RNAse inhibitor in the sorting buffer, and performing prior eosinophil enrichment (PMID: 36509106). Based on guidance from 10x, we employed targeted approaches to identify eosinophil clusters according to eosinophil markers (e.g. Siglecf, Prg2, Ccr3, Il5r), and relaxed the scRNA-Seq cutoff analysis to include more cells and intronic content, but still could not find eosinophils. We conclude that eosinophils may be absent due to the enzyme digestion required for SVF isolation and processing for single cell sequencing, which could lead to specific eosinophil population loss due to low RNA content, RNases or cell viability issues. Future experiments would be needed to optimize eosinophil single cell sequencing, based on the recent publication of eosinophil single cell sequencing.”

      3a) There are several issues with the scRNA-Seq analysis and interpretation. More details on the steps taken in the single-cell analyses should be included in the methods section.

      We agree with the reviewer that more details on steps taken in the single cell data processing and bioinformatics needs to be included in the methods section. We included more information and separated sections within the data processing section in the Materials and Methods on the methodology used for these approaches, as well as provided a code for our data processing in a public Github repository: https://github.com/rrugg002/Sexual-dimorphism-in-obesity-is-governed-by-RELM-regulation-of-adipose-macrophages-and-eosinophils.

      b) With regards to the 'pseudobulk' analyses presented in Figs. 5-6, several of the differentially expressed genes identified in Fig. 6 are hemoglobin genes (i.e., Hba, Hbb genes). It is not uncommon to filter these genes out of single-cell analysis since their presence usually indicates red blood cell (RBC) contamination (PMID: 31942070, PMID: 35672358). We would recommend assessing RBC contamination as well as removing Fig. 6 from the manuscript and focusing on cell-type-specific analyses. Re-analysis will likely have an impact on the overall conclusions of the study.

      Prior to our first submission, we consulted with 10x support scientists and the UCR bioinformatics core director to ensure that our analysis included the appropriate filtering. We have now added details in the Methods. The PMIDs provided above are from studies that looked at hippocampus development (where they didn’t perfuse so there may be blood contamination) or whole blood (where there would be significant red blood cell contamination). In contrast, we perfused our mice and treated the single cell suspension with RBC lysis buffer, as detailed in Methods. Also, we have now extended our scSeq analysis to compare hemoglobin RNA to red blood cell specific markers including Gypa/CD235a. While hemoglobin is distributed throughout the myeloid population in the female KO mice, Gypa/CD235a, which would suggest RBC contamination is not expressed at all (see new Fig 7B). Additionally, we provide hemoglobin protein ELISA and IF staining to support our finding that macrophages from KO mice express hemoglobin protein. Last, two publications support hemoglobin expression by nonerythroid sources, including macrophages (PMID: 10359765; PMID: 25431740). While we are confident based on above that our data is not due to RBC contamination, we cannot exclude the fact that, although unlikely, macrophages may be phagocytosing RBC and preserving specifically hemoglobin RNA and protein. Nonetheless, we discuss this possibility in the text. In conclusion, based on the justification above and the new data, we are confident that our findings and overall conclusions are robust.

      To assess for potential RBC contamination, in addition to Gypa, we additionally looked at top genes expressed by murine erythrocytes (PMID: 24637361). Please see below feature plots, showing little to no expression, and a very different distribution than the hemoglobin genes (see new Fig 7a):

      Also, we had a small cluster of potential RBCs (only 75 cells) that we filtered out of downstream DEG analysis, which revealed the same data as in the first submission.

      4) Within the text, there are several instances where the authors claim that a pathway is upregulated based on their Gene Ontology (GO) over-representation analysis (ORA). To come to this conclusion, the authors identify genes that are upregulated in one condition and then perform GO-ORA on these genes. However, the authors do not consider negative regulators, whose upregulation would actually decrease the pathway. Authors should either replace their GO-ORA analysis with one that considers the magnitude and direction of differentially expressed genes and provides an activation z-score (i.e., Ingenuity Pathway Analysis) or replace instances of 'upregulated' or 'downregulated' pathways with 'over-represented' pathways.

      Unfortunately, we did not have access to IPA for this project, therefore we have changed our analysis to over and under-represented pathways as suggested.

      5) For Fig.7A, a representative tSNE plot for each group (WT Female, KO Female, WT Male, KO Male) should be shown to ensure there is proper integration of the clusters across groups. There are some instances where the scRNA-Seq data do not appear to be integrated properly (i.e., Supplemental Figure 2C). The authors should explore integration techniques (i.e., Seurat; PMID: 29608179) to correct for potential batch effects within the analysis.

      We thank the reviewer for the suggestion of proper integration of the clusters across groups. We performed integration using the Cell Ranger aggregation (aggr) pipeline (see updated materials and methods section). In addition, many technical controls were performed to prevent batch effects between our samples. For sequencing, we used the 10x genomics library sequencing depth and run parameters for both gene expression and multiplexing libraries. For all 3’ gene expression library sequencing, we sequenced at a depth of 20,000 read pairs per cell and for all cell multiplexing library sequencing we sequenced at a depth of 5,000 read pairs per cell. All libraries were paired-end dual indexed libraries and were pooled on one flow cell lane using a 4:1 ratio (3’ Gene expression: Multiplexing ratio) in the Novaseq, as recommended by 10x Genomics, in order to maintain nucleotide diversity and prevent batch effects during the sequencing process. When performing integration/aggregation of all sample gene expression libraries using the Cell Ranger aggregation (aggr) pipeline, we performed sequencing depth normalization between all samples. Cell Ranger does this by equalizing the average read depth per cell between groups before merging all sample libraries and counts together. This is a default setting in the Cell Ranger aggr pipeline, and this approach avoids artifacts that may be introduced due to differences in sequencing depth. Thus, we are confident that changes we observed in gene expression and cell type populations are due to biological differences and not technical variability. Below we have provided a tSNE plot showing clustering of all 12 samples after we performed integration:

      We updated old Fig.7 (now Fig. 6) and included a representative tSNE plot for each group. We also updated the tSNE plot for Figure 5-figure supplement 2C (previously S2C) showing overall clustering amongst all groups. The largest population differences occurred in the fibroblast population and these population differences were largely due to sex differences. Because we are confident that integration was performed appropriately and that batch effects were controlled for, we believe these sex differences are a biological effect.

      6) LncRNA Gm47283 is identified as a gene that is differentially expressed by genotype in HFD females (Fig. 7G); however, according to Ensembl this gene is encoded on the Y-chromosome (https://uswest.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000096768;r=Y:90796007-90827734). The authors should use the RELMα genotype and sex chromosomally-encoded genes to confirm that their multiplexing was appropriate.

      We agree with the reviewer that it is crucial to confirm that multiplexing and all subsequent analyses are performed correctly. Comparison between males and females contains internal controls that increase confidence, such as Xist gene that is expressed only in females, and Ddx3y that is located on the Y chromosome. LncRNA, Gm47283 is located in the syntenic region of Y chromosome and is also present in females, annotated as Gm21887 located in the syntenic region of the X chromosome. It also has 100% alignment with Gm55594 on X chromosome. Additionally, it is also referred to erythroid differentiation regulator 1 (Erd1), x or y depending on the chromosome, although NCBI database specifies partial assembly and incomplete annotation. Therefore, this explains why we see expression of this gene in females. We have discussed this in the text. We revised the text to refer to this LncRNA as Gm47283/Gm21887 to prevent further confusion. The RELMalpha genotype (absence in the KO) was also confirmed. Last, the PC analysis (see Fig 5) supports clustering by group.

      7) For Fig. 8, samples should be co-clustered and integrated across groups before performing trajectory analysis to allow for direct comparisons between groups.

      We appreciate the valuable feedback and suggestions, which have been helpful in clarifying the trajectory analysis, which we have done as follows:

      Regarding the co-clustering and integration of our samples across groups, here is the explanation of our trajectory analysis approach. We have co-clustered all of our samples using the align_cds function from the Monocle3 package. We have included the code for Figure 8 in our Github repository at https://github.com/rrugg002/Sexual-dimorphism-in-obesity-is-governed-by-RELM-regulation-of-adipose-macrophages-and-eosinophils/blob/main/Figure8.R. Specifically, lines 138, 166, 196 and 225 of the code indicate that the align_cds function was used to cluster our samples by "Sample.ID".

      The align_cds function in Monocle3 can be used to co-cluster all samples in a single-cell RNA-seq experiment by aligning coding sequences (CDS) across different cell types or conditions. The align_cds function takes a set of reference CDS sequences and single-cell RNA-seq reads and identifies the CDS sequences within each read, allowing the identification of differentially expressed genes across different cell types or conditions based on the aligned CDS sequences. More details about align_cds can be found here https://rdrr.io/github/cole-trapnell-lab/monocle3/man/align_cds.html .

      We hope that this additional information alleviates the reviewer’s concerns.

      8) Since the experiments presented in this report were from young mice using a single diet intervention, the authors should comment on how age and other obesogenic diets may impact the results found here. Also, the authors should expand their discussion as to what upstream regulators (i.e., hormones or genetics) may be driving the sex differences in RELMα expression in response to HFD.

      We thank the reviewer for the suggestion. We included several sentences to address this comment. However, since reviewers commented that some of the text needs to be trimmed down, extensive discussion regarding reasons for sex differences, which are numerous, are outside the scope of this manuscript. For example, sex differences can arise from all or any of these:

      1. Sex steroid hormones (estrogen and testosterone) are an obvious possibility for sex differences and this discussion has been included below and in the text.

      2. Sex differences we observe may stem from variety of other factors, besides ovarian estrogen; including extraovarian estrogen, primarily estrogen produced in adipose tissues (32119876).

      3. Sex differences exist in fat deposition, which may or may not be estrogen dependent (25578600, 21834845).

      4. Sex difference were determined in metabolic rate and oxidative phosphorylation, which may also be independent of estrogen (28650095, and reviewed in 26339468).

      5. Sex differences exist in the immune system, some of which are estrogen independent, but dependent on sex chromosomes (32193609).

      6. Sex differences particularly in myeloid lineage, which may also be estrogen independent (25869128).

      7. Sex differences were determined in adipokine levels, including leptin and adiponectin, which influence immune cells in adipose tissues (33268480).

      The role of estrogen is not clear either, and thus extensive discussion is not possible. Numerous studies demonstrated that estrogen is protective from inflammation, thus it is possible that estrogen drives some of the sex differences observed herein. However, several studies determined that estrogen can be pro-inflammatory (20554954, 15879140, 18523261). Previous publications by us (30254630, 33268480) and others (25869128) demonstrated intrinsic sex differences in immune system, that are maybe dependent on sex chromosome complement and/or Xist expression (34103397, 30671059).

      Studies are more consistent that estrogen is protective from weight gain: postmenopausal women with diminished estrogen, and ovariectomized animal models gain weight. The effects of ovariectomy on weight gain and its additive effects with high fat diet were reported in Rhesus monkeys (for example PMID: 2663699; and PMID: 16421340); and in rodents (PMID: 7349433).

      The reviewer is correct that the effects of aging or estrogen on RELMa levels would be of significant interest, and could be a future direction of our studies. Aging-mediated increase in inflammation (including of adipose tissue, recently reviewed in 36875140), that may be dependent on estrogen, can exacerbate obesity-mediated inflammation. We have added this discussion.

      For these reasons we limited our discussion regarding possible differences and stated this in the discussion: “Several studies demonstrated the protective role of estrogen in obesity-mediated inflammation and in weight gain, as discussed above. Whether estrogen protection occurs via estrogen regulation of RELMa levels is a focus of our future studies. Alternatively, intrinsic sex differences in immune system have been demonstrated as well (30254630, 33268480, 25869128) that are dependent on sex chromosome complement and/or Xist expression (34103397, 30671059), and RELMa may be regulated by these as well. Additionally, ageing-mediated increase in inflammation (including of adipose tissue, recently reviewed in 36875140), may also occur via changes in RELMa levels. Our studies used young but developmentally mature mice (4-6 weeks old when placed on diet, 18 weeks old at sacrifice), and future work on aged mice would be needed to investigate aging-mediated inflammation. Furthermore, there are sex differences in fat deposition, metabolic rates and oxidative phosphorylation (reviewed in 26339468), and adipokine expression (Coss) that regulate cytokine and chemokines levels, and therefore may regulate levels of RELMa as well. These possibilities will be addressed in future studies.”

    1. Author Response

      Reviewer #2 (Public Review):

      Yamaguchi et al. studied the roles of two proteins, Calaxin and Armc4, in the assembly of the outer arm dynein (OAD) docking complex (DC). By combination of the improved cryo-ET analysis and gene knockout zebrafish lacking each of these proteins, they found that Armc4 plays a critical role in the docking of OAD and that Calaxin stabilizes the molecular interaction in the docking.They further showed an evidence that Calaxin changes the conformation of another compartment of DC comprising CCDC151/114. This new information provides an important basis for understanding how the DC is assembled and regulates docking of OAD. The authors' conclusion is well supported by the data but some data presentation and discussion need to be completed.

      Gui et al. (2021) already reported on a cryo-EM observation in bovine tracheal cilia, with the conclusion similar to this paper in the structure of OAD/DC on DMT. Using knockout zebrafish strain, the authors present detailed interaction of calaxin with other DC components. They show that the binding of calaxin induces the changes of conformation in N-terminal region of CCDC151/114. The conformation further changes in the presence of Ca2+; specific conformation of N-terminal region of CCDC151/114 becomes undetectable, instead additional structure appears in the vicinity of calaxin.

      1) The authors conclude that the Ca2+-dependent conformational change of DC is subtle and not dynamic. This result is eventually valuable information but may be somewhat unexpected from the point of view that calaxin plays an important role in the regulation of flagellar motility in Ciona sperm. The authors found that calaxin changes the conformation of N-terminal CCDC151/114 region but the core dynein structure shows no dynamic change. What about the changes in the interaction between calaxin, core dynein, and DMT? Is this beyond the resolution of cryo-ET analysis?

      Since Mizuno et al., 2009 reported that Ciona Calaxin switches its interactor depending on Ca2+ concentration, it is highly expected that zebrafish Calaxin also changes its interactor in 1 mM Ca2+ buffer conditions. However, the resolution of our cryo-ET data was insufficient to detect the change of Calaxin interactors. More detailed structural analyses are required to understand the OAD structures in the Ca2+ buffer conditions. We discussed this point as follows:

      (line 389-395)

      Regarding the Calaxin conformation, a previous biochemical analysis reported that Ciona Calaxin switches its interactor depending on Ca2+: β-tubulin at lower Ca2+ concentration and OAD γ-HC at higher Ca2+ concentration (Mizuno et al., 2009). Moreover, a crystal structure analysis revealed the conformational transition of Ciona Calaxin toward the closed state by Ca2+-binding (Shojima et al., 2018). In this study, however, such conformation change of Calaxin was not detected, probably due to insufficient resolution of our cryo-ET analysis. More detailed structural analyses in the Ca2+ condition are required to understand the mechanism of the Ca2+-dependent OAD regulation.

      2) It would be very helpful if the authors could add the cryo-ET images of calaxin-/- axoneme in the presence of 1 mM EGTA in Figure 7. Although these images are thought to be similar or identical to Figure 4F, it would help to confirm that the conformational changes in CCDC151/114 and additional part of DC are induced in a Ca2+-dependent manner.

      We added the cryo-ET images of calaxin-/- OAD-DC (1 mM EGTA) in Figure 7D.

      3) To clarify the molecular interaction of calaxin with other components, it would also be helpful if the authors add the images rotated 80 degree to Figure 4F and G, in similar way in Figure 7.

      We added the images of OADs rotated 80 degrees in Figure 4F and G.

      4) Despite the molecular phylogenetic difference, there are several similarities between calaxin and Chlamydomonas DC3, not only in the in situ structure and configuration but in the phenotype of mutants; Chlamydomonas mutant lacking DC3 shows OAD loss in the distal part of a flagellum (Casey et al, MBC, 2003). It may be a good reference if the authors add the position of DC3 in Figure 4. A', B', and C.

      To answer this comment, we created Figure 4—figure supplement 1, which shows the cryo-ET structures and models of OAD-DCs in vertebrates and Chlamydomonas.

      5) There is a significant difference in sperm motility between WT and calaxin-/- or WT and armc4-/- (Figure 2E). However, it is not clear whether immotile sperm were included in the data for VAP (Figure 2F) or BCF (Figure 2G). For example, WT and calaxin-/- show similar VAP, although both are significantly different in the percent of motile sperm.

      In our CASA study, spermatozoa with less than 20 μm/s velocities were considered immotile and excluded from the data for VAP (Figure 2F) and BCF (Figure 2G). To clarify this point, we revised the manuscript as follows:

      before

      Swimming velocity and beating frequency were calculated from the trajectories of the motile spermatozoa (Figure 2F-G; Figure 2—figure supplement 1; Video3).

      after (line 139-141)

      Swimming velocity (VAP) and beating frequency (BCF) were calculated from the trajectories of the motile spermatozoa, which have 20 μm/s or more velocities (Figure 2F-G; Figure 2—figure supplement 2; Video3).

      6) In calaxin-/- zebrafish, OAD was clearly detected from the base to two-thirds of a flagellum with unclear border (Figure 2A). Typical distribution of OAD+class and OAD-class are shown in Figure 5 in the ~3 micrometer tomograms. Were these taken from around this unclear border? Are proximal most region of a flagellum occupied with OAD+class only? The authors should clearly indicate the region of a flagellum where the tomograms in Figure 5C and D were selected.

      7) Line 229~: It is not clear what the authors meant by "probably reflecting the different distance from the sperm head". In relation to this and the comment 6, does the "proximal" in the sentence "OAD loss occurred even in the proximal part of the flagella" (line 232) indicate the region near the base of a flagellum?

      In general, axonemes are tangled on the cryo-TEM grids, which makes it difficult to identify the ends of all axonemes, especially for the long zebrafish sperm flagella. Thus, we could not clarify the region of a flagellum about the tomograms shown in Figure 5D.

      However, to answer comments (6) and (7), we created Figure 5—figure supplement 1. In this experiment, we newly generated cryo-TEM grids with sparse sperm axonemes and succeeded in finding two areas containing clear axonemal ends with suitable ice conditions for cryo-ET observations (Figure 5—figure supplement 1B). The polarity of the axonemes was judged from the 3D-reconstructed structures of the axonemes (Figure 5—figure supplement 1B, red dotted lines). By the structural classification of OAD+ class and OAD- class in the tomograms, we confirmed the OAD loss in calaxin-/- even in the proximal part of the flagella, which is near the base of a flagellum (Figure 5—figure supplement 1D, (a) and (c)). To clarify these points, we revised the manuscript as follows:

      before

      In calaxin-/-, the ratio of OAD+ class to OAD- class varied among tomograms (Figure 5D), probably reflecting the different distance from the sperm head. However, all calaxin-/- tomograms showed multiple clusters of OAD- class, indicating that the OAD loss occurred even in the proximal part of the flagella.

      after (line 236-239)

      In calaxin-/-, the ratio of OAD+ class to OAD- class varied among tomograms (Figure 5D), reflecting the different distances from the sperm head. Analysis of detailed OAD distributions along calaxin-/- axoneme revealed that OAD loss occurred even in the proximal part of the flagella (Figure 5—figure supplement 1D).

      8) In conjugation with comment 7, it would be appreciated to show an authors' idea on why distal region of flagella tends to lack calaxin, if they do not discuss anywhere in the text.

      We discussed this point as follows:

      (line 316-323)

      calaxin-/- spermatozoa exhibited a unique OAD distribution, with OAD-missing clusters at various regions of the flagella. Interestingly, OADs decreased gradually toward the distal end, by which the mechanism is unclear. The axoneme is elongated by adding flagellar components to its distal end during ciliogenesis (Johnson & Rosenbaum, 1992). IFT88, a component of the IFT machinery, disappears as the spermatozoa mature (San Agustin et al., 2015). Thus, we speculate that the OAD supply at the distal sperm axoneme is insufficient to compensate for the OAD dissociation in the calaxin-/-. Consistent with this idea, distal OAD loss is the sperm-specific phenotype, as olfactory epithelial cells in calaxin-/- have Dnah8 along the entire length of the cilia (Figure 6B).

      9) Immunofluorescence in twister-/- epithelial cilia showed that the localization of calaxin is independent of OAD (line 271-274). Based on the authors' finding, the localization of calaxin requires Armc4, which is preassembled with calaxin in the cytoplasm. If this is true and the localization of calaxin is NOT resulting from diffusion, Armc4 must be localized with calaxin along the entire length of cilia in twister-/- epithelial cilia (Figure 6D). Although Armc4 is shown localized in cryo-ET images (e.g. Figure 1, Figure 7), authors may provide the immunofluorescence of Armc4 along the entire length of sperm flagella and epithelial cilia.

      To answer this comment, we obtained a commercially available anti-ARMC4 (human) antibody and checked the cross-reactivity of the antibody against zebrafish Armc4, but no signal was detected in our western blot analysis. Thus, we could not assess the localization of zebrafish Armc4 in twister-/- epithelial cilia.

      In our study, we found an ectopic accumulation of Calaxin at the ciliary base in armc4-/- cells (Figure 6C, white arrowheads). The small molecular weight of Calaxin (~25 kDa) suggests the possible diffusional entry of Calaxin into the ciliary compartment. However, in armc4-/- cells, Calaxin accumulated at the ciliary base, strongly suggesting that Calaxin requires Armc4 to be localized to cilia.

      Reviewer #3 (Public Review):

      ODA-DC anchors ODA, the main force generator of ciliary beating, onto the doublet microtubules. Vertebrate ODA-DC contains 5 proteins, including Calaxin and Armc4, whose mutations are associated with defective ciliary motility in animals and human. By generating calaxin-/- and armc4-/- knockout zebrafish lines, this manuscript examined the Kupffer's vesicle cilia and spermatozoa. They showed that calaxin-/- and armc4-/- knockouts both affect ciliary motility but to different degrees. The authors conducted careful structural analyses using cryo-ET and subtomo averaging on both mutants, revealing a partial loss of ODA in calaxin-/- and a complete loss of ODA in armc4-/-. I really like the distribution analysis of calaxin-/- OADs (Figure 5), which emphasizes the strength of cryo-ET in uncovering the molecule distribution of distinct conformational states in situ. Fitting of the atomic models of ODA and ODA-DC into the cryo-ET density maps and Calaxin rescue experiments showed how Calaxin stabilizes ODA at a molecular detail. By using olfactory epithelium, the authors also presented the possible assembly mechanism of ODA-DC proteins, which is also a beautiful experiment. Finally, the authors also investigated how Ca2+ regulate the ODA-DC using cryo-ET.

      The thorough structural and functional analyses of Calaxin and Armc4 in WT and gene KO animals could serve as a reference for future study of the detailed function of other ciliary proteins. The experiments are overall well designed and conducted, but some aspects need to be clarified and improved.

      The authors interpret the vertebrate ODC-DC to include four linkers (line 193). However, the authors also said that loss of one linker (Calaxin) makes ODA to attach on the DMT through two linkers (line 199 and 246). These descriptions are confusing. It would make more sense to interpret the vertebrate ODC-DC as containing three linkers (CCDC151/114, Armc4/TTC25, Calaxin).

      This comment is reasonable because vertebrate OAD is tethered to DMT through three linker structures (the distal CCDC151/114, Armc4/TTC25, and Calaxin). However, vertebrate DC is composed of four parts (a) Calaxin, (b) the Armc4-TTC25 complex, (c) the proximal CCDC151/114, and (d) the distal CCDC151/114 (Figure 4E). The (c) part is embedded in the cleft between protofilaments A07 and A08. To clarify this point, we revised the manuscript as follows:

      before

      The bovine DC model shows that vertebrate DC is composed of four linker structures: (a) Calaxin, (b) the Armc4-TTC25 complex, (c) the proximal CCDC151/114, and (d) the distal CCDC151/114 (Figure 4E).

      after (line 196-200)

      The bovine DC model shows that vertebrate DC is composed of four parts: (a) Calaxin, (b) the Armc4-TTC25 complex, (c) the proximal CCDC151/114, and (d) the distal CCDC151/114 (Figure 4E). Among the four parts, three (a, b, and d) work as linkers between OAD and DMT, while (c) the proximal CCDC151/114 is embedded in the cleft between protofilaments of the DMT.

      To confirm whether Calaxin directly interacts with β-tubulin (line 213), a control experiment could be needed by incubating WT axoneme with mEGFP-Calaxin followed by IF imaging.

      In our manuscript, we wrote as follows:

      (line 218-224)

      To assess the specificity of Calaxin binding, we also performed a rescue experiment with mEGFP-Calaxin (Figure 4H-I; Figure 4—figure supplement 2). Ciona Calaxin was reported to interact with β-tubulin (Mizuno et al., 2009), suggesting the possible binding of Calaxin along the entire length of the axoneme. However, the rescued axonemes showed partial loss of EGFP signal (Figure 4H, white arrowheads). This pattern resembled the OAD localization of calaxin-/- in immunofluorescence microscopy, suggesting the preferential binding of Calaxin to the remaining OAD-DC. mEGFP alone showed no interaction with the axoneme (Figure 4H, asterisk).

      Therefore, our manuscript is NOT intended to support or deny the interaction between Calaxin and β-tubulin, which was reported by Mizuno et al., 2009. Instead, we focused on the interaction between Calaxin and OAD-DC, revealing that Calaxin binds to Calaxin-deficient OAD-DC (Figure 4G, H, and I). Thus, we assume this comment refers to the interaction between Calaxin and OAD-DC.

      To further discuss the interaction between Calaxin and OAD-DC, we created Figure 4—figure supplement 2. We tested Calaxin’s interaction by incubating recombinant mEGFP-Calaxin with sperm axonemes of calaxin-/-, armc4-/- (representing OAD-missing DMT), and WT (representing DMT with Calaxin and OAD). The localization of mEGFP-Calaxin was assessed by fluorescence microscopy of mEGFP signals. In calaxin-/-, mEGFP-Calaxin was bound to the limited region of the axoneme, with the partial loss of EGFP signals (Figure 4—figure supplement 2A, white arrowheads), consistent with Figure 4H. On the other hand, mEGFP-Calaxin showed no significant interaction with armc4-/- axoneme (Figure 4—figure supplement 2B) or WT axoneme (Figure 4—figure supplement 2C). These data show the preferential binding of Calaxin to the Calaxin-deficient OAD-DC than OAD-missing DMT or WT OAD. Although Mizuno et al., 2009 reported the interaction between Calaxin and β-tubulin, our analysis could not detect the signals for such interaction, probably due to the different binding affinity of Calaxin against OAD-DC and β-tubulin.

      The Immunoblotting experiment should be improved in Figure 5E. Could the authors get the same results in repeating experiments? Why is the Dnah8 signal higher in 50 mM NaCl of the (+)Calaxin group compared to that in 0 NaCl? This makes me doubt if the difference between (-)Calaxin and (+)Calaxin groups are significant.

      This comment is reasonable because NaCl concentration-dependent detachment of OAD-DMT suggests the highest Dnah8 signal in 0 mM NaCl of the (+)Calaxin group. To discuss this point, we created Figure 5—figure supplement 2, which shows the experimental replication of the immunoblot analysis in Figure 5E. In this experiment, we used calaxin-/- sperm axonemes collected independently of the Figure 5E data.

      However, again, the Dnah8 signal was higher in 50 mM NaCl of the (+)Calaxin group than that in 0 mM NaCl, confirming the result in Figure 5E. One possible explanation for this result is that the NaCl concentration affects the rescue efficiency of the Calaxin protein. We speculate that the Calaxin protein requires NaCl for efficient binding to OAD-DC, which caused the lower amount of OAD in 0 mM NaCl of the (+)Calaxin group compared to that in 50 mM NaCl.

      The authors have covered several important points in the Discussion section. Now that the function of Calaxin in both mouse and zebrafish have been reported, the authors could discuss the similarity and difference of Calaxin function in different species and tissues.

      To discuss this point, we inserted the following paragraph:

      (line 324-333)

      In mouse Calaxin-/- mutant, motile cilia in various organs (sperm flagella, tracheal cilia, and brain cilia) showed abnormal motilities, although OADs in the mutant cilia/flagella seemed mostly intact when observed by conventional transmission electron microscopy (Sasaki et al., 2019). In our study, however, we revealed that mutation of zebrafish calaxin caused OAD-missing clusters at various regions of the flagella, by using detailed cryo-ET analysis and immunofluorescence microscopy. Thus, we speculate that the same OAD defects to zebrafish calaxin-/- caused abnormal ciliary motilities in mouse Calaxin-/- mutant. One exception is the mouse nodal cilia. In mouse Calaxin-/- mutant, the formation of nodal cilia was significantly disrupted (Sasaki et al., 2019). On the other hand, zebrafish calaxin-/- mutant showed the normal formation of Kupffer’s vesicle cilia (orthologous to the mouse nodal cilia), suggesting the tissue-specific function of Calaxin on the ciliary formation.

      Because of the limited resolution, the authors should be more careful when interpreting the small densities in the difference map, for example, in Figure 4F-G black arrows. Considering that the CCDC151/114 coiled coil is overall poorly resolved both in the WT and mutant cryo-ET maps, the different densities could be due to different map quality or data processing. This makes the following statement suspicious "This structure corresponds to the N-terminus region of CCDC151/114, suggesting that Calaxin affects the conformation of neighboring DC components".

      This comment is reasonable because the resolution of our cryo-ET data was insufficient to identify each molecule in the cryo-ET map. To be more careful about the interpretation of our cryo-ET structures, we revised the manuscript as follows:

      before

      However, the difference map also showed an additional missing structure adjacent to Calaxin (Figure 4F’, black arrowhead). This structure corresponds to the N-terminus region of CCDC151/114, suggesting that Calaxin affects the conformation of neighboring DC components.

      after (line 207-210)

      However, the difference map also showed an additional missing structure adjacent to Calaxin (Figure 4F’, black arrowhead). When fitting the bovine DC model, this structure overlapped the N-terminus region of CCDC151/114, indicating that Calaxin can affect the conformation of neighboring DC components.

      To discuss the map quality and data processing of our cryo-ET analysis, we summarized the following points that can support the confidence of our data:

      (1) Two independent experiments showed the same results of OAD-DC structures, suggesting that the small changes in DC conformations were not due to different map quality or data processing:

      (a) For OAD structures in 1 mM EGTA condition, we analyzed the WT OAD (Figure 4D) and the calaxin-/- OAD rescued with recombinant Calaxin (Figure 4G). These samples were prepared in completely independent processes. However, in both cases, the small densities overlapping the N-terminus region of CCDC151/114 were visualized adjacent to Calaxin (Figure 4D and G, black arrowhead).

      (b) For OAD structures in 1 mM Ca2+ condition, we analyzed the WT OAD (Figure 7B) and the calaxin-/- OAD rescued with recombinant Calaxin (Figure 7C). These samples were prepared in completely independent processes. However, in both cases, the small densities overlapping the N-terminus region of CCDC151/114 were not observed. Instead, the additional densities appeared around DC (Figure 7B and C, white arrowheads).

      (2) We assessed the statistical significance of the changes in DC conformations. We applied Student’s t-test for WT and calaxin-/- OAD-DC structures and created Figure 7—figure supplement 1. p-values of each voxel were calculated as described in Oda & Kikkawa, 2013. The isosurface threshold of p-values corresponds to 0.05% probability in one-tailed test. p-value maps indicate not only Calaxin structures but also the adjacent small density (Figure 7—figure supplement 1A, black arrowhead) and the additional density around DC (Figure 7—figure supplement 1B, white arrowheads) as the statistically significant difference between WT and calaxin-/- OAD-DC.

    1. Author Response

      Reviewer #1 (Public Review):

      This project aimed to understand if decision making impairments commonly observed in older adults arise from working memory (WM) or reinforcement learning (RL) deficits. Evidence in the paper suggests it is the former; they observe poorer task accuracy in older adults that is accompanied by a faster memory decay in older adults using a novel hierarchical instantiation of a previously validated computational model. There were no similar changes in RL in this model. These results are extended using Magnetic Resonance Spectroscopy (MRS) to measure glutamate and GABA levels in striatum, prefrontal and parietal regions. They found that impairments in working memory were linked to reductions of glutamate in PFC, particularly in the older adult group.

      The task employed is elegant and has been studied extensively in different populations and is well-validated (though here a hierarchical Bayesian extension is developed and validated). The results however may not be definitive in some respects; the paper did not replicate previously observed RL deficits. It therefore, remains possible that this is due to the sensitivity of the task to this RL component in ageing and future work is needed to fully bridge the gap in the literature.

      Thank you for the comment. If our understanding of the comment is correct, our results suggesting no impairments in the RL system conflict with previously observed RL deficits in older adults. In the introduction section, we discuss previous literature on RL deficits in old adults which yields largely mixed conclusions, wherein some experiments show RL impairments (Frank and Kong, 2008; Hämmerer et al., 2011; Samanez-Larkin et. al, 2014) and some do not (Grogan et al., 2019; Radulescu et al., 2016). Placing our experiment in the context of these mixed results, we aimed to use a task that addresses these inconsistencies, by reasoning that commonly used RL tasks and models do not account for additional processes that may contribute to learning (e.g. executive function/WM/attention), hence explaining why sometimes the deficits are observed and sometimes they are not. We can also point to our model parameter recovery (Appendix 1 - Figure 9), where we show that RL model parameters (e.g. learning rate) are successfully recovered - indicating that our model is sensitive to RL variability in participants, but we observe no differences split across age groups.

      Although the study is well-executed, there is an obvious limitation in the use of a cross-sectional design to address this question. The authors acknowledge this limitation in the discussion but could go further to highlight the potential confound of cohort effects on gaming, RL and WM tasks more generally. Without within-person change data, the evidence can only be suggestive of potential age-related decline. For this reason, it may be more appropriate to use the terminology "age-related differences' rather than "age-related declines" given the study design.

      Thank you for the comment. We have attempted to address the cohort effects by administering RBANS to old and young participants. Age-normed total RBANS (Randolph et al., 1998) scores were similar in both age groups (described in the first paragraph of the results section), which we took to suggest that our cohorts reflected comparable samples of the population with respect to overall cognitive ability. In addition, we show that certain aspects of performance (e.g. accuracy) decline within the group of older adults, and not just between the two groups, which would constitute an argument against cohort-based effects. We now elaborate further on the point of cross-sectional design in the discussion section on lines 410-417. As suggested by the reviewer, we have also adjusted the language throughout the manuscript to imply age-related differences instead of age-related decline.

      Reviewer #2 (Public Review):

      In this study, Rmus and colleagues contribute to the important open question of whether reinforcement learning deficits observed in older adults are due to impairments in basic learning processes, or can be attributed to a decline in working memory function. The authors present cross-sectional behavioral data from a task designed to assess the role of working memory in reinforcement learning. And they use computational modeling in conjunction with MR spectroscopy to demonstrate a relationship between prefrontal glutamate and age-related impairments in learning specific to working memory decay. I found the overall story compelling, the data novel, and the analysis carefully executed. Below I outline some areas in which the claims of the manuscript could be strengthened.

      1) I may have missed this, but does glutamate correlate with other model parameters? Or did the authors only focus on the WM parameters because of the age difference? In support of the specificity argument, it would be important to show that glutamate only predicts WM related parameters regardless of whether there was an age difference or not.

      Thank you for your suggestion. In Appendix 1-figure 7, we show correlations between glutamate and all model parameters. If glutamate captured impairments in RL computational processes, we would expect to see a correlation between glutamate and the learning rate. Below we show that glutamate does positively correlate with RL learning rate. However, there are parameter correlations within the model itself – making the direct correlations hard to interpret.To better understand the relationships between learning rate, working memory, and glutamate, we ran a model predicting MFG glutamate using all parameters that significantly correlated with MFG glutamate (MFG glutamate ~ 1 + learning rate + decay + omega3 + negative learning rate), and found that only WM decay predicted MFG glutamate when controlling for other factors (learning rate: t = -0.42, β = -.03, p =0.67; WM decay: t = -3.14, β = -0.30, p = .002; omega3: t = 1.84, β = .16, p = .07; negative learning rate: t = .56, β = .03, p = .57). Thus, while glutamate measures correlate with RL learning rate, these correlations seem to be driven by the fact that both glutamate and RL learning rate correlate with WM Decay. Note that negative learning rate influences both RL and WM processes’ updating (see computational modeling section), and thus cannot help us make claims about specificity of RL or WM mechanisms alone being related to glutamate.

      2) As it is somewhat common with these tasks, it seems like the model does not fully capture the performance deficit in OA (Fig. 2B), even when all the individual difference parameters in WM are allowed to vary. Can the authors say more about the discrepancy? This is an interesting datapoint which may give clues to mechanism.

      Thank you for your comment. We elaborated on this in detail in the Appendix 1 (Posterior predictive checks section). We have observed that in some blocks (particularly in ns=6 blocks), older adults only learned a correct response for a subset of the presented stimuli, and neglected to learn responses to other stimuli altogether. We have interpreted this as a possible strategy older adults used to reduce the difficulty of the ns=6 condition. This would explain the discrepancy between the data and the model predictions, as the model has no way of accounting for stimulus identity effects on learning (since the model predicts similar performance for all the stimuli). To test our reasoning, we have fit the model to a subset of data - excluding participants who have implemented this strategy, and predicted that this should reduce the model misfit. We found that this is indeed the case (Appendix 1 - Figure 4). This confirms that strategic prioritization of stimuli in some older adults negatively affected the fit of the model. While we believe that a better understanding of these contaminant response patterns in the RL-WM model is worthy of further investigation, we feel that it is beyond the scope of this paper, and might require task designs with even higher set sizes to elicit the strategic stimulus prioritization more robustly. We have now added a paragraph in the discussion to discuss this issue.

      3) Relatedly, it may not be possible with these data alone, but can authors discuss what the WM decay parameter captures? In particular for OA, the distinction between generating and maintaining a "task set" has been extensively written about. Older adults tend to have difficulty internally generating and flexibly deploying task sets, but somewhat paradoxically can perform better than YA in certain decision situations (e.g. when reward is dependent on previous choices, see Worthy et. Al. 2011). The task in this study necessarily pushes OA in a regime in which relying on familiar decision strategies is sub-optimal, and task sets must be continuously generated. Is there a type of intervention do authors expect would reverse the observed deficit in WM?

      In the RLWM model, WM stores stimulus-action-outcome weights. Using WM decay we can gradually reduce the stimulus-dependent weights on each trial where the stimulus is not observed (e.g. forgetting). These weights, therefore, get reduced with the rate of decay, by being pulled towards the uniform/uninformative values (1/nA, where nA is the number of actions) they were initialized to. It effectively captures forgetting of information with increased time delays (here time = number of intervening trials between successive stimulus presentations where the stimulus is not observed). It is possible that older adults might be prioritizing storage of different types of (irrelevant) task information (e.g. category of stimuli, or relationships between the stimuli), resulting in a tradeoff that might lead to faster decay in older adults, and that the younger adults neglect such information. This could also explain discrepancies between our model and older adults described above, as the model does not hold any assumptions about how stimulus identities might impact task performance strategy. If this was the case, if probed about such task-irrelevant prioritized information older adults could potentially perform better than younger adults (in a way that in the Worthy et al. (2011) paper the older adults perform better on a choice dependent task compared to younger adults). We are unable to test this idea in our dataset, but we believe that it could be a promising avenue for future research.

      4) There is a wealth of evidence suggesting striatal DA loss in older adults, which served as the basis for many of the original investigations and hypotheses regarding a simple RL deficit in OA (e.g. work by Shu-Chen Li and others). While the authors do not directly measure DA in this study, it would be helpful to place the results in the context of that literature.

      Thank you for pointing this out. In the introduction, we have discussed the mixed results from research on RL/dopamine deficits in older adults. Some of the literature suggests no impairments in striatal dopamine in older adults (Samanez-Larkin et. al, 2014; Bäckman et al., 2006), while some suggests absence of impairments (Grogan et al., 2019). Furthermore, while DA is important for RL updating, it is also potentially important for WM updating (O’Reilly and Frank, 2006), therefore a potential DA loss could affect both RL and WM, and not RL exclusively. Prior research also suggests that although correlative relationship between DA and cognitive functions has been recorded, the extent of generality/specificity of the effects of DA on cognition in aging (Bäckman et al., 2006), compared to resulting noise that impairs cognition (Li et al.,2001) should be studied more extensively in the future. We have not focused on dopamine in the study, but have now added a paragraph in the discussion section to address this on lines 402-407.

      5) Finally, the main argument of the paper as I read it is that PFC glutamate mediates the performance deficits observed in RL because it reflects a compromised WM system. Sample size permitting, it would be helpful to see a formal test of this mediation relationship.

      As highlighted in the response to the mediation point in essential revisions, we observe that glutamate mediates effect of WM on task performance, but that this mediation approach might be difficult to justify, due to WM decay and task performance having shared signal and noise (since WM decay is estimated from task performance). We have now included the mediation analysis in our Appendix 1 information and provided a conservative interpretation of it in the results section.

      Reviewer #3 (Public Review):

      Aging impacts many cognitive functions, and how these changes affect performance in different tasks is an important question. By testing 42 older and 36 younger healthy adults with a novel learning task and MR spectroscopy, Rmus et al addressed the important question whether age-related declines in learning are driven by WM, or by deficiencies of the RL system. The task varied the role of working memory in learning by asking participants to learn about either 3 or 6 stimulus response associations from feedback (set sizes 3 and 6). The paper combines a detailed computational account of participants behaviour and striatal and prefrontal/parietal MR spectroscopy in order to assess individual glutamate and GABA levels.

      The authors report an effect of set-size on learning in both are groups, and show that participant age is associated with (1) worse accuracy, (2) a larger set size performance difference, and (3) a heightened sensitivity to reward. Computational modeling showed that working memory decay differed between age groups, but that reliance on WM to perform the task at hand was similar in both age groups (similarly differing between conditions in both groups). Turning to the MRS results, the paper shows that an aggregate measure of glutamate relates to aggregate task performance, that prefrontal glutamate specifically relates to WM decay observed in the task, and that age was negatively associated with glutamate levels.

      While the paper is well worth reading and offers many interesting data points, the title's suggestion that "Age-related decline in prefrontal glutamate predicts failure to efficiently deploy working memory in working memory" is, in my opinion, not fully supported by the evidence. First, the authors don't report clear evidence for any age-related differences in WM reliance in the task overall. Second, the authors find that MFG glutamate relates significantly only to WM decay, not the parameter that captures WM deployment. Third, correlations don't imply predictive relations.

      We apologize for the lack of clarity in our wording. We agree that the title of the paper implies that the reliance on WM parameter differentiates older and young adults, while the results show that the difference is mostly captured by the WM decay parameter. We meant to communicate that the age-difference seems to be particularly rooted in the WM, but have chosen misleading/confusing words. We have proposed changing the title of the manuscript to “Age-related differences in prefrontal glutamate are associated with increased working memory decay that gives appearance of learning deficits” to minimize confusion. With regards to your last point, as outlined in our response to essential revisions, we agree that we should modify the language used in our manuscript to be more consistent with the associative rather than predictive nature of our results.

      Another important open question relates to the relatively large age difference in the effect of set-size on performance. The authors write that working memory will contribute less to performance in higher set size conditions. Yet, age differences are largest in the set size 6 condition, suggesting that RL-dependent learning is most severely impaired in learning (set size 6 performance), rather than WM dependent learning (set size 3 performance). Finally, a statistically significant age difference in reward sensitivity seems to be hardly integrated into the authors' overall interpretation.

      Working memory does contribute less in higher set-size condition; however, given the higher number of items, the delays between successive presentations of the stimuli in the high set-size condition are on average longer - which makes the effect of WM forgetting more pronounced. Furthermore, a WM impairment can have an indirect effect in RL, in that frequent failure to select correct action through WM leads to reduced ability to train RL on encoding correct responses (especially earlier in training, when the incremental RL hasn’t ‘caught up’ yet), and thus worse performance overall. As such, a larger effect of set size could potentially be indicative of either or both WM or RL process deficits. This most clearly underscores the importance of modeling - these complex interactions are difficult to intuit, but modeling allows us to establish cleaner mechanistic explanations of observed behavioral patterns/group performance deficits (e.g. while on the surface impairment might look to be RL driven, it is actually better explained by a WM parameter, such as WM decay in older adults - this can). With regards to reward sensitivity, the same explanation applies - there are multiple mechanisms through which differences in reward sensitivity could occur (e.g. slower learning rate, or increased RL recruitment due to failure of WM), which further emphasizes the need for modeling.

      In short, in a complex task, there are often multiple ways to explain the same qualitative feature and here we have leaned on computational modeling to identify the computational elements that differed across groups. However we have now also simulated data from our computational models using posterior predictive checks to show that they can reproduce core descriptive features of the original data, including those noted above, and to examine the degree to which different features can be mapped onto the working memory decay parameter (Appendix 1 Figure 5).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents a thorough biochemical characterization of inferred ancestral versions of the Dicer helicase function. Probably the most significant finding is that the deepest ancestral protein reconstructed (AncD1D2) has significant double-stranded RNA-stimulated ATPase activity that was lost later, along the vertebrate lineage. These results strongly suggest that the previously known differences in ATPase activity between extant vertebrates and, for example, extant arthropods is due to loss of the ATPase activity over evolutionary time as opposed to gains in specific lineages. Based on their analysis, the authors also "restore" ATPase function in the vertebrate dicer, but they did so by making many (over 40) mutations in the vertebrate protein, and it is not clear which of these many mutations is required for the restoration of the activity. Thus, it is difficult to discern how the results of this experiment relate to the evolutionary history.

      We completely agree with this reviewer's assessment of our paper. Our Michaelis-Menten analyses raised the intriguing idea that loss of ATPase activity in the helicase domain of the vertebrate ancestor may indicate loss of the ability to couple dsRNA binding to formation of the active conformation. Our rescue experiments support this idea, albeit in future studies we hope to create an active ancestor with fewer amino acid changes. While the rescue experiments validate what these analyses told us, as the reviewer suggests, they do not themselves inform on the evolutionary history.

      A criticism of the paper is the authors' tendency (probably unconscious) to ascribe a purposefulness to evolution. For example, in the introduction, "We speculate that the unique role of the RLR's in the interferon signaling pathway in vertebrates...created an incentive to jettison an active helicase in vertebrates." Although this sentence is clearly labelled as speculation and "incentive" is clearly a metaphor, the implication is that evolution somehow has forethought. (There are other instances of this notion in the paper, for example, in the last line of the abstract). The author's statement also implies that the developing interferon system somehow caused the loss of active helicase, but it seems equally plausible that the helicase function was lost before the interferon system co-opted it.

      We agree with the stated critiques and have rephrased language that suggests that evolution is an active force. In addition to changing the last line of the abstract (page 2, line 35), and removing the quoted sentence from the Introduction, we have included a more nuanced discussion of the order of evolutionary events that may have preceded or followed the loss of helicase function in Dicer (page 18, lines 418-430)

      Reviewer #2 (Public Review):

      The manuscript by Aderounmu presents an interesting attempt to reconstruct evolution of the function of the helicase domain in ancestral Dicers, RNase III enzymes producing siRNAs from long double-stranded RNA and microRNAs from small hairpin precursors. The helicase has a role in long dsRNA recognition and processing and this function could have an antiviral role. Authors show on reconstructed ancestral Dicer variants that the helicase was losing dsRNA binding affinity and ATPase activity during evolution of the lineage leading to vertebrates while an early divergent Dicer-2 variant in Arthropods retained high activity and seemed better adapted for blunt ended long dsRNA, which would be consistent with antiviral function.

      The work is consistent with apparent adaptation of vertebrate Dicers for miRNA biogenesis and two known modes of substrate loading: "bottom up" dsRNA threading through the helicase domain where the helicase domain recognizes the end of dsRNA and feeds it into the enzyme and "top-down" where the substrate is first anchored in the PAZ domain before it locks into the enzyme. Some extant Dicer variants are known to be adapted for just one of these two modes while Dicer in C. elegans exemplifies an "ambidextrous" variant. The reconstruction of the helicase domain complex enabled authors to test how well would be ancestral helicases supporting the "bottom up" feeding of long dsRNA and whether the helicase would be distinguishing blunt-end dsRNA and 3' 2 nucleotide overhang. Although the reconstruction of an ancestral protein from highly divergent extant sequences yields just a hypothetical ancestor, which cannot be validated, the work provides remarkable data for interpreting evolutionary history of the helicase domain and RNA silencing in more general. While it is not surprising that the ancestral helicase was a functional ATPase stimulated by dsRNA, particularly new and interesting are data that the decline of the helicase function started already at the level of the common deuterostome ancestor and the helicase was essentially dead in the vertebrate ancestor. It has been reported two decades ago that human Dicer carries a helicase, which has highly conserved critical residues in the ATPase domain but it is non-functional (10.1093/emboj/cdf582). Recently published mouse mutants showed that these highly conserved residues are not important in vivo (10.1016/j.molcel.2022.10.010). Aderounmu et al. now suggest that Dicer carried this dead ATPase with conserved residues for over 500 million years of vertebrate evolution.

      I do not have any major comments to the biochemical analyses and while I think that the ancestral protein reconstruction could yield hypothetical sequences, which did not exist, I think they represent reasonable reconstructions, which yielded data worth of interpretations. My major criticism of the work concerns clarity for the readership and interpretations of some results where I wish authors would clarify/revise the text. The following three examples are particularly significant:

      1) It should be explained to which common ancestor during metazoan evolution belongs the ancestral helicase AncD1D2 or at least what that sequence might represent in terms of common ancestry during metazoan evolution.

      We thank the reviewer for bringing this issue to our attention, and we have now included a brief discussion of the complexity in identifying AncD1D2’s exact position in metazoan evolution (page 6, lines 124-134). Our maximum likelihood phylogeny is constructed from Dicer’s helicase and DUF283 subdomains which evidently do not contain enough phylogenetic signal to resolve the finer details of early metazoan evolutionary events surrounding the divergence of non-bilaterians: Porifera, Ctenophora, Cnidaria and Placozoa. In our tree, Cnidaria even diverges later than the Nematode bilaterian branch reflecting the fact that our reported phylogeny does not match consensus species relationships, especially in the invertebrate clades. This means we cannot pinpoint AncD1D2’s exact position with certainty. While we do not intend to overinterpret the evolutionary trends from these hypothetical ancestral constructs, we believe the functional differences in biochemical activity are meaningful and correspond to big-picture changes over evolutionary time. AncD1D2 thus corresponds to some early metazoan ancestor that existed before the divergence of bilaterians from non-bilaterians. In support of this interpretation, when the phylogeny is constrained such that the bilaterian branches match the consensus species tree (Figure 1-figure supplement 2A) we observe that AncD1D2 is ancestral to the bilaterian ancestor, AncD1BILAT (now labeled on the figure), but retains 95% identity to the version of AncD1D2 constructed from the maximum likelihood phylogeny (Figure 1-figure supplement 3B).

      2) This is linked to the first point - authors work with phylogenetic trees reconstructed from a single protein sequence, which are not well aligned with predicted early metazoan divergence (https://doi.org/10.1098/rstb.2015.0036). While their sequence-based trees show early branching of Dicer-2 as if the two Dicers existed in the common ancestor of almost all animals (except of Placozoa), I do not think there is sufficient support for such a statement, especially since antiviral RNAi-dedicated Dicers evolve faster and Dicer-2 is restricted to a few distant taxonomic group, which might be better explained by independent duplications of ambidextrous ancestral Dicers. I would appreciate if authors would discuss this issue in more detail and make readers more aware of the complexity of the problem.

      We agree with the reviewer that in our initial submission we did not properly address the incongruence between our maximum likelihood phylogeny and the consensus species tree of life. We have now addressed this by revisions that discuss the difficulty in using a single gene or protein to accurately date ancient evolutionary events, especially in the case of Dicer, a protein whose evolutionary history is littered with multiple duplication events (page 6, lines 124-147, beginning with “Importantly, we observed multiple instances…”; page 16, lines 365-371, sentence beginning with “Uncertainty in the single gene or protein phylogeny…”). Our assumption that an early gene duplication produced the arthropod Dicer-2 clade is consistent with previous Dicer phylogenies that have been constructed with maximum likelihood algorithms with different parameters (https://doi.org/10.1371/journal.pone.0095350, https://doi.org/10.1093/molbev/msx187, https://doi.org/10.1093/molbev/mss263) using full length Dicer sequences with different taxon sampling depths and tree construction parameters. Removing other fast evolving taxa with long branch lengths from the sequence alignment still resulted in arthropod Dicer-2 branching out early in metazoan phylogeny (https://doi.org/10.1093/molbev/mss263).

      In analyses not included in our manuscript, we also independently constructed trees using full-length metazoan Dicers, helicase and DUF-283 subdomains using both RAXML-NG and MrBayes. We tried different taxon sampling depths and tried rooting the tree using either a non-bilaterian outgroup or a fungal outgroup and also tried breaking up potential long-branch attraction with deep taxon sampling. In every iteration, the arthropod Dicer-2 clade diverged early in animal evolution at some point before or during non-bilaterian evolution. We recognize that all these efforts are still prone to long-branch attraction that may cause the rapidly evolving Dicer-2 clade to artificially cluster with distant outgroups, but so far, the only evidence to support an arthropod-specific duplication event is parsimony. This parsimony model is plausible and one might expect a recently duplicated arthropod Dicer-2 to cluster closely with nematode Dicer-1, another antiviral Dicer that would have descended from a common ecdysozoan ancestor but this is not the case. The nematode HEL-DUF clade does get attracted to non-bilaterian Cnidaria clade in our ML tree, but unlike the arthropod Dicer-2 clade, this position varied depending on the parameters of phylogenetic analysis, and so we cannot conclude that arthropod Dicer-2’s position is due to long branch attraction. More sophisticated phylogenetic and statistical tools are needed to answer this question definitively, so we decided to proceed with the highest scoring maximum-likelihood phylogeny generated by our analysis.

      While we have now included a short discussion on the nature of this uncertainty in the revised manuscript (page 6, line 124., page 16, lines 365-371), we have excluded these additional details (paragraph above) from the main text in an attempt to prioritize readability for the generalist reader, and we hope that more specialized readers will find this discussion in the public comments helpful.

      3) Authors should take more into the account existing literature and data when hypothesizing about sequences of events. Some decline of the helicase activity is apparent in AncD1DEUT suggesting that it initiated between AncD1D2 and AncD1DEUT. This implies that a) antiviral role of Dicer was becoming redundant with other cellular protein sensors by then and b) Dicer was already becoming adapted for miRNA biogenesis, which further progressed in the lineage leading to vertebrates to the unique top-down loading with the distinct pre-dicing state where the helicase forms a rigid arm. Authors even cite Qiao et al. (https://doi.org/10.1016/j.dci.2021.103997) who report primitive interferon-like system in molluscs - this places the ancestry of the interferon response upstream of AncD1DEUT and suggests that this ancestral protein-based system was taking over antiviral role of Dicer much earlier. In fact, a bit weaker performance of AncD1LOPH/DEUT combined with the aforementioned interferon-like system and massive miRNA expansion in extant molluscs (10.1126/sciadv.add9938) suggests that molluscs possibly followed a convergent path like mammals. While I am missing this kind of discussion in the manuscript, I think that the model where "interferon appears ..." in AncD1VERT (Fig. 6) is incorrect and misleading.

      This comment is similar to others, including point 3 of Essential revisions, and we have revised our model in Figure 6 accordingly. We agree with the reviewer that we did not sufficiently explore the significance of the decline in Dicer helicase function between AncD1D2 and AncD1DEUT. In addition to the changes noted in point 3 of Essential revisions, we have corrected this by adding or modifying sentences in the Results (page 9, sentence beginning on line 197 “This reduction in ATP hydrolysis efficiency prior to deuterostome divergence may have coincided with…”, and page 11, sentence beginning on line 247 “One possibility is that between AncD1D2 and the deuterostome ancestor…”).

      We did not intend to suggest that this loss of Dicer helicase function was unique to vertebrates, but we focused on the deuterostome-to-vertebrate transition for the following reasons:

      a) The mollusk clade in our analysis is incongruent with its expected species position as a protostome. In our tree it clusters with deuterostomes instead. On one hand, this is probably an artefact of incomplete lineage sorting or long branch attraction. On the other hand, it is possible that this clade’s position is an underlying signal of the convergent evolution proposed by the reviewer. In support of the latter, some extant mollusk Dicer helicases (ACCESSION: XP_014781474, ACCESSION: XP_022331683) show a loss of amino acid conservation in Dicer’s ATPase motifs implying that extant mollusks have also lost Dicer helicase function like vertebrates. However, this is in contrast to vertebrate Dicer helicase where loss of function exists, but ATPase motifs remain conserved. We do not discuss this in the paper because the evidence remains inconclusive until extant mollusk Dicers can be functionally characterized, similar to Human Dicer and Drosophila Dicer-1, to determine that they are truly specialized for miRNA processing to the detriment of helicase function.

      b) Caenorhabditis elegans Dicer is an example of an ambidextrous Dicer, that processes both miRNAs, with the top-down mechanism, and viral dsRNAs, with the bottom-up mechanism. Recently, work has been published that suggests that C. elegans also possesses a protein-based innate immune defense mechanism, but instead of competing with the RNA interference mechanism, both mechanisms seem to work in concert and even share a protein in both pathways: DRH-1, a RIG-I-Like receptor homolog (https://doi.org/10.1128/JVI.01173-19). Furthermore, a protein-based pathway has also been reported in Drosophila and in this scenario Drosophila Dicer-2 is the dsRNA sensor that is common to both pathways (https://doi.org/10.1371/journal.pntd.0002823). This collaboration observed in ecdysozoan invertebrates is different from the competition that has been well established in vertebrates. More data is needed to understand whether a model of competition or collaboration exists in lophotrochozoan invertebrates like mollusks.

    1. Author Response

      Reviewer #1 (Public Review):

      VO2max is one of the most important gross criteria of peak performance ability and a plethora of studies focused on VO2max prediction. This manuscript provides huge and comprehensive data from male runners and male cyclists. The endurance-trained athletes performed cardiopulmonary exercise testing on a treadmill (n= 3330) or cycle ergometer (n=1094). In contrast to former studies, the authors used machine learning for algorithms and VO2max prediction. Models were derived and internally validated with multiple linear regression. The present study substantially expands current research.

      Sadly, the manuscript has an important and relevant main shortcoming as the limitations of the study had not been addressed properly:

      • The authors paid no attention to the fact that their results are strongly influenced by the exercise protocol used. It is obvious e.g. that maximal performance attainable in protocols with 2-minute exercise steps will be higher compared to an identical protocol with 3- or 4-minute steps.

      • The exercise intensity was kept constant for only 2 minutes before the workload was increased (by 1km/h treadmill or by 20-30 W cycle ergometer). Due to the kinetics of lactate, VO2, etc., it is evident that the short 2-min intervals aggravate the correct determination of aerobic and anaerobic threshold. It is well-known that longer-lasting constant exercise steps (e.g. 4 minutes) are better when the focus is centered on threshold determinations.

      The quality of this manuscript will be substantially improved when the authors could implement a comprehensive and blunt paragraph showing the limitations of their study.

      We have completed our manuscript by indicating its limits as recommended. It is reasonable to suspect that the type of protocol used matters in the cardiorespiratory indices obtained. Interestingly, according to available studies, this effect is more pronounced for the determination of cyclists' threshold power output or runners' treadmill running speed than for threshold and maximum cardiorespiratory indices such as VO2max or Hrmax (Silva et al. 2021; Weston et al. 2002; Vucetić et al. 2014).

      In the regression models presented, the main explanatory variables with the largest effect on the prediction value are the AT/RCP threshold VO2 values (rVO2RCP; rVO2AT). The coefficients for the other explanatory variables are relatively low and differences in their values due to the use of potentially different protocols appear to be marginal. Nevertheless, we see the possibility of worsening the prediction when using less suitable testing protocols for athletes such as ramp tests or typically clinical tests such as the Bruce test.

    1. Author Response

      Reviewer #1 (Public Review):

      This study represents an important work in the field of (CAR)T-cell immunotherapy by analyzing the effect of different oxygen tension on the function and differentiation of T-cells (especially CD8+). Although it has been described that low oxygen levels can influence effector function/differentiation of T-cells, as nicely acknowledged by the authors in the introduction, a comprehensive analysis in the context of immunotherapy has been missing so far and this study adds significant findings that will be relevant for patient care in all fields applying (CAR)T-cell immunotherapy.

      The strength of the evidence is generally solid although there are some discrepancies between the different ways to induce HIF-1α (i.e. low O2, pharmacological inhibition, shRNA knockdown) that need to be clearly stated and/or discussed.

      1) The first section of the results determines the impact of low oxygen and pharmacological HIF-1α stabilization on CD8+ T-cell activation/differentiation. Low oxygen diminishes cell growth but induces T-cell activation and effector cytokines, while HIF-1a stabilization mimics the effects on activation without alterations in expansion. Unfortunately, it remains unclear why effects upon low O2 are more pronounced although pharmacological HIF-1a stabilization is more efficient.

      2) As a next step, in vitro conditioned T-cells are transferred into a subcutaneous B16-OVA model. Although only the low O2 levels increase T-cell numbers in vivo after the transfer, the initial tumor burden was nicely decreased by both low O2 and HIF-1a stabilization. However, only the latter significantly improved survival and it remains unclear and uncommented why.

      3) Next, the authors address whether pre-conditioning of human CART-cells to induce HIF-1α either by pharmacological stabilization or by silencing of VHL shows similar effects. Surprisingly, both ways of HIF-1a stabilization resulted in different effects concerning differential gene expression and cytotoxic capacity of CART-cells. Accordingly, pharmacologically pre-conditioned CART-cells did not have a significant impact on survival in an in vivo model, while the VHL-silenced ones did significantly improve animal survival. This discrepancy between the two modes of HIF-1a stabilization remains uncommented. Unfortunately, it also remains unclear why the pharmacological HIF-1a stabilization significantly improved the survival in animals of the B16-OVA model and not in the human CART-cell model.

      4) After this, the researchers determine how the timing of hypoxic conditioning affects the (CAR)T-cells. Here it is convincingly shown that already a short period of hypoxic conditioning (1 day) with a subsequent expansion phase (additional 6 days) is sufficient to induce HIF-1a mediated alterations (e.g. metabolic changes, calcium flux, intracellular signaling). Although this section is coherent in itself, the switch between different times of hypoxic conditioning, expansion, and analysis is difficult to follow and might lead to confusion. The expression pattern of e.g. HIF-1a on day 1 and day 7 together with the nuclear amounts of NFAT and c-Myc might be misunderstood, like the other presented data as well.

      5) Last, short-term hypoxic conditioning of CART cells is tested in a solid tumour mouse model. The previously identified conditioning protocol also increases CART-cell function against solid tumours (as shown by enhanced cytotoxicity, reduced tumour burden, and prolonged survival). Unfortunately, although both HER2-CART-cells and CD19-CART-cells are shown to have superior cytotoxicity in vitro after the pre-conditioning, only HER2-CART-cells are demonstrated to be superior upon low O2 conditioning in an in vivo adoptive transfer mouse model and CD19-CART-cells remain an open question.

      Generally spoken, the limitations of the manuscript are:

      1) The occurring discrepancies of determining effects caused by the different modes of Hif-1a stabilization which certainly are caused by the complex nature of Hif-1a regulatory network, and;

      We now extend our observations and discuss these concerns more extensively in the manuscript.

      2) The limitation of detected effects primarily on CD8+ T cells while CART-cells products usually are a mixture of CD4+ and CD8+ ones.

      Figure S6H now shows that the effects of shorter periods of low oxygen conditioning obtained with CAR-T cells generated from isolated CD8+ T cells are reproducible in CAR-T cells generated from PBMCs. We have found that a 24h incubation of PBMC-derived CAR-T cells in 1 %O2 increases cytotoxicity against target cell effector differentiation at day 7, when compared to the cytotoxic effects of cells cultured at 21% oxygen levels.

      Reviewer #3 (Public Review):

      In this study, Cunha et al. examined the role of different oxygen tensions (21%, 5%, and 1% O2) and HIF-1α stabilisation in regulating murine and human CD8+ T cell proliferation and function. The authors find that hypoxia (1% O2) and pharmacological PHD inhibition with FG-4592, enhance murine T cell activation but impair proliferation. Furthermore, adoptive cell transfer (ACT) therapy of CD8+ T cells from both conditions reduced tumour burden in a B16-OVA melanoma model. Short hypoxic conditioning (1% O2) of human CD8+ T cells for 1 day increased HIF-1α stabilisation, with increased activation, glycolysis, and mitochondrial function still observed following 6 days of normoxic cell culture. Short hypoxic conditioning of HER2 and CD19 CAR-T cells improved their activation and cytotoxicity in vitro, while HER2 CAR-T cell counts were increased in vivo, reducing tumour burden, and increasing survival when compared to 21% O2.

      Strengths:

      The paper convincingly demonstrates that short hypoxic conditioning in a defined window improves CAR-T cell function through in vitro cytotoxicity assays and following adoptive transfer in a preclinical HER2+-SKOV3+ positive tumour model. Thus, the major conclusion of the paper is mostly well supported by the data and could represent a novel strategy to improve CAR-T cell immunotherapy for solid tumours in the future.

      Weaknesses:

      The extent to which hypoxic conditioning-mediated improvement in CAR-T cell function is dependent on HIF-1-driven metabolic reprogramming is unclear and other potential mechanisms are not explored. 5FG-4592 and VHL silencing in HER2 CAR-T cells did not phenocopy each other faithfully. In addition, neither approach was as effective as short hypoxic conditioning with 1% O2 in improving CAR-T cell function in vitro or in vivo. Although the authors suggest the temporal dynamics of HIF-1α stabilisation is the key point, this is not convincingly proven, and no metabolic characterisation of these CAR-T cells was performed.

      The revised manuscript now includes live metabolic analyses in a Seahorse set up, using T cells following FG-4592 treatment or VHL silencing. We found exposure of human CD8+ T cells to FG-4592 leads to a suppression of their oxygen consumption rates, both at basal and maximal levels. This can underpin the observed reduced expression of effector molecules (PMID: 33398183). Treatment of human T cells with FG-4592 resulted in a dose-dependent reduction of in vitro cytotoxicity, similar to that observed with exposure to low oxygen (e.g., 7 day OT-I expansion in 1%O2 impairs antitumour function [Figure supplement 6L]).

      Regarding VHL silencing, we did not observe metabolic differences compared to controls. This might arise from the fact that shVHL vectors only caused an overall 30% reduction in VHL protein expression, and that the silencing occurred after T cells had been activated. As we show, the moment of activation is key for T cell differentiation and function, and this could explain the lack of metabolic differences between shNCT and shVHL-expressing cells. These points are now added to 5th paragraph of the Discussion section.

      It is unclear how changes elicited during short hypoxic conditioning are maintained following continued normoxic cell culture. Hypoxia is known to rapidly regulate histone methylation and chromatin structure in a HIF-independent manner (PMID: 30872525; PMID: 30872526). Are similar epigenetic changes observed in T cells, and if so, could these epigenetic changes underlie improved T cell activation?

      We thank the reviewer for the insightful comment on potential epigenetic changes observed in T cells cultured in hypoxia. We have now carried out an extensive analysis of histone methylation and acetylation (Figure 4H). Human CD8+ T cells cultured for 1 day in 1% and 6 days in 21% showed decreased acetylation of H3K9 and H3K27, reduced trimethylation of H3K4 and H3K27 and increased methylation of H3K9me2, as compared to the levels of cells continuously grown in ambient oxygen. These differences might underpin the altered differentiation and metabolic shifts of 1% cultured T cells and further indicate that the oxygen tensions during the first 24 hours of activation elicit permanent alterations in T cells. Future work will be dedicated to understanding the link between the observed alteration in histone post-translational modifications and T cell function in response to hypoxia.

      Complications may also arise when comparing different oxygen tensions given recent data that suggests standard cell culture conditions can lead to local hypoxia through a combination (https://www.biorxiv.org/content/10.1101/2022.11.29.516437v1) of cellular respiration and poor O2 diffusion. Although it is unclear how this will impact suspension T cells it does beg the question as to whether HIF-1α stability following T cell activation is (at least in part) mediated by pericellular O2 limitations in cell culture over time, even in presumed hyperoxic (21% O2) conditions? Or if T cells subsequently cultured at 21% O2 following short hypoxic conditioning (1% O2) still experience local hypoxia during the 6-day culturing protocol? It would be important to assess this in future work and at least discuss these potential weaknesses.

      Upon analysing HIF-1α accumulation on day 7, we only found substantial HIF levels in cells that had been in low oxygen tensions for the last 3 days of culture (Figure S4G). This suggests that cells were not experiencing hypoxia at the time of analysis on day 7, given that we did not observe substantial HIF accumulation. We have additionally designed an experiment where 21% and 1% 1 day T cells were cultured for 7 days with a single media change on day 4 (standard) or with 5 media changes (each media change performed on separate days to minimize local hypoxia in ambient oxygen). Regardless of the number of media changes, 1% 1d cultures showed increased effector differentiation and expression of effector molecules, relative to 21% cells (Figure S4H). We also did not observe any differences between control cells cultured with 1 or 5 media changes. As hypoxia elicits changes in T cell differentiation, this suggests cells do not experience local hypoxia during the phase of ambient oxygen expansion. Nevertheless, we very much agree that it is important to accurately assess oxygen concentrations in cell culture media.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors provide evidence for chromatin, which in Drosophila muscle cells is peripherally localized in the nucleus, whereas the central region is depleted of chromatin, and is organised such that RNA polymerase II (RNAp) is surrounding dense regions of chromatin. The authors theoretically study the formation of these regions by describing chromatin as a multi-block copolymer, where the blocks correspond to active and inactive chromatin regions. These regions are assumed to phase separately and to have different solvability. The solvability of the active region is regulated by binding RNAp. The authors study the core-shell organization in a layered geometry by analyzing the various contributions to free energy. In this way, they in particular obtain the dependence of the shell-layer thickness, which is described as a polymer brush. From these results, they infer chromatin organization in spherical coreshell chromatin domains and compare these results to Brownian dynamics simulations.

      The work is well done and even though it uses standard methods for studying block copolymers and polymer brushes obtains interesting information about local chromatin organization. These findings should be of great interest to researchers in the field of chromatin organization and in general to everybody interested in understanding the physical principles of biological organization.

      The work has two main weaknesses: The experimental evidence for RNAp and chromatin microorganization is weak as only one example is shown. It remains unclear whether the observed organization pattern is common or not. Also, no data is shown concerning the dependence of the extensions of the active and inactive phases on parameters, for example, solvent properties or transcriptional activity. Second, some parts could prove difficult for biologists to assess. For example, the expression for the brush-free energy should be explained in more detail and notions like that of 'mushrooms' need to be introduced. As a second example, biologists might benefit from a better explanation of the concept of a theta solvent and its relevance.

      We thank Reviewer #1 for the positive review and critical feedback. Below we answer the points raised in the last paragraph of its review.

      In the original version of the manuscript we only showed a representative image of nuclei of muscle cells in an intact, live Drosophila larvae. Notably, this organization is representative of many nuclei analyzed in muscle tissue. In the revised version we show that in a distinct tissue, e.g. salivary gland epithelium of live Drosophila larvae, RNA Pol II distribution is similarly facing the nucleoplasm, although chromatin condensation differs due to higher DNA ploidy. The new images were added as Supplement information (Fig A1). Since these representative images are the main motivation behind our theoretical analysis, we think that including them will help the reader in understanding the relevance of our minimal model. The effect of different biological perturbations, such as changes in the repressive marks and how these change the core-shell structure require extensive experiments that are outside the scope of the present paper. We also note, that in live organisms (not just live cells) such as those studied here, one can only reliably use genetic perturbations; solvent quality is regulated by the organism and cannot be controlled as in synthetic polymer experiments. Our main focus in the present paper is to highlight an area that has been relatively unexplored by the chromatin organization community, which is how changes in concentrations binding-partners of chromatin may have a strong effect in nuclear architecture.

      We have also improved the explanation of the physical concepts for biologists. We added a more thorough explanation of the concept of a polymer brush and explained more clearly what the concept of theta solvent in terms of the scaling properties of a polymer in solution. We quote these revisions below.

      Reviewer #2 (Public Review):

      This work formulates a detailed theoretical polymer physics model intended to explain the observed morphology of chromatin in the Drosophila cell nucleus. The model is examined in detail by both analytical calculation and computer simulation. The central premise of the suggested theory is that it is again based on equilibrium statistical mechanics. Within this paradigm, authors explore the model that views chromatin fiber as a block copolymer and, most importantly, describes the role of RNA polymerase as it interacts with one of the copolymer blocks and regulates its effective solvent quality. Blocks are assumed to be fixed on the time scale of interest by, e.g., different levels of acetylation or methylation. RNA polymerase is supposed to interact only with one of the chromatin blocks, called active, and assumed interaction is quite peculiar. Namely, RNA polymerase complex may absorb on chromatin fiber and, the model assumes, the fiber decorated with absorbed RNA polymerase molecules is less sticky to itself, or more repulsive than the fiber itself. This peculiar assumption allows authors to make interesting predictions about how proteins can regulate the genome folding architecture.

      We thank the reviewer for the positive and critical feedback. We agree that our assumption of changes in the effective solvent stemming from protein complexes binding to chromatin is at the core of our analysis and we justify it further below.

      STRENGTH

      The work includes a rather detailed theoretical description of the model and its equilibrium statistical mechanics. As both analytical theory and accompanying simulation indicate, the assumptions put forward in formulating the model do indeed produce the desired morphology, with isolated regions ("micelles") of core inactive chromatin surrounded by the less dense shell region in which RNA polymerization may potentially take place. Having such a detailed theory is potentially beneficial for the field and opens up avenues for further exploration.

      We thank the referee for appreciating the potential benefit of our minimal theory of solvent-quality regulation by binding processes.

      WEAKNESS

      The underlying assumption about the interaction of RNA polymerase complex with the fiber, although important and organic for the model, does not seem easy to justify from a molecular standpoint, especially thinking of the charges and electrostatic interactions.

      We visualize that the binding of RNA Pol II (mediated by different transcription factors) to chromatin is also associated with larger protein complexes that may contain hydrophobic and hydrophilic components, such as pre-initiation complexes. Some regions of these complexes might associate directly with chromatin due to positive charges on the surface of the Pol II complex , whereas the hydrophilic negative regions may be directed towards the solvent. Our theory is typical of the approach used in polymer physics where coarse-grained interactions are considered. While the origin of hydrophilic interactions lies in electrostatics, such interactions are highly screened in cells (typically 200 mM concentration of salts) and can be considered as short-ranged and competitive with hydrophobic interactions. Chromatin in solution is known to condense (see Gibson, et. al., Cell 2019 and Strickfaden, et. al., Cell 2020) and even phase separate from the nucleoplasm (see Amiad-Pavlov, et. al., Science Advances, 2021); this can arise either from hydrophobic interactions of the histone tails or from opposite charge attraction of the histones and linker DNA. In our model, this competes with the binding of protein complexes which then disrupt the self-attraction of chromatin. Previous work has shown that RNA Pol II associating with chromatin (in the absence of transcription) prevents the coarsening of dense chromatin domains (see Hilbert, et. al. Nat. Comm. 2021), which agrees with our modeling of protein complexes that bind to chromatin and interfere with its condensation; in addition, the binding of Pol-II and all its binding partners that form the pre-initiation complex (see Hahn, Nat. Struct. & Mol. Biol. 2004, 11) will result in effective, steric repulsion between different active and Pol II bound chromatin domains. Another interesting observation is that most of the surface of RNA Polymerase II is negatively charged with a few positively charged patches with which it specifically interacts with DNA while others serve as exit paths of RNA (see Cramer, et. al., Science, 2001.). We agree that a more thorough analysis of the molecular interactions between what we name protein complexes and chromatin is interesting, but it is out of the scope of our paper that uses a coarsegrained, polymer physics approach. This approach also allows our model to be to be predictive as to the physical organization and growth of the domains, independent of those molecular details that are as yet unknown.

      Reviewer #3 (Public Review):

      This theoretical study provides a theoretical explanation for a puzzling question arising from recent experiments: How can chromosomes behave like polymers collapsed in a poor solvent but also contain "open" active chromatin sections? The authors propose that the binding of proteins (e.g. RNAP's) to the active sections can effectively change the solvent quality for these sections and thus open them. They suggest further that chromosomes show micellar structures with inactive blocks forming the cores of the micelles. Protein binding causes swelling of the micellar shells which affects the whole chromosome structure by changing the total number of micelles. This theory fits well to live imaging data of chromatin in Drosophila larvae, like the one shown in the striking Figure 1.

      The manuscript is written very clearly.

      My only suggestion is that the authors, in both the theory and simulation parts, are more explicit about how the interactions between the various components are modeled. From what I could see, in the theory part, one needs to look closely at Eq. 5 to understand how the influence of the binding of proteins affects the interaction between active monomers, and in the simulation part, one needs to go to the appendix to learn that interaction strengths between monomers within the active blocks and monomers within the inactive blocks have different values. The latter is crucial to understand the micellar structure shown at the top of Fig. 5A.

      We thank the reviewer for his positive response. We have explained Eq. 5 more carefully now and included other explanatory remarks throughout the text. We also explained more clearly the interactions considered in the simulations. Below we answer point by point and add quotes from the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Marchal-Duval et al studied the role of Prrx1 in lung fibroblasts. Prrx1 is a transcription factor expressed in lung fibroblasts but not in other cell types. The authors showed that Prrx1 gene expression was enhanced in IPF patients. Immunohistochemistry in IPF tissue suggested that Prrx1 was expressed in fibroblasts in fibroblastic foci. The authors then showed that Prrx1 expression was regulated by TGF-b1 stimulation or stiffness of substrate by in vitro experiments using primary human lung fibroblasts from either normal or IPF lungs. The authors also showed that Prrx1 regulated fibroblast proliferation and TGF-b signaling by regulating PPM1A and Tgfbr2 expression. Finally, the authors revealed that Prrx1 knockdown suppressed fibrosis in bleomycin-induced fibrosis or PCLS. This manuscript identified novel molecular roles of Prrx1 in fibroblast activation, which is expressed in not only lung fibroblasts but also in other injured or developing organs. To support the idea that Prrx1 plays a critical role in lung fibrosis, however, some discrepancies between in vitro and in vivo data need to be clarified.

      Comment #1. Although the authors showed that Prrx1 knockdown in primary fibroblasts reduced Smad2/3 phosphorylation, the reduction of Acta2 or Col1a1 after Prrx1 knockdown and TGF-b1 stimulation was not impressive (Fig. S6), suggesting that the inhibition of TGF-b signaling by Prrx1 knockdown is only partial. In contrast, Prrx1 knockdown by ASO in bleomycin-induced fibrosis showed remarkable fibrosis suppression (Fig. 6, 7). Admittedly there are differences in models and nucleotides used, but this discrepancy needs to be addressed.

      We agree with the reviewer that Prrx1 inhibition only partially affects the upregulation of ACTA2, but this effect was significant (around 50% inhibition at the protein level). As stated in the discussion (lines 569-572), our data show that key ECM proteins such as Collagen 1 and Fibronectin were still upregulated in TGF-1 stimulated lung fibroblasts transfected with PRRX1 siRNA, whereas TNC and ELN mRNA expression levels were perturbed. These findings suggest that broader phenotypical changes are associated with Prrx1 knockdown. Notably, we also observed that Prrx1 inhibition impacted cell proliferation in vitro. We believe that the observed suppression of fibrosis in bleomycin treated mice following Prrx1 knockdown by ASO is the result of both the partial inhibition of TGF-β1 effect and the decrease in mesenchymal cell proliferation. Supporting this hypothesis, we observed a decrease in PDGFR-positive cell proliferation in Prrx1 ASO-treated animals (see comment #4 hereafter).

      Comment #2. Fig.6 and 7 lack control groups, where mice are treated with PBS instead of bleomycin and treated with either control ASO or Prrx1 ASO.

      As stated in the revised version of the material and method (line 683-686), the knockdown efficiency of Prrx1 ASO and lack of effects of control ASO were first validated in naive mice, which were treated with either Prrx1 ASO or control ASO, compared to PBS-treated mice (see Figure R2 in the answer to comment #11 of reviewer 2). Those groups were not repeated / included in the first set of bleomycin experiments in order to comply with institutional regulation to limit animal usage. In the first set of experiments (Prrx1 ASO treatment between day 7 and day 13 after bleomycin insult), the saline + PBS was just used to confirm fibrosis development while the bleomycin + Control ASO was the proper control of the bleomycin + Prrx1 ASO group. In the new second set of experiment (ASO treatment between day 21 and day28 suggested by reviewer #2), we were authorized by our local animal ethical committee to include a control ASO group in the saline treated group to confirm that the lack of effect of these control ASO compared to the PBS group (see new Figure 7-figure supplement 1).

      Comment #3. In Fig. 6F, the hydroxyproline content is shown with ug collagen/ug protein. Total protein in the lung is influenced by infiltration of hematopoietic cells, which are the major population in injured lungs by cell count. Fibrosis should be ideally assessed as ug hydroxyproline/lung (or lobe).

      We completely agree with the reviewer that hydroxyproline content should ideally be assessed by lobe/lung. As stated in the revised material and methods (lines 882-885), hydroxyproline and protein contents were measured using paraffin lung sections (15 sections of 10µm per sample) with the Quickzyme Biosciences hydroxyproline assay and total protein assay kits; due to limited material access and to refine its use to limit animal usage. Furthermore, the infiltration of hematopoietic cells would rather undermine the effect of Prrx1 ASO (less fibrosis and inflammation) since the contribution of those cells would be higher in control ASO-treated bleomycin mice. Considering the reviewer’s concern, a complete lobe was used to measure hydroxyproline content in the new set of experiments generated during the revision of the manuscript (see new Figure 7-figure supplement 1).

      Comment #4. Major proliferating populations in bleomycin-treated lungs are not mesenchymal cells but epithelial/endothelial/hematopoietic cells. Mki67+ cells (Fig. 7D) need to be identified by co-staining with mesenchymal markers if the authors claim that Prrx1 knockdown suppresses fibroblast proliferation in vivo.

      We agree with the reviewer that epithelial/endothelial/hematopoietic cells are the main proliferating populations in bleomycin treated animals at day 14. As suggested by the reviewer, we performed a MKI67 / PDGFR co-staining to identify proliferating mesenchymal cells and confirmed a decrease in proliferation in these cells after Prrx1 knock down in bleomycin treated mice (see lines 448-451 and Figure 6-figure supplement 3).

      Comment #5 Bleomycin-injured lungs or IPF tissue are patchy and mixed with normal and abnormal areas. Therefore, how areas of interest are chosen for histological quantifications (Fig. 6C, S14D) need to be described in the methods section.

      As now stated in the revised material section (lines 864-866), areas of interest were chosen according to the presence of major alveolar thickening as well as fibrous changes and masses (confirmed by picrosirius staining on serial section).

      Reviewer #2 (Public Review):

      The paper from Marchal-Duval et al reports for the first time the important role played by the transcription factor PRRX1, expressed specifically in the mesenchyme of the lung, in the context of fibrosis. The authors used a combination of human (Donor and IPF) and mouse lungs (saline and bleomycin treated) as well as associated fibroblasts and PCLS to test the functional role of PRRX1 in the context of proliferation and differentiation induced by TGFb1. The work is supported by an impressive amount of data (7 main figures and 14 supplementary figures).

      Comment #1: A main weakness in this work is the counterintuitive result that PRRX1 is downregulated in human lung fibroblasts (from both IPF and Donor) treated with TGFb1.

      We agree with reviewer that PRRX1 downregulation upon TGFb1 treatment may appear counterintuitive. First, as stated in the manuscript, this inhibitory effect is partial. Secondly, we performed additional experiments in the revised manuscript to better understand (timewise) the downregulation of PRRX1 in response to TGF-b1 in lung fibroblast as suggested by the reviewer. Time course analysis of PRRX1 isoform expression levels showed that PRRX1 was downregulated only after 48h. This late downregulation of PRRX1 in response to TGF-b1, could be the signature of a negative feedback loop to limit cell-responsiveness to TGF-b1 when lung fibroblasts are fully differentiated into myofibroblasts at 48h as discussed in the revised manuscript (see lines 175-180 and lines 589-594).

      Comment #2: Another smaller weakness is the inactivation of Prrx1 in vivo using ASO starting at d7 post bleomycin treatment.

      In our study of Prrx1 inhibition in vivo, we followed a therapeutic/interventional protocol consistent with current literature on the bleomycin model of lung fibrosis (Moeller A. et al, Int J Biochem Cell Biol 2008 and Kolb M. et al., Eur Resp J. 2020), treating the animals with either control or Prrx1 ASO every other day between day 7 and day 14 during the active fibrotic phase. In the revised manuscript, we extended our investigation to assess the potential effect of Prrx1 inhibition during the late fibrosis phase after bleomycin treatment at day 28, treating the animals with either control or Prrx1 ASO every other day between day 21 and day 27. Interestingly, we found that the effects of Prrx1 inhibition during the late fibrosis phase were less (but still) potent compared to the active fibrotic phase (see Figure 7-figure supplement 1).

    1. Author Response

      Reviewer #2 (Public Review):

      We thank the reviewer for their assessment that our work “supports the idea that epithelial-endothelial crosstalk is important for lung regeneration and proposes a potential candidate for this process” and their helpful suggestions for strengthening and clarifying our work.

      1) The scRNA-seq analysis is performed in two separate objects ("control lung" and "H1N1 infected lung 14dpi"). For these two sets of data to be comparable, the authors should have integrated the objects and analyzed them together. This is not only important for deciding the clusters' identities and making sure that the same clusters are compared between control and infected, but also necessary to compare gene expression.

      We have integrated the control and H1N1-infected scRNA-seq datasets and reanalyzed the integrated data. We then analyzed CAP1_A and CAP1_B populations, comparing their gene expression between control and influenza conditions. Unbiased clustering of the integrated dataset reveals the same clusters we identified in the individual datasets, with cells from control and flu contributing to each cluster (with the exception of proliferating endothelial cells, which are found only in the H1N1-infected lung). We have added a supplemental figure outlining these data (Figure 1 – Figure Supplement 3).

      2) ATF3 is not only present in Cap1_B, in the infected lung there seems like Cap1_A express less ATF3. The authors should comment on this difference.

      We have added violin plots to Figure 1, which we feel will better represent the greater Atf3 expression in CAP1_Bs relative to other endothelial cell subtypes. The reviewer is correct that Atf3-expressing cells are found in large vessels, but they are also numerous in the alveolar capillary space and increase with influenza in these regions. We have added lower-magnification, higher-resolution images of Atf3CreER; ROSA26tdTomato animals, both control and influenza-infected, to illustrate this expansion in a new Figure 2 – Figure Supplement 3. This increase is also quantified in Figure 2C. We have also clarified this in the text.

      3) It is unclear how the clusters Cap1_A and Cap1_B were decided. The manuscript would benefit from clarification.

      We have added text to the Materials and Methods section to clarify this.

      4) It would be beneficial to see via immunofluorescence the morphological and spatial differences between ATF3-expressing and non-expressing endothelial cells since this transcription factor is expressed in multiple endothelial cell types.

      We have added lower-magnification, higher-resolution images of Atf3CreER; ROSA26tdTomato animals, both control and influenza-infected, to illustrate the spatial distribution of Atf3-expressing endothelial cells. This data is now shown in the new Figure 2 – Figure Supplement 3. We have also added further data to the new Figure 5 – Figure Supplement 1 to include the cytoplasmic endothelial marker Endomucin-1 (EndoM1) in an analysis of the spatial distribution of endothelial cells in wild-type and Atf3-knockout animals at 21 dpi.

      5) The authors mention ATF3 is not endothelial-specific. Expression of ATF3 in other cell types should be evaluated via immunofluorescence.

      This data is present in Figure 2 – Figure Supplement 2.

      6) The authors should have shown evidence of the deletion in their Atf3EC-KO mouse and addressed whether they had residual ATF3. If there is no antibody available, RNAscope could be used, or Western Blot or RT-PCR on sorted endothelial cells.

      We agree that this is an important quantification to make. We have performed qRT-PCR for Atf3 in both the animals used to perform the RNA sequencing experiment as well as a new cohort of animals to confirm Atf3 deletion. We have added these results to a new supplemental figure accompanying Figure 4 (Figure 4 – Figure Supplement 1).

      7) The authors only show the epithelium as evidence that the alveolar region is altered in their mutant after infection. The endothelium should have also been investigated, especially since their mutant is an endothelial-specific deletion. Within this, the different endothelial cells should have been assessed by a method other than RNAscope such as immunofluorescence, given that this method is unable to show morphology and there are antibodies available.

      This data is present in Figure 5. We have also added additional data to the new Figure 5 – Figure Supplement 1 to extend our analysis to 21 dpi and to incorporate a cytoplasmic marker of endothelial cells, Endomucin (EndoM1).

      8) Bulk RNA-seq from endothelial cells is used in the manuscript. However, because ATF3 is not specific to Cap1_B cells or even capillaries alone, the downstream gene expression analysis of bulk RNA should be placed into the context of lung endothelial heterogeneity.

      We have added qRT-PCR analysis of several downstream genes to address the comments of Reviewer #3, point #3. To place this into the context of endothelial heterogeneity, we have added dot plots to show the expression of selected genes from the RNA-seq analysis in each endothelial subtype from the H1N1 scRNA-seq dataset. These data can be found in the new Figure 4 – Figure Supplement 1. However, because of the relatively low sequencing depth of scRNA-seq compared to bulk RNA-seq, many of the transcripts examined were only present in a small percentage of endothelial cells in the scRNA-seq dataset, so the differences seen are more striking in the RNA-seq data.

      9) Although the authors mentioned that the infection with H1N1 influenza can have regional differences, they do not show how they picked regions for their analysis and quantification, and whether ATF3 upregulation was found in more severely affected regions. Furthermore, since they quantified via FACS, this heterogeneity in the infection itself could have affected their observations.

      We agree that it is essential both to define the extent of H1N1-mediated inflammation in Atf3 wild-type and knockout mice and to compare this factor between genotypes. We have therefore used a previously published method for quantifying regions of severe, damaged, and normal tissue structure (Liberti et al., Cell Reports 2021) in both Atf3 wild-type and knockout animals. Our results show that Atf3 wild-type and knockout mice have similar levels of tissue damage, and we have added a supplemental figure demonstrating these data (new Figure 3 – Figure Supplement 2). We have also clarified how regions were selected for quantification of alveolar area.

      H1N1 influenza injury in mice is heterogeneous, with regions of severe alveolar destruction marked by densely packed immune cells, adjacent regions of damaged tissue, and regions of tissue that appear to have normal tissue structure, as we and others have previously described (Zacharias, Frank et al., Nature 2018; Liberti et al., Cell Reports 2021; Niethamer et al., eLife 2020). However, it has become increasingly apparent that these regions where tissue structure appears normal are actually regions of active regeneration, and endothelial cell proliferation is increased in these regions (Niethamer et al., eLife 2020). We therefore selected 20X fields in these areas to use for quantifying alveolar area, as these are actively regenerating regions where alveolar structures are present for quantification. Because of the changes to tissue structure seen in damaged or destroyed tissue areas, we did not select these regions for quantification, although they were present at similar frequency in Atf3 wild-type and knockout animals.

    1. Author Response

      Joint Public Review:

      These RNAs come from a screen which is not well described and the descriptions of the sequence analyses are unclear, so it is difficult to know exactly what they are analyzing in the manuscript.

      We apologize for not including the required details in the manuscript. The cell cycle lncRNA screen where we identified the initial SNUL-1 probe was published in an earlier paper 6. By performing RNA-seq in cell cycle synchronized samples, we identified several hundreds of lncRNAs that differentially expressed in a particular stage of the cell cycle. We performed a large-scale RNA-FISH-based screen to characterize the localization of these cell cycle-regulated lncRNAs. One of the probes in this screen hybridized to SNUL-1 RNA in the nucleolus. The original double-stranded DNA probe that detected the SNUL-1 RNA cloud(s) was mapped to hg38-Chr17: 39549507-39550130 genomic region, encoding a lncRNA. However, other unique non-overlapping probes generated from the Chr17-encoded lncRNA failed to detect the SNUL-1 RNA cloud. Furthermore, BLAST-based analyses failed to align the SNUL-1 hybridized sequence to any other genomic loci. Since a large proportion of the p-arms of nucleolus-associated NOR-containing acrocentric chromosomes is not yet annotated, we speculated that SNUL-1 could be transcribed from an unannotated genomic region from the acrocentric p-arms.

      We have now provided the information in the revised manuscript. Specifically, we have provided the details of the PacBio iso-seq, nanopore seq analyses as well as the bioinformatic approaches that were conducted to determine the identity of the full-length SNUL-1 ncRNA.

      If these are RNAs with reasonable abundance, then they should be findable without the extensive PCR amplification they appear to have done for the PacBio sequencing (the methods section is not clear on exactly how many rounds of PCR were performed).

      We apologize for not providing the essential details. In the PacBio-iso-seq analyses, we utilized the standard protocol (recommended by the scientists from PacBio, who are authors in the manuscript), which included 13 PCR cycles. However, as described in the manuscript, in parallel to PacBio-seq, we also performed nano-pore sequencing of the nucleolus-enriched RNA without any amplification. The SNUL-1 full-length candidate sequence (CS) that we described in the manuscript is the ncRNA that showed 100% sequence similarity in both independent PacBio Iso-seq as well as nanopore seq analyses. We argue that if the SNUL-1 candidate transcripts would have been an artifact of PCR amplification in PacBio-seq, then we would not have obtained the full-length sequence with 100% match in the nanopore-seq reads. We have now included the detailed bioinformatic analyses in the methods section of the ms.

      Moreover, given the acknowledged sequence similarities of the SNULs with other RNAs, the possibility of chimaera formation during PCR amplification is high. They are clearly detecting RNAs associated with nucleoli but exactly what they are examining is unclear.

      Please see our response above (public Reviewer comment 2). In addition, we performed detailed bioinformatic analyses to test whether the SNUL-1 full-length sequence obtained in the PacBio-seq is not an artifact of PCR amplification. This analysis is described in detail in the methods section under the sub-title “sequencing analyses”.

      It is possible that a clear determination of the genomic origin of these RNAs will be complicated by the repetitive sequences in the regions of the genome where they reside.

      We thank this reviewer for acknowledging the technical limitation of mapping the genomic locus of SNUL1 genes. We have pointed out this as the limitation of the present manuscript. Mapping the SNUL-1 genomic locus and characterizing the regulatory sequence elements and factors that control the monoallelic expression of SNULs will be part of future research plans.

      Note also that the idea of monoallelic expression from rRNA encoding loci is interesting, but has been established in 2009. Title: Allelic inactivation of rDNA loci. Genes Dev. 2009 Oct 15;23(20):2437-47. doi: 10.1101/gad.544509.

      We thank the reviewer for pointing out the study from Cedar lab published in 2009. To test the idea that SNULs contribute to allele specific expression of rRNA, which was previously reported by Cedar lab in their 2009 G&D paper, we performed the same set of experiments described in their paper in three different cell lines in the presence or absence of SNULs (please see the response to Editorial comment-2). However, we could not reproduce any of the data presented in the G&D manuscript. Also, we have not seen any other follow up study, where mono-allelic expression of rDNA genes was observed. Currently, no concrete data supports monoallelic expression of rRNA 5. We, therefore, argue that our current study is the first one, demonstrating the mono-allelic association of a ncRNA from the p-arm containing rDNA cluster.

    1. Author Response

      Reviewer #1 (Public Review):

      The shift from outcrossing to selfing is one of the most prevalent evolutionary events in flowering plants. The ecological and genetic backgrounds of these transitions have been of major interest for decades, and one of the key questions was the dating of this transition. Timing of pseudogenization of the self-incompatibility (SI) genes has been used as a proxy for this transition because loss-of-function mutations of SI genes are often responsible for the evolution of predominant selfing. However, SI genes are identified only in a limited number of taxa, and in some cases, the evolution of selfing is not necessarily associated with loss of SI. Therefore, an independent time estimate of the evolution of selfing by genome-wide polymorphism data has been considered important in this field.

      This study provides two statistical methods: SMC-based and ABC-based methods. Both methods intend to detect the genome-wide signatures of the outcrossing-to-selfing transition that alters the ratio of population recombination rate and mutation rate. Authors validated these methods by using the simulated data, confirming that both methods can generally infer the timing of the outcrossing-to-selfing transition jointly with population size changes, although its precision depends on several population history settings.

      This study would be an important contribution to the field of mating system evolution. By applying the proposed methods to many other selfing organisms, we may be able to see a general picture of the timescale of the outcrossing-to-selfing transition combined with population size dynamics. At the same time, this is one of the extensions of the SMC method, which has already been well utilized for various inferences, including population size and recombination rate heterogeneity.

      We thank the reviewer for his positive comments and acknowledging the novelty and relevance of our study for the field.

      I do not find a major weakness in the methodologies of this study, but I have a few comments on their applications to the data of Arabidopsis thaliana. It is important that these estimates largely depend on what input data is used, especially the mutation rate and recombination rate. While the authors claim that their estimate is older than Bechsgaard's estimate (<413 kyrs), these two studies used different mutation rates: the authors used Ossowski's mutation rate, and Bechsgaard used Koch's mutation rate (Koch et al. MBE 2010). To compare these two estimates, it is important to use the same mutation rate. Shimizu & Tsuchimatsu (2015; Ann Rev Eco Evo Syst) in detail discussed this point and showed that Bechsgaard's estimate becomes <1.48 myrs when Ossowski's mutation rate was used (see Figure 4). Then it happens to overlap with the estimate of this study.

      Thank you very much for identifying this important problem. It is indeed critical to re-scale Bechsgaard’s age of the transition using the same mutation rate as used in our analysis (Ossowski et al 2010). We now use the rescaled estimate published in your review (Shimizu and Tsuchimatsu 2015, figure 4). We note that Bechsgaard et al did not publish a measure of uncertainty around their estimate of the transition; making it difficult to compare it with our posterior distributions. However, Bechsgaard’s estimate is not contained within the credibility intervals of our posteriors for t_sigma and therefore we consider both results significantly different. We have modified the text accordingly, at page 4 l. 8-10; and p.12 l. 27 to p.13 l. 5

      I am also concerned about the genomic regions of Arabidopsis thaliana used for this study. Authors chose specific five regions based on homogeneity of recombination rates and diversity, but how does the estimated change when randomly chosen genomic regions are used? If it is important to choose "preferable" regions according to the homogeneity of recombination rates and diversity, it may be useful to describe how these regions should be chosen for future applications of this method to other organisms.

      The genomic intervals used for the application to A. thaliana are indeed not random. They were defined such as to avoid, on each chromosome, the increased diversity observed at and surrounding pericentromeric regions. This effect has already been described by Clarck et al (2007, Science) but however, no explanation for this pattern has been published yet. We have updated the text, including a recommendation for future application to other species, at lines p. 13 l. 8-15 and p. 18 l.25-30, and Figure S15. We have also replicated our analysis of the A. thaliana data using a different set of genomic intervals located outside pericentromeric regions (Figure S15 and S16)

      Reviewer #2 (Public Review):

      This submission seeks to detect changes in the rate of selfing through pairwise comparison of haplotypes sampled from a population. It begins, as did a previous paper by a subset of the authors (Sellinger et al. 2020), with the well-known theoretical finding that partial selfing increases the rate of coalescence and decreases the rate of crossing-over events in genealogical histories.

      I am supportive of pitching this contribution as primarily theoretical, with the very short discussion of the Arabidopsis data provided as a worked example. This perspective increases my enthusiasm, compared to an initial reading. My comments are intended to encourage development.

      Some thematic characteristics reduce the impact of the submission. Among these are:

      (1) a rather less than a scholarly perspective on previous literature;

      (2) tendency to avoid theoretical development in favor of computation;

      (3) little interpretation of results of their only analysis of real data.

      We have now revised the manuscript along the lines suggested by reviewer 2. We provide more references when needed, have emphasized in the abstract and in the theoretical part of the manuscript that it is primarily a new theoretical/methodological development with an application to A. thaliana data, and have improved the interpretation of the A. thaliana data (see reply to reviewer 1).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of this study sought to test whether the optogenetic induction of context-related freezing behavior could be enhanced by synchronizing light pulses to the ongoing hippocampal theta rhythm. Theta is a hippocampus-wide oscillation that strongly modulates almost every cell in this structure, which suggests that causal interventions locked to theta could have a more pronounced impact than open-loop ones. Indeed, the authors found that activating engram-associated dentate gyrus (DG) neurons at the trough of theta resulted in an increase in freezing relative to baseline when averaging across all stimulation epochs. In contrast, open-loop stimulation and peak-locked stimulation had weaker effects. Analysis of local field potentials showed that only the theta-locked stimulation facilitated coupling between theta and mid-gamma, indicating that this manipulation likely enhances the flow of activity from DG to CA1 via CA3 (as opposed to promoting transmission from entorhinal cortex to CA1). Previous results from mice, rats, and humans support the hypothesis that memory encoding and recall occur at distinct phases of theta. This work further strengthens the case for phase-specific segregation of memory-related functions and opens up a path toward more precise clinical interventions that take advantage of intrinsic theta rhythm.

      Strengths:

      This study recognizes that, when artificially reactivating a context-specific memory, the brain's internal context matters. In contrast to previous attempts at optogenetically inducing recall, this work adds an additional layer of precision by synchronizing the light stimulus to the ongoing theta rhythm. This approach is more challenging, because, in addition to viral expression and bilateral optical fibers, it also requires a recording electrode and real-time signal processing. The results indicate that this additional effort is worth it, as it results in a more effective intervention.

      The findings on theta-gamma cross-frequency coupling suggest a possible mechanism underlying the observed behavioral effects: trough stimulation enhances DG to CA1 interactions via CA3. LFP recordings showed that stimulation increases the coupling between theta and mid-gamma (though not in all mice), and the percentage of freezing during reactivation is correlated with the gamma modulation index.

      Weaknesses:

      Given the precision of the intervention being performed, one might expect to see a stronger behavioral impact. Instead, the overall effect is subtle, and quite variable across mice. Looking at individual data points, the biggest overall increase in freezing actually occurred in 2 mice during the 6 Hz stimulation condition. Furthermore, trough stimulation decreased freezing in 3 mice. This is not a weakness in itself; rather, the weakness lies in the lack of an attempt to make sense of this variability. There are a number of factors that could explain these differences, such as viral expression levels, electrode/fiber placement, and behavior during baseline. There is of course a risk of over-interpreting results from a few mice, but there is also a chance that the results will appear more consistent after accounting for these additional sources of variation.

      Although two mice that had negative light induced freezing for trough stimulation, the other 15 mice showed a positive result. Stringent inclusion criteria were used to ensure that mice had adequate viral expression levels and behavior during baseline. Mice without at least 5% light induced freezing in at least two of the four epochs were not included in the study. The negative behavior from some mice is further explained through the correlation between MI and light induced freezing (Figure 5D). 6 Hz showed mixed behavioral results across the different behavioral measures quantified. Additionally, 6 Hz did not show the physiological hallmarks of memory reactivation through the theta-gamma modulation index so having an increased number of negative light induced freezing samples is expected.

      Finally, the elevated baseline freezing rate relative to previous literature could have masked some of the behavioral effect.

      In the revised manuscript, we discuss the effects of exclusion criteria more clearly.

      While trough-locked optogenetic stimulation significantly increases freezing, the effects are much weaker than placing the mouse in the actual fear-conditioned context (average time freezing of 15% vs. 50%). The discussion would benefit from additional treatment of ways to further increase the specificity and effectiveness of artificial memory reactivation.

      We have content on future directions for artificial memory reactivation to further approach the behavioral response of natural recall. We believe that incorporating time varying stimulation to different cells or parts of the hippocampus could improve the induced recall value as all current methods stimulate the entire sub-region simultaneously.

      Using an open-source platform (RTXI) for real-time signal processing is commendable; however, more work could be done to make it easier to adopt these methods and make them compatible with other tools. The RTXI plugin used for closed-loop stimulation should be fully documented and publicly available, to allow others to replicate these results.

      The RTXI plugin can be found here: (https://github.com/ndlBU/phase_specific_stim). The URL has been added in the description of Figure 1.

    1. Author Response

      Reviewer #1 (Public Review):

      The screening effort has revealed a number of interesting and novel suggestions of new modulators of nuclear appearance that are exciting and have the potential to be of value to the field.

      We appreciate the reviewer’s view that identification of new modulators of nuclear morphology is exciting and of value to the field.

      Major Points:

      1) The discussion of the screen hits and prior knowledge key to their interpretation is lacking. For example, the authors only report on the purported localization of the hits without an unbiased analysis of their function(s). As a sole example, multiple members of the condensin complex are hits in Fig.1 while multiple members of the cohesin complex are hits in Fig. 2 - but there are many more factors worthy of further discussion. Moreover, the authors need to provide more information on the data used to assign the localization of the hits and how rigorous these assignments may be. For example, multiple CHMP proteins (ESCRTs) are listed - indeed CHMP4B is the highest scoring hit in Fig.1 - but this protein does not reside at the nuclear envelope at steady-state; rather, it is specifically recruited at mitotic exit to drive nuclear envelope sealing. Moreover, there are many hits for which there is prior published evidence of a connection to nuclear shape or size that are ignored: examples include BANF1, CHMP7, Nup155 (and likely far more that I am not aware of). This is a missed opportunity to put the findings into context and to provide a more mechanistic interpretation of the type of effects that lead to the observed changes in nuclear appearance. For example, there is already hints as to whether the effects occur as a mitotic exit defect versus an interphase defect, but conceptually this is not addressed.

      We appreciate this important point. We find that one of the major challenges in presentation of screening results is to provide detailed information on all interesting hits within the length limits of a manuscript! To provide a more comprehensive picture, we have now performed pathway analysis using STRING to display protein interaction networks to more comprehensively classify hits and groups of hits (Figures S7 and S8). We find highly connected regions in the network corresponding to condensin and histone modifiers in fibroblast hits altering nuclear shape. In contrast, MCF10AT hits showed increased connectivity with nucleoporin proteins. Fibroblast hits displaying an increase in nuclear size identified multiple nucleoporins and MCF10AT hit analysis identified components of DNA replication. We have added these findings to Supplementary Figures 7 and 8 and discuss them on page 16. Also, as requested, we added more than 20 new references and additional information on previously identified functions of some hits discussed in the text on p. 22-24.

      2) Validation of the screen is lacking. There appears to be no evidence that the authors validated the initial screen hits by addition siRNA experiments in which the levels of the knock-down could be assessed. As an example: do nucleoporin hits decrease in their abundance at the nuclear envelope in these conditions? This validation is absolutely essential.

      As requested, we now include in Tables S6A-C, data from independent validation experiments in which we selected the primary hits and validated them using an independent set of siRNAs with distinct chemistry and target sequences. Additionally, we demonstrate efficient knockdown capabilities for 8 targets in Supplementary Figure 9 with knockdown levels for most siRNAs of at least 60%. We find no strong relationship between knockdown efficiency and the extent of the observed phenotype (compare Figure S9 and Figure S10).

      3) Differences in cell type - the authors' interpretation that a lack of overlap in the hits across cell types reveals that there are fundamentally cell type-specific mechanisms at play is a stretch. This could also reflect a lack of robustness in the screen, which should be addressed by directly testing the knock-down of the hits from one cell line in the other. Even if this approach reinforces the cell type specificity, the differences in the biology beyond the nucleus itself - an obvious example being the mechanical state of the cell - organization of the cytoskeleton, adhesions, etc that influence forces exerted on the nucleus are different rather than the nuclear response is different. These caveats needs to be explicitly acknowledged.

      As requested, we have now performed side by side experiments between both cell lines to directly compare a subset of nuclear morphology hits in parallel. They are shown in Supplementary Figure 10. We find a number of hits display strong nuclear shape abnormalities in either fibroblasts or MCF10AT cells but not both, with the exception of LMNA, which confirms our screen data. In addition, we compared the hits from our screen with previously published reports of other factors which regulate nuclear morphology to further strengthen our findings. We mention these results on p. 16. Despite these results, we have now toned down our statements regarding cell-type specificity of individual hits considering the small number of cell lines analyzed and the possible cellular factors which could contribute to cell-type specific differences.

      4) There are major issues with the interpretation of the presented biochemistry. For example, the basis for the supposed effect of monomer/dimer state of lamin is confusing and likely misinterpreted. It is well established that GST imposes dimerization on proteins expressed as GST fusions independent of cysteines. Any effect of DDT would have to manifest through some other mechanism (disulfides between the lamin domains - assumedly what the authors are thinking). Further, GST will impose dimerization of lamin A and lamin C in the co-incubation experiments. It is therefore entirely expected that if lamin A binds H3 and lamin C does not that the mixed dimers will bind H3 with lower affinity. Critically, this does not, however, address how full-length lamin C influences binding of lamin A to H3 in vivo. Last, how an effect of lamin C on lamin A would manifest through a disulfide bond in the nucleus, which has a reducing environment, is entirely unclear.

      We directly tested the possibility that GST causes artifactual dimerization of lamins by mutating cysteines to alanine in GST-lamin and assessing their effect on histone binding experiments. We show the results in Supplementary Figure 14E. If the observed binding were artifactually due to GST-mediated dimerization, we should not expect an effect of the cystine mutants on histone binding. We find, however, that the C522A mutation in lamin A results in increased binding of H3 in the presence of lamin C, demonstrating that the observed effects are not due to GST dimerization. We discuss these results on p. 18 and p. 19.

      We agree with the referee that it will be exceptionally challenging to determine the in-vivo relevance of disulfide bonds, not knowing what the precise environment of the nucleus is. Given these caveats, we have now toned down this point and discuss the limitations of these findings in more detail on p. 19, 23, 24, and 25.

      5) It is important for the authors to address the concept of nuclear size changes versus changes in the nuclear to cell volume ratio – biologically these could be quite different conditions, but obviously these cannot be distinguished by measuring nuclear volume alone. Addressing this experimentally would be best (to provide more depth to the size measurements).

      This is an important point. As requested, we now clearly indicate on p. 23 that we are measuring nuclear area using nuclear cross-sections as a proxy for nuclear size rather than nuclear to cell volume ratio. We have found in our imaging studies over the past two decades that measuring cell volumes is exquisitely challenging and often highly inaccurate. A major challenge in these approaches is the correct identification of cell boundaries and this is particularly challenging in a high-throughput setting since cell volume measurements require z-stacks that greatly complicates the imaging and quantitative analysis and increases the complexity of this kind of analysis of the millions of cells analyzed in a screen. Ultimately, measurements of cell volume for adherent cells will only be estimates (see for example PMID 28622449). We now clearly indicate this limitation of our approach and discuss on p. 15 and 23 previous studies measuring nuclear size and cell volume ratio measurements and how it compares to measuring nuclear area alone. We have also added several references on this topic on p. 15 and 23.

      6) There are important caveats to the approach of using the nuclear area as proxy measurement for nuclear size, most prominently that it is highly responsive to changes in nuclear height that can occur for a multitude of reasons (increased height = small radius and decreased height = larger radius), particularly given the different cell types. This needs to be acknowledged directly.

      Along the lines of point 5 and as requested, we now more clearly acknowledge on p. 23 these caveats due to our screening method of measuring nuclear area as a proxy for nuclear size. Nuclear cross-sectional area has been experimentally shown to be a good proxy for nuclear size in many systems (see PMID 31085625). For this reason, and because quantifying nuclear size from z-stacks would have greatly complicated the imaging and quantitative analysis, we chose to use nuclear cross-sectional area as our metric for nuclear size. In looking through our data, we did not find any significant differences in nuclear height between the two cell lines used or amongst hits and non-hits. With respect to the issue of different cell types, our analysis focused on RNAi knockdowns that altered nuclear morphology in a given cell line and we did not compare cell lines against each other. Separate analyses were performed for each cell line, so possible differences in nuclear height between the different cell lines used should not affect our analysis. We now discuss these issues on p. 23.

      7) What is the evidence that the H3 effects manifest through lamins rather than directly?

      We apologize for not being clear. We did not mean to intend to state that H3 acts via lamins. We do find that H3 physically interacts with lamins and that H3.3 mutants (K9M, K27M, and K36M) result in nuclear morphology defects. We now also show in the new Figure S17 that H3.3 mutants slightly affect lamin levels. However, as pointed out by the reviewer, these observations do not categorically rule out non-lamin related mechanisms and we now make it clear in our discussion on p. 20 that the effect of H3 may either be mediated via lamins or independently.

      8) Context is needed for the "methyl-methyl" histone states described as being the highest binders in the peptide array experiments. Are these states commonly found? Where in the genome? Does this match any DamID data? Again - more depth of investigation is required.

      This is a good point. Unfortunately, to our knowledge there is currently no ChIP-seq human genome map of di-methyl modifications on histone tails available. We were unable to generate or procure the individual dually methylated peptides and methyl-methyl H3 antibodies are not available and we are thus not able to perform quantitative binding assays. However, to begin to address this issue, we now provide in a new Supplementary Table 8 quantitative data of binding intensities. Given these limitations, we have now toned the claims regarding the methyl-binding sites.

      9) That oncohistones induce changes in nuclear shape or size does not mean that this is related to the mechanism in cancer. Also - how over-expression of H3 without its obligate partner H4 could disrupt the cell or an assessment of the extent of the oncohistone incorporation into chromatin achieved in these experiments makes it challenging to interpret.

      We agree and did not intend to imply that the oncogenic function of the histone mutants involves changes in nuclear morphology. We now clearly state so on p. 25 and we also mention the caveat of the overexpression experiment.

      10) Throughout the manuscript it would be helpful to the reader if the author would provide at minimum a brief statement on the previously identified functions of the hits that are explicitly discussed beyond their localization (membrane versus chromatin). References would also be helpful (for example, again - what is the evidence that SLC27A3 resides at the nuclear envelope?).

      As requested, we added more than 20 new references and now provide additional information and previously identified functions of many of the hits mentioned in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Huang et al. examines the potential "self-policing" of Bacillus cells within a biofilm. The authors first discover the co-regulation of lethal extracellular toxins (BAs) and the self-immunity mechanisms; the global regulator Spo0A controls both. The authors further show that a subpopulation of cells co-express these genes and speculate that these cells engage in preferential cooperation for biofilm formation (over cells that produce neither). Based on previous literature, the authors then evaluate the relative fitness of the wild-type strain compared to mutants locked into either constantly exporting the toxins or permanently immune to these poisons. The wild-type exhibited increased fitness (compared to the mutants) for the tested biofilm conditions. The manuscript raises interesting ideas and provides a potential model to probe questions of cooperatively in Bacillus biofilms.

      Strengths:

      • The authors use fluorescence-producing reporter strains to discern the spatial expression patterns within biofilms. This real-time imaging provides striking confirmation of their conclusions about shared co-regulation.

      • The authors also nicely deploy genetic constructs in microbiological assays to show how toxin production and immunity can influence biofilm phenotypes, including resilience to stress.

      Thank you very much for your positive comments. The detailed response to your comments and suggestions are as follows.

      Concerns:

      • My biggest concern is that the claim of policing on a single-cell level needs more quantitive microscopy, particularly of the xylose-induced strain. The data support a more tempered consideration of self-policing via BAs and self-resistance in this Bacillus species. It seems sufficient that this manuscript opens the door for a novel and readily examinable system for examining potential cooperation and its molecular controls (without making broader claims).

      Thank you very much for your comments. We demonstrated the policing system on a single-cell level by re-filming the progress of individual nonproducers from alive to death and even disappearance in a biofilm population (please see the pictures in Figure 2 and the statistical data in Figure 2-figure supplement 1 of the revised manuscript, as well as revised Figure 2-video 1-4). Alternatively, the xylose-induced strain (SQR9-Pxyl-accDA) was constructed to assess the involvement of AccDA expression (controlled by Spo0A~P in wild-type while induced by exogenous xylose here) in regulating BAs synthesis and immunity. The expression of AccDA is likely to be homogeneous in the colony with xylose addition, instead of a heterogeneous expression in the wild-type population.

      • The discussion is more speculative than the presented data warrants. For example, the speculation in lines 289 - 310 is not anchored in the results. It is hard for this reviewer to imagine how one would use the genetic framework and tools developed in this manuscript to address the ideas proposed in lines 289 - 310.

      Thank you for your comments. We have revised the discussion to ensure it is more related to data warrants than speculation. As a complement to the molecular mechanism of the policing system in the discussion, the hypothesis of the evolution of this system (Lines 289-310 in the original version) was included to give a possibility that how it raised, which is based on a couple of ecological theories with regards to division of labour and kin selection4-6; we have shortened this discussion in the revised manuscript.

      • Some conclusions (in the results section) are more decisive than the data supports. For example, the microscopy of the PI staining (as presented in Figure 2 and the supplemental movies) does not prove that only non-expressing cells die. Yet the conclusion in line 143 states that "ECM and BAs producers selectively punish the nonproducing siblings." Also, the presented data shows many non-labeled cells without PI; why do some nearby non-gfp-expressing cells remain alive?

      Thank you for your constructive comments. According to the reviewer's suggestion, an observation covering more complete biofilm forming process, as well as a more convinced data statistics, should be performed. We then re-conducted microscope observation lasting for 3 h during biofilm formation, and assess the source and location of dead cells for statistical analysis. The results showed that all dead cells were originated from the subpopulation that didn't express the gfp (the nonproducers), and the number of dead cells adjacent to the producers was significantly higher than that closed to the non-producers (please see the pictures in revised Figure 2 and Figure 2-figure supplement 1).

      In addition, regarding the survival of some non-gfp-expressing cells near the producers, based on several relevant literatures1-3 and the observation in the present study, we speculate that the coordination system for optimizing the division of labor is relatively temperate, thus only a part of the nonproducers (relative sensitive cells or facing higher concentrations of the toxin) are eliminated. We think this scene is a balance between restraining the cheater-like subpopulation and retaining the advantages of cell differentiation.

    1. Author Response

      Reviewer #1 (Public Review):

      The work in this study builds on previous studies by some of the same authors and aims to test whether the heartbeat evoked response was modulated by the local/global auditory regularities and whether this differed in post-comatose patients with different contagiousness diagnosis. The authors report that during the global effect there were differences between the MCS and UWS patients.

      The study is well constructed and analysed and has data from 148 participants (although the maximum in anyone group was 59). The reporting of the results is excellent and the conclusions are supported by the results presented. This study and the results presented are discussed as evidence that EEG based techniques maybe a low cost diagnostic tool for consciousness in post-comatose patients, although it should be stressed that here no classification of diagnostics was performed on the EEG data.

      One potential weakness was the relationship between the design of the experiment and the analysis pathway for the results. If I have understood correctly the experimental design the auditory regularity changed on whether the local/global regularity was standard/deviant. In the analysis the differences between all conditions in which the local or global regularity were compared between the standard and deviant trials. This difference was then compared between MCS and UWS patient groups. For these analyses the results for the health and emerging MCS were not included. If this is correct it would be interesting to understand the motivation for this. Relatedly, it would be good to clarify if the effects reported were corrected for the multiple planned contrasts and if not why they should not be corrected.

      Thanks for the appreciation and constructive comments to our work. The misdiagnosis of MCS/UWS patients in the clinical practice occurs because of misdetection of covert consciousness given the absence of overt behavioral signs of consciousness. Therefore, the main motivation of our study is to contribute to a better distinction between those two patients’ groups.

      We have modified the introduction to clarify that the objective of the paper is to show in major detail the group differences between MCS and UWS patients:

      "In this study, we analyze HERs following the presentation of auditory irregularities, with special regard on distinguishing UWS (n=40) and MCS (n=46) patients. Note that the automated classification of this cohort was previously performed in another study (Raimondo et al., 2017). Therefore, our aim is to characterize the group-wise differences between UWS and MCS patients that may allow a multi-dimensional cognitive evaluation to infer the presence of consciousness (Sergent et al., 2017), but also complement the bedside diagnosis performed with neuroimaging methods that capture neural correlates of covert consciousness (Sanz et al., 2021)."

      Reviewer #2 (Public Review):

      The goal of this study was to determine whether heartbeat-evoked responses measured at the scalp level with EEG, which followed regularity violations, could potential help inform the diagnosis of patients with altered states of consciousness.

      The authors use high density EEG and an oddball paradigm that probes violations of both local and global regularities. Four groups were considered including unresponsive wakefulness syndrome patents, minimally consciousness patients, emerging minimally consciousness patients and healthy controls. A difference was found between unresponsive and minimally conscious patients in the amplitude of the heartbeat evoked responses measure with EEG following a sound that violated a global regularity. Similarly, differences were found between the variance of these responses between the two above mentioned groups (N=58 and N=59), but no differences were found in relation to the healthy control group, which appear to be "in between" the two other groups (at least for global effect of HER). I thought this was a little counterintuitive and raises some questions about what this neural signature can tell us about the state of consciousness. Having said that, the healthy control sample was very small, more than 5 times smaller (only N=11).

      Thanks to the reviewer for their comments. As described above, distinguishing between MCS/UWS patients is one of the main challenges in the clinical practice. We have modified the manuscript to show the differences between these two patients’ groups. Further data on EMCS and healthy participants is not included in this revision because of the new inclusion criteria.

      In general, I thought the Discussion section was a little light on the implications of the findings, what they tell us about the brain mechanisms of consciousness and their different levels/states. A question is raised about whether it is necessary to lock EEG to heartbeats to find differences between patients. The data appeared to say that this is not the case but the discussion does not appear to reflect that very clearly.

      We have enriched the discussion to comment on the relation of HERs in perception:

      "Our results contribute to the extensive experimental evidence showing that brain-heart interactions, as measured with HERs, are related to perceptual awareness (Azzalini et al., 2019; Skora et al., 2022). For instance, neural responses to heartbeats correlate with perception in a visual detection task (Park et al., 2014). Further evidence exists on somatosensory perception, where a higher detection of somatosensory stimuli occurs when the cardiac cycle is in diastole and it is reflected in HERs (Al et al., 2020). Evidence on heart transplanted patients shows that the ability of heartbeats sensation is reduced after surgery and recovered after one year, with the evolution of the heartbeats sensation recovery reflected in the neural responses to heartbeats as well (Salamone et al., 2020). The responses to heartbeats also covary with self-perception: bodily-self-identification of the full body (Park et al., 2016), and face (Sel et al., 2017), and the self-relatedness of spontaneous thoughts (Babo-Rebelo et al., 2016) and imagination (Babo-Rebelo et al., 2019). Moreover, brain-heart interactions measured from heart rate variability correlate with conscious auditory perception as well (Banellis and Cruse, 2020; Pérez et al., 2021; Pfeiffer and Lucia, 2017)."

      Reviewer #3 (Public Review):

      I found the results very interesting but wondered why the ERP results for the global vs. local effects are not reported. This analysis is mentioned in the methods section, but I do not find it in the results. Is this what is shown in the mid row in panel D? If yes, it should be made clearer. Is there a significant local and global deviant response in each patient group?

      We thank the reviewer for their appreciation of our work and their comments.

      We have reported the new results showing clustered effects in both ERPs and HERs.

      Additionally, eyeballing Figure 1, there are a few potential issues that may be affecting the conclusion re HER:

      (1) Panel D top: it seems that the orange trace (MCS) is largely the same in both the "Local" and "global" condition. But the blue trace (UWS) shows a larger negative going deflection in the "global" case. Put differently, the UWS, but not MCS patients appear to generate a different response to the Global effect relative to the local effect. Is this the case?

      We have separated the Figure 1 into 3 new figures to clarify on the results. And we also provide a more detailed description of our results.

      In brief, our results show that MCS may have a distinctive response to global and local effects. We have included new correlation analysis in which we show that the responses to global and local effects are uncorrelated (Table 2):

      With respect to the “negative” responses in UWS. Note that the measured effect correspond to a linear combination of evoked potentials, e.g.: global effect = mean(global deviants) – mean(global standard). Therefore, the negative group-wise response may imply that global standard responses are larger than global deviants. We have included in Table 1 the statistical tests to show whether the responses to local and global effects are different from zero:

      (2) There are some MCS subjects that appear to show a global effect that is larger than that observed in EMCS and healthy controls. How do you interpret these data?

      We have included in the discussion a paragraph in which we discuss on the outliers:

      "Note that outliers are expected in disorders of consciousness and exact physiological characterization of the different levels of consciousness remains challenging. First, the standard assessment of consciousness based on behavioral measures has shown a high rate of misdiagnosis in MCS and UWS (Stender et al., 2014). The cause of the misdiagnosis of consciousness arises because consciousness does not necessarily translate into overt behavior (Hermann et al., 2021). Unresponsive and minimally conscious patients, namely non-behavioral MCS (Thibaut et al., 2021), represent the main diagnostic challenge in clinical practice. Second, some of these patients suffer from conditions that may translate to no response to stimuli, even in presence of consciousness. For instance, when they suffer from constant pain, fluctuations in arousal levels, or sensory impairments caused by brain damage (Chennu et al., 2013). Third, these patients were recorded in clinical setups, which may lead to a lower signal-to-noise ratio, and lead to biased measurements in evoked potentials (Clayson et al., 2013)."

      (3) How do you interpret the negative average HER data shown by many UWS patients?

      As mentioned above, the negative HER is a result of a linear combination of different HER-based markers (deviants minus standard).

    1. Author Response

      Reviewer #1 (Public Review):

      Kazrin appears to be implicated in many diverse cellular functions, and accordingly, localizes to many subcellular sites. Exactly what it does is unclear. The authors perform a fairly detailed analysis of Kazrin in-cell function, and find that it is important for the perinuclear localization of TfN, and that it binds to members of the AP-1 complex (e.g., gamma-adaptin). The authors note that the C-terminus of Kazrin (which is predicted to be intrinsically disordered) forms punctate structures in the cytoplasm that colocalize with components of the endosomal machinery. Finally, the authors employ co-immunoprecipitation assays to show that both N and C-termini of Kazrin interacts with dynactin, and the dynein light-intermediate chain.

      Much of the data presented in the manuscript are of fairly high quality and describe a potentially novel function for Kazrin C. However, I had a few issues with some of the language used throughout, the manner of data presentation, and some of their interpretations. Most notably, I think in its current form, the manuscript does not strongly support the authors' main conclusion: that Kazrin is a dynein-dynactin adaptor, as stated in their title. Without more direct support for this function, the authors need to soften their language. Specific points are listed below.

      Major comments:

      1) I agree with the authors that the data provided in the manuscript suggest that Kazrin may indeed be an endosomal adaptor for dynein-dynactin. However, without more direct evidence to support this notion, the authors need to soften their language stating as much. For example, the title as stated would need to be changed, as would much of the language in the first paragraph of the discussion. Alternatively, the manuscript could be significantly strengthened if the authors performed a more direct assay to test this idea. For example, the authors could use methods employed previously (e.g., McKenney et al., Science 2014) to this end. In brief, the authors can simply use their recombinant Kazrin C (with a GFP) to pull out dynein-dynactin from cell extracts and perform single molecule assays as previously described.

      While this is certainly an excellent suggestion, the in vitro dynein/dynactin motility assays are really not straight forward experiments for laboratories that do not use them as a routine protocol. That is why we asked Dr. Thomas Surrey (Centre for Genomic Regulation, Barcelona), an expert in the biochemistry and biophysics of microtubule dynamics, to help us with this kind of analysis. In their setting, TIRF microscopy is used to follow EGFPdynein/dynactin motility along microtubules immobilized on cover slides (Jha et al., 2017). As shown in figure R1, more binding of EGFP-dynein to the microtubules is observed when purified kazrin is added to the assay (from 20 to 400 nM), but there is no increase in the number or processivity of the EGFP-dynein motility events. These results are hard to interpret at this point. Kazrin might still be an activating adaptor but a component is missing in the assay (i. e. an activating posttranslational modification or a particular subunit of the dynein or dynactin complexes), or it could increase the processivity of dyneindynactin in complex with another bona fide activating adaptor, as it has been demonstrated for LIS1 (Baumbach et al., 2017; Gutierrez et al., 2017). Alternatively, kazrin could transport dynactin and/or dynein to the microtubule plus ends in a kinesin 1-dependent manner, in order to load the peripheral endosomes with the minus end directed motor (Yamada et al., 2008).

      Figure R1. Kazrin C purified from E. coli increases binding of dynein to microtubules but does not increase the number or processivity of EGFP-dynein motility events. A. TIRF (Total Internal Reflexion Fluorescence) micrographs of microtubule-coated cover slides incubated in the presence of 10 nM EGFP-dynein and 20 nM dynactin in the presence or absence of 20 nM kazrin C, expressed and purified from E. coli. B. Kymographs of TIRF movies of microtubule-coated cover slides incubated in the presence of purified 10 nM EGFP-dynein, 20 nM dynactin and either 400 nM of the activating adaptor BICD2 (1:2:40 ratio) (left panel) or kazrin C (right panel). Red squares indicate processive dynein motility events induced by BICD2”.

      Investigating the molecular activity of kazrin on the dynein/dynactin motility is a whole project in itself that we feel it is out of the scope of the present manuscript. Therefore, as suggested by the BRE, we have chosen to soften the conclusions and classify kazrin as a putative “candidate” dynein/dynactin adaptor based on its interactome, domain organization and subcellular localization, as well as on the defects installed in vivo on the endosome motility upon its depletion. We also discuss other possibilities as those outlined above.

      2) I'm not sure I agree with the use of the term 'condensates' used throughout the manuscript to describe the cytoplasmic Kazrin foci. 'Condensates' is a very specific term that is used to describe membraneless organelles. Given the presumed association of Kazrin with membrane-bound compartments, I think it's more reasonable to assume these foci are quite distinct from condensates.

      We actually used condensates to avoid implying that the kazrin IDR generates membraneless compartments or induces liquid-liquid-phase separation, which is certainly not a conclusion from the manuscript. However, since all reviewers agreed that the word was misleading, we have substituted the term condensates for foci throughout the manuscript.

      3) The authors note the localization of Tfn as perinuclear. Although I agree the localization pattern in the kazKO cells is indeed distinct, it does not appear perinuclear to me. It might be useful to stain for a centrosomal marker (such as pericentrin, used in Figure 5B) to assess Tfn/EEA1 with respect to MT minus ends.

      We have now changed the term perinuclear, which implies that endosomes surround the nucleus, by the term juxtanuclear, which more accurately define what we wanted to indicate (close to). We thank the reviewer for pointing out this lack of accuracy. We also more clearly describe in the text that in fibroblast, the Golgi apparatus and the Recycling Endosomes (REs) gather around the pericentriolar region ((Granger et al., 2014) and reference therein), which is usually close to the nucleus ((Tang and Marshall, 2012) and references therein). Nevertheless, as suggested by the reviewer, we have included pictures of the TxR-Tfn and EEA1-labelled endosomes accumulating around pericentrin in wild type mouse embryonic fibroblast (MEF) (Figure 1–supplement figure 3) to illustrate these points.

      4) "Treatment with the microtubule depolymerizing drug nocodazole disrupted the perinuclear localization of GFP-kazrin C, as well as the concomitant perinuclear accumulation of EE (Fig. 5C & D), indicating that EEs and GFP-kazrin C localization at the pericentrosomal region required minus end-directed microtubule-dependent transport, mostly affected by the dynactin/dynein complex (Flores-Rodriguez et al., 2011)."

      • I don't agree that the nocodazole experiment indicates that minus end-directed motility is required for this perinuclear localization. In the absence of other experiments, it simply indicates that microtubules are required. It might, however, "suggest" the involvement of dynein. The same is true for the subsequent sentence ("Our observations indicated that kazrin C can be transported in and out of the pericentriolar region along microtubule tracks...").

      We agree with the reviewer. To reinforce the point that GFP-kazrin C localization and the pericentriolar accumularion of EEA1 rely on dynein-dependent transport, we have now added an experiment in figure 5E and F, where we use ciliobrevin to inhibit dynein in cells expressing GFP-kazrin C. In the treated cells, we see that the GFP-kazrin C staining in the pericentrin foci is lost and that EEs have a more dispersed distribution, similar to kazKO MEF. We have also completed and rearranged the in vivo fluorescence microscopy data to more clearly show that small GFP-kazrin C foci can be observed moving towards the cell centre (Figure 5-S1 and movies 6 and 7). Taken all this data together, I think we can now suggest that kazrin might travel into the pericentriolar region, possibly along microtubules and powered by dynein.

      5) Although I see a few examples of directed motion of Tfn foci in the supplemental movies, it would be more useful to see the kymographs used for quantitation (and noted by the authors on line 272). Also related to this analysis, by "centripetal trajectories", I assume the authors are referring to those moving in a retrograde manner. If so, it would be more consistent with common vernacular (and thus more clear to readers) to use 'retrograde' transport.

      We have now included some more examples of the time projections used in the analysis in figure 6-S1 and 2, where we have coloured in blue the fairly straight, longer trajectories, as opposed to the more confined movements that appeared as round dots in the time projections (coloured in red). We have also added more videos illustrating the differences observed in cells expressing endogenous or GFP-kazrin C versus kazKO cells or kazKO cells expressing GFP or GFP-kazrin C-Nt. Movies 8 and 11 show the endosome motility in representative WT and kazKO cells (movie 8) and kazKO cells expressing GFP, GFPkazrin C or GFP-kazrin C Nt (movie 11). Movies 9 and 10 show endosome motility in four magnified fields of different WT and kazKO cells, where longer and faster motility events can be observed when endogenous kazrin is expressed. Movies 12 to 14 show endosome motility in four magnified fields of different kazKO cells expressing, GFP-kazrin C (movie 12), GFP (movie 13) and GFP-kazrin C-Nt (movie 14). Longer and faster movements can be observed in the different insets of movie 12, as compared with movies 13 and 14. Finally, as suggested by the reviewer, we have re-worded centripetal movement to retrograde movement throughout the manuscript.

      6) The error bars on most of the plots appear to be extremely small, especially in light of the accompanying data used for quantitation. The authors state that they used SEM instead of SD, but their reasoning is not stated. All the former does is lead to an artificial reduction in the real deviation (by dividing SD by the square root of whatever they define as 'n', which isn't clear to me) of the data which I find to be misleading and very nonrepresentative of biological data. For example, the error bars for cell migration speed in Figure 2B suggest that the speeds for WT cells ranged from ~1.7-1.9 µm/sec, which I'm assuming is largely underrepresenting the range of values. Although I'm not a statistician, as someone that studies biochemical and biological processes, I strongly urge the authors to use plots and error bars that more accurately describe the data to your readers (e.g., scatter plots with standard deviation are the most transparent way to display data).

      We have now changed all plots to scattered plots with standard deviations, as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      Nicotine preference is highly variable between individuals. The paper by Mondoloni et al. provided some insight into the potential link between IPN nAchR heterogeneity with male nicotine preference behavior. They scored mice using the amount of nicotine consumption, as well as the rats' preference of the drug using a two-bottle choice experiment. An interesting heterogeneity in nicotine-drinking profiles was observed in adult male mice, with about half of the mice ceasing nicotine consumption at high concentrations. They observed a negative association of nicotine intake with nicotine-evoked currents in the antiparticle nucleus (IPN). They also identified beta4-containing nicotine acetylcholine receptors, which exhibit an association with nicotine aversion. The behavioral differentiation of av vs. n-avs and identification of IPN variability, both in behavioral and electrophysiological aspects, add an important candidate for analyzing individual behavior in addiction.

      The native existence of beta4-nAchR heterogeneity is an important premise that supports the molecules to be the candidate substrate of variabilities. However, only knockout and re-expression models were used, which is insufficient to mimic the physiological state that leads to variability in nicotine preference.

      We’d like to thank reviewer 1 for his/her positive remarks and for suggesting important control experiments. Regarding the reviewer’s latest comment on the link between b4 and variability, we would like to point out that the experiment in which mice were put under chronic nicotine can be seen as another way to manipulate the physiological state of the animal. Indeed, we found that chronic nicotine downregulates b4 nAChR expression levels (but has no effect on residual nAChR currents in b4-/- mice) and reduces nicotine aversion. Therefore, these results also point toward a role of IPN b4 nAChRs in nicotine aversion. We have now performed additional experiments and analyses to address these concerns and to reinforce our demonstration.

      Reviewer #2 (Public Review):

      In the current study, Mondoloni and colleagues investigate the neural correlates contributing to nicotine aversion and its alteration following chronic nicotine exposure. The question asked is important to the field of individual vulnerability to drug addiction and has translational significance. First, the authors identify individual nicotine consumption profiles across isogenic mice. Further, they employed in vivo and ex vivo physiological approaches to defining how antiparticle nuclei (IPn) neuronal response to nicotine is associated with nicotine avoidance. Additionally, the authors determine that chronic nicotine exposure impairs IPn neuronal normal response to nicotine, thus contributing to higher amounts of nicotine consumption. Finally, they used transgenic and viralmediated gene expression approaches to establish a causal link between b4 nicotine receptor function and nicotine avoidance processes.

      The manuscript and experimental strategy are well designed and executed; the current dataset requires supplemental analyses and details to exclude possible alternatives. Overall, the results are exciting and provide helpful information to the field of drug addiction research, individual vulnerability to drug addiction, and neuronal physiology. Below are some comments aiming to help the authors improve this interesting study.

      We would like to thank the reviewer for his/her positive remarks and we hope the new version of the manuscript will clarify his/her concerns.

      1) The authors used a two-bottle choice behavioral paradigm to investigate the neurophysiological substrate contributing to nicotine avoidance behaviors. While the data set supporting the author's interpretation is compelling and the experiments are well-conducted, a few supplemental control analyses will strengthen the current manuscript.

      a) The bitter taste of nicotine might generate confounds in the data interpretation: are the mice avoiding the bitterness or the nicotine-induced physiological effect? To address this question, the authors mixed nicotine with saccharine, thus covering the bitterness of nicotine. Additionally, the authors show that all the mice exposed to quinine avoid it, and in comparison, the N-Av don't avoid the bitterness of the nicotine-saccharine solution. Yet it is unclear if Av and N-Av have different taste discrimination capacities and if such taste discrimination capacities drive the N-Av to consume less nicotine. Would Av and N-Av mice avoid quinine differently after the 20-day nicotine paradigm? Would the authors observe individual nicotine drinking behaviors if nicotine/quinine vs. quinine were offered to the mice?

      As requested by all three reviewers, we have now performed a two-bottle choice experiment to verify whether different sensitivities to the bitterness of the nicotine solution could explain the different sensitivities to the aversive properties of nicotine. Indeed, even though we used saccharine to mask the bitterness of the nicotine solution, we cannot fully exclude the possibility that the taste capacity of the mice could affect their nicotine consumption. Reviewers 1 and 2 suggested to perform nicotine/quinine versus quinine preference tests, but we were afraid that forcing mice to drink an aversive, quinine-containing solution might affect the total volume of liquid consumed per day, and also might create a “generalized conditioned aversion to drinking water - detrimental to overall health and a confounding factor” as pointed out by reviewer 3. Therefore, we designed the experiment a little differently.

      In this two-bottle choice experiment, mice were first proposed a high concentration of nicotine (100 µg/ml) which has previously been shown to induce avoidance behavior in mice (Figure 3C). Then, mice were offered three increasing concentrations of quinine: 30, 100 and 300 µM. Quinine avoidance was dose dependent, as expected: it was moderate for 30 µM but almost absolute for 300 µM quinine. We then investigated whether nicotine and quinine avoidances were linked. We found no correlation between nicotine and quinine preference (new Figure: Figure 1- supplementary figure 1D). This new experiment strongly suggests that aversion to the drug is not directly tied to the sensitivity of mice to the bitter taste of nicotine.

      Other results reinforce this conclusion. First, none of the b4-/- mice (0/13) showed aversion to nicotine, whereas about half of the virally-rescued animals (8/17, b4 re-expressed in the IPN of b4-/- mice) showed nicotine aversion, a proportion similar to the one observed in WT mice. This experiment makes a clear, direct link between the expression of b4 nAChRs in the IPN and aversion to the drug.

      Furthermore, we also verified that the sensitivity of b4-/- mice to bitterness is not different from that of WT mice (new Figure 4 – figure supplement 1B). This new result indicates that the reason why b4-/- mice consume more nicotine than WT mice is not because they have a reduced sensitivity bitterness.

      Together, these new experiments strongly suggests that interindividual differences in sensitivity to the bitterness of nicotine play little role in nicotine consumption behavior in mice.

      b) Metabolic variabilities amongst isogenic mice have been observed. Thus, while the mice consume different amounts of nicotine, changes in metabolic processes, thus blood nicotine concentrations, could explain differences in nicotine consumption and neurophysiology across individuals. The authors should control if the blood concentration of nicotine metabolites between N-Av and Av are similar when consuming identical amounts of nicotine (50ug/ml), different amounts (200ug/ml), and in response to an acute injection of a fixed nicotine quantity.

      We agree with the reviewer that metabolic variabilities could explain (at least in part) the differences observed between avoiders and non-avoiders. But other factors could also play a role, such as stress level (there is a strong interaction between stress and nicotine addiction, as shown by our group (PMID: 29155800, PMID: 30361503) and others), hierarchical ranking, epigenetic factors etc… Our goal in this study is not to examine all possible sources of variability. What is striking about our results is that deletion of a single gene (encoding the nAChR b4 subunit) is sufficient to eliminate nicotine avoidance, and that re-expression of this receptor subunit in the IPN is sufficient to restore nicotine avoidance. In addition, we observe a strong correlation between the amplitude of nicotineinduced current in the IPN, and nicotine consumption. Therefore, the expression level of b4 in the IPN is sufficient to explain most of the behavioral variability we observe. We do not feel the need to explore variations in metabolic activities, which are (by the way) very expensive experiments. However, we have added a sentence in the discussion to mention metabolic variabilities as a potential source of variability in nicotine consumption.

      2) Av mice exposed to nicotine_200ug/ml display minimal nicotine_50ug/ml consumption, yet would Av mice restore a percent nicotine consumption >20 when exposed to a more extended session at 50ug/kg? Such a data set will help identify and isolate learned avoidance processes from dose-dependent avoidance behaviors.

      We have now performed an additional two-bottle choice experiment to examine an extended time at 50 µg/ml. But we also performed the experiment a little differently. We directly proposed a high nicotine concentration to mice (200 µg/ml), followed by 8 days at 50 µg/ml. We found that, overall, mice avoided the 200 µg/ml nicotine solution, and that the following increase in nicotine preference was slow and gradual throughout the eight days at 50 µg/ml (Figure 2-figure supplement 1C). This slow adjustment to a lower-dose contrasts with the rapid (within a day) change in intake observed when nicotine concentration increases (Figure 1-figure supplement 1A). About half of the mice (6/13) retained a steady, low nicotine preference (< 20%) throughout the eight days at 50 µg/ml, resembling what was observed for avoiders in Figure 2D. Together, these results suggest that some of the mice, the non-avoiders, rapidly adjust their intake to adapt to changes in nicotine concentration in the bottle. For avoiders, aversion for nicotine seems to involve a learning mechanism that, once triggered, results in prolonged cessation of nicotine consumption.

      3) The author should further investigate the basal properties of IPn neuron in vivo firing rate activity recorded and establish if their spontaneous activity determines their nicotine responses in vivo, such as firing rate, ISI, tonic, or phasic patterns. These analyses will provide helpful information to the neurophysiologist investigating the function of IPn neurons and will also inform how chronic nicotine exposure shapes the IPn neurophysiological properties.

      We have performed additional analyses of the in vivo recordings. First, we have built maps of the recorded neurons, and we show that there is no anatomical bias in our sampling between the different groups. The only condition for which we did not sample neurons similarly is when we compare the responses to nicotine in vivo in WT and b4-/- mice (Figure 4E). The two groups were not distributed similarly along the dorso-ventral axis (Figure 4-figure supplement 2B). Yet, we do not think that the difference in nicotine responses observed between WT and b4-/- mice is due to a sampling bias. Indeed, we found no link between the response to nicotine and the dorsoventral coordinates of the neurons, in any of the groups (MPNic and MP Sal in Figure 3-figure supplement 1D; WT and b4-/- mice in Figure 4-figure supplement 2C). Therefore, our different groups are directly comparable, and the conclusions drawn in our study fully justified.

      As requested, we have looked at whether the basal firing rate of IPN neurons determines the response to nicotine and indeed, neurons with higher firing rate show greater change in firing frequency upon nicotine injection (Figure 3 -figure supplement 1G and Figure 4-figure supplement 2F). We have also looked at the effect of chronic nicotine on the spontaneous firing rate of IPN neurons (Figure 3 -figure supplement 1F) but found no evidence for a change in basal firing properties. Similarly, the deletion of b4 had no effect on the spontaneous activity of the recorded neurons (Figure 4-figure supplement 2F). Finally, we found no evidence for any link between the anatomical coordinates of the neurons and their basal firing rate (Figure 3-figure supplement 1E and Figure 4figure supplement 2D).

      Reviewer #3 (Public Review):

      The manuscript by Mondoloni et al characterizes two-bottle choice oral nicotine consumption and associated neurobiological phenotypes in the antiparticle nucleus (IPN) using mice. The paper shows that mice exhibit differential oral nicotine consumption and correlate this difference with nicotine-evoked inward currents in neurons of the IPN. The beta4 nAChR subunit is likely involved in these responses. The paper suggests that prolonged exposure to nicotine results in reduced nAChR functional responses in IPN neurons. Many of these results or phenotypes are reversed or reduced in mice that are null for the beta4 subunit. These results are interesting and will add a contribution to the literature. However, there are several major concerns with the nicotine exposure model and a few other items that should be addressed.

      Strengths:

      Technical approaches are well-done. Oral nicotine, electrophysiology, and viral re-expression methods were strong and executed well. The scholarship is strong and the paper is generally well-written. The figures are high-quality.

      We would like to thank the reviewer for his/her comments and suggestions on how to improve the manuscript.

      Weaknesses:

      Two bottle choice (2BC) model. 2BC does not examine nicotine reinforcement, which is best shown as a volitional preference for the drug over the vehicle. Mice in this 2BC assay (and all such assays) only ever show indifference to nicotine at best - not preference. This is seen in the maximal 50% preference for the nicotine-containing bottle. 2BC assays using tastants such as saccharin are confounded. Taste responses can very likely differ from primary reinforcement and can be related to peripheral biology in the mouth/tongue rather than in the brain reward pathway.

      The two-bottle nicotine drinking test is a commonly used method to study addiction in mice (Matta, S. G. et al. 2006. Guidelines on nicotine dose selection for in vivo research. Psychopharmacology 190, 269–319). Like all methods, it has its limitations, but it also allows for different aspects to be addressed than those covered by selfadministration protocols. The two-bottle nicotine drinking test simply measures the animals' preference for a solution containing nicotine over a control solution without nicotine: the animals are free to choose nicotine or not, which allows to evaluate sensitivity and avoidance thresholds. What we show in this paper is precisely that despite interindividual differences in the way the drug is used (passively or actively), a significant proportion of the animals avoids the nicotine bottle at a certain concentration, suggesting that we are dealing with individual characteristics that are interesting to identify in the context of addiction and vulnerability. We agree that the twobottle choice test cannot provide as much information about the reinforcing effects of the drug as selfadministration procedures. We are aware of the limitations of the method and were careful not to interpret our data in terms of reinforcement to the drug. For instance, mice that consume nicotine were called “non-avoiders” and not “consumers”. We added a few sentences at the beginning of the discussion to highlight these limitations.

      The reviewer states that the mice in this 2BC assay (and all such assays) “only ever show indifference to nicotine at best - not preference”. This is seen in the maximal 50% preference for the nicotine-containing bottle. While this is true on average, it isn’t when we look at individual profiles, as we did here. We clearly observed that some mice have a strong preference for nicotine and, conversely, that some mice actively avoid nicotine after a certain concentration is proposed in the bottle.

      Regarding tastants, we indeed used saccharine to hide the bitter taste of nicotine and prevent taste-related side bias. This is a classical (though not perfect) paradigm in the field of nicotine research (Matta, S. G. et al. 2006. Guidelines on nicotine dose selection for in vivo research. Psychopharmacology 190, 269–319). To evaluate whether different sensitivities to the bitterness of nicotine may explain the interindividual differences in nicotine consumption we performed new experiments (as suggested by all three reviewers). In this two-bottle choice experiment, mice were first proposed a high concentration of nicotine (100 µg/ml) which has previously been shown to induce avoidance behavior in mice (Figure 3C). Then, mice were offered three increasing concentrations of quinine: 30, 100 and 300 µM. Quinine avoidance was dose dependent, as expected: it was moderate for 30 µM but almost absolute for 300 µM quinine. We then investigated whether nicotine and quinine avoidances were linked. We found no correlation between nicotine and quinine preference (new Figure: Figure 1- supplementary figure 1D). This new experiment strongly suggests that aversion to the drug is not directly tied to the sensitivity of mice to the bitter taste of nicotine. Other results reinforce this conclusion. First, none of the b4-/- mice (0/13) showed aversion to nicotine, whereas about half of the virally-rescued animals (8/17, b4 re-expressed in the IPN of b4-/- mice) showed nicotine aversion, a proportion similar to the one observed in WT mice. This experiment makes a clear, direct link between the expression of b4 nAChRs in the IPN and aversion to the drug. Furthermore, we also verified that the sensitivity of b4-/- mice to bitterness is not different from that of WT mice (new Figure 4 - figure supplement 1B). This new result indicates that the reason why b4-/- mice consume more nicotine than WT mice is not because they have a reduced sensitivity bitterness. Together, these new experiments strongly suggests that interindividual differences in sensitivity to the bitterness of nicotine play little role in nicotine consumption behavior in mice.

      Moreover, this assay does not test free choice, as nicotine is mixed with water which the mice require to survive. Since most concentrations of nicotine are aversive, this may create a generalized conditioned aversion to drinking water - detrimental to overall health and a confounding factor.

      Mice are given a choice between two bottles, only one of which contains nicotine. Hence, even though their choices are not fully free (they are being presented with a limited set of options), mice can always decide to avoid nicotine and drink from the bottle containing water only. We do not understand how this situation may create a generalized aversion to drinking. In fact, we have never observed any mouse losing weight or with deteriorated health condition in this test, so we don’t think it is a confounding factor.

      What plasma concentrations of nicotine are achieved by 2BC? When nicotine is truly reinforcing, rodents and humans titrate their plasma concentrations up to 30-50 ng/mL. The Discussion states that oral self-administration in mice mimics administration in human smokers (lines 388-389). This is unjustified and should be removed. Similarly, the paragraph in lines 409-423 is quite speculative and difficult or impossible to test. This paragraph should be removed or substantially changed to avoid speculation. Overall, the 2BC model has substantial weaknesses, and/or it is limited in the conclusions it will support.

      The reviewer must have read another version of our article, because these sentences and paragraphs are not present in our manuscript.

      Regarding the actual concentration of nicotine in the plasma, this is indeed a good question. We have actually measured the plasma concentrations of nicotine for another study (article in preparation). The results from this experiment can be found below. The half-life of nicotine is very short in the blood and brain of mice (about 6 mins, see Matta, S. G. et al. 2006. Guidelines on nicotine dose selection for in vivo research. Psychopharmacology 190, 269–319), making it very hard to assess. Therefore, we also assessed the plasma concentration of cotinine, the main metabolite of nicotine. We compared 4 different conditions: home-cage (forced drinking of 100 ug/ml nicotine solution); osmotic minipump (OP, 10 mg/kg/d, as in our current study); Souris-city (a large social environment developed by our group, see Torquet et al. Nat. Comm. 2018); and the two-bottle choice procedure (when a solution of nicotine 100 ug/ml was proposed). The concentrations of plasma nicotine found were very low for all groups that drank nicotine, but not for the group that received nicotine through the osmotic minipump group. This is most likely because mice did not drink any nicotine in the hour prior to being sampled and all nicotine was metabolized. Indeed, when we look at the plasma concentration of cotinine, we see that cotinine was present in all of the groups. The plasma concentration of cotinine was similar in the groups for which “consumption” was forced: forced drinking in the home cage (HC) or infusion through osmotic minipump. This indicates that the plasma concentration of cotinine is similar whether mice drink nicotine (100 ug/ml) or whether nicotine is infused with the minipump (10 mg/kg/d). For Souris city and the two-bottle choice procedure, the cotinine concentrations were in the same range (mostly between 0-100 ng/ml). Globally, the concentrations of nicotine and cotinine found in the plasma of mice that underwent the two-bottle choice procedure are in the range of what has been previously described (Matta, S. G. et al. 2006. Guidelines on nicotine dose selection for in vivo research. Psychopharmacology 190, 269–319).

      Regarding the limitations of the two-bottle choice test, we discuss them more extensively in the current version of the manuscript.

      Statistical testing on subgroups. Mice are run through an assay and assigned to subgroups based on being classified as avoiders or non-avoiders. The authors then perform statistical testing to show differences between the avoiders and non-avoiders. It is circular to do so. When the authors divided the mice into avoiders and non-avoiders, this implies that the mice are different or from different distributions in terms of nicotine intake. Conducting a statistical test within the null hypothesis framework, however, implies that the null hypothesis is being tested. The null hypothesis, by definition, is that the groups do NOT differ. Obviously, the authors will find a difference between the groups in a statistical test when they pre-sorted the mice into two groups, to begin with. Comparing effect sizes or some other comparison that does not invoke the null hypothesis would be appropriate.

      Our analysis, which can be summarized as follows, is fairly standard (see Krishnan, V. et al. (2007) Molecular adaptations underlying susceptibility and resistance to social defeat in brain reward regions. Cell 131, 391–404). Firstly, the mice are segregated into two groups based on their consumption profile, using the variability in their behavior. The two groups are obviously statistically different when comparing their consumption. This first analytical step allows us to highlight the variability and to establish the properties of each sub-population in terms of consumption. Our analysis could support the reviewer's comment if it ended at this point. However, our analysis doesn't end here and moves on to the second step. The separation of the mice into two groups (which is now a categorical variable) is used to compare the distribution of other variables, such as mouse choice strategy and current amplitude, based on the 2 categories. The null hypothesis tested is that the value of these other variables is not different between groups. There is no a priori obvious reason for the currents recorded in the IPN to be different in the two groups. These approaches allow us to show correlations between the variables. Finally, in the third and last step, one (or several) variable(s) are manipulated to check whether nicotine consumption is modified accordingly. Manipulation was performed by exposing mice to chronic nicotine, by using mutant mice with decreased nicotinic currents, and by re-expressing the deleted nAChR subunit only in the IPN. This procedure is fairly standard, and cannot be considered as a circular analysis with data selection problem, as explained in (Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F. & Baker, C. I. (2009) Circular analysis in systems neuroscience: the dangers of double dipping. Nature Neuroscience 12, 535-540).

      Decreased nicotine-evoked currents following passive exposure to nicotine in minipumps are inconsistent with published results showing that similar nicotine exposure enhances nAChR function via several measures (Arvin et al, J Neurosci, 2019). The paper does acknowledge this previous paper and suggests that the discrepancy is explained by the fact that they used a higher concentration of nicotine (30 uM) that was able to recruit the beta4containing receptor (whereas Arvin et al used a caged nicotine that was unable to do so). This may be true, but the citation of 30 uM nicotine undercuts the argument a bit because 30 uM nicotine is unlikely to be achieved in the brain of a person using tobacco products; nicotine levels in smokers are 100-500 nM. It should be noted in the paper that it is unclear whether the down-regulated receptors would be active at concentrations of nicotine found in the brain of a smoker.

      We indeed find opposite results compared to Arvin et al., and we give possible explanations for this discrepancy in the discussion. To be honest we don’t fully understand why we have opposite results. However, we clearly observed a decreased response to nicotine, both in vitro (with 30 µM nicotine on brain slices) and in vivo (with a classical dose of 30 µg/kg nicotine i.v.), while Arvin et al. only tested nicotine in vitro.

      Regarding the reviewer’s comment about the nicotine concentration used (30 µM): we used that concentration in vitro to measure nicotine-induced currents (it’s a concentration close to the EC50 for heteromeric receptors, which will likely recruit low affinity a3b4 receptors) and to evaluate the changes in nAChR current following nicotine exposure. We did not use that concentration to induce nAChR desensitization, so we don’t really understand the argument regarding the levels of nicotine in smokers. For inducing desensitization, we used a minipump that delivers a daily dose of 10 mg/kg/day, which is the amount of nicotine mice drink in our assay.

      The statement in lines 440-41 ("we show that concentrations of nicotine as low as 7.5 ug/kg can engage the IPN circuitry") is misleading, as the concentration in the water is not the same as the concentration in the CSF since the latter would be expected to build up over time. The paper did not provide measurements of nicotine in plasma or CSF, so concluding that the water concentration of nicotine is related to plasma concentrations of nicotine is only speculative.

      The sentence “we show that concentrations of nicotine as low as 7.5 ug/kg can engage the IPN circuitry" is not in the manuscript so the reviewer must have read another version of the paper.

      The results in Figure 2E do not appear to be from a normal distribution. For example, results cluster at low (~100 pA) responses, and a fraction of larger responses drive the similarities or differences.

      Indeed, that is why we performed a non-parametric Mann-Whitney test for comparing the two groups, as indicated in the legend of figure 2E.

      10 mg/kg/day in mice or rats is likely a non-physiological exposure to nicotine. Most rats take in 1.0 to 1.5 mg/kg over a 23-hour self-administration period (O'Dell, 2007). Mice achieve similar levels during SA (Fowler, Neuropharmacology 2011). Forced exposure to 10 mg/kg/day is therefore 5 to 10-fold higher than rodents would ever expose themselves to if given the choice. This should be acknowledged in a limitations section of the Discussion.

      The two-bottle choice task is very different from nicotine self-administration procedures in terms of administration route: oral versus injected (in the blood or in the brain), respectively. Therefore, the quantities of drug consumed cannot be directly compared. In our manuscript, mice consume on average 10 mg/kg/day of nicotine at the highest nicotine concentration tested, which is fully consistent with what was already published in many studies (20 mg/kg/day in Frahm et al. Neuron 2013, 5-10 mg/kg/day in Bagdas et al., NP 2020, 10-20 mg/kg/day in Bagdas et al. NP2019, to cite a few...). Hence, we used that concentration of nicotine (10 mg/kg/d) for chronic administration of nicotine using minipumps. This is also a nicotine concentration that is classically used in osmotic minipumps for chronic administration of nicotine: 10 mg/kg/d in Dongelmans et al. Nat. Com 2021 (our lab), 12 mg/kg/d in Arvin et al. J. Neuro. 2019 (Drenan lab), 12 mg/kg/d in Lotfipour et al. J. Neuro. 2013 (Boulter lab) etc… Therefore, we do not see the issue here.

      Are the in vivo recordings in IPN enriched or specific for cells that have a spontaneous firing at rest? If so, this may or may not be the same set/type of cells that are recorded in patch experiments. The results could be biased toward a subset of neurons with spontaneous firing. There are MANY different types of neurons in IPN that are largely intermingled (see Ables et al, 2017 PNAS), so this is a potential problem.

      It is true that there are many types of neurons in the IPN. In-vivo electrophysiology and slice electrophysiology should be considered as two complementary methods to obtain detailed properties of IPN neurons. The populations sampled by these two methods are certainly not identical (IPR in patch -clamp versus mostly IPR and IPC in vivo), and indeed only spontaneously active neurons are recorded in in-vivo electrophysiology. The question is whether this is or not a potential problem. The results we obtained using in-vivo and brain-slice electrophysiology are consistent (i.e., a decreased response to nicotine), which indicates that our results are robust and do not depend on the selection of a particular subpopulation. In addition, we now provide the maps of the neurons recorded both in slices and in vivo (see supplementary figures, and response to the other two referees). We show that, overall, there is no bias sampling between the different groups. Together, these new analyses strongly suggest that the differences we observe between the groups are not due to sampling issues. We have added the Ables 2017 reference and are discussing neuron variability more extensively in the revised manuscript.

      Related to the above issue, which of the many different IPN neuron types did the group re-express beta4? Could that be controlled or did beta4 get re-expressed in an unknown set of neurons in IPN? There is insufficient information given in the methods for verification of stereotaxic injections.

      Re-expression of b4 was achieved with a strong, ubiquitous promoter (pGK), hence all cell types should in principle be transduced. This is now clearly stated in the result section, the figure legend and the method section. Unfortunately, we had no access to a specific mouse line to restrict expression of b4 to b4-expressing cells, since the b4-Cre line of GENSAT is no more alive. This mouse line was problematic anyways because expression levels of the a3, a5 and b4 nAChR subunits, which belong to the same gene cluster, were reported to be affected. Yet, we show in this article that deleting b4 leads to a strong reduction of nicotine-induced currents in the IPR (80%, patch-clamp), and of the response to nicotine in vivo (65%). These results indicate that b4 is strongly expressed in the IPN, likely in a large majority of IPR and IPC neurons (see also our response to reviewer 1). In addition, we show that our re-expression strategy restores nicotine-induced currents in patch-clamp experiments and also the response to nicotine in vivo (new Figure 5C). Non-native expression levels could potentially be achieved (e.g. overexpression) but this is not what we observed: responses to nicotine were restored to the WT levels (in slices and in vivo). And importantly this strategy rescued the WT phenotype in terms of nicotine consumption. Expression of b4 alone in cells that do not express any other nAChR subunit (as, presumably, in the lateral parts of the IPN, see GENSAT images above) should not produce any functional nAChR, since alpha subunits are mandatory to produce functional receptors. As specified in the manuscript, proper transduction of the IPN was verified using post-hoc immunochemistry, and mice with transduction of b4 in the VTA were excluded from the analyses.

      Data showing that alpha3 or beta4 disruption alters MHb/IPN nAChR function and nicotine 2BC intake is not novel. In fact, some of the same authors were involved in a paper in 2011 (Frahm et al., Neuron) showing that enhanced alpha3beta4 nAChR function was associated with reduced nicotine consumption. The present paper would therefore seem to somewhat contradict prior findings from members of the research group.

      Frahm et al used a transgenic mouse line (called TABAC) in which the expression of a3b4 receptor is increased, and they observed reduced nicotine consumption. We do the exact opposite: we reduce (a3)b4 receptor expression (using the b4 knock-out line, or by putting mice under chronic nicotine), and observe increased consumption. There is thus no contradiction. In fact, we discuss our findings in the light of Frahm et al. in the discussion section.

      Sex differences. All studies were conducted in male mice, therefore nothing was reported regarding female nicotine intake or physiology responses. Nicotine-related biology often shows sex differences, and there should be a justification provided regarding the lack of data in females. A limitations section in the Discussion section is a good place for this.

      We agree with the reviewer. We added a sentence in the discussion.

    1. Author Response

      Reviewer #3 (Public Review):

      1) While the data are generally very convincing, the authors overstated the conclusions in several instances. For example, the authors state that EPAC and PKCε are "required" or "essential" for vesicle docking and release. However, the author's own data show that both vesicle docking and release are clearly present (though reduced) in the absence of EPAC and PKCε, demonstrating they are not absolutely required. The language could be toned down without diminishing the impact of the excellent work.

      We thank you for these important comments. We have double-checked the manuscript and modified the language of our statements. In particular, we have changed the unnecessary words “required” and “essential” to “regulate” or “important”.

      2) The authors used analysis of cumulative EPSCs to estimate release probability (Pr) and the readily releasable pool (RRP) size. Unfortunately, this approach is likely not suited for low release probability synapses such as parallel fibers (the authors estimate Pr to be 0.04-0.06). Thanawala and Regehr (2016) extensively investigated the validity of cumulative EPSC analysis under a variety of conditions. They found that this analysis produces large errors in Pr and RRP at synapses with a Pr below ~0.2. In addition, 20 Hz EPSC stimulation (as was used here) produces much larger errors compared to the more commonly used 100 Hz stimulation. Between the low Pr at parallel fiber synapses and the low stimulus frequency used, it is likely that the cumulative EPSC analysis provides a poor estimate of Pr and RRP in this case.

      Thanks for the very insightful comment. In the previous experiments, we measured RRP and Pr based on parameter taken from the work in the hippocampal CA1 neurons (He et al., 2019), which, in our opinion, is similar to PF-PC synapses concerning low release probability. We have carefully read Thanawala and Regher (2016) paper and compared different methods. While the performance of the EQ method is in general more reliable to estimate small RRP and low Pr, it relies on p to be constant throughout a stimulus train (Thanawala and Regher, 2016). Although p may be constant for the calyx of Held synapses they studied, it cannot be case for PF-PC synapses. Therefore, we decided to redo the estimations of RRP and Pr using 100-Hz train (previously 20-Hz train). This method does not require constant p and allows us to have a better estimation on RRP and Pr at PF-PC synapses (Thanawala and Regher, 2016).

      The new results have been presented in new Fig. 2E and 2F. The PF-PC synapses were stimulated at the frequency of 100 Hz, and the artifacts were truncated and the EPSCs were aligned (Fig. 2E and 2F). Note that the aim of this experiment was to investigate whether there is difference between control and cKO mice. Indeed, we found that the amplitudes of both EPSC0 and follow-up EPSCs were smaller in cKO mice, indicating that both the initial release and the replenishment are reduced by the conditional knockout o EPACs or PKCε. Compared to 20-Hz train, the 100-Hz train resulted in steady-state EPSCs brought EPSCs into steady state faster. We created linear fit from normalized steady-state EPSCs and back-extrapolated the curve to the y-axis to calculate Pr. Indeed, we found that the Pr value estimated from the 100-Hz train stimulus was significantly larger than that from the 20-Hz train, showing 0.17 (Math1-cre) and 0.19 (PKCεf/f) with 100-Hz, but 0.07 (Math1-cre) and 0.08 (PKCεf/f) in previous submission. This result was similar to Thanawala and Regher (2016), in which they claimed that the accuracy of estimation from a 100-Hz train is about three times of that from a 20-Hz train. Moreover, we found that the conditional knockout of either EPACs or PKCε produced significant decrease on Pr (Math1-cre 0.17 vs Math1-cre;EPAC1cKOEPAC2cKO 0.11; PKCεf/f 0.19 vs PKCεcKO 0.12). These results have been added in the text and figure legend (Fig. 2E and 2F), and corresponding methods have also been updated.

      3) Using a combination of genetic knockouts and pharmacology, this paper convincingly shows that presynaptic EPAC/PCKε are necessary for presynaptic LTP, but do not alter postsynaptic LTP/LTD. However, given the experimental conditions in the slice experiments, it is difficult to extrapolate from the slice data to in vivo plasticity during motor learning. Synaptic plasticity in the cerebellar cortex is quite complex and can depend significantly on age, temperature, location, and ionic conditions. Unfortunately, these were not well matched between slice and in vivo experiments. Slice experiments used P21 mice, while in vivo experiments were performed at P60. Slice experiments were performed in the vermis, while VOR expression/adaptation generally requires the vestibulo-cerebellum/flocculus. Slice experiments were performed at room temperature, not physiological temperature. Lastly, slice experiments used 2 mM Ca2+ in the ACSF, somewhat high compared to the physiological extracellular fluid. Each of these factors can significantly affect the induction and expression of plasticity. These differences leave one wondering how well the slice data translate into understanding plasticity in the in vivo context.

      This is a great question. To date, almost all PC plasticity in published work were recorded in young adult mice (< 1 month) and at room temperature, and most behavioral experiments were conducted around 2-3 months of age. To better answer the reviewer’s comment, we tried our best to redo the LTP experiments under the requested, alternative conditions (in 2-month-old mice, low Ca2+ or high recording temperature). Our new data show that, under these conditions, EPACs and PKCε are still needed for the induction of presynaptic PC-LTP (Figure 3–figure supplement 2-4). In addition, we have tried to record PC EPSCs in the flocculus. Unfortunately, we found PC EPSCs there were quite unstable, which might be due to the more complex orientation of PCs and their innervations. We have discussed the reviewer’s comment in the revised manuscript “Second, presynaptic PF-PC LTP was performed in the cerebellar vermis in the present work, whereas VOR learning generally requires PC activity in the flocculus. Unfortunately, we found that PC-EPSCs in the flocculus were not suitable to record PC plasticity because they were unstable” (Line 557).

      4) Many experiments use synaptosomal preparation. The authors identify excitatory synapses by VGLUT labelling, but it is unclear how, or if, the authors distinguish between parallel fiber, climbing fiber, and mossy fiber synaptosomes. These synapses likely have very different properties and molecular composition, some quantification or estimation of how many synaptosomes are derived from each type of synapse would be helpful.

      We have performed synaptosome staining vGluT1/vGluT2, EAAT4 and bassoon to identify PF-PC synapses (vGluT1+EAAT4+) or CF-PC (vGluT2+EAAT4+) synapses. Our staining results showed that PF-PC synapses covered 88.8% of the total and CF-PC synapses covered 7.5% of the total. Thus, we estimated the number of mossy fiber synapses to be less than 3.7%, which would not affect our conclusion. These results have been presented in Figure 1–figure supplement 1.

      5) The math1-cre mouse line is used to selectively knockout EPAC or PKCε expression in cerebellar granule cells. This line also expresses Cre in unipolar brush cells (UBCs) of the cerebellum (Wang et al., 2021). This is likely not a factor in the molecular/slice studies of EPAC/PKC signaling, but UBC dysfunction could play a role in motor/learning deficits observed in vivo. This possibility is not considered in the text.

      There is indeed evidence that UBCs are involved in cerebellar ataxias (Kreko-Pierce et al., 2020). How UBCs precisely participate in motor learning or VOR learning is unclear, but they are suggested be involved in motor performance (Mugnaini et al., 2011; Guo et al., 2021). So, we agree with the reviewer that this option cannot be excluded. Therefore, we have revised the discussion about the potential role of UBCs “Two caveats should be considered in the present studies. First, Math1-Cre-induced deletion of EPAC or PKCε might affect the function of unipolar brush cells (UBCs), which are involved in cerebellar ataxias (Kreko-Pierce et al., 2020). However, we believe that the EPAC-PKCε module regulates VOR learning through presynaptic plasticity mechanism at PF-PC synapses rather than UBCs, in line with the observations in other granule cell-specific mutations (Galliano et al., 2013; Schonewille et al., 2021).” (Line 552).

      References:

      Mugnaini E, Sekerková G, Martina M. The unipolar brush cell: a remarkable neuron finally receiving deserved attention. Brain Res Rev. 2011;66(1-2):220-45.

      Guo C, Rudolph S, Neuwirth ME, Regehr WG. Purkinje cell outputs selectively inhibit a subset of unipolar brush cells in the input layer of the cerebellar cortex. Elife. 2021;10:e68802.

      Kreko-Pierce T, Boiko N, Harbidge DG, Marcus DC, Stockand JD, Pugh JR. Cerebellar ataxia caused by Type II unipolar brush cell dysfunction in the Asic5 knockout mouse. Sci Rep. 2020;10:2168.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper uses light field microscopy to measure calcium signals across the fly brain while it is walking and turning, and also while the fly is externally driven to walk and turn, using a treadmill. The authors drive calcium indicator expression using pan-neuronal drivers, as well as drivers specific to individual neurotransmitters and neuromodulators. From their experiments, the authors show that inhibitory and excitatory neurons in the brain are activated in similar patterns by walking and that neurons expressing machinery for different neuromodulatory amines tend to show differentially strong calcium signals during walking. By examining spontaneous and forced walking and turning, the authors identify brain regions that activate before spontaneous turning and that activate asymmetrically in concert with spontaneous or forced turning.

      Strengths: Overall, the strength of this paper is in its careful descriptions and analyses of whole brain activation patterns that correlate with spontaneous and forced behaviors. Showing how the pattern of activity relates to broad classes of cells is also useful for understanding brain activation. Especially in brain regions identified as preceding spontaneous walking and in being asymmetrically involved in spontaneous and forced turning, it provides a wealth of potential hypotheses for new experiments. Overall, it contributes to a coarse-grained understanding of broad changes in brain activity during behavior.

      Weaknesses: The primary weakness of this paper is that it presents some speculative interpretations and conclusions too strongly. Most importantly, average activity in a neuropil can represent the calcium activity of hundreds or thousands of neurons, and it is hard to know what fraction is active, for instance, or how expression pattern differences might play into calcium signals. Calcium signals also do not reliably indicate hyperpolarization, so a net increase in the average Ca++ indicator signal does not necessarily reflect that the average neuron is becoming more active, just that some labeled neurons are becoming more active, while others may be inactive or hyperpolarized. The conclusions about regions triggering walk (rather than just preceding it) are too strong for the manipulations in this paper, as are some of the links with individual neuron types. Thus, more presenting substantial caveats is required for the conclusions being drawn from the data presented here.

      We thank the reviewer for their assessment and the positive comments on our manuscript. We have made these caveats clear throughout the manuscript by adding text and removing overly strong conclusions and speculations.

      Reviewer #2 (Public Review):

      Aimon et al. used fast whole-brain imaging to investigate the relationship between walking and neural activity in adult fruit flies. They find that increases in brain-wide activity are tightly correlated with walking behavior, and not with grooming or flailing, and are independent of visual input. They reveal that excitatory, inhibitory, and neuromodulatory neurons all contribute to brain-wide increases in neural activity during walk. Aimon et al. extend their observations of brain-wide activity to reveal that activity in some inferior brain regions is more correlated with walk than in other brain regions. The authors further analyzed their imaging dataset to identify candidate brain regions and cell types that may be important for walking behavior, which will be useful in hypothesis generation in future studies. Finally, the authors show that brain-wide activity is similar between spontaneous and forced walk and that severing the connection between the ventral nerve cord and central brain abolishes walk-related increases in brain activity. These results suggest that increases in brain-wide activity during walking may be largely attributed to sensory and proprioceptive feedback ascending to the central brain from the ventral nerve cord rather than to top-down executive and motor control programs. The observations presented in this study suggest hypotheses that may be tested in future studies.

      Strengths: This paper presents a rich imaging dataset that is well-analyzed and cataloged, which will be valuable for researchers who use this paper for future hypothesis generation. The comparison of many different reagents, imaging speeds, and behavioral conditions suggests that the observed increases in brain-wide activity during walking are quite robust to imaging methods in adult fruit flies.

      Weaknesses: This study is largely observational, and the few experimental manipulations presented are insufficient to support the author's broad claims about the generation of brain-wide neural activity.

      We thank the reviewer for their assessment and have toned down claims throughout the paper accordingly.

      Notably, the authors suggest that their image analysis can reveal individual cell types that are important for walking by matching their morphologies to registered components from whole-brain imaging experiments. While these predictions are a useful starting point for future experiments, they have not convincingly shown that their method can identify individual cell types in genetic reagents with more restricted expression patterns. Adding further validation to show that genetically subtracting the candidate neurons from the overall expression pattern of the calcium indicator abolishes that component from the response would strengthen this claim. Furthermore, imaging the matched candidate neuronal cell type to show that it recapitulates the activity dynamics of the proposed component would add additional evidence.

      We agree that the correspondence to specific neuron types is often very speculative. We have clarified this throughout the manuscript. There are a few exceptions where the neurons we discuss are the only known neurons in a specific GAL4 expression pattern in a given region, and where we find the exact anatomical pattern matching these neurons’ anatomy. Together, this makes us quite confident that the activity results indeed from these neurons. However, the experiments proposed by the reviewer would be interesting complementary approaches. We believe, however, that abolishing activity in one neuron will be difficult to interpret regarding the neuron type as it would affect the activity of other neurons in the network (which is, in our opinion, an interesting point and research direction). Nevertheless, we plan to perform such experiments and experiments looking at the activity in more restricted drivers in the future.

      In addition, increases in neural activity prior to walk onset in specific brain regions are intriguing but insufficient to demonstrate the neurons in these regions trigger walking. This claim should await further studies that employ targeted and acute manipulation of neural activity, as noted by the authors. Furthermore, that activity in these brain regions is significantly increased prior to walk onset awaits more rigorous statistical testing, as do the authors' claims that spontaneous versus forced walking alters these dynamics. The suggestion that walking increases brain-wide activity via feedback from the ventral nerve cord is an interesting possibility and would also benefit from additional experimental validation. Activating and silencing neurons that provide proprioceptive feedback from the legs and determining the effect of this manipulation on brain-wide neural activity would be a good starting point.

      We have removed claims of causality in the result section. We have also added a statistical test for activation before walk onset. Activating and silencing proprioceptive neurons from the legs would be interesting follow up experiments although it is likely to affect walking. Nevertheless, we are planning to carry out such experiments in the future. We have added this point in the discussion.

      Reviewer #3 (Public Review):

      Aimon and colleagues investigated brain activity in flies during spontaneous and forced walking. They used light-field microscopy to image calcium activity in the brain at high temporal resolution as the animal walked on a ball and they used the statistical inference methods PCA and ICA to tease out subregions of the brain that had distinct patterns of activity. They then sought to relate those patterns to walking. Most interesting are the experiments they performed comparing forced walking to spontaneous walking because this provides a framework to generate hypotheses about which aspects of neural activity are reporting the animal's movements versus generating those movements. The authors identify subregions and neuron types that may be involved in generating vs reporting walking. Their analysis is reasonable but could be further strengthened with a more powerful statistical framework that explicitly considered the multiple hypotheses being tested. More broadly, the work serves as a starting point to investigate the role of different regions in the brain and should spur follow-up investigations that involve more perturbative approaches in addition to the correlative approaches presented here.

      We thank the reviewer for their overall positive assessment of our work and fully agree with the conclusion of its current limitations.

    1. Author Response:

      Reviewer #1 (Public Review):

      Tomasi et al. performed a combination of bioinformatic, next-generation tRNA sequencing experiments to predict the set of tRNA modifications and their corresponding genes in the tRNAs of the pathogenic bacteria Mycobacterium tuberculosis. Long known to be important for translation accuracy and efficiency, tRNA modifications are now emerging as having regulatory roles. However, the basic knowledge of the position and nature of the modifications present in a given organism is very sparse beyond a handful of model organisms. Studies that can generate the tRNA modification maps in different organisms along the tree of life are good starting points for further studies. The focus here on a major human pathogen that is studied by a large community raises the general interest of the study. Finally, deletion of the gene mnmA responsible for the insertion of s2U at position 34 revealed defects in in growth in macrophage but in test tubes suggesting regulatory roles that will warrant further studies. The conclusions of the paper are mostly supported by the data but the partial nature of the bioinformatic analysis and absence of Mass-Spectrometry data make it incomplete. The authors do not take advantage of the Mass spec data that is published for Mycobacterium bovis (PMID: 27834374) to discuss what they find.

      Important points to be considered:

      1) The authors say they took a list of proteins involved in tRNA modifications from Modomics and added manually a few but we do not know the exact set of proteins that were used to search the M. mycobacterium genome.

      Thank you for pointing out this issue. We will add the complete list of proteins used for the BLAST query.

      2) The absence of mnmGE genes in TB suggested that the xcm5U derivatives are absent. These are present in M. bovis (PMID: 27834374). Are the MnmEG gene found in M. bovis? If yes, then the authors should perform a phylogenetic distribution analysis in the Mycobacterial clade to see when they disappeared. If they are not present in M. bovis then maybe a non-orthologous set of enzymes do the same reaction and then the authors really do not know what modification is present or not at U34 without LC-MS. The exact same argument can be given for the xmo5U derivatives that are also found in M.bovis but not predicted by the authors in M. tuberculosis.

      The reviewer raises a valid point. In M. bovis mnm5U and cmo5U derivatives were observed in LC-MS analysis. However, we did not identify candidate genes known to be involved in the biogenesis of mnm5U and cmo5U in the Mycobacteriaceae, including M. bovis and Mtb, suggesting that if these modifications are indeed present, they are not synthesized through a canonical biogenesis pathways in this family. There are several examples where the same modification is generated by distinct modification enzymes (Kimura, 2021). These observations raise the interesting possibility that in the Mycobacteriaceae and most species in actinomycetota (except for Bifidobacterium, Corynebacterium and Rhodococcus species), major wobble modifications are generated by biosynthesis pathways that are distinct from those employed by well-characterized organisms. Future studies will examine this hypothesis.

      3) Why is the Psi32 predicted by the authors because of the presence of the Rv3300c/Psu9 gene not detected by CMC-treated tRNA seq while the other Psi residues are? Members of this family can modify both rRNA and tRNA. So the presence of the gene does not guarantee the presence of the modification in tRNAs

      Thank you very much for the careful read. We did not include RluA in the list of query proteins because it is not classified as a tRNA modification enzyme in Modomics. Additionally, the CMC-coupled tRNA-seq is imperfect for detection of all pseudouridylated positions. Due to this limitation, we only assigned modifications that are both predicted by the presence of putative biosynthetic enzymes and RT-derived signatures. As the reviewer points out, we cannot rule out that this homolog targets only rRNAs. We will clarify this possibility in the revised manuscript. Also, RluA will be added to the query and the name of Rv3300c will be changed to RluA in the text and related figures.   

      4) What are tsaBED not essential but tsaC (called sua5 by the authors) essential?

      Thank you for pointing out this interesting observation. We are also curious about differences in the essentiality among t6A biogenesis genes. We speculate that TsaC potentially has critical roles in cell viability other than t6A synthesis. TsaC synthesizes a compound, threonylcarbamoyl-AMP, as an intermediate for t6A biogenesis. Thus, it is possible that this intermediate has a role in other essential cellular activities besides t6A biogenesis. Further study of these factors in Mtb could reveal interesting crosstalk between modification synthesis and other cellular activities.

      Reviewer #2 (Public Review):

      In this study, Tomasi et al identify a series of tRNA modifying enzymes from Mtb, show their function in the relevant tRNA modifications and by using at least one deleted strain for MnmA, they show the relevance of tRNA modification in intra-host survival and postulate their potential role in pathogenesis.

      Conceptually it is a wonderful study, given that tRNA modifications are so fundamental to all life forms, showing their role in Mtb growth in the host is significant. However, the authors have not thoroughly analyzed the phenotype. The growth defect aspect or impact on pathogenesis needs to be adequately addressed.

      - The authors show that ΔmnmA grows equally well in the in vitro cultures as the WT. However, they show attenuated growth in the macrophages. Is it because Glu1_TTC and Gln1-TTG tRNAs are not the preferred tRNAs for incorporation of Glu and Gln, respectively? And for some reason, they get preferred over the alternate tRNAs during infection? What dictates this selectivity?

      Thank you very much for raising this excellent point. As the reviewer suggests, the attenuation of DmnmA Mtb growth inside of macrophages could be caused by disparate codon usage between genes required for in vitro growth and intracellular growth. Among multiple codons encoding Glu, Gln, or Lys, s2U modification-dependent codons might be preferentially distributed in genes associated with intracellular growth. For example, Mtb has two tRNA isoacceptors, Glu1_TTC and Glu2_CTC, to decipher two Glu codons, i.e., GAA and GAG. According to the wobble pairing rule, GAA is only decoded by Glu1_TTC, whereas GAG is decoded by both Glu1_TTC and Glu2_CTC; i.e., GAG can be deciphered by an s2U-independent tRNA. Thus, genes required for intracellular growth might be enriched with GAA, an s2U-dependent codon. The same thing can happen to other Gln and Lys codons deciphered by s2U-containing tRNAs. In the revised manuscript, we will include the perspective of codon usage for explaining the intracellular fitness defect of the ΔmnmA Mtb mutant.

      - As such the growth defect shown in macrophages would be more convincing if the authors also show the phenotype of complementation with WT mnmA.

      The reviewer raises a valid point. We note however, that Rv3023c, a putative transposase, is downstream of MnmA and unlike MnmA, Rv3023c appears to be dispensable for in vivo growth, according to the Tn-seq database. Therefore, it is likely that the intracellular growth defect is caused by loss of mnmA.

      An important consideration here is the universal nature of these modifications across the life forms. Any strategy to utilize these enzymes as the potential therapeutic candidate would have to factor in this important aspect.

      This is a valid point. Targeting a pathogen-specific system enables avoidance of the adverse side effects caused by many therapeutic reagents. There are a couple of Mtb modification enzymes that are specific to bacteria and critical for Mtb fitness (e.g., TilS). These enzymes represent ideal potential therapeutic targets to suppress Mtb intracellular growth.

      Reviewer #3 (Public Review):

      The work presented in the manuscript tries to identify tRNA modifications present in Mycobacterium tuberculosis (Mtb) using reverse transcription-derived error signatures with tRNA-seq. The study identified enzyme homologs and correlates them with presence of respective tRNA modifications in Mtb. The study used several chemical treatments (IAA and alkali treatment) to further enhance the reverse transcription signals and confirms the presence of modifications in the bases. tRNA modifications by two enzymes TruB and MnmA were established by doing tRNA-seq of respective deletion mutants. Ultimately, authors show that MnmA-dependent tRNA modification is important for intracellular growth of Mtb. Overall, this report identifies multiple tRNA modifications and discuss their implication in Mtb infection.

      Important points to be considered:

      - The presence of tRNA-based modifications is well characterised across life forms including genus Mycobacterium (Mycobacterium tuberculosis: Varshney et al, NAR, 2004; Mycobacterium bovis: Chionh et al, Nat Commun, 2016; Mycobacterium abscessus: Thomas et al, NAR, 2020). These modifications are shown to be essential for pathogenesis of multiple organisms. A comparison of tRNA modification and their respective enzymes with host organism as well as other mycobacterium strains is required. This can be discussed in detail to understand the role of common as well as specific tRNA modifications implicated in pathogenesis.

      The reviewer raises a fair point. However, with the exception of Chionh et al., the other studies cited here are not genome-wide characterization of tRNA modification. We will add a discussion of the distribution of tRNA modification enzymes across multiple mycobacterium species and the implications of this distribution for pathogenesis to the revised manuscript.

      - Authors state in line 293 "Several strong signatures were detected in Mtb tRNAs but not in E. coli". Authors can elaborate more on the unique features identified and their relevance in Mtb infection in the discussion or result section.

      Thank you for the suggestion. We will lengthen the discussion of the RT-derived signatures observed in Mtb but not in E. coli but the relevance of these modifications for Mtb pathogenicity remains speculative at this point.

      - Deletion of MnmA is shown to be essential for E. coli growth under oxidative stress (Zhao et al, NAR, 2021). In similar lines, MnmA deleted Mtb suffers to grow in macrophage. Is oxidative stress in macrophage responsible for slow Mtb growth?

      This is an excellent hypothesis which we will raise in the revised manuscript.

      - Authors state in line 311-312 "Mtb does not contain apparent homologs of the tRNA modifying enzymes that introduce the additional modifications to s2U". This can be characterised further to rule out the possibility of other enzyme specifically employed by Mtb to introduce additional modification.

      The reviewer raises a valid point. As discussed above (Reviewer #1, pt 2), Mtb may employ distinct enzymes to generate certain tRNA modifications. Future mass spec-based analyses of Mtb tRNAs will be carried out to identify the precise chemical structure of the sulfurated uridine, and subsequent studies will attempt to determine the enzymes that account for the biogenesis of these modifications.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors start the study with an interesting clinical observation, found in a small subset of prostate cancers: FOXP2-CPED1 fusion. They describe how this fusion results in enhanced FOXP2 protein levels, and further describe how FOXP2 increases anchorageindependent growth in vitro, and results in pre-malignant lesions in vivo. Intrinsically, this is an interesting observation. However, the mechanistic insights are relatively limited as it stands, and the main issues are described below.

      Main issues:

      1) While the study starts off with the FOXP2 fusion, the vast majority of the paper is actually about enhanced FOXP2 expression in tumorigenesis. Wouldn't it be more logical to remove the FOXP2 fusion data? These data seem quite interesting and novel but they are underdeveloped within the current manuscript design, which is a shame for such an exciting novel finding. Along the same lines, for a study that centres on the prostate lineage, it's not clear why the oncogenic potential of FOXP2 in mouse 3T3 fibroblasts was tested.

      We thank the reviewer very much for the comment. We followed the suggestion and added a set of data regarding the newly identified FOXP2 fusion in Figure 1 to make our manuscript more informative. We tested the oncogenic potential of FOXP2 in NIH3T3 fibroblasts because NIH3T3 cells are a widely used model to demonstrate the presence of transformed oncogenes2,3. In our study, we observed that when NIH3T3 cells acquired the exogenous FOXP2 gene, the cells lost the characteristic contact inhibition response, continued to proliferate and eventually formed clonal colonies. Please refer to "Answer to Essential Revisions #1 from the Editors” for details.

      2) While the FOXP2 data are compelling and convincing, it is not clear yet whether this effect is specific, or if FOXP2 is e.g. universally relevant for cell viability. Targeting FOXP2 by siRNA/shRNA in a non-transformed cell line would address this issue.

      We appreciate these helpful comments. Please refer to the "Answer to Essential Revisions #1 from the Editors” for details.

      3) Unfortunately, not a single chemical inhibitor is truly 100% specific. Therefore, the Foretinib and MK2206 experiments should be confirmed using shRNAs/KOs targeting MEK and AKT. With the inclusion of such data, the authors would make a very compelling argument that indeed MEK/AKT signalling is driving the phenotype.

      We thank the reviewer for highlighting this point and we agree with the reviewer’s point that no chemical inhibitor is 100% specific. In this study, we used chemical inhibitors to provide further supportive data indicating that FOXP2 confers oncogenic effects by activating MET signaling. We characterized a FOXP2-binding fragment located in MET and HGF in LNCaP prostate cancer cells by utilizing the CUT&Tag method. We also found that MET restoration partially reversed oncogenic phenotypes in FOXP2-KD prostate cancer cells. All these data consistently supported that FOXP2 activates MET signaling in prostate cancer. Please refer to the "Answer to Essential Revisions #2 from the Editors” and to the "Answer to Essential Revisions #7 from the Editors” for details.

      4) With the FOXP2-CPED1 fusion being more stable as compared to wild-type transcripts, wouldn't one expect the fusion to have a more severe phenotype? This is a very exciting aspect of the start of the study, but it is not explored further in the manuscript. The authors would ideally elaborate on why the effects of the FOXP2-CPED1 fusion seem comparable to the FOXP2 wildtype, in their studies.

      We thank the reviewer very much for the comment. We had quantified the number of colonies of FOXP2- and FOXP2-CPED1-overexpressing cells, and we found that both wildtype FOXP2 and FOXP2-CPED1 had a comparable putative functional influence on the transformation of human prostate epithelial cells RWPE-1 and mouse primary fibroblasts NIH3T3 (P = 0.69, by Fisher’s exact test for RWPE-1; P = 0.23, by Fisher’s exact test for NIH3T3). We added the corresponding description to the Results section in Line 487 on Page 22 in the tracked changes version of the revised manuscript. Please refer to the "Answer to Essential Revisions #5 from the Editors” for details.

      5) The authors claim that FOXP2 functions as an oncogene, but the most-severe phenotype that is observed in vivo, is PIN lesions, not tumors. While this is an exciting observation, it is not the full story of an oncogene. Can the authors justifiably claim that FOXP2 is an oncogene, based on these results?

      We appreciate the comment, and we made the corresponding revision in the revised manuscript. Please refer to the "Answer to Essential Revisions #3 from the Editors” for details.

      6) The clinical and phenotypic observations are exciting and relevant. The mechanistic insights of the study are quite limited in the current stage. How does FOXP2 give its phenotype, and result in increased MET phosphorylation? The association is there, but it is unclear how this happens.

      We appreciate this valuable suggestion. In the current study, we used the CUT&Tag method to explore how FOXP2 activated MET signaling in LNCaP prostate cancer cells, and we identified potential FOXP2-binding fragments in MET and HGF. Therefore, we proposed that FOXP2 activates MET signaling in prostate cancer through its binding to MET and METassociated gene. Please refer to the "Answer to Essential Revisions #2 from the Editors” for details.

      Reviewer #2 (Public Review):

      1) The manuscript entitled "FOXP2 confers oncogenic effects in prostate cancer through activating MET signalling" by Zhu et al describes the identification of a novel FOXP2CPED1 gene fusion in 2 out of 100 primary prostate cancers. A byproduct of this gene fusion is the increased expression of FOXP2, which has been shown to be increased in prostate cancer relative to benign tissue. These data nominated FOXP2 as a potential oncogene. Accordingly, overexpression of FOXP2 in nontransformed mouse fibroblast NIH-3T3 and human prostate RWPE-1 cells induced transforming capabilities in both cell models. Mechanistically, convincing data were provided that indicate that FOXP2 promotes the expression and/or activity of the receptor tyrosine kinase MET, which has previously been shown to have oncogenic functions in prostate cancer. Notably, the authors create a new genetically engineered mouse model in which FOXP2 is overexpressed in the prostatic luminal epithelial cells. Overexpression of FOXP2 was sufficient to promote the development of prostatic intraepithelial neoplasia (PIN) a suspected precursor to prostate adenocarcinoma and activate MET signaling.

      Strengths:

      This study makes a convincing case for FOXP2 as 1) a promoter of prostate cancer initiation and 2) an upstream regulator of pro-cancer MET signaling. This was done using both overexpression and knockdown models in cell lines and corroborated in new genetically engineered mouse models (GEMMs) of FOXP2 or FOXP2-CPED1 overexpression in prostate luminal epithelial cells as well as publicly available clinical cohort data.

      Major strengths of the study are the demonstration that FOXP2 or FOXP2-CPED1 overexpression transforms RWPE-1 cells to now grow in soft agar (hallmark of malignant transformation) and the creation of new genetically engineered mouse models (GEMMs) of FOXP2 or FOXP2-CPED1 overexpression in prostate luminal epithelial cells. In both mouse models, FOXP2 overexpression increased the incidence of PIN lesions, which are thought to be a precursor to prostate cancer. While FOXP2 alone was not sufficient to cause prostate cancer in mice, it is acknowledged that single gene alterations causing prostate cancer in mice are rare. Future studies will undoubtedly want to cross these GEMMs with established, relatively benign models of prostate cancer such as Hi-Myc or Pb-Pten mice to see if FOXP2 accelerates cancer progression (beyond the scope of this study).

      We appreciate these positive comments from the reviewer. We agree with the suggestion from the reviewer that it is worth exploring whether FOXP2 is able to cooperate with a known disease driver to accelerate the progression of prostate cancer. Therefore, we are going to cross Pb-FOXP2 transgenic mice with Pb-Pten KO mice to assess if FOXP2 is able to accelerate malignant progression.

      2) Weaknesses: It is unclear why the authors decided to use mouse fibroblast NIH3T3 cells for their transformation studies. In this regard, it appears likely that FOXP2 could function as an oncogene across diverse cell types. Given the focus on prostate cancer, it would have been preferable to corroborate the RWPE-1 data with another prostate cell model and test FOXP2's transforming ability in RWPE-1 xenograft models. To that end, there is no direct evidence that FOXP2 can cause cancer in vivo. The GEMM data, while compelling, only shows that FOXP2 can promote PIN in mice and the lone xenograft model chosen was for fibroblast NIH-3T3 cells.

      To determine the oncogenic activity of FOXP2 and the FOXP2-CPDE1 fusion, we initially used mouse primary fibroblast NIH3T3 for transformation experiments, because NIH3T3 cells are a widely used cell model to discover novel oncogenes2,3,10,11. Subsequently, we observed that overexpression of FOXP2 and its fusion variant drove RWPE-1 cells to lose the characteristic contact inhibition response, led to their anchorage-independent growth in vitro, and promoted PIN in the transgenic mice. During preparation of the revised manuscript, we tested the transformation ability of FOXP2 and FOXP2-CPED1 in RWPE1 xenograft models. We subcutaneously injected 2 × 106 RWPE-1 cells into the flanks of NOD-SCID mice. The NODSCID mice were divided into five groups (n = 5 mice in each group): control, FOXP2overexpressing (two stable cell lines) and FOXP2-CPED1- overexpressing (two cell lines) groups. The experiment lasted for 4 months. We observed that no RWPE-1 cell-injected mice developed tumor masses. We propose that FOXP2 and its fusion alone are not sufficient to generate the microenvironment suitable for RWPE-1-xenograft growth. Collectively, our data suggest that FOXP2 has oncogenic potential in prostate cancer, but is not sufficient to act alone as an oncogene.

      3) There is a limited mechanism of action. While the authors provide correlative data suggesting that FOXP2 could increase the expression of MET signaling components, it is not clear how FOXP2 controls MET levels. It would be of interest to search for and validate the importance of potential FOXP2 binding sites in or around MET and the genes of METassociated proteins. At a minimum, it should be confirmed whether MET is a primary or secondary target of FOXP2. The authors should also report on what happened to the 4-gene MET signature in the FOXP2 knockdown cell models. It would be equally significant to test if overexpression of MET can rescue the anti-growth effects of FOXP2 knockdown in prostate cancer cells (positive or negative results would be informative).

      We appreciate all the valuable comments. As suggested, we performed corresponding experiments, please refer to the " Answers to Essential Revisions #2 from the Editors”, to the "Answer to Essential Revisions #6 from the Editors”, and to the "Answer to Essential Revisions #7 from the Editors” for details.

      Reviewer #3 (Public Review):

      1) In this manuscript, the authors present data supporting FOXP2 as an oncogene in PCa. They show that FOXP2 is overexpressed in PCa patient tissue and is necessary and sufficient for PCa transformation/tumorigenesis depending on the model system. Overexpression and knock-down of FOXP2 lead to an increase/decrease in MET/PI3K/AKT transcripts and signaling and sensitizes cells to PI3K/AKT inhibition.

      Key strengths of the paper include multiple endpoints and model systems, an over-expression and knock-down approach to address sufficiency and necessity, a new mouse knock-in model, analysis of primary PCa patient tumors, and benchmarking finding against publicly available data. The central discovery that FOXP2 is an oncogene in PCa will be of interest to the field. However, there are several critically unanswered questions.

      1) No data are presented for how FOXP2 regulates MET signaling. ChIP would easily address if it is direct regulation of MET and analysis of FOXP2 ChIP-seq could provide insights.

      2) Beyond the 2 fusions in the 100 PCa patient cohort it is unclear how FOXP2 is overexpressed in PCa. In the discussion and in FS5 some data are presented indicating amplification and CNAs, however, these are not directly linked to FOXP2 expression.

      3) There are some hints that full-length FOXP2 and the FOXP2-CPED1 function differently. In SF2E the size/number of colonies between full-length FOXP2 and fusion are different. If the assay was run for the same length of time, then it indicates different biologies of the overexpressed FOXP2 and FOXP2-CPED1 fusion. Additionally, in F3E the sensitization is different depending on the transgene.

      We appreciate these valuable comments and constructive remarks. As suggested, we performed the CUT&Tag experiments to detect the binding of FOXP2 to MET, and to examine the association of CNAs of FOXP2 with its expression. Please refer to the " Answer to Essential Revisions #2 from the Editors" and the " Answer to Essential Revisions #4 from the Editors" for details. We also added detailed information to show the resemblance observed between FOXP2 fusion- and wild-type FOXP2-overexpressing cells. We added the corresponding description to the Results section in Line 487 on Page 22 in the tracked changes version of the revised manuscript. Please refer to the “Answer to Essential Revisions #5 from the Editors” for details.

      2) The relationship between FOXP2 and AR is not explored, which is important given 1) the critical role of the AR in PCa; and 2) the existing relationship between the AR and FOXP2 and other FOX gene members.

      We thank the reviewer very much for highlighting this point. We agree that it is important to examine the relationship between FOXP2 and AR. We therefore analyzed the expression dataset of 255 primary prostate tumors from TCGA and observed that the expression of FOXP2 was significantly correlated with the expression of AR (Spearman's ρ = 0.48, P < 0.001) (Figure 1. a). Next, we observed that both FOXP2- and FOXP2-CPED1overexpressing 293T cells had a higher AR protein abundance than control cells (Figure 1. b). In addition, shRNA-mediated FOXP2 knockdown in LNCaP cells resulted in a decreased AR protein level compared to that in control cells (Figure 1. c). However, we analyzed our CUT&Tag data and observed no binding of FOXP2 to AR (Figure 1. d). Our data suggest that FOXP2 might be associated with AR expression.

      Figure 1. a. AR expression in a human prostate cancer dataset (TCGA, Prostate Adenocarcinoma, Provisional; n = 493) classified by FOXP2 expression level (bottom 25%, low expression, n = 120; top 25%, high expression, n = 120; negative expression, n = 15). P values were calculated by the MannWhitney U test. The correlation between FOXP2 and AR expression was evaluated by determining the Spearman's rank correlation coefficient. b. Immunoblot analysis of the expression levels of AR in 293T cells with overexpression of FOXP2 or FOXP2-CPED1. c. Immunoblot analysis of the expression levels of AR in LNCaP cells with stable expression of the scrambled vector or FOXP2 shRNA. d. CUT&Tag analysis of FOXP2 association with the promoter of AR. Representative track of FOXP2 at the AR gene locus is shown.

      Reference

      1. Mayr C, Bartel DP. Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009 Aug 21;138(4):673-84.
      2. Gara SK, Jia L, Merino MJ, Agarwal SK, Zhang L, Cam M et al., Germline HABP2 Mutation Causing Familial Nonmedullary Thyroid Cancer. N Engl J Med. 2015 Jul 30;373(5):448-55.
      3. Kohno T, Ichikawa H, Totoki Y, Yasuda K, Hiramoto M, Nammo T et al., KIF5B-RET fusions in lung adenocarcinoma. Nat Med. 2012 Feb 12;18(3):375-7.
      4. Chen F, Byrd AL, Liu J, Flight RM, DuCote TJ, Naughton KJ et al., Polycomb deficiency drives a FOXP2-high aggressive state targetable by epigenetic inhibitors. Nat Commun. 2023 Jan 20;14(1):336.
      5. Kaya-Okur HS, Wu SJ, Codomo CA, Pledger ES, Bryson TD, Henikoff JG et al., CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019 Apr 29;10(1):1930.
      6. Spiteri E, Konopka G, Coppola G, Bomar J, Oldham M, Ou J et al., Identification of the transcriptional targets of FOXP2, a gene linked to speech and language, in developing human brain. Am J Hum Genet. 2007 Dec;81(6):1144-57.
      7. Lai CS, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature. 2001 Oct 4;413(6855):519-23.
      8. Hannenhalli S, Kaestner KH. The evolution of Fox genes and their role in development and disease. Nat Rev Genet. 2009 Apr;10(4):233-40.
      9. Shu W, Yang H, Zhang L, Lu MM, Morrisey EE. Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lung and act as transcriptional repressors. J Biol Chem. 2001 Jul 20;276(29):27488-97.
      10. Wang C, Liu H, Qiu Q, Zhang Z, Gu Y, He Z. TCRP1 promotes NIH/3T3 cell transformation by over-activating PDK1 and AKT1. Oncogenesis. 2017 Apr 24;6(4):e323.
      11. Suh YA, Arnold RS, Lassegue B, Shi J, Xu X, Sorescu D et al., Cell transformation by the superoxide-generating oxidase Mox1. Nature. 1999 Sep 2;401(6748):79-82.
    1. Author Response

      Reviewer #1 (Public Review):

      This refinement of their model, coupled with the demonstration that the Sis1 J protein chaperone does not appear to play a direct role in the inactivation phase of the HSR, provide a significant advance over their earlier work.

      We are pleased that the reviewer is satisfied that our new results represent a significant advance.

      A main weakness is that while the evidence that Sis1 is important for fitness of heat-stressed yeast cells is reasonable, exactly how Sis1 achieves this is not clear. In a single sentence the authors suggest that Sis1 might be an orphan ribosome chaperone, partly based on its nucleolar localization, but provide no evidence for this. If this were true, then one might expect a reduction in ribosome content under stress conditions (because there are more ORPS to take care of because of translation stalling?) and a decreased rate of protein synthesis (yes, this happens, how much this is due to overall translation suppression vs there being less ribosomes to translation things, is unknown and hard to test), which could be tested. Some further insights into this more general role of Sis1 would strengthen the authors' conclusions.

      We would like to make a distinction between the important biochemical roles for Sis1 in the cellular response to heat shock – which we explore elsewhere – and the role we are investigating here for the regulation of Sis1 expression by Hsf1. For new insights into the functional role of Sis1 as a chaperone for orphan ribosomal proteins, please see our recent preprint (Ali et al., https://www.biorxiv.org/content/10.1101/2022.11.09.515856v1). Here, we have focused on how Sis1 transcriptional regulation promotes fitness. Please see above for the description of the new mechanistic insight we have into the role of Sis1 expression tuning in controlling stress granules.

      Moreover, whether Sis1 plays a general role in the fitness of cells under stress has not been firmly established, i.e., is its mechanistic role the same in heat shock conditions and under nutrient stress conditions? Without knowing the mechanistic basis for how Sis1 maintains the fitness of heat-stressed cells, it is not possible to conclude that the same mechanism is at play in cells grown on a non-preferred carbon source.

      As described above, we have now provided evidence that the inability to properly tune Sis1 expression levels in the 2xSUP35-SIS1 strain results in disrupted stress granule homeostasis, linking a known function of Sis1 to a known process driven by nutrient stress.

      Figure 4: This is an ingenious experiment to study the subcellular localization of newly synthesized Sis1 in response to heat shock, compared to that of the heat-shock inducible Hsp70 Ssa1. However, based on the images presented in panel B it is hard to know how discrete the subnuclear distributions of Sis1 and Ssa1 really are, and ideally what is needed is to be able to analyze their localizations when both tagged proteins are expressed in the same cell, although this would obviously not be possible using the halo-tagged protein system. In addition, one would like to know the localization of Hsf1 in the cell at the same time. As it stands, these data seem overinterpreted, and it remains possible that some other event such as an inactivating post-translational modification of Sis1 under heat shock conditions might be involved in inactivating its function.

      To address this concern, we constructed two new imaging strains expressing Hsf1-mVenus/Halo-Sis1 and Hsf1-mVenus/Halo-Ssa1 (Hsp70) and used pulse-labeling followed by live lattice light sheet 3D imaging to resolve the subcellar localization of newly synthesized Sis1 and Hsp70 with respect to Hsf1 over a heat shock time course. Unfortunately, we cannot monitor newly induced Sis1 and newly induced Hsp70 simultaneously in the same cells with the HaloTag pulse labeling system. We found that a significantly greater fraction of newly synthesized Hsp70 colocalizes with Hsf1 than new Sis1. Thus, while we cannot directly image new Sis1 and Hsp70 in the same cell, we clearly observe a differential localization pattern with respect to Hsf1. These data are included in the revised Figure 4.

      One way to establish whether Sis1 nucleolar sequestration prevents it from acting on Hsf1 during the inactivation phase of the HSR would be to selectively disrupt its nucleolar localization signal eliminated while retaining its nuclear localization and determine how expression of such a mutant perturbed the inactivation kinetics of the HSR.

      Unfortunately, there is no known Sis1 nucleolar localization signal that we could use in the experiment you propose. In the preprint described above, we show that direct interactions with oRPs recruit Sis1 to the nucleolar periphery, but we do not yet know binding to oRPs is competitive with binding to Hsf1.

      Reviewer #2 (Public Review):

      This study aims to provide a needed update and validation of a previously outlined mathematical model that describes HSR/Hsf1 regulation. The purpose of the update is to incorporate the impact of newly translated proteins as negative regulators of Hsf1 following heat shock. A requirement for ongoing translation to mount the HSR and activate Hsf1 has been described in several recent studies. Moreover, the study addresses the role of the Hsp70 cochaperone Sis1 in HSR regulation, including its potential function in negative feedback regulation following heat-shock.

      The main strength of the study is that it combines quantitative modeling with a well-defined experimental system to generate data. Overall, the model appears to accurately reflect the behavior of HSR under the employed experimental conditions and provides and elegant example of a formalized model for this simple regulatory circuit. Another strength of the study is that it addresses the functional involvement of Sis1 in HSR/Hsf1 regulatory mechanisms and rules out Sis1 involvement in negative feedback regulation of Hsf1 following heat shock. This finding is of importance in light of the complexity of Sis1 involvement in HSR/Hsf1 regulation suggested by the literature. The authors also document a need for endogenous SIS1 promoter regulation during growth on non-fermentable carbon sources.

      The study is important for the advancement of Hsf1 research and it may provide inspiration for the study of other chaperone-titrated transcriptional mechanisms such as the UPR or bacterial stress sigma factors.

      We thank the reviewer for the generous evaluation.

      Reviewer #3 (Public Review):

      This paper follows other excellent work from the Pincus laboratory detailing the molecular mechanisms of Hsf1 regulation and extending experimental observations into predictive mathematical models. Overall, the work is top-quality, however, the findings are incremental in nature with respect to our understanding of the HSR and refine existing models rather than break new experimental or conceptual ground. Additionally, the relevance of the non-fermentable carbon source growth phenotype for the 2XSUP35pr-SIS1 strain is unclear with respect to HSR regulation.

      We thank the reviewer for this fair assessment of the work.

    1. Author Response

      Reviewer #2 (Public Review):

      I believe the authors succeeded in finding neural evidence of reactivation during REM sleep. This is their main claim, and I applaud them for that. I also applaud their efforts to explore their data beyond this claim, and I think they included appropriate controls in their experimental design. However, I found other aspects of the paper to be unclear or lacking in support. I include major and medium-level comments:

      Major comments, grouped by theme with specifics below:

      Theta.

      Overall assessment: the theta effects are either over-emphasized or unclear. Please either remove the high/low theta effects or provide a better justification for why they are insightful.

      Lines ~ 115-121: Please include the statistics for low-theta power trials. Also, without a significant difference between high- and low-theta power trials, it is unclear why this analysis is being featured. Does theta actually matter for classification accuracy?

      Lines 123-128: What ARE the important bands for classification? I understand the point about it overlapping in time with the classification window without being discriminative between the conditions, but it still is not clear why theta is being featured given the non-significant differences between high/low theta and the lack of its involvement in classification. REM sleep is high in theta, but other than that, I do not understand the focus given this lack of empirical support for its relevance.

      Line 232-233: "8). In our data, trials with higher theta power show greater evidence of memory reactivation." Please do not use this language without a difference between high and low theta trials. You can say there was significance using high theta power and not with low theta power, but without the contrast, you cannot say this.

      Thank you, we have taken this point onboard. We thought the differences observed between classification in high and low theta power trials were interesting, but we can see why the reviewer feels there is a need for a stronger hypothesis here before reporting them. We have therefore removed this approach from the manuscript, and no longer split trials into high and low theta power.

      Physiology / Figure 2.

      Overall assessment: It would be helpful to include more physiological data.

      It would be nice, either in Figure 2 or in the supplement, to see the raw EEG traces in these conditions. These would be especially instructive because, with NREM TMR, the ERPs seem to take a stereotypical pattern that begins with a clear influence of slow oscillations (e.g., in Cairney et al., 2018), and it would be helpful to show the contrast here in REM.

      We thank the reviewer for these comments. We have now performed ERP and time-frequency analyses following a similar approach to that of (Cairney et al., 2018). We have added a section in the results for these analyses as follows:

      “Elicited response pattern after TMR cues

      We looked at the TMR-elicited response in both time-frequency and ERP analyses using a method similar to the one used in (Cairney et al., 2018), see methods. As shown in Figure 2a, the EEG response showed a rapid increase in theta band followed by an increase in beta band starting about one second after TMR onset. REM sleep is dominated by theta activity, which is thought to support the consolidation process (Diekelmann & Born, 2010), and increased theta power has previously been shown to occur after successful cueing during sleep (Schreiner & Rasch, 2015). We therefore analysed the TMR-elicited theta in more detail. Focussing on the first second post-TMR-onset, we found that theta was significantly higher here than in the baseline period, prior to the cue [-300 -100] ms, for both adaptation (Wilcoxon signed rank test, n = 14, p < 0.001) and experimental nights (Wilcoxon signed rank test, n = 14, p < 0.001). The absence of any difference in theta power between experimental and adaptation conditions (Wilcoxon signed rank test, n = 14, p = 0.68), suggests that this response is related to processing of the sound cue itself, not to memory reactivation. Turning to the ERP analysis, we found a small increase in ERP amplitude immediately after TMR onset, followed by a decrease in amplitude 500ms after the cue. Comparison of ERPs from experimental and adaptation nights showed no significant difference, (n= 14, p > 0.1). Similar to the time-frequency result, this suggests that the ERPs observed here relate to the processing of the sound cues rather than any associated memory.“

      And we have updated Figure 2.

      Also, please expand the classification window beyond 1 s for wake and 1.4 s for sleep. It seems the wake axis stops at 1 s and it would be instructive to know how long that lasts beyond 1 s. The sleep signal should also go longer. I suggest plotting it for at least 5 seconds, considering prior investigations (Cairney et al., 2018; Schreiner et al., 2018; Wang et al., 2019) found evidence of reactivation lasting beyond 1.4 s.

      Regarding the classification window, this is an interesting point. TMR cues in sleep were spaced 1.5 s apart and that is why we included only this window in our classification. Extending our window beyond 1.5 s would mean that we considered the time when the next TMR cue was presented. Similarly, in wake the duration of trials was 1.1 s thus at 1.1 s the next tone was presented.

      Following the reviewer’s comment, we have extended our window as requested even though this means encroaching on the next trial. We do this because it could be possible that there is a transitional period between trials. Thus, when we extended the timing in wake and looked at reactivation in the range 0.5 s to 1.6 s we found that the effect continued to ~1.2 s vs adaptation and chance, e.g. it continued 100 ms after the trial. Results are shown in the figures below.

      Temporal compression/dilation.

      Overall assessment: This could be cut from the paper. If the authors disagree, I am curious how they think it adds novel insight.

      Line 179 section: In my opinion, this does not show evidence for compression or dilation. If anything, it argues that reactivation unfolds on a similar scale, as the numbers are clustered around 1. I suggest the authors scrap this analysis, as I do not believe it supports any main point of their paper. If they do decide to keep it, they should expand the window of dilation beyond 1.4 in Figure 3B (why cut off the graph at a data point that is still significant?). And they should later emphasize that the main conclusion, if any, is that the scales are similar.

      Line 207 section on the temporal structure of reactivation, 1st paragraph: Once again, in my opinion, this whole concept is not worth mentioning here, as there is not really any relevant data in the paper that speaks to this concept.

      We thank the reviewer for these frank comments. On consideration, we have now removed the compression/dilation analysis.

      Behavioral effects.

      Overall assessment: Please provide additional analyses and discussion.

      Lines 171-178: Nice correlation! Was there any correlation between reactivation evidence and pre-sleep performance? If so, could the authors show those data, and also test whether this relationship holds while covarying our pre-sleep performance? The logic is that intact reactivation may rely on intact pre-sleep performance; conversely, there could be an inverse relationship if sleep reactivation is greater for initially weaker traces, as some have argued (e.g., Schapiro et al., 2018). This analysis will either strengthen their conclusion or change it -- either outcome is good.

      Thanks for these interesting points. We have now performed a new analysis to check if there was a correlation between classification performance and pre-sleep performance, but we found no significant correlation (n = 14, r = -0.39, p = 0.17). We have included this in the results section as follows:

      “Finally, we wanted to know whether the extent to which participants learned the sequence during training might predict the extent to which we could identify reactivation during subsequent sleep. We therefore checked for a correlation between classification performance and pre-sleep performance to determine whether the degree of pre-sleep learning predicted the extent of reactivation, this showed no significant correlation (n = 14, r = -0.39, p = 0.17). “

      Note that we calculated the behavioural improvement while subtracting pre-sleep performance and then normalising by it for both the cued and un-cued sequences as follows:

      [(random blocks after sleep - the best 4 blocks after sleep) – (random blocks pre-sleep – the best 4 blocks pre-sleep)] / (random blocks pre-sleep – the best 4 blocks pre-sleep).

      Unlike Schönauer et al. (2017), they found a strong correspondence between REM reactivation and memory improvement across sleep; however, there was no benefit of TMR cues overall. These two results in tandem are puzzling. Could the authors discuss this more? What does it mean to have the correlation without the overall effect? Or else, is there anything else that may drive the individual differences they allude to in the Discussion?

      We have now added a discussion of this point as follows:

      “We are at a very early phase in understanding what TMR does in REM sleep, however we do know that the connection between hippocampus and neocortex is inhibited by the high levels of Acetylcholine that are present in REM (Hasselmo, 1999). This means that the reactivation which we observe in the cortex is unlikely to be linked to corresponding hippocampal reactivation, so any consolidation which occurs as a result of this is also unlikely to be linked to the hippocampus. The SRTT is a sequencing task which relies heavily on the hippocampus, and our primary behavioural measure (Sequence Specific Skill) specifically examines the sequencing element of the task. Our own neuroimaging work has shown that TMR in non-REM sleep leads to extensive plasticity in the medial temporal lobe (Cousins et al., 2016). However, if TMR in REM sleep has no impact on the hippocampus then it is quite possible that it elicits cortical reactivation and leads to cortical plasticity but provides no measurable benefit to Sequence Specific Skill. Alternatively, because we only measured behavioural improvement right after sleep it is possible that we may have missed behavioural improvements that would have emerged several days later, as we know can occur in this task (Rakowska et al., 2021).”

      Medium-level comments

      Lines 63-65: "We used two sequences and replayed only one of them in sleep. For control, we also included an adaptation night in which participants slept in the lab, and the same tones that would later be played during the experimental night were played."

      I believe the authors could make a stronger point here: their design allowed them to show that they are not simply decoding SOUNDS but actual memories. The null finding on the adaptation night is definitely helpful in ruling this possibility out.

      We agree and would like to thank the reviewer for this point. We have now included this in the text as follows: “This provided an important control, as a null finding from this adaptation night would ensure that we are decoding actual memories, not just sounds. “

      Lines 129-141: Does reactivation evidence go down (like in their prior study, Belal et al., 2018)? All they report is theta activity rather than classification evidence. Also, I am unclear why the Wilcoxon comparison was performed rather than a simple correlation in theta activity across TMR cues (though again, it makes more sense to me to investigate reactivation evidence across TMR cues instead).

      Thanks a lot for the interesting point. In our prior study (Belal et. al. 2018), the classification model was trained on wake data and then tested on sleep data, which enabled us to examine its performance at different timepoints in sleep. However in the current study the classifier was trained on sleep and tested on wake, so we can only test for differential replay at different times during the night by dividing the training data. We fear that dividing sleep trials into smaller blocks in this way will lead to weakly trained classifiers with inaccurate weight estimation due to the few training trials, and that these will not be generalisable to testing data. Nevertheless, following your comment, we tried this, by dividing our sleep trials into two blocks, e.g. the first half of stimulation during the night and the second half of stimulation during the night. When we ran the analysis on these blocks separately, no clusters were found for either the first or second halves of stimulation compared to adaptation, probably due to the reasons cited above. Hence the differences in design between the two studies mean that the current study does not lend itself to this analysis.

      Line 201: It seems unclear whether they should call this "wake-like activity" when the classifier involved training on sleep first and then showing it could decode wake rather than vice versa. I agree with the author's logic that wake signals that are specific to wake will be unhelpful during sleep, but I am not sure "wake-like" fits here. I'm not going to belabor this point, but I do encourage the authors to think deeply about whether this is truly the term that fits.

      We agree that a better terminology is needed, and have now changed this: “In this paper we demonstrated that memory reactivation after TMR cues in human REM sleep can be decoded using EEG classifiers. Such reactivation appears to be most prominent about one second after the sound cue onset. ”

      Reviewer #3 (Public Review):

      The authors investigated whether reactivation of wake EEG patterns associated with left- and right-hand motor responses occurs in response to sound cues presented during REM sleep.

      The question of whether reactivation occurs during REM is of substantial practical and theoretical importance. While some rodent studies have found reactivation during REM, it has generally been more difficult to observe reactivation during REM than during NREM sleep in humans (with a few notable exceptions, e.g., Schonauer et al., 2017), and the nature and function of memory reactivation in REM sleep is much less well understood than the nature and function of reactivation in NREM sleep. Finding a procedure that yields clear reactivation in REM in response to sound cues would give researchers a new tool to explore these crucial questions.

      The main strength of the paper is that the core reactivation finding appears to be sound. This is an important contribution to the literature, for the reasons noted above.

      The main weakness of the paper is that the ancillary claims (about the nature of reactivation) may not be supported by the data.

      The claim that reactivation was mediated by high theta activity requires a significant difference in reactivation between trials with high theta power and trials with low theta, but this is not what the authors found (rather, they have a "difference of significances", where results were significant for high theta but not low theta). So, at present, the claim that theta activity is relevant is not adequately supported by the data.

      The authors claim that sleep replay was sometimes temporally compressed and sometimes dilated compared to wakeful experience, but I am not sure that the data show compression and dilation. Part of the issue is that the methods are not clear. For the compression/dilation analysis, what are the features that are going into the analysis? Are the feature vectors patterns of power coefficients across electrodes (or within single electrodes?) at a single time point? or raw data from multiple electrodes at a single time point? If the feature vectors are patterns of activity at a single time point, then I don't think it's possible to conclude anything about compression/dilation in time (in this case, the observed results could simply reflect autocorrelation in the time-point-specific feature vectors - if you have a pattern that is relatively stationary in time, then compressing or dilating it in the time dimension won't change it much). If the feature vectors are spatiotemporal patterns (i.e., the patterns being fed into the classifier reflect samples from multiple frequencies/electrodes / AND time points) then it might in principle be possible to look at compression, but here I just could not figure out what is going on.

      Thank you. We have removed the analysis of temporal compression and dilation from the manuscript. However, we wanted to answer anyway. In this analysis, raw data were smoothed and used as time domain features. The data was then organized as trials x channels x timepoints then we segmented each trial in time based on the compression factor we are using. For instance, if we test if sleep is 2x faster than wake we look at the trial lengths in wake which was 1.1 sec. and we take half of this value which is 0.55 sec. we then take a different window in time from sleep data such that each sleep trial will have multiple smaller segments each of 0.55 sec., we then add those segments as new trials and label them with the respective trial label. Afterwards, we resize those segments temporally to match the length of wake trials. We now reshape our data from trials x channels x timepoints to trials x channels_timepoints so we aggregate channels and timepoints into one dimension. We then feed this to PCA to reduce the dimensionality of channels_timepoints into principal components. We then feed the resultant features to a LDA classifier for classification. This whole process is repeated for every scaling factor and it is done within participant in the same fashion the main classification was done and the error bars were the standard errors. We compared the results from the experimental night to those of the adaptation night.

      For the analyses relating to classification performance and behavior, the authors presently show that there is a significant correlation for the cued sequence but not for the other sequence. This is a "difference of significances" but not a significant difference. To justify the claim that the correlation is sequence-specific, the authors would have to run an analysis that directly compares the two sequences.

      Thanks a lot. We have now followed this suggestion by examining the sequence specific improvement after removing the effect of the un-cued sequence from the cued sequence. This was done by subtracting the improvement of the un-cued sequence from the improvement for the cued sequence, and then normalising the result by the improvement of the un-cued sequence. The resulting values, which we term ‘cued sequence improvement’ showed a significant correlation with classification performance (n = 14, r = 0.56, p = 0.04). We have therefore amended this section of the manuscript as follows: We have updated the text as follows: “We therefore set out to determine whether there was a relationship between the extent to which we could classify reactivation and overnight improvement on the cued sequence. This revealed a positive correlation (n = 14, r = 0.56, p = 0.04), Figure 3b.”

    1. Author Response

      Reviewer #1 (Public Review):

      Pelentritou and colleagues investigated the brain’s ability to infer temporal regularities in sleep. To do so, they measured the effect on brain and cardiac activity to the omission of an expected sound. Participants were presented with three different categories of sounds: fixed sound-to-sound intervals (isochronous), fixed heartbeat-to-sound intervals (synchronous), and a control condition without any regularity (asynchronous). When omitting a sound, they observed a difference in the isochronous and synchronous conditions compared to the control condition, in both wakefulness and sleep (NREM stage 2). Furthermore, in the synchronous condition, sounds were temporally associated with sleep slow waves suggesting that temporal predictions could influence ongoing brain dynamics in sleep. Finally, at the level of cardiac activity, the synchronous condition was associated with a deceleration of cardiac frequency across vigilance states. Overall, this work suggests that the sleeping brain can learn temporal expectations and responds to their violation.

      We thank the reviewer for the very useful and informed comments, to which we carefully reply below.

      Major strengths and weaknesses:

      The paradigm is elegant and robust. It represents a clever way to investigate an important question: whether the sleeping brain can form and maintain predictions during sleep. Previous studies have so far highlighted the lack of evidence for predictive processes during sleep (e.g. (Makov et al., 2017; Strauss et al., 2015; Wilf et al., 2016)). This work shows that at least a certain type of prediction still takes place during sleep.

      However, there are some important aspects of the methodology and interpretations that appear problematic.

      (1) The methodology and how it compares to previous articles would need to be clarified. For example, the Methods section indicates that the authors used a right earlobe electrode as a reference. This is quite different from the nose reference used by SanMiguel et al. (2013) or in Dercksen et al. (2022). This could affect the polarity and topographies of the OEP or AEP and thus represents a very significant difference. Likewise, SOs are typically detected in a montage reference to the mastoids. Perhaps the left/right asymmetries present in many plots (e.g. Figure 3) could be due to the right earlobe reference used.

      We thank the reviewer for raising this important point which has prompted us to clarify the reference choice in the manuscript both for completing the information about data recordings in our experiment and for emphasizing the influence of the reference on the EEG results and how they compare to previous reports.

      First, we would like to clarify that although EEG data is referenced to the right earlobe online, electrophysiological data from both earlobes were acquired and offline re-referencing to paired earlobes was performed. This is now clarified in the Methods section on page 26, lines 648-651 as follows:

      ‘Continuous EEG (g.HIamp, g.tec medical engineering, Graz, Austria) was acquired at 1200 Hz from 63 active ring electrodes (g.LADYbird, g.tec medical engineering) arranged according to the international 10–10 system and referenced online to the right earlobe and offline to the left and right ear lobes.’

      Additionally, after preprocessing, we performed common average re-referencing, as is common practise and recommended in the literature (see e.g. Niso et al., 2022), and hence the initial online referencing is no longer of relevance. Nonetheless, we agree with the reviewer that different online and offline referencing schemes could explain why some results in the literature are not optimally reproducible. We have clarified this point in the discussion on page 17, lines 408-411 as follows:

      ‘Finally, while we used largely similar pre-processing (i.e. filters) and experiment implementation (i.e. online and offline reference) as in Chennu et al. (2016), this was not the case for other studies with which direct comparisons are unwarranted.’

      For the SO analysis chosen reference (linked earlobes online and common average offline in our case) we acknowledge that - as the reviewer mentioned - many groups indeed employ mastoid re-referencing for SO detection (e.g. Siclari et al., 2018; Schneider et al., 2020; Ameen et al., 2022). However, to the best of our knowledge, this is not a standard choice, as many other groups choose a linked earlobe reference for online SO detection and the mastoids only for offline SO detection (Ngo et al., 2013; Besedovsky et al., 2017; Ngo and Staresina, 2022). In addition, other recent studies used linked earlobe referencing (Bouchard et al., 2021) or common average re-referencing (Züst et al., 2019) for offline SO detection. In our study we opted for using the same average reference for SO detection and evoked potential analysis in order to be able to relate the results of the omission evoked response comparison to that of the SO analysis.

      Also, the authors did not use the same filters in wakefulness and sleep, which could introduce an important bias when comparing sleep and wake results or sleep results with previous wake papers.

      We fully agree with the reviewer and thank him/her for this suggestion. We have now re-analysed the wakefulness data using a bandpass filter of 0.5-30 Hz as used for the sleep data. The chosen filtering range is commonly used in sleep research. Moreover, Chennu et al. (2016) employed a very similar filtering range (0.5-25 Hz) in an omission EEG study, whose results are similar to ours (Chennu et al., 2016). This new preprocessing resulted in a higher number of valid trials (average trial number: before N=245, now N=286) in wakefulness. Hence, the data from more participants could be used (before N=21, now N=23) and the statistical power of observed differences in our comparisons was improved. The Methods section has been updated accordingly on page 31, lines 763-764 as follows:

      ‘Continuous raw EEG data were band-pass filtered using second-order Butterworth filters between 0.5 and 30 Hz for the wakefulness and sleep session.’

      (2) The ERP to sound omission shows significant differences between the isochronous and asynchronous conditions in wakefulness (Figure 3A and Supp. Fig.) but this difference is very different from previous reports in wakefulness. Topographies are also markedly different, which questions whether the same phenomenon is observed. For example, SanMiguel and colleagues observed an N1 in response to omitted but expected sounds. The authors argue that they observe a similar phenomenon in the iso vs baseline contrast, but the timing and topography of their effect are very different from the typical N1. The authors also mention that, within their study, wake and N2 OEPs were "largely similar" but they differ in terms of latencies and topographies (Figure 3A-B). It would be better to have a more objective way to explore differences and similarities across the different analyses of the paper or with the literature.

      We concur with the reviewer and reviewing editor, who both pointed that the way we previously analysed (see our reply to the reviewer’s previous comment) and reported our data was sub-optimal. The new analysis of the wake data reveals more similarities with the MMN and to some extend with the omission literature (Figure 4). As requested, we also improved the description of the comparison of our results to those from the literature, in the Discussion section (pages 17-19, lines 391-458).

      (3) The authors applied a cluster permutation to identify clusters of significant time points. However, some aspects of this analysis are puzzling. Indeed, the authors restricted the cluster permutation to a temporal window of 0 to 350ms in wake (vs. -100 to 500ms in sleep). This can be misleading since the graphs show a larger temporal window (-100 to 500ms). Consequently, portions of this time window could show no cluster because the analysis revealed an absence of significant clusters but because the cluster permutation was not applied there. Besides, some of the reported clusters are extremely brief (e.g. l. 195, cluster's duration: 62ms), which could question their physiological relevance or raise the possibility that some of these clusters could be false positives (there was no correction for multiple comparisons across the many cluster permutations performed). Finally, there seems to be a duplication of the bar graphs showing the number of significant electrodes in the positive and first negative cluster for Figure 2 Supp. Fig. 1.

      We thank the reviewer for raising this point. We have now performed cluster permutation statistical analysis over the entire -100 to 500 ms window in wakefulness, thus matching the temporal window used for the sleep data (Methods, page 34, lines 843-846). Please note that this modified temporal window was applied to the wake data for which the pre-processing had also been modified (see our reply to comment #1 above). With matching analysis for wakefulness and sleep, we now identify clusters of higher or similar significance compared to our earlier results (Cohen’s d for isoch vs asynch = 0.92 now and 0.67 before; for synch vs asynch = 0.91 now and 1.06 before). In addition, for the isoch vs asynch omission response comparisons, overlapping cluster periods are identified in wakefulness (114-159 ms) and sleep (85-223 ms). The relevant results are thoroughly described on pages 9-10, lines 202-210; page 11, lines 238-251, pages 38-39, lines 970-985.

      We would like to also mention that while multiple comparisons correction is performed across channels and electrodes in the EEG using cluster permutation statistics, it is true that we do not perform multiple comparisons correction across the many comparisons. We now explicitly mention the lack of this correction for multiple comparisons in the Methods section page 34, lines 840-843 as follows:

      ‘Of note, the cluster permutation based multiple comparisons correction only applied across channels and latencies when comparing two experimental conditions, however no multiple comparisons correction was applied across the number of comparisons made in this study.’

      (4) More generally, regarding statistics, the absence of exact p-values can render the interpretation of statistical outputs difficult. For example, the authors report a significant modulation of the sound-to-SO latency across conditions (p<0.05) but no significant effect of heartbeat peak-to-SO latency (p>0.05). They interpret this pattern of results rather strongly as evidence that the "readjustment of SOs was specific to auditory regularities and not to cardiac input". Yet, examining the reported chi-square values show very close values between the two analyses (7.9 vs. 7.4). It seems thus difficult to argue for a real dissociation between the two effects. Providing exact p-values for all statistical tests could help avoid this pitfall.

      To assist the interpretation of statistical analysis results, we have now included exact p-values.

      Specifically, for SOs, we agree with the reviewer on the highly similar chi-squared values for the two analyses of Sound onset to SO peak and R peak onset to SO peak and have now included a comment in the discussion to reflect this on page 20, lines 478-480 as follows:

      ‘However, it should be noted that although not significant, we observed a trend of lower R peak to SO peak latencies during cardio-audio regularity compared to the other auditory conditions, possibly driven by the fixed relationship between heartbeat and sound in the synch condition.’

      Reviewer #2 (Public Review):

      This study was designed to study the cortical response to violations in auditory temporal sequences during wakefulness and sleep. To this end, the study had three levels of temporal sequence, a regular temporal sequence, an auditory tone that was yoked to the cardiac signal, and an irregular tone. The authors show significant EEG differences to an omitted tone when the auditory tone was predictable both during wakefulness and sleep.

      The authors analyze the ERP to the omitted tone as well as when aligned to the R-peak of the HEP. The analysis was comprehensive and the effects reported align with the interpretation given. Of particular interest was the fact that a deceleration of the heart rate was present for omissions when the auditory tone was yoked to the R-peak (synch) in all stages of wakefulness and sleep.

      We thank the reviewer for his/her positive judgment.

      However, one weakness was the rationale for the current study and how the results link to current theoretical frameworks for the role of interoception in perception and cognition. This was in contrast to the clear background and explanation to study the response to omissions for a predictable auditory sequence in wakefulness and sleep. It was unclear why the authors selected the cardiac signal to yoke their auditory stimuli. What is the specific motivation for the cardiac signal rather than the respiratory signal? This was not clear.

      In the revised Introduction section, we improved our description of these aspects, including the interaction between interoception and external stimulus processing. We hypothesized that cardiac signals would be more relevant than respiratory signals in coordinating temporal expectation because of existing prior experimental evidence thereof, as well as data showing a modulation of the neural response to heartbeat by levels of vigilance/consciousness, and the sharp cardiac R peak offering an ideal candidate for online temporal locking to administered sounds (see our detailed reply to the reviewer’s comment #2 below). However, we cannot exclude that respiratory signals could also be used by the brain to assist temporal regularities detection.

      Future studies may test for this possibility.

    1. Author Response

      Reviewer #1 (Public Review):

      Kozol et al adapt an important tool, in the form of the atlas, to the Astyanax research community. While broadly the atlas appears to correctly identify large brain regions, it is unclear what is the significance of the finer divisions. The external confirmations are restricted to just a few large brain regions (by independent human observer: e.g., optic tectum, hypothalamus. By molecular marker: hypothalamus only.). As such, interpretations of results from as many as 180 small subregions should be interpreted sceptically.

      The authors also suggest that some brain regions have increased in size during cavefish evolution (e.g., hypothalamus, subpallium). The analysis of progeny from a genetic cross of cave and surface morphs suggest a complex genetic program has evolved to control this variant set of brain structures. With the development of genetic manipulation tools in this species, an exciting series of experiments may link causal variants with brain development differences.

      MAJOR ISSUES

      Line 85+. Segmentation accuracy is not well established by the authors. For example, Figure S2 states that the pixel correlation is high between Astyanax populations. But the details of how this cross-correlation was done are sparse. Is the Y- axis here showing the fraction of pixels that are shared in the morphs? While the annotation appears to function similarly across morphs, the 80% machine:human correlation is difficult to put into context. On the one hand, this seems low. For what values should one strive? Are there common "mistakes" or differences in human & machine annotations that lead to certain regions being excluded? A discussion of these is warranted and will be useful to others who wish to use this approach.

      Line 87. "such as" is misleading since these were the only two antibodies used to confirm molecular definitions of regions.

      But more to the point, additional markers should be used to confirm more than just the ISL+ hypothalamic divisions.

      This is particularly warranted, as Fig 1d is not convincing. I believe that the yellow label is ISL; this is difficult to see in the figures. ISL is not ideal since this is widespread in the hypothalamus. There are no ISL-negative regions depicted, which would be necessary to demonstrate that the resolution of this subregion labeling tool is high. A complementary approach would be to find molecular markers that are more restricted than ISL which label only subsets of hypothalamic regions.

      Finally, do the mid/hindbrain ISL labeled regions correspond to known ISL+ subregions?

      We agree with the reviewer that the Islet1/2 assessment was insufficient for demonstrating automated segmentation accuracy and that the labeling was difficult to visualize in the previous version of the figure. We have addressed this reviewers concern by adding new molecular markers for verification of segment accuracy and through a modified presentation of the original data. The first, and in our opinion most convincing, is the addition of more markers of known neuroanatomical regions. This required not only adding extra antibody stains to our brain atlas, but also optimizing Hybridization Chain Reaction (HCR) in situ protocol that could be coupled with immunohistochemistry, permitting automated segmentation via total ERK registration and brain atlas inverse registration. This novel protocol showed corresponding localization of markers, such as 5-hydroxytryptamine (5-HT), gastrulation brain homeobox 1 (gbx1), and oxytocin (oxt), in the expected neuroanatomical areas. It should be noted that these markers included both large neuroanatomical areas as well as small, well-defined areas such as the superior, and also labeled disparate neuroanatomical loci throughout the brain. We also modified our original figure to better illustrate the regions that islet+ staining labeled. These markers show that islet1/2 labels precise regions of the hypothalamus which correspond to known expression patterns. The updated methodology can be found in lines 422- 440, while the results can be found in lines 105-118 of the text, Figure 1 and Figure 1 – Figure Supplement 1a.

      We believe these two changes address the reviewers concerns, and suggest that the neuroanatomical labels generated in this study faithfully label the Astyanax brain.

      The molecular and human-observed confirmations of brain regions suggests that the annotated borders of gross anatomical regions are correctly identified by the algorithm. However, data is not presented that indicates whether the smaller regions correspond to biologically meaningful compartments.

      We agree with the reviewer that our assessment of regional accuracy for automated segmentation necessitated additional markers, which labeled smaller, more refined compartments. To address this, we developed an HCR in situ hybridization strategy that was compatible with our brain atlas, and used several markers that label smaller regions, such as the 5-HT positive neurons of the dorsal raphe and oxytocin positive neurons of the medial preoptic region. Together, these results were consistent with our previous finding that anatomical regions confirmed by human- observation and molecular staining did faithfully label the correct regions of the brain. These findings can be found in lines 105-118 in the text, along with Figure 1 and Figure 1 – Figure Supplement 1a-d. Together, we hope this shows that not only large neuroanatomical areas, but also finer areas are correctly labeled by CobraZ.

      Parameters used in CobraZ to perform the segmentation are not defined. More transparency is required here for others to replicate.

      We agree with the reviewer that parameters used for CobraZ and Advanced Normalization Tools (ANT) are necessary for reproducibility of our results. We have since added sentences to clarify that we did not change the original ANTs or CobraZ parameters from Gupta et al. 2018. (line 474- 475) and have added the CobraZ parameter file and ANTs bash scripts to our dryad depository.

      Reviewer #3 (Public Review):

      In this manuscript the authors use novel techniques and analytical methods on an up and coming animal model for brain evolution. The manuscript utilizes the cavefish Astyanax mexicanus, which can provide future important insights into the field of neurobiology and in evolution in general.

      The authors however, only argue that Astyanax is a powerful system for functionally determining basic principles of brain evolution (which clearly it will be), but fail to actually describe what brain evolution insights Astyanax gives. The data is in the paper, but the interpretation needs refinement. This would be a much more valuable paper with a thorough evolutionary context based on the already existing, extensive literature. I believe this manuscript has the potential to be extremely impactful.

      We thank the reviewer for her positive critique of our manuscript, and more broadly for the thoughtful comments, the challenge to re-evaluate the way we have thought about our own data, and for hinting us in a direction of scientific direction that is more impactful. We have spent a lot of time re-thinking this work to address this reviewers critique, and believe that it is a far better study for it.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of this manuscript aimed to systematically evaluate the pleiotropic effects of MCR-1-mediated colistin resistance. They evaluated the effect of MCR-1 and MCR-3 carried on different plasmids on antimicrobial peptides (AMPs) and assessed their ultimate effect on virulence. The authors find that MCR-1-mediated colistin resistance correlates with increased resistance against some host AMPs, but also increased sensitivity to others. The authors also find that MCR-1 alone is associated with resistance to human serum and to elements of the complement system. This highlights a potential selective advantage for MCR-1-mediated resistance to host immune factors and a potential for enhanced virulence.

      The methods have been well established before and adequately support their main findings. While determining the role of MCR-1 in a single genetic background is important to better understand its potential pleiotropic effects against a diversity of AMPs and in a variety of scenarios, the impact and significance of the results are partially ameliorated because different genetic backgrounds, particularly those most relevant to a clinical (or agricultural) context were not considered. The results depicted here are still a necessary and important step towards a more comprehensive understanding of the pleiotropic effects of MCR-1. But, interactions between plasmids and host genomes and their co-evolution can have important effects more generally. The authors do mention this in the discussion and suggest it to be an important avenue for future work. However, given the objective of the study and the clinical and agricultural context in which the authors have framed their work, it seems more relevant to include those distinct genetic backgrounds already here.

      The conclusions stemming from the results found in Figure 3, and Figures 4c and d seem too overreaching to me. The associated resistance to AMPs from pigs seems to be only strong enough against one of the five tested AMPs and hence concluding that these impose a strong selective pressure in the pig's gut seems unsubstantiated. Similarly, the difference in survival probability within their in vivo system, though statistically significant, seems to be very ild between their MCR-1 and empty vector control.

      Thank you for the comment. We agree on the effect of MCR-MOR on AMP susceptibility and have edited the paragraph by removing the lines on strong selective pressure in the pig gut. As regards the 4c and 4d results (4e and 4f in the revised version), it is interesting and statistically convincing that MCR increases bacterial virulence despite the cost of MCR expression. And importantly, this effect is even stronger in the case of LPS treatment where the immune system is stimulated, expressing diverse host AMPs (PMID: 19897755). This shows MCR-mediated advantages to bacteria in the complex host environment.

      Reviewer #2 (Public Review):

      Jangir et al test the hypothesis that resistance to the antimicrobial peptide (AMP) colistin can simultaneously increase resistance to other AMPS with related modes of action. Because AMPS comprise part of innate immunity, their central concern is that colistin resistance may compromise host defenses and thereby increase bacterial virulence. Their results show that MCR-1, whether expressed from naturally circulating or synthetic plasmids, can increase the MIC to AMPS from humans, pigs, and chickens, and impart fitness benefits at sub-MIC concentrations. In addition, they find that MCR-1-containing strains have increased survival in human plasma and are more lethal in an insect infection model.

      The conclusions of the paper are generally well supported by the results, but some aspects could be clearer and better defended with a few small additional experiments.

      Strengths:

      Using both synthetic and natural plasmids makes it possible to cleanly separate the effects of MCR-1 from the effects of other plasmid-borne genes or plasmid copy numbers. This helps confirm the causal role of MCR-1 on altered AMP susceptibility.

      Testing the survival of transformed isolates in human serum and in insects points to relevance in the more immunologically complex host environment where cells are exposed to a suite of factors that reduce bacterial survival.

      Thank you!

      Weaknesses/suggestions:

      Although increases in MIC are evident for different AMPS, the effects are generally modest. To address this, it might be helpful to use pairwise competition assays, as in Figure 1, to establish that even small changes to MIC are associated with clear selective benefits.

      Thank you for the suggestion. We agree that in some cases the change in MIC is modest, however, we would like to highlight that small-level changes in resistance have important clinical implications. For example, resistance mutations conferring a small change in MIC can ensure the survival of pathogenic bacteria in antibiotic-treated hosts (PMID: 30131514). Additionally, a comparison between competition assays (Fig 1) and MICs (Fig 2) clearly shows that small changes in MIC are associated with substantial fitness benefits. For example, for pSEVA:MCR-1, the fold change in MIC of CATH2 (chicken), PMAP23 (pig), and LL37 (human) ranges between 1.05 and 1.5, however, the competitive fitness ranges from 10% to 17%. This issue is discussed in the revised manuscript (lines 306-317, page 13)

      ….This would be especially helpful in assays with human serum and in Galleria where the concentrations of AMPS or other immune components are unknown.

      It is clear that MCR-1 increases resistance to serum and virulence (Figure 4). However, we agree with the reviewer that the selective benefits of MCR-1 in complex host environments are not known (i.e., serum or Galleria). We have revised the final paragraph of the discussion to reflect this limitation of our study (lines 370-382, page 15).

      Assays using human serum are interesting but challenging to interpret given the diverse causes of bacterial killing, including complement. Although this was partly addressed in Supplementary Figure 6, I found the predictions of these experiments unclear. First, I think these experiments are too central to be relegated to the supplemental materials; they belong in the main text. Secondly, it is important to explicitly spell out the expectations of using heat-killed serum (which will degrade any heat-labile components) or complement-deficient serum. It should be clearer under which conditions MCR-1-containing strains are predicted to do better or worse than controls.

      We have addressed this in the revised version. We have moved Supplementary Fig 6 to the main text, and have edited the text, clarifying the model prediction (lines 245-257, page 10).

      Galleria is a useful infection model for virulence, but it is unclear what drives differences between strains. First, bacterial numbers aren't measured in this assay, so it isn't known if increased virulence is due to increased bacterial growth or decreased bacterial clearance. As above, I think these assays would be stronger using the competition-based approach in Figure 1. This would indicate bacterial numbers through time and directly show the selective benefit associated with MCR-1. Second, it would be useful to elaborate on why MCR-1 increases virulence, especially any known similarities between Galleria AMPS and those tested in Figures 1 and 2. Overall, it would help if Galleria were less of a black box.

      We agree that the mechanism underlying increased virulence remains to be explored and thus, we have already discussed this in the discussion as a limitation (lines, 370-382, page 15). However, elucidating the mechanisms by which MCR-1 increases virulence would clearly be an interesting line of research moving forward.

    1. Author Response

      Reviewer #1 (Public Review):

      The adhesion of Leishmania promastigotes to the stomodeal valve in the anterior region of the sandfly vector midgut is thought to be important to facilitate the transmission of the parasites by bite. The promastigote form found in attachment is termed a 'haptomonad', although its adhesion mechanism and role in facilitating transmission have not been well studied. Using 3D EM techniques, the paper provides detailed new information pertaining to the adhesion mechanism. Electron tomography was especially useful to reveal the ultrastructure of the attachment plaque and the extensive remodelling of the flagellum that occurs. A few of the attached haptomonads were found to be in division, which is a novel observation. The attachment of cultured promastigotes to plastic and glass surfaces in vitro was found to involve a similar remodeling of the flagellum and was exploited to image the sequential steps in attachment, flagellar remodeling, and haptomonad differentiation. The in vitro attachment was found to be calcium2+ dependent. Based mainly on the in vitro observations, a sound model of the haptomonad attachment plaque and differentiation process is provided.

      We thank the reviewer for highlighting the significant progress we have made in dissecting the adhesion mechanism and flagellum restructuring in the Leishmania haptomonad.

      Reviewer #2 (Public Review):

      The study by Yanase et al. investigated the details of the 3D architecture of Leishmania haptomonad promastigote's adhesion to the midgut of the insect vector. The authors generated a dataset of images that reveal intricate details of the formed adhesion plaque and expanded the study with in vitro alternatives for the exploration of how Leishmania promastigotes strong adhesion by hemidesmosomes to surfaces can happen and be maintained. They show with unprecedented detail the ultrastructure of the attachment plaque. The in vitro dataset of the paper adds to the specific literature important details on how to explore micro/nanostructures involved in an important attachment step for this eukaryotic parasite. However, the in vitro data should be reconsidered in its discussion and conclusions as it does not support direct comparison with in vivo Leishmania forms as pictured by the authors. In general, the dataset presented in this manuscript adds valuable data and resources for the study of Leishmania promastigotes to surfaces, especially to the thoracic midgut parts of its insect vector.

      The dataset of this paper is well-collected and robust, but some aspects of image analysis need to be clarified and extended. Also, the in vitro data from the manuscript will benefit from an extensive adjustment in its discussion. Points to focus on:

      We thank the reviewer for recognising the ultrastructural detail we have now provided of this cryptic parasite life cycle stage. Below we address each of your points in detail.

      1) The haptomonad promastigote is indeed a possible critical form for transmission, but it lacks formal demonstration still in all literature available. This should not be claimed without proper formal demonstration.

      We agree with the reviewer that any relationship between transmission and the haptomonad form has yet to be formally demonstrated. Hence, we revised the descriptions referring to the relationship between transmission and the haptomonad form (Line 22-23, 31 and 113-114).

      2) Literature available and cited in this manuscript regarding in vitro adhesion of culture Leishmania promastigotes does not provide direct evidence for haptomonad differentiation. Haptomonads are still a largely unknown promastigote form with no defined ontogeny. With that, to propose an in vitro haptomonad differentiation protocol, more detailed direct evidence of in vivo haptomonads will be necessary. The in vitro experiments available show how cultured promastigotes attach to surfaces. Detailed studies in vivo will be needed still to attribute the findings in vitro to haptomonads.

      We would like to highlight that promastigotes and haptomonads have morphological definitions within the literature and our cells are definitely more like haptomonads than promastigotes. As the reviewer highlights, the haptomonad-like cells we generate in vitro have an almost identical morphology and attachment plaque structure to those haptomonads we observed attached to the stomodeal valve. In addition, we have been able to watch individual cells that had a promastigote morphology acquire a haptomonad morphology and we believe this will provide future insights to the ontogeny of these forms. However, as there are currently no published molecular markers for haptomonads we have not been able to provide direct evidence other than the morphology and ultrastructure that in vitro attachment replicates in vivo haptomonad differentiation. Therefore, we have revised our nomenclature and now refer to the in vitro haptomonad-like cell. In the discussion, we have been careful to highlight that certain aspects of our model rely on in vitro data and therefore may not accurately reflect the situation in the sand fly.

      3) This manuscript will benefit by having a detailed description of how to analyze and get to the 3D models presented. This has a strong potential for usage beyond the Leishmania/sand fly field. Statistics should be made available with ease across the manuscript and with a dedicated section on methods.

      We added a detailed description of how to analyse the 3D models (Line 756-763), and added videos showing a rotated view of each 3D model (Figure 1—video 3 and 4, Figure 2—video 2, and Figure 3—video 2 and 4). We have deposited the SBF-SEM and tomography data on the Electron Microscopy Public Image Archive (EMPIAR; https://www.ebi.ac.uk/empiar/), enabling access to the raw data (Line 763-766). We have added a statistics section into the Materials and Methods (Line 864-868).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Sampaio et al. tackle the role of fluid flow during left-right axis symmetry breaking. The left-right axis is broken in the left-right organiser (LRO) where cilia motility generates a directional flow that permit to dictate the left from the right embryonic side. By manipulating the fluid moved by cilia in zebrafish, the authors conclude that key symmetry breaking event occurs within 1 hour through a mechanosensory process.

      Overall, while the study undeniably represents a huge amount of work, the conclusions are not sufficiently backed up by the experiments. Furthermore, the results provided present a limited advance to the field: the transient activity of the LRO is well established, and narrowing down this activity to 1 hour (even though unclear from the presented data that it is a valid conclusion) does not help to understand better the mechanism of symmetry breaking.

      We thank the reviewer1 for acknowledging the hard experimental set up. However, we must argue that knowing the exact timing that is more sensitive to fluid flow manipulations is a very important advance we provide here. The reason is because this type of experiment is giving us the physiological timing in a WT embryo. It is one thing to know the system can respond to optical tweezers earlier than 5 ss and later than 5 ss, as Yuan lab did recently, but quite another to constrain the physiological timing at which the process occurs in an unperturbed manner (as much as possible). Our aim was the latter. Our rationale is that knowing the physiological time is important to provide clues, for example we had these types of questions at the time: is the physiological time before or after cell rearrangements occur? is it falling in a directional or non-directional flow regime? Is it governed by a mild flow or stronger one? Is it before or after dand5 becomes asymmetric? Some of these questions that we think we all know the answers for, could be challenged by our experiments… so it is indeed very important to not assume we know the answer, and ask the question again in an unbiased way with every new technique available! We wanted to be unbiased, and we think that is the beauty of our time-window experiment. Indeed, it shows the physiological time-window peaks at 5 ss which is later than Yuan’s lab calcium transient recording and before dand5 asymmetric expression. In our opinion this is compatible and makes perfect sense because although the system already shows calcium transients before and can respond to lack of Pkd2 or optical tweezer cilia manipulations at 1 ss – 3 ss, it is from 4 to 6 ss, peaking at 5 ss, that it is most responsive physiologically to the fluid extraction and therefore both mechanical and chemical perturbations.

      We have made additional experiments and used smFISH on WT embryos for detecting dand5 expression with cellular resolution, and we have quantified asymmetries in dand5 number of transcripts as early as 6 ss (new Figure 7 and new author: Catarina Bota) that further support our time-window claim. Degradation of dand5 mRNA has been the mechanism suggested to be at the base of the asymmetric dand5 expression, which is usually a very fast mechanism. This new piece of evidence supports that the physiological breaking of symmetry is stronger around 5 ss. (see new discussion on this subject on page 27).

      Regarding the symmetry breaking. The fact that anterior angular velocity was the major difference between embryos that recovered without LR defects versus those that did not, reveals that angular velocity must be tightly regulated by cilia motility and CFTR activity to bring back fluid and flow directionality, which together confer the robustness of flow. This is now better explained in the manuscript. We agree that the novelty regarding angular velocity may seem incremental compared to our work from 2014, where we only analyzed speed (Sampaio et al, 2014). However, here we provided more resolution and detailed parameters of angular velocity per sections of the LRO as well as tangential and radial velocities, the components of angular velocity. The Radial component shows a trend towards left anterior that is now discussed in the text as evidence for a left difference. The present work shows that anterior angular velocity has a major role in the successful recovery of the symmetry breaking process, which was not claimed before. Here we challenged the embryo to bring to light the most important parameters.

      Importantly, the authors do not provide any convincing experiments to back up the mechanosensory hypothesis because the fluid extraction experiments affect both the chemical and physical features of the LRO, so it is impossible to disentangle the two with this approach.

      We agree the first extraction experiment (Figures 1-3 and Table 1) affects both mechanisms and does not disentangle them, and that was, in fact, our goal for the first experiment - the finding of the exact time-window for symmetry breaking. However, in the second part of the work (Figures 4-5 and Table 2) we provide a 20,000 times dilution experiment, this dilution experiment is very different than the extraction one. We apologize if this was not clear and hope to have made it clear this time.

      We must agree with the reviewer that chemosensing is not excluded, in fact we had provided a paragraph in the discussion about EV secretion rates to tone down our claim and did acknowledge that secretion could still overcome the dilution we are causing. We think we had already addressed this problem in the previous eLife manuscript but now we have discussed the possibilities and the experimental evidence that supports each of them (see page 28, last paragraph). The key experiment that does not fit with secretion is pointed out in the end, and we ask the reviewer to read it in the context of wildtype animals. We agree both scenarios must be discussed and leave space for future data on mmp21 and CIROP. However, so far, in zebrafish we cannot favor chemosensing as much as mechanosensing, we can only wait for more discoveries and be open.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Lee and colleagues address the participation of NBR1 in chloroplast clearance after treatment with high light intensity. Authors use NBR1 fused to reporter proteins (GFP, mCherry), with the aid of nbr1, atg7, and nbr1-atg7 mutants, in combination with immunogold labelling to show localization of NBR1 to surface and interior of photodamaged chloroplasts, which follows with their engulfment in the vacuole, a process which is independent of ATG7. The combined use of ATG8 fused to GFP further shows that NBR1 and ATG8 are recruited independently to photodamaged chloroplasts. In addition, the use of mutant versions of NBR1 in combination with mutants lacking E3 ligases PUB4 and SP1 and mutant toc132-2 and tic40-4 lacking members of the TIC-TOC complex of protein translocation to the chloroplast, authors show that chloroplast localization of NBR1 requires the ubiquitin ligase domain (UBA2) of the protein, whereas, the PB1 domain exerts a negative effect on NBR1 chloroplast association, yet neither the PUB4 and SP1 E3 ligases nor the TOC-TIC are required for NBR1 association to photodamaged chloroplasts. All these approaches are well described and strongly support the authors' conclusions that the loss of chloroplast envelope integrity allows the entrance of cytosolic ubiquitin ligases and the participation of NBR1 in photodamaged chloroplast clearance by a process of microautophagy. All these findings add valuable information to our knowledge of chloroplast homeostasis in response to light stress.

      To further support these conclusions, authors perform a chloroplast proteomic analysis of the WT, nbr1, atg7, and nbr1-atg7 mutants. However, in contrast with the above results, the description of the proteomic data is rather confusing. The paragraph on Page 17 (lines 393-406) is hard to follow. The term "over-representation of less abundant chloroplast protein" is also quite confusing, like the data in Fig. 6 and supplementary to this figure (what does show the PCA analysis in Fig. 6-suppl. 1?). I wonder whether it would be possible to show all these data as supplementary and try to present the data supporting the major conclusion of these analyses (if I understood correctly, that nbr1, atg7, and the double mutant have lower contents of chloroplast proteins), in a more simple and clear format.

      Following the reviewer’s comments, we have re-written the result section describing the proteomic data to make it more concise and clearer. We have also made modified Figure 6 to make it more concise and generated new graphs for Figure 6 supplemental figures 1 and 2.

      Reviewer #2 (Public Review):

      The authors conducted a wide-ranging series of experiments which lead to the conclusion that NBR1 is involved in the clearance of photodamaged chloroplasts. It is a novel finding because the role of NBR1 in this process was never documented. Notably, the NBR1-mediated clearance is only one of the several possible mechanisms responsible for chloroplast turnover. It is not surprising, considering that the nbr1 mutants are viable. The work is arranged very well. The rationale of the subsequent experiments is logically justified and the outcomes and followed by clear conclusions. In consequence, the authors managed not only to observe the association of NBR1 with the chloroplasts but they threw some light on the corresponding mechanisms. The manuscript contains numerous high-quality images from a confocal microscope and from a transmission electron microscope. All images are accompanied by statistical analysis of the respective microscopic observations, which greatly improves the credibility of the conclusions. Shortly, the authors demonstrated that NBR1 decorates not only the exterior but also the interior of damaged chloroplasts in an ATG7-independent way. Next, they establish that NBR1 and ATG8 are recruited to different populations of damaged chloroplasts, and they document differences in chloroplasts turnover, differences in chlorophyll abundance and chlorophyll photochemical properties, as well as differences in the total proteome of the nbr1 mutant in comparison to the wild type and atg7 mutant in two light regimes (low light and high light). Finally, they exclude the requirement for the known E3 ligases PUB4 and SP1 for NBR1mediated degradation and show that the NBR1 internalization relies rather on the chloroplastic membrane rupture than on the TIC-TOC-dependent processes. In summary, the authors postulate that NBR1-mediated chloroplast clearance is a novel, not yet described mechanism and summarize it in a clear diagram.

      The work is interesting, the figures are convincing and the conclusions are justified by the results. It provides novel data on the function of selective autophagy receptors NBR1 in plant cells, however, it also leaves the reader with some unanswered questions. The most important is the relative contribution of each of the chloroplast's degradation routes to the turnover of these organelles in different stresses, light regimes, plant growth stages, etc. This is a difficult problem because the mutations in relevant genes have pleiotropic effects and it is difficult to separate the functions of the individual turnover routes. For example, the defects in core autophagy genes (like the atg7 mutant used in this study) result in an increased level of NBR1. These issues are not sufficiently addressed in the discussion.

      The reviewer is correct and indeed, we also detected higher levels of NBR1 in the atg7 mutant (Fig 2G). This could be, for example, the underlying reason why there are more chloroplasts decorated with NBR1 in that atg7 mutants than in complemented nbr1 plants, 24h after high light treatment (Fig 1F). However, the higher frequency of photodamaged chloroplasts observed in atg7 (Fig 2D), supports a different scenario: the higher number of photodamaged chloroplasts that are not successfully repaired or degraded by canonical autophagy in atg7, become substrates of NBR1. The increased levels of NBR1 in the agt7 mutant and how this could influence the effects seen in the mutants studied in this manuscript is now discussed in lines 670-673.

      Reviewer #3 (Public Review):

      The authors use an impressive array of techniques to determine the role of the NBR1 autophagy receptor protein specifically in the clearing of photodamaged chloroplasts. The authors describe the mechanism(s) by which this receptor operates in this context and demonstrate that this NBR1-mediated process occurs independently of SP1 and PUB4 (whose own roles in other aspects of chloroplast autophagy have previously been shown). The authors further dissect the functional domains of NBR1 to identify which are important in this process.

      The major strength of this work is the myriad techniques used to approach the problem. The data are of high quality, and on the whole, well replicated and statistically analysed. In the main, these data substantiate the findings of the authors, although some findings are quite correlative/descriptive. However, the authors show good circumspection in their conclusions and discussion. One potential weakness is that the genetic data (use of mutants) rely on single mutant alleles, therefore whilst genetic linkage to the mutations is assumed, it cannot strictly be guaranteed. The authors performed effective genetic complementation to analyse the domain structure of NBR1 shown in Figure 7. It would have been good if complementation of nbr1 and atg1 mutants and/or alternative mutant alleles had been used for experiments described in Figures 1 to 6. Without this, I think even more circumspection regarding the data obtained from these single-allele mutants would be advised.

      We agree with the reviewer that more mutant alleles would have provided stronger support to our conclusions, but we would also like to highlight that the atg7-2 (Chung et al 2010), nbr1-2, and atg7-2 nbr1-2 mutants (Jung et al 2020) have been well characterized previously and the nbr1-2 mutant, shown to be rescued by the expression of fluorescently tagged NBR1 (Jung et al 2020). We are confident about the results on the localization of NBR1 in chloroplasts, not only because the fluorescently tagged NBR1 proteins are functional but also because we were able to corroborate the localization of NBR1 by using antibodies against the native proteins (Fig 2). That said, the reviewer does raise an important point and therefore, we have acknowledged more explicitly the limitation of our conclusions based on the analysis of single mutant alleles in lines 630-631 of the discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The model put forward by the authors in this manuscript is a simple and exciting one, explaining the function of AGS3 as a negative regulator of LGN, acting as a 'dominant-negative' version of LGN. Overall, the results support the model very well, and the results shown in Fig 6, which clearly reveal the functional relevance of AGS3, add strength to the paper.

      We thank the reviewer for their enthusiasm regarding our finding that AGS3 acts as an endogenous dominant-negative to inhibit LGN. We appreciate their assertion that the results support the model and that the functional relevance to epidermal stratification is a strength.

      In Figures 3A and B, the authors claim that AGS3 overexpression leads to depolarization of LGN in epidermal stem cells. However, in the example provided in Figure 3A, the LGN signal appears to be stronger than the control, with more LGN still on the apical side (many would categorize this as 'apically polarized'). In the scoring shown in Figure 3B, I am not sure if 'eyeballing' is the right way to decide whether it is polarized/depolarized/absent. The authors should come up with a bit more quantitative method to quantify the localization/amount of LGN and explain the method well in the manuscript. A similar concern regarding the determination of the LGN localization pattern applies to the rest of figure 3 as well.

      We agree with this important critique about the methodology used to assess LGN expression patterns. While we have historically included categorical analyses like those used in Fig. 3A,B in past publications (Williams et al, NCB 2014; Lough et al eLife, 2019), we have also now performed additional, unbiased, quantitative measures of LGN fluorescent intensity, as described in greater detail above. We added these new data in Fig. 4C-J, while the data previously in Fig. 3A,B have now been redistributed between Fig. 3E,F (overexpression) and Fig. 4A,B (knockdown).

      Reviewer #2 (Public Review):

      To date, only a handful of studies have addressed the importance of AGS3, a paralog of the relatively well-characterized spindle orientation factor LGN. The authors now show that AGS3 acts as a negative regulator of LGN and propose that this activity could work through competition for binding partner(s). Remarkably, regulation is temporally restricted in such a way that the conserved role played by LGN in metaphase spindle orientation is unaffected. Instead, AGS3 regulates a post-metaphase function for LGN, namely Telophase Correction. The article is well-written, the experiments are performed at a high level, and the claims are generally supported by the data. Two main points of confusion are raised in the current version. 1) The authors show that AGS3 regulates cortical localization of LGN, but would need to clarify how LGN is being affected. 2) The authors propose in the discussion that AGS3 might exert its regulatory effect through competition for NuMA, an important binding partner for LGN, but would need to clarify how and why NuMA would be involved in Telophase Correction.

      We thank the reviewer for appreciating the novelty of our findings regarding the understudied LGN/pins paralog AGS3. In regards to the first point, as described earlier, we have added additional quantitative analyses of how AGS3 affects cortical LGN fluorescent intensity in Fig. 4C-J. We now show that AGS3 loss leads to broader and higher expression levels throughout mitosis, and therefore we have amended our model to soften the claim that AGS3 primarily operates during telophase correction. This renders the second point somewhat moot, but we nonetheless have expanded our Discussion to note that NuMA can be cortically recruited to the anaphase cortex independent of LGN (lines 531-542). We also contextualize our findings with the Reviewer’s own recent study which proposes a “threshold model” of cortical Insc as a determinant of spindle orientation (Neville et al, 2023), and speculate that a similar model could apply in our system, perhaps with AGS3 binding and sequesting Insc rather than NuMA (lines 543-556).

      Reviewer #3 (Public Review):

      This paper examines the mechanisms that control division orientation in the basal layers of the epidermis. Previous work established LGN as a key promoter of divisions where one of the siblings populates the differentiated layers (perpendicular). This work addresses two important, related issues - the mechanisms that determine whether a particular division is planar vs perpendicular, and the function of AGS3, and LGN paralog that has been enigmatic. A central finding is that AGS3 is required for the normal distribution of planar and perpendicular divisions (roughly equal) such that in its absence the distribution is skewed towards the perpendicular. Interestingly, however, the authors find that AGS3 has no detectable effect on orientation if the orientation is measured at anaphase. This timing aspect builds upon previous work from this group demonstrating a phenomenon they term "telophase correction" in which the orientation changes at the latest phases of division (and possibly post division?). Thus AGS3 seems to exert its effect using these later mechanisms and this is supported by further analysis by the authors. Importantly, the authors show that AGS3 acts through LGN, based on localization data and an epistasis analysis. The function of AGS3 has been highly enigmatic so resolving this issue while providing a useful step towards understanding how the division orientation decision is made, makes for exciting progress towards an important problem. I found the overall narrative and presentation to be quite good and especially appreciated the thoughtful discussion section that did an excellent job of putting the results in context and speculating how unknown aspects of the mechanism might work based on current clues. With that said, I think there are some important issues that should be resolved.

      We thank the Reviewer for this excellent summary of our findings and appreciation of the significance of the issues that our study addresses.

      Regarding the orientation measurements, the authors should specify how the midbody marker was used to mark sibling cells, especially given the midbody can move following division. For example, how can the authors be confident that the siblings in the middle panel of 1A are correct and not an adjacent cell? Regarding quantification, it would be useful for the authors to comment on how the following would influence their measurements: 1) movements along the z-axis, and 2) movement of the nucleus within the cell

      We have used this methodology for over a decade, and while it is not flawless, we have included several safeguards to ensure that sibling cells are correctly identified. We have added additional details to the Methods section (lines 867-869, 873-879).

      A similar question is how much telophase correction really happens in telophase. How confident are the authors that the process actually occurs during division and not subsequent to it? What is drawn in their previous paper and in Figure 7A implies that post-division movements may be important. It would be useful for the authors to comment on whether they can make the distinction and whether or not it might be important.

      Our intent in coining the term “telophase correction” was to imply that this process initiates, rather than completes, during telophase. We apologize for this confusion and have clarified this in the text (lines 80-82). Since most mammalian cells complete M phase in ~1h, with the longest time spent in prophase, in the absence of direct evidence to the contrary, it may be prudent to assume that telophase, like metaphase and anaphase, is relatively short, on the order of minutes. Since we cannot directly observe reformation of the nuclear membrane in our movies, we cannot be sure when telophase ends. Likewise, we do not currently have a suitable marker of the spindle midbody for live-imaging, so cannot be sure when cytokinesis completes. That said, we feel confident that most of the reorientation is occurring prior to cytokinesis, because we have previously reported that the greatest changes in daughter cell positioning occur within the first 10-15 minutes of anaphase onset, when a gap in membrane-GFP/TdTomato is still visible (Lough et al, eLife, 2019). However, while we feel that there are many interesting questions that our work raises about the timing or reorientation relative to specific mitotic stages—e.g. is the midbody asymmetrically positioned, inherited, or ejected?—these questions are beyond the scope of the present study.

      Does the division angle in the AGS3 OE experiment (Figure 1D) correlate with AGS3 levels within the cell?

      This is an interesting question, and indeed, we our hypothesis would predict that it would. However, it is not straightforward to quantify AGS3 or mRFP1 levels, and as we explain in a new section of the Results (lines 212-237), we have some concerns that N-terminally tagged AGS3 may not be fully functional. We have added new data with C-terminally tagged AGS3-mKate2, which we feel provides even stronger evidence that mKate2+ cells show a planar shift compared to mKate2- cells (Fig. 3C,D). In the future, we could test this hypothesis at the population level by comparing division orientation profiles for AGS3-mKate2+ cells carrying either a non-targeting scramble or Gpsm11147 shRNA. We would predict that knocking down endogenous AGS3 while overexpressing AGS3-mKate2 should give an intermediate phenotype.

      I found the localization data to be the weakest part of the paper and feel that some reconsideration and reanalysis are warranted. First, the quantifications in Figures 2C, 3B, and 3F are unnecessarily vague scoring-based metrics. In 2C, "Localization pattern" should be replaced with membrane/cytoplasm ratio or an equivalent quantification. In 3B "LGN localization" should be replaced with apical/cytoplasmic and apical/basal ratios or equivalents. In 3F, "Polarized LGN frequency" should be replaced with apical/basal ratio or equivalent. It seems to me that non-AI processed data would be most appropriate for these quantifications unless such processing can be justified.

      This issue was raised by the previous two Reviewers and has been addressed by new data added to Figure 4.

      Second, it is important to note that the cytoplasmic localization of AGS3 does not allow one to conclude that AGS3 is not on the membrane. Unfortunately, high cytoplasmic signal can preclude the determination of membrane-bound signal.

      We agree with the Reviewer and have softened our language throughout the text.

      Finally, I had difficulty reconciling the images of LGN shown in Figure 3 with the conclusions made by the authors.

      We have added additional, representative images of LGN expression in control and AGS3 KD cells in Figure 4C-E.

      The challenge of the localization data is troubling because an important conclusion of the paper is that AGS3 acts via LGN. The localization data provided one leg of support for this conclusion and the other is provided by an epistasis analysis. Unfortunately, this data seems to be right on the edge because it is based on the difference between the solid and dashed blue lines in Figure 5B not being significant. However, we can see how close this is by comparing the solid and dashed red lines in the adjacent 5C, which are significantly different. Between the localization data, which doesn't seem clear cut, and the epistasis experiment, which is on the razor's edge, I'm concerned that the conclusion that AGS3 acts through LGN may be going beyond what the data allows.

      We appreciate the Reviewer’s comments about the importance of these two lines of experimentation: 1) AGS3’s effect on LGN localization, and 2) epistasis experiments between AGS3/Gpsm1 and LGN/Gpsm2. We feel we have significantly strengthened this first pillar with the additional data presented in Fig. 4C-J. Regarding the second point, we would like to emphasize that we present three lines of evidence for the existence of an epistatic relationship between LGN and AGS3: 1) the static division orientation data comparing LGN single KOs to both LGN KO + AGS3 KD and AGS3+LGN dKOs (Fig. 6B); 2) live imaging division orientation/telophase correction comparing LGN KOs to AGS3+LGN dKOs (Fig. 6C-E); 3) lineage tracing data comparing LGN KOs to AGS3+LGN dKOs (Fig. 7H,I). Further, we think the reviewer may have misconstrued the data presented in Fig. 5C (now Fig. 6C). The dashed lines indicate orientation at anaphase and solid lines 1h after anaphase, so the shift between dashed and solid lines indicates telophase correction, which occurs to similar (and statiscially significant) degrees in both LGN single mutants and AGS3+LGN dKOs. Comparisons between the single and double mutant would be between red and magenta solid lines or red and magenta dashed lines, and neither of these are statistically significant. We realize that our use of dashed lines in Fig. 5B (now Fig. 6B), which we normally only use to refer to anaphase entry in live imaging data, may have caused this confusion. Therefore, we have changed all plots to solid lines¬ in Fig. 6B, and use light and dark magenta, respectively, to differentiate between LGN KO + AGS3 KD and AGS3+LGN dKOs.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors took a comprehensive set of analyses to examine the relationship between pupil diameter / derivative and BOLD-signal during rest in the ascending arousal system nuclei in 72 young participants. Focus is on the locus coeruleus, ventral tegmental area, substantia nigra, dorsal and median raphe nuclei and the basal forebrain. Analyses were performed using various processing pipelines: canonical versus custom hemodynamic response functions, with/without smoothing, time to peak analyses and cross spectral power density analyses to define the time lag between both measurements. The authors could not replicate previous correlations between locus coeruleus BOLD and pupil measurements using standard analytic approaches, and also found no relationship between locus coeruleus BOLD and pupil measurements when using custom hemodynamic response functions. When using time to peak and cross-correlation analyses, the authors found that coupling between pupil size and AAS BOLD patterns increases with decreasing time to peak, when the two signals were close in time. The authors conclude that these findings suggest that pupil size could be used as a noninvasive readout of AAS activity under passive conditions.

      These authors did a thorough assessment, and described the methods and results well and in a balanced manner.

      Outstanding questions:

      • the reliability of these observations? would we see the same findings in a different cohort or using a different sequence/field strength?

      • What is the independent association of each assessed nucleus with pupil dilation? That could be informative to understand their shared or unique role.

      We are grateful to the reviewer for their expert advice in helping us strengthen our manuscript. We agree with the reviewer that these two outstanding questions are important and we have done our best to answer these questions below. We believe that our manuscript has greatly improved, thanks to the reviewer’s suggestions for running these additional analyses.

    1. Author Response

      Reviewer #2 (Public Review):

      The availability of large collections of Mycobacterium tuberculosis (Mtb) isolates has enabled many important studies looking to identify mycobacterial genetic polymorphisms associated with anti-tuberculosis (TB) drug resistance, including both classical "resistance-conferring" mutations and novel "resistance-enabling" mutations. Importantly, these studies have expanded our understanding of mycobacterial genetic adaptations undermining chemotherapy, in many cases allowing for improved diagnostic tests and predictions of treatment failure. In this submission, Gao and colleagues adopt a different approach to the problem: although also applying a GWAS-type analysis, they instead attempt to elucidate polymorphisms implicated in poor outcomes of TB patients undergoing treatment for the drug-susceptible disease. Starting with a large dataset comprising 3496 samples with corresponding clinical (host) metadata, the authors generate Mtb whole-genome sequence data for 91 samples obtained from patients with "poor" outcomes and 3105 patients with "good" outcomes. These are used to identify 14 fixed and >230 unfixed mutations that might be associated with "poor" treatment outcomes, a conclusion which they argue is plausible given transcriptional evidence implicating many of the identified genes in the mycobacterial response in vitro to first-line drug exposure and/or hypoxia, both of which are considered relevant to clinical disease. Notably, they also identify a tendency for a greater proportion of "ROS mutational signatures" in unfixed mutations from "poor" outcome samples. Finally, incorporating these observations in a prediction model, the authors observe that the mycobacterial factors aren't adequate on their own but, when combined with key host factors - including patient age, sex, and duration of diagnostic delay (which have stronger predictive value) - they enhance predictive capacity. In summary, this paper reports a novel approach yielding observations that offer tantalizing insight into the mycobacterial factors which might influence TB treatment outcomes independent of drug resistance, however, the following must be considered:

      (i) The manuscript provides little to no detail about how the samples were obtained, other than the fact that they comprise "pre-treatment" samples: are they all sputum samples? Were they induced? Similarly, no information is provided about sample propagation: were the samples cultured to achieve sufficient biomass for whole-genome sequencing? If so, in what growth media, for how long, and how many passages? Were all samples treated identically? And were they plated to single colonies - or are the "isolates" referred to throughout the manuscript actually heterogenous populations of potentially different Mtb clones obtained - and propagated - as a mixed sample? This information is critical given the potential that the identified polymorphisms - both fixed and (perhaps even more so) unfixed - might have arisen as a consequence of in vitro (laboratory) manipulation under standard aerobic conditions.

      Thanks for your encouraging comments. The requested information about sample propagation has been added to the methods section in the new version. For details, please see our response, above, to the essential revisions (Q1).

      (ii) A key question that arises from this study (and others like it) is whether causation has been adequately established. Ideally, the Mtb genotypes contained within samples obtained pre-treatment should be compared with samples obtained from the same patients following treatment - that is, when the "poor" outcome was manifest. The expectation is that the polymorphisms identified prior to initiation of therapy - especially the 14 fixed mutations - should be evident (even dominant) at the later stage when therapy failed (or at the subsequent presentation in cases of relapse). Recognizing that this is not easily accomplished, though, it seems fair to suggest that the perceived relevance of the identified mutations would be strengthened if the authors were able to provide any other evidence - perhaps from studies of drug-resistant Mtb isolates - supporting their inferred role in undermining frontline treatment.

      Thank you for these insightful questions. We sequenced the isolates obtained at the time of relapse for all 47 relapse cases and found that the 14 GWAS-identified fixed mutations were only detected in relapse isolates from the 13 patients whose first samples also contained the GWAS-identified mutations. None of the 14 mutations we identified were found in isolates from the other relapsed patients. We also searched for the presence or absence of theses 14 mutations in published studies seeking noncanonical mutations associated with drug-resistant Mtb isolates [5-7]. None of the 14 mutations we identified were reported in any of these studies, but two of the genes (ctpB & metA) in which our mutations were found had been previously identified as potentially associated with first-line drug resistance.

      (iii) Related to the above, the authors make the valid point that their intention here was different from other studies which have deliberately utilized drug-resistant Mtb isolates to identify resistance-conferring and resistance-enabling mutations (such as in the study they cite by Hicks et al). It would be interesting to know, however, if any of the mutations identified in those other studies were also picked up in this work - and, if not, why that might be the case.

      As mentioned in our response to the previous question, none of our mutations were mentioned in prior studies. Our inference is that the 14 fixed mutations we identified had only limited effects on outcomes, which would explain why: they were not identified in previous studies; isolates from only 24.2% (22/91) of patients carried any of these 14 mutations; and none of the mutations were shared amongst all 22 patients.

      (iv) Finally, the analyses presented in this study are heavily dependent on the use of appropriate statistical methods to identify potentially rare genetic polymorphisms. However, as noted for sample processing (see my earlier comment above), there is very little detail provided about the methodology applied. This omission detracts from the interpretation, especially given that the predominance of lineage 2 (which contributes >75% of the isolates, with sublineage 2.3 constituting >50%) risks a lineage-specific association, rather than a more generalizable pathogenicity phenotype. Similarly, the heavy skew in the numbers of "good" (3105 samples) versus "poor" (91 samples) collections (approximately 34x difference in sample size) raises the possibility that mutations identified in the "poor" category might be artificially over-represented. More clarity in detailing the statistical methods is required to allay any concerns about the identification of candidate polymorphisms.

      Thank you for pointing this out. We have added details of our statistical methods to the methods section, and in the results section we have indicated the specific statistical methods used and the meaning of the statistical metrics.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper by Zhuang and colleagues seeks to answer an important clinical question by trying to come up with novel predictive biomarkers to predict high-risk T1 colorectal cancers that are at risk for nodal involvement. The current clinical features may both miss patients who underwent local therapy and who should have gone on to have surgery and patients for whom surgery was done based on risk features but perhaps unnecessarily. Using a training and validation set, they developed a protein-based classifier with an AUC of 0.825 based on mass spec analyses and proteomic analyses of patients with and without LN importantly linking biological rationale to the proteomic discoveries.

      In the training cohort, they took 105 candidate proteins reduced to 55, and did a validation in the training cohort first and then in two validation cohorts (one of which was prospective). They also looked at a 9-protein classifier which also performed well and furthermore looked at IHC for clinical ease.

      We appreciate the reviewers for the positive review and valuable comments. We have revised the manuscript according to the comments.

      Reviewer #2 (Public Review):

      The authors utilized a label-free LC-MS/MS analysis in formalin-fixed paraffin-embedded (FFPE) tumors from 143 LNM-negative and 78 LNM-positive patients with T1 CRC to identify protein biomarkers to determine LNM in T1 CRC.

      The authors used a fair number of clinical samples for the proteomics investigation. The experimental design is reasonable, and the statistical methods used in this manuscript are solid.

      The authors largely achieved their aims and the results supported their conclusion. The method used in this proteomic study can also be used for the proteomics analysis of other cancer types to identify diagnostic and prognostic biomarkers. In addition, the 9-marker panel has a potential clinical diagnosis practice in determining LNM in T1 CRC.

      Nevertheless, the authors need to justify their standards in selecting the biomarkers. For example, a p-value cut-off of 0.1 is not a usual criterion in similar proteomic studies. In addition, an identification frequency of 30% in patients seems not preferable for biomarker identification. The authors also need to justify the definition of fold change in the three subtypes with Kruskal-Walli's test. The authors need to describe more details on how they identified the 13 proteins from a 55-protein database. In addition, what is the connection between the final 9 proteins and the 19 proteins? What is the criterion to select 5 proteins for IHC validation from the 9 proteins?

      We appreciate the reviewers for the positive review and valuable comments. We have revised the manuscript according to the comments.

      The criteria and details of our standards in selecting are as follows.

      1) About p-value cut-off of 0.1:

      The purpose of this step is to screen appropriate variables for subsequent machine learning, rather than comparing differences between groups. The p-value cut-off of 0.1 is also a reliable strategy for variable selection in proteomics research. For example, it has been used in studies to predict the response to tumor necrosis factor-α inhibitors in rheumatoid arthritis (PMID: 28650254); the research about circadian clock in mouse liver (PMID: 29674717); the proteomic biomarker discovery in atherosclerosis (PMID: 15496433); and the proteomics and transcriptomics analysis in bacillus subtilis (PMID: 19948795).

      Based on reviewer’s suggestion, we used a cutoff of p-value 0.05 to screen for variables. In a training set of 70 lymph node-negative and 62 lymph node-positive cases, we identified 355 protein markers. We further incorporated these proteins into a lasso regression analysis and ultimately developed a lymph node metastasis prediction model consisting of 52 protein markers. We validated the model in VC1 and VC2, with AUC values of 1.000, 0.824, and 0.918 for the training set, VC1, and VC2, respectively, the predictive performance was slightly inferior to that of the model developed in this study (Figure 3- figure supplement 1C).

      2) About identification frequency of 30%:

      The analysis focusing on the proteins identified in > 30% of the samples has been applied in the previous published studies. For instance, the study of using proteomic biomarkers to build diagnostic model in lung cancer (PMID: 29576497), proteins identified in > 30% cohort samples were used for downstream analysis. In the study on the impact of Reptin on protein-protein interaction (PMID: 30862565) have demonstrated that proteins were required to have at least in > 30% of samples in order to be included in the proteome dataset.

      We compared our cohort with Jun Qin et al. and Bing Zhang et al., study published in Nature (PMID: 25043054), according to the number of the proteins detected in more than 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% of samples, respectively (Figure 2- figure supplement 1). The proportion proteins detected at different cutoff of the samples in the three cohort were, 10% (0.60, 0.94, 0.48), 20% (0.52, 0.83, 0.38), 30% (0.46,0.75, 0.31), 40% (0.41, 0.69, 0.26), 50% (0.37, 0.63, 0.23), 60% (0.33, 0.57, 0.18), 70% (0.29, 0.52, 0.15), 80% (0.25, 0.45, 0.11), 90% (0.19, 0.37, 0.11), 100% (0.07, 0.23, 0.10), respectively. The results showed that our cohort was reliable.

      To investigate the impacts of protein identification frequency cutoff in our study, we performed comparative pathway enrichment analysis of the differential expressed proteins (LNM+ vs. LNM-: p-value < 0.05, Wilcoxon rank-sum tests) under different observation percentiles, which were detected in more than 10%, more than 30% and more than 50% of samples, respectively. The results revealed that proteins from three thresholds (10%, 30% and 50%) represented similar pathway enrichment, such as mTOR signaling pathway and amino acid metabolism pathways were dominant in LNM-negative patients, coagulation cascades and Lipid metabolism pathways were overrepresented in the LNM-positive patients (Figure 2- figure supplement 1)

      Based on reviewer’s suggestion, we used a cutoff of 50% as identification frequency for variables. The lasso regression was carried out in training cohort (70 LNM-negative and 62 LNM-positive), with AUC of 0.999. The model was validated in VC1 and VC2, with AUC of 0.812 and 0.886, respectively. (Figure 2- figure supplement 1).

      3) About identification of the 13 proteins and the criterion to select 5 proteins for IHC validation from 55-protein database:

      The process of reducing the number of proteins from 55 to 13 and finally establishing a 5-molecule classifier based on the IHC score is as shown in Figure 1- figure supplement 2 in the revision. We first selected 19 proteins with [log2FC] > 1 or < -1 and p<0.05 (Wilcoxon rank-sum test) between the LNM-negative and LNM-positive in 221 patients from 55 proteins. Then we started looking for antibodies to these 19 proteins. We finally obtained 13 antibodies for further immunohistochemistry. We did immunohistochemical staining to the FFPE samples with 13 antibodies, and got the IHC score of each protein to build the single molecular prediction model by SPSS on ROC curve. For the principles of MS based proteomic and IHC stain are different, not all identified proteins can be converted into IHC. Finally, 5 IHC makers with p-value of IHC score less than 0.05 (Student’s t-test) were selected to build the IHC classifier using Logistic Regression. We also updated the description in the “Result” section in the revised manuscript (line 718-722, page 34-35 in the revision).

      4) About the connection between the final 9 proteins and the 19 proteins:

      To facilitate the clinical translation of the model, Multiple Logistic Regression was used to obtain 9 core proteins from 19 proteins (Figure 1- figure supplement 2 in the revision). We first performed logistic regression in 19 proteins, and eliminated 10 proteins with insignificant Estimate Std. Error z value (Pr (>|z|) > 0.05, and obtained 9 proteins with Pr(>|z|) < 0.05. After that, we carried out Binary Logistic Regression calculation again with 9 proteins to build the simplified classifier. We also updated the description in the “Materials and methods” section in the revised manuscript (line 1092, page 51 in the revision).

      5) About the definition of fold change in the three subtypes with Kruskal-Walli's test:

      The fold change in the three subtypes is the ratio of the mean of the expressions in each group (well to moderately differentiated adenocarcinoma, poorly differentiated adenocarcinoma and mucinous adenocarcinoma) to the mean of the other two group. Kruskal-Walli's test was performed between three subtypes.

      We also updated the description in the “Result” section in the revised manuscript (line 506-517, page 25 in the revision), and “Figure 1- figure supplement 2H in the revision”.

      Reviewer #3 (Public Review):

      This work provides a proteomic analysis of 132 early-stage (pT1) colorectal cancers (CRC) to attempt to identify proteins (or a signature pattern thereof) that might be used to predict the patient risk of lymph node metastases (LNM) and potentially stratify patients for further treatment or surveillance. The generated dataset is extensive and the methods appear solid. The work identifies a 55-protein signature that is strongly predictive of LNM in the training cohort and two validation cohorts and then generates two simplified classifiers: a 9-protein proteomic and a 5-protein immunohistochemical classifier. These also perform very well in predicting LNM. Loss of the small GTPase RHOT2 is identified as a poor prognostic factor and validated in a migration assay. The findings could allow better prognostication in CRC and, if confirmed and better validated and contextualized, might impact patient care.

      Strengths:

      A large training cohort of resected early-stage (pT1M0) CRCs was analyzed by rigorous methods including careful quantitative analysis. The data generated are unbiased and potentially useful. A number of proteins are found to be different between CRCs with and without lymph node metastases, which are used to train a machine learning model that performs flawlessly in predicting LNM in the training cohort and very well in predicting LNM in two validation cohorts. The authors then develop two simplified classifiers that might be more readily extended into clinical care: a 9-protein proteomic assay and a 5-protein immunohistochemical assay; both of these also perform well in predicting LNM. Because LNM is a key prognostic factor, and colectomy (which includes removal of lymph nodes needed to assess LNM) carries significant risk and morbidity, particularly in rectal cancer, classifiers like these are potentially interesting. Finally, the authors identify the loss of expression of RHOT2 as a novel prognostic factor.

      Weaknesses:

      Major points:

      The data are limited by a number of assumptions about metastasis, minimal contextualization of the results, and claims that are too strong given the data. Critically, the authors use the presence or absence of LNM as the study's only outcome; while LNM is a key predictor in CRC, it is uncommon in T1 CRC (generally 3-10%, 12% in this study), stochastic, inefficient, and incompletely identified by histologic evaluation. Larger resection (here, colectomy) removes both identified and occult LNM, which is probably best studied in randomized trials of lymphadenectomy in Japanese gastric cancer cohorts and should be better discussed. Critically, patient survival or disease-free survival would be more relevant outcomes. Further, absent longer-term data, many patients without identified LNM might nonetheless be high-risk and skew the cohorts. It is also not clear whether these findings would be generalizable to other early-stage colon cancers.

      The data are also not correlated with the genetics of the cases, which were not discussed.

      The results would benefit from the inclusion of standard-of-care MSI status. The classifiers would also be much more impactful if they were generalizable beyond T1 CRCs; this could be readily tested in public datasets.

      The authors explain the data as mechanistic, but, aside from one experiment modulating RHOT2 levels, they are fundamentally correlative and should be described as such.

      Although they focused on areas containing >80% tumor as judged by the reading pathologist, it is unclear whether the identified proteomic changes originate from the tumor or the microenvironment.

      The authors fail to properly contextualize the results or overstate the novelty of their study. A number of examples - the study is claimed as "the first proteomic study of T1 CRC" and "the first comprehensive proteomics study to focus on LNM in patients with submucosal T1 CRCs"; neither of these appears to be true, for example, Steffen et al. (Journal of Proteome Research, 2021, reference 18) may satisfy both of these, although the numbers are smaller. Many other results are reported without context, for example, proteomic characterization of mucinous carcinomas has been performed previously, a modest correlation in mucinous carcinoma is ascribed a large mechanistic role, and PDPN is discussed but is not contextualized as a protein that has been well-studied in the context of metastasis.

      The data on RHOT2 are promising but very preliminary. RHOT2 is described as ubiquitous in colorectal cancer cell lines; a brief search in Human Protein Atlas shows RHOT2 RNA and proteins are ubiquitously expressed throughout the body. While its loss appears potentially prognostic, it is unclear whether this is simply a surrogate for other features, such as loss of differentiation state, and whether this is unique to CRC; multivariate analysis would be important.

      We appreciate the reviewer for the constructive and insightful comments, which help to improve the quality of this manuscript. Here, we summarized the reviewer’s comments as following: (1) Lack of longer-term data and micrometastasis; (2) test the classifier in public datasets; (3) inclusion of standard genetics and gene alterations; (4) about the tumor purity of all tumor samples and whether the results were influenced by the tumor microenvironment; (5) contextualize the results; (6) multivariate analysis of RHOT2.

      1) Lack of longer-term data and micrometastasis:

      Thank the reviewer for the comments. We fully acknowledge the limitations of our study, including the uncertainty associated with the detection of lymph node micrometastasis and the lack of long-term survival data, which can impact the strength of our conclusions. We agree that LNM is a key predictor in CRC and that it is uncommon in T1 CRC, with a reported incidence of 3-10%. We acknowledge that larger resections, such as colectomy, are generally recommended for patients with T1 CRC with LNM due to the potential risk of metastasis. However, our study aimed to establish a predictive model for LNM in T1 CRC, which could potentially help guide clinical decision-making on whether additional surgery is needed after endoscopic resection, according to the current NCCN guidelines.

      We have taken following methods to address these limitations:

      • We matched propensity-score of patients to reduce confounding biases in our training cohort, and patients were prospectively enrolled in our validation cohort, which was designed as a single-blinded prospective study to enhance the rigor and reliability of our findings.

      • For the influence of micrometastases in our study. According to reviewer's suggestion, we discussed the reports related to lymph nodes micrometastases in Japanese gastric cancer cohorts (PMID: 17377930, 9070482), and at the same time, we consulted the articles about micrometastases in T1 CRC (PMID: 17661146, 16412600). There were about 5% pT1N0 gastric cancer patients have ITCs in LN, and 10% in pT1Nx CRC. The effect of MMs on prognosis in pT1N0 CRC is still unclear. The present of ITCs/MMs in LN may explain why there are nearly 13% (29 of 221) LNM-negative patients were classified into high-risk group by the prediction model in our study.

      We have also added a section to the “Discussion” in the revised manuscript to discuss the potential impact of these limitations on the interpretation of our findings (line 856-873, page 41) in the revision, as follow:

      “In this study, to ensure the accuracy of LN status of the enrolled patients, the dissected number of LN in all patients including both surgical resection and ESD was more than 12. However, the longer-term follow-up data, including DFS, PFS, etc., are not available, due to limitations in sample collection time and the prognosis of such patients needs to be tracked over long periods of time, and may impact the strength of our conclusions. To address this limitation, we used propensity-score matching to reduce confounding biases in our training cohort. Patients were prospectively enrolled in our validation cohort (VC2), which was designed as a single-blinded prospective study to enhance the rigor and reliability of our findings. Furthermore, the presence of isolated tumor cells (ITCs) or micrometastases (MMs) within regional LN are not considered, due to conventional histopathologic examination cannot detected them. According to previous studies, there were about 5% pT1N0 gastric cancer patients have ITCs in LN, and 10% in pT1Nx CRC. The effect of MMs on prognosis in pT1N0 CRC is still unclear. The present of ITCs/MMs in LN may explain why there are nearly 13% (29 of 221) LNM-negative patients were classified into high-risk group by the prediction model in our study. Our study would provide a valuable database and could help for clinical decision-making in the context of T1 CRC. We will continuously follow the prognosis of the patients, and the ITCs/MMs in LN also need to be further validated in the future studies.”

      In conclusion, we appreciate reviewer’s comments and acknowledge the limitations of our study. We believe that our study provides valuable insights into the development of a predictive model for LNM in T1 CRC, which could potentially aid in clinical decision-making according to the current NCCN guidelines.

      2) Test the classifier in public datasets:

      According to reviewer’s suggestions, we tested our classifier in two different public datasets, including the colon and rectal cancer study from CPTAC published in Nature (PMID: 25043054), and the metastatic colorectal cancer study published in Cancer Cell (PMID: 32888432). The detail was further discussed in “point-to-point responses R3 Q2.”.

      3) Standard genetics and gene alterations:

      According to reviewer’s suggestions, we assessed MSI status and CRC-associated gene mutations (RAS, BRAF and PIK3CA) in our cohort. The detail was further discussed in “point-to-point responses R3 Q1.”

      4) The influence of microenvironment:

      We apologized for not explaining it clearly. To the question of whether the differences between two groups (LNM+ and LNM-) are caused by tumor microenvironment or the tumor tissues, we firstly, used xCell (PMID: 29141660) to study the composition of the tumor microenvironment (Figure2-source data 4 in the revision). The results showed that there was no difference in the tumor microenvironment between the LNM-positive and negative groups (P > 0.05, Wilcoxon rank-sum test) (Figure RL1A). However, when we compared the xCell algorism-based cell deconvolution results between the LNM-positive and -negative groups, we found 8 microenvironment associated cell features differed in two groups (p<0.05) (Figure RL1B). LNM-positive patients were featured with Chondrocytes and Th1 cells. And the remaining 6 features are all high in LNM-negative patients, including, B cells, cDC, Myocytes, etc. Correspondingly, 7 immune cell markers were also observed to be significantly different between the two groups (Log2FC>1 or <-1, P > 0.05, Wilcoxon rank-sum test) (Figure RL1C).

      Secondly, we checked the expression profile of the signature proteins detected in our study by The Human Protein Atlas (HPA). Among 9404 identified proteins, 7852 (83.4%) have HPA’s CRC IHC staining data, and 6249 (79.6%) showed medium to high tumor-specific staining in CRC samples (Figure RL1D). Of the signature proteins up-regulated in LNM-positive patients (LNM+ vs. LNM-: log2FC > 1 and p<0.05, Wilcoxon rank-sum test), 76 of 84 (90.5%) have IHC staining data in HPA, and 63 (82.9%) showed medium to high tumor-specific staining in CRC samples (Figure RL1E). For specific proteins of LNM-negative patients (LNM+ vs. LNM-: log2FC <-1 and p<0.05, Wilcoxon rank-sum test), 72 of 82 (87.8%) have IHC staining data in HPA, and 60 (83.3%) showed medium to high tumor-specific staining in CRC samples (Figure RL1F).

      Finally, we reviewed again all H&E-stained slides of tumor tissues of patients involved in the study, and supplemented tumor purity values of tumor samples of all the patients in Figure1-source data 1. We compared the tumor purity between the LNM-positive (with average 87.75%) and negative patients (with average 88.27%). The result showed there was no difference between the two groups (P = 0.46, Student’s t-test), demonstrating the high purity and quality of the tumor tissues. (Figure1-supplementary figure 1J in the revision).

      These results indicate that, in our study the differences between LNM-positive and LNM-negative groups are mainly caused by tumor tissues. However, the tumor microenvironment may also play a critical but not direct role in T1 CRC development and progression.

      Figure RL1. A. Comparison of xCell scores of immune and microenvironment between the LNM-negative group (n= 143) and LNM-positive group (n= 78). B&C. Immune/stromal signatures identified from xCell, together with derived relative abundance of immune and stromal cell types. D, E, F. Identified signature proteins (D), LNM-positive group up-regulated proteins (E) and LNM-negative group up-regulated proteins (F) were mostly validated by HPA IHC Staining Data. G. Barplot for tumor purity between LNM-negative and -positive patients.

      5) Contextualize the results:

      According to the reviewer’s advice, we have made corresponding adjustments in the revised manuscript, for example:

      • “We have made a comprehensive proteomic study of T1 CRC and provides a reliable data source for future research. “(line 342, page 17 in the revision)

      -“Here, we present a comprehensive proteomic study to focus on LNM in patients with submucosal T1 CRCs.” (line 788, page 37 in the revision)

      With regard to the problem of results are reported without context, we have provided supplementary descriptions of the context of the results in the “Result” section of the revised manuscript, for example:

      • “Mucinous adenocarcinoma was considered to be a significant risk factor of LNM in T1 CRC (PMID: 31620912).” (line 498, page 24 in the revision)

      • “Mucinous adenocarcinoma of the colorectal is a lethal cancer with unknown molecular etiology and a high propensity to lymph node metastasis. Previous proteomic studies on mucinous adenocarcinoma have found the proteins associated with treatment response in rectal mucinous adenocarcinoma and mechanisms of metastases in mucinous salivary adenocarcinoma.” (PMID: 34990823, 28249646) (line 534-538, page 26 in the revision)

      • “Previous studies have shown that PDPN expression correlated with LNM in numerous cancers, especially in early oral squamous cell carcinomas.” (PMID: 21105028).” (line 570, page 27 in the revision)

      6) Multivariate analysis of RHOT2:

      RHOT2 and its paralog RHOT1 plays an important role in mitochondrial trafficking (PMID: 16630562). Although the function of RHOT2 in cancer is still unknown, the expression of RHOT1 affects metastasis in a variety of tumors, including pancreatic cancer (PMID: 26101710), gastric cancer (PMID: 35170374), small cell lung cancer (PMID: 33515563), etc. In addition, previous studies have found that Myc regulation of mitochondrial trafficking through RHOT1 and RHOT2 enables tumor cell motility and metastasis (PMID: 31061095).

      As shown in Figure 4, in our analysis of previous version, we found RHOT2 was significant down-regulated (Log2FC=-1.35; p=0.003, Wilcoxon rank-sum test) in LNM-positive patients compared with LNM-negative patients in our T1 CRC cohort and the low level of RHOT2 is related to low overall survival of patients with colon cancer in TCGA cohort. Knockdown of RHOT2 expression could markedly enhance the migration ability of colon cancer cells.

      In order to further explore the influence of RHOT2 on T1 CRC LNM, in addition to the previous results, we carried out the following analysis as shown in Figure4 in the revision.

      We, firstly, calculated the correlations between the expression of RHOT2 and other proteins in our cohort (Figure 4). 1,508 proteins were correlated significantly (P < 0.05, Spearman) with RHOT2, and 1,354 proteins showed a positive correlation (coefficient >0) with RHOT2, and 154 proteins were negatively correlated with RHOT2 (coefficient <0). However, when we performed GSEA in RHOT2-associated proteins to identify biological signatures impacted by RHOT2, most of the obtained pathways (p<0.01) showed NES less than 0, which means these pathways were mainly enriched in RHOT2-negative-correlated group, only “mitochondrion” (GOCC) had a positive correlation (Figure 4). As we known RHOT2 is an important protein involved in the regulation of mitochondrial dynamics and mitophagy (PMID: 16630562). This result indicates that the involvement of RHOT2 in regulation of mitochondrial function might contribute to the pathogenesis of metastasis in cancer, especially in early-stage CRC. Consistent with the previous results, RHOT2-negative-correlated group was significantly enriched for EMT (HALLMARK) and complement and coagulation cascades pathways. Proteins up-regulated in LNM-positive group (LNM+ vs. LNM-: Log2FC >0; p<0.05, Wilcoxon rank-sum test) were negatively correlated with RHOT2(p < 0.05, coefficient<0, Spearman), including CAP2, COL6A3, COL6A2, TNC, DPYSL3, PCOLCE and BGN in pathway EMT; and GUCY1B3, VWF and F13A1 in pathway complement and coagulation cascades (Figure 2E, L; Figure 4D in the revision). ECM, focal adhesion and Dilated cardiomyopathy (DCM) pathways were also enriched in negative-correlated group. Degradation of RHOT2 has already been reported to be associated with DCM (PMID: 31455181). Overall, combined with the previous results, RHOT2 may play an important role in T1 CRC LNM (Figure 4D in the revision.).

      As reviewer mentioned the data on RHOT2 are promising, but the understanding of it is preliminary. More analytical studies and experiments are needed in our future researches to understand the specific role and mechanism of RHOT2 in the process of tumor metastasis. In the revision, we discussed these limitations of our research.

    1. Author Response

      Reviewer #1 (Public Review):

      Lammer et al. examined the effects of social loneliness, and longitudinal change in social loneliness, on cognitive and brain aging. In a large sample longitudinal dataset, the authors found that both baseline loneliness and an increase in loneliness at follow-up were significantly associated with smaller hippocampal volume, reduced cortical thickness, and worse cognition in healthy older adults. In addition, those older adults with high loneliness at baseline showed even smaller hippocampal volume at follow-up. These results are interesting in identifying the importance of social support to cognitive and brain health in old age. With a longitudinal design, they were able to show that increased loneliness was related to reduced brain structural measures. Such results could help guide clinicians and policymakers in designing social support systems that would benefit the growing aging population.

      The strength of the current study lies in the large sample size and longitudinal follow-up design. The multilevel models used to separate within and between subject effects are well constructed. Combining neuroimaging data with behavioral changes provided further evidence that social loneliness may be related to accelerated brain aging. Stringent FDR correction, Bayes factor comparison, and the additional analyses for sensitivity showed the robustness and credibility of the results.

      Thank you for a thorough and overall positive evaluation of our manuscript and the constructive feedback. We considered all of your comments valuable, please see point-by-point responses below for more details.

      Weaknesses of the study were related to the interpretation and discussion of their findings.

      1a) Social loneliness is a relatively little-studied factor in cognitive ageing, and the authors should consider expanding the discussion, with some additional analyses, as to how their results could be used by clinicians and older adults to monitor social behaviors.

      We agree with the reviewer and are thankful for these suggestions. We have run additional analyses following the clinical cut-off of the questionnaire on social isolation and added those and their interpretations to the results and discussion section. Please see below response to questions 2a) and 3a) as well as to those in section b) to this reviewer how we implemented the reviewer’s advice in detail.

      2a) The authors examined the interaction between baseline and age change to see if higher baseline loneliness was associated with accelerated decline. The interaction was significant, but the authors did not further explore the interaction effect, which may have clinical significance. The authors should consider identifying a cut-off point in LSNS that suggests persons scoring less than this score on the LSNS may be at greater risk of accelerated brain decline than others. Such a cut-off point is important for clinicians, as well as for future researchers to compare their results.

      2a) Thanks to your recommendation, we decided to explore differences between handling LSNS as a categorical (using the standard threshold of 12) and continuous variable and recalculated all LMEs on HCV and cognitive functions with LSNS coded dichotomously. We found the results to be similarly good in detecting adverse effects of social isolation (see new Tables S16-18). The interaction of categorical LSNS with change in age on HCV tends towards showing an effect but does not reach significance even before FDR-correction.

      As cut-off points are central to clinical work, we are convinced that this expansion improved our study greatly, contributed to its benefit to our readers and we are thus very grateful for this valuable question.

      Our analyses indicate that the cut-off can be employed in clinical settings to detect social isolation that might harm patients’ brain health.

      However, this does not answer another important question, namely which public health strategy is most suitable to target social isolation for preventive purposes. Should it focus on the most isolated individuals (i.e. those categorised as socially isolated) or pursue a population strategy (Rose et al., 2008)? This actually is the topic of ongoing research in our group and we hope to answer it in future work. For now, we ran additional models testing an interaction effect of dichotomous LSNS with continuous LSNS. Finding evidence for such an interaction effect would suggest that having less social contact has stronger negative effects for those that are categorised as socially isolated. Roughly speaking, is it worse to have one instead of two reliable friends than it is to have four instead of five? If this were the case, this would point public health towards a high-risk rather than population strategy. We did not find any evidence for such an interaction effect and thus can not say that we have found that more social contact ceases to be beneficial beyond the threshold score of 12. In addition to the new results, we have expanded on this in the discussion section where it now reads: „We showed that the established LSNS cut-off can be employed by clinicians to identify subjects likely to suffer adverse effects due to social isolation. However, the absence of evidence for more pronounced negative effects of less social contact amongst those that are deemed socially isolated by the cut-off renders a public health strategy focused on high-risk individuals questionable.”

      3a) Although it was not directly tested in the paper, LSNS scores did not seem to change with increasing age (Table 1). This general stability of LSNS scores in older adults should be discussed further. The authors should consider how their relatively healthy and high SES sample may be less vulnerable to loss of family or friends in old age, making this sample sub-optimal for the question they have. The significance of the subject effect suggests that some individuals still experience a loss of social connectedness. The authors may want to elaborate on this and give some explanations of such subject differences in the ageing effect on social loneliness. Although stress was not a significant mediating factor, is it related to baseline loneliness or changes in loneliness in the current sample?

      Concerning the link between change in age and LSNS we indeed found a statistically significant effect of age change on higher social isolation in an ancillary LME. However, as the reviewer noticed, the per year effect is very small, meaning that it would need getting more than 20 years older to score one point higher on the LSNS sum score (see new Table S2, see also answer below to questions 4a and 3b). We therefore tend to agree that in our sample, higher age does not affect social isolation substantially.

      Furthermore, we very much appreciated your recommendation to further discuss how our relatively high SES-sample might be less vulnerable to loss of social contact during the aging process. As a foundation for this discussion, we investigated the link between SES and LSNS using an LME and found the association to be highly significant (see new Table S2). Furthermore, we added a table showing which percentage of our participants fell into the SES quintiles that would be observed in a fully representative German sample to help our readers to interpret our findings (see new Table S3). Following your advice, we have added a comment highlighting how the relatively high SES of our sample might have contributed to this in the limitations section: “As we found higher SES to be associated with lower LSNS scores, this relatively high SES sample might have led to underestimation of the detrimental effects of social isolation and increases in social isolation in the aging process.”

      Regarding the importance of chronic stress to social isolation, we did not only find no mediating effect of stress, we also did not find a significant simple association between TICS and LSNS scores (see new Table S2). We are hesitant to attribute this finding to the incorrectness of the stress-buffering hypothesis as the missingness in stress data makes all interpretations of analyses involving TICS scores problematic. We have expanded on this in the discussion section and added emphasis to the importance of also pursuing other mechanistic theories in our discussion, where it reads: “we could not find evidence that social isolation affected hippocampal volume through higher chronic stress measured with questionnaires, a hypothesis put forward by the stress buffering theory (Kawachi & Berkman, 2001). These latter analyses suffered from small sample sizes and a limited number of timepoints. Nonetheless, the lack of any significant link between chronic stress and social isolation (see Table S2) is hard to align with the stress-buffering hypothesis in spite of the missingness in the TICS.”.

      4a) The presentation of longitudinal data (Figure 1) lacks dimensionality. The scatter plots presented here are more suitable for cross-sectional studies and could cause confusion regarding the interpretation of the results. The authors should consider individual growth curves or spaghetti plots in visualizing change within subjects.

      We are grateful for your advice to visualise individual developments in social isolation and outcome measures over time in spaghetti plots and have done so to give our readers insight into these developments (see new Fig. S1). As you had assumed, there is no unequivocal pattern of increasing social isolation over time (see also answer to 3a). In addition, we decided to stick with presenting results of the statistical modeling of linear mixed effect using scatterplots in Figure 1, as this is regarded the most appropriate visualization of the tested effectors. Please see also response to 5b.

      Reviewer #2 (Public Review):

      The paper by Laurenz Lammer and colleagues used cohort data to investigate the cross-sectional and longitudinal association between loneliness and brain structure and cognitive function. The main finding was that baseline social isolation and change in social isolation were associated with smaller hippocampus volumes, reduced cortical thickness, and poorer cognitive function. Given that more and more people feel lonely nowadays (e.g., due to the pandemic), the study by Lammer and colleagues addresses a highly relevant health concern of our time.

      Significant strengths of the study:

      • large cohort;

      • the cross-sectional and longitudinal analyses confirmed the findings;

      • the study was preregistered;

      • the study included men and women;

      • analyses were sound and controlled for essential confounders.

      Thank you for your time to thoroughly review the manuscript and for the encouraging comments. Please see below how we implemented your advice.

      The major weaknesses of the study:

      1a) it is unclear whether loneliness causally contributes to brain structure and cognitive function;

      Indeed, based on structural equation analyses of the available data from this cohort, we could not find strong evidence for neither causality (social isolation causes brain/cognitive decline) nor reverse causality (brain/cognitive decline causes social isolation). This could be due to a lack of power to detect such effects due to the drop in sample size for these analyses. Overall, regarding these two competing hypotheses, we see some minor indication of support for causality of social isolation in our data due to the presence of robust and significant associations in our very healthy sample, the absence of clear increases in effect size when including cognitively less healthy participants and the absence of clear decreases in effect sizes when only including participants with high MMST scores. Accordingly, we added this concluding synopsis to our paragraph on causality in our discussion: “Still, overall these results only add a modicum of corroboration to the case for a causal role of social isolation.” and pointed towards the key role of RCTs in understanding causality in this regard: ”Intervention studies will be the gold standard to provide evidence with regards to the causal role and effect size of social isolation.”

      2a) the factors that may cause loneliness are unclear.

      Thank you very much for encouraging us to shed some light on participant characteristics of potential relevance to social isolation. Starting from the impulse to look into marital status and employment, we also investigated links to socioeconomic status, migration background, age at baseline, change in age, gender, living alone and the number of persons living in the participants dwelling. We found all of these factors except for gender and migration background to be significantly linked to social isolation. Results are presented in Table S2 and briefly referred to in the results section: “In our sample, social isolation was positively correlated with not living alone, being married, the number of persons living in the participants’ dwelling, being gainfully employed, younger baseline age and less change in age and being married but no to gender or having a migration background. See Tables S1-2 for descriptive statistics and details of the associations. To contextualise the observed link to SES, a comparison of SES category frequencies in LIFE-Adult and a fully representative sample (Lampert et al., 2013) is provided in Table S3.” And added to the discussion: “Existing and future research on reasons for and the role of social isolation in health and disease should provide guidance for the urgently needed development and evaluation of tailored strategies against social isolation and its detrimental effects.”

    1. Author Response

      Reviewer #1 (Public Review):

      Weakness of the study include:

      1) There are no data supporting a role for insulin regulation of microtubule-dependent GLUT4-containg vesicle movement. The data in Fig.2B do not support a differences in the number of "moving" GLUT4 vesicles between basal and insulin-stimulated fibers. The statement on line 103 that they "observed a ~16% but insignificant increase" to be confusing. These data do not support an effect of insulin on the number of moving GLUT4 vesicles that can be detected in an individual experiment. There is also effect of insulin on GLUT4 vesicles in the data reported in Fig.S2D, Fig.S5B, and Fig.S5F. However, the data in Fig. 2C suggest there was a consistent increase in "moving" vesicles in insulin-stimulated conditions in 4 independent experiments (how are these data normalized?). Because the basis of insulin-regulation of glucose uptake is the control of GLUT4 translocation to the plasma membrane, the authors need to clarify their thinking on why they do not detect insulin robust effects on GLUT4 dynamics in the individual experiments. Is it that they are not measuring the correct parameter? That the assay is not sensitive to the changes?

      The small (or no effect) of insulin distracts a bit from the findings that there is microtubule-dependent GLUT4 movement in basal and stimulated muscle fibers, and that disruption of this movement by depolymerization of microtubules or Kif5b knockdown blunts GLUT4 translocation. As noted above, the data strongly support microtubule-dependent GLUT4 dynamics as permissive for insulin-stimulated GLUT4 translocation even if this dynamics might not be a target of insulin action.

      In light of the reviewer´s comment and to avoid confusing/distracting readers we have removed figure 2C showing the effect of insulin based on pooled data across all our independent experiments. We discuss several possibilities for the lack of significant insulin effect on GLUT4 movement in individual experiments in the discussion section (lines 342 to 361 in TC version of MS). The discussion has been updated to reflect the points raised by the reviewer. More sensitive techniques than currently available in our lab are required to firmly conclude whether microtubule-based GLUT4 trafficking is directly regulated by insulin.

      2) The analyses of GLUT4-containing structures are not particularly informative. Co-localization with other markers (beyond syntaxin6) are needed to understand these structures. Defining structures as small, medium or large is incomplete. In particular, it is important to probe the microtubule nucleation site clusters for other membrane markers. Transferrin receptor? IRAP?

      While our analysis based on structure-segmentation clearly demonstrate a microtubule-dependent effect on GLUT4 localization, we completely agree that additional work including co-labelling of GLUT4 and various compartment markers is required to fully understand the localization changes observed for GLUT4-containing structures upon microtubule disruption. However, for practical reasons, it is not currently feasible for us to complete these analyses within a reasonable time-frame so we will reserve this for future studies.

      3) The Kinesore data do not support the authors hypothesis. The data show that Kinesore increases the amount of GLUT4 in the plasma membrane of basal cells and that insulin further increases plasma membrane GLUT4 to the same extent as it does in control cells. How does that provide insight into the role microtubules (or kif5b) in GLUT4 biology? Why does Kinesore increase plasma membrane GLUT4? Is it an effect of Kinesin 1 on GLUT4 vesicles? Kinesore is reported to remodel the microtubule cytoskeleton by a mechanism dependent on Kinesin 1. Is that the reason for the change in GLUT4?

      To better understand the effect of kinesore on GLUT4-dependent glucose uptake, we have now incubated EDL and Soleus muscles ± kinesore and ± insulin and measured 2-DG uptake (GLUT4 translocation and glucose transport is considered the rate-limiting step for 2-DG uptake in incubated muscles due to the lack of muscle perfusion in this model) and proximal insulin signaling. In contrast to the enhancing effect on membrane GLUT4 observed following kinesore treatment in basal and insulin stimulated L6 cells, kinesore did not stimulate basal 2-DG uptake in EDL and Soleus. Furthermore, kinesore markedly impaired insulin-stimulated 2-DG uptake (figure 4B). We also tested the effect of 2h kinesore treatment in differentiated primary human myotubes. In this model, kinesore reduced basal glucose uptake and blocked the insulin effect (figure 4C). Together, this suggests that kinesore inhibits GLUT4-dependent glucose uptake in adult muscle and primary human muscle cells, presumably by inhibiting the binding of GLUT4 containing cargo, despite kinesore also having an activating effect on Kinesin-1 motor function. This possibility is discussed in the current version of the manuscript (line 177-180, 203-211). These data are consistent with the KIF5B knockdown data in L6 and support a necessary role of this motor protein in skeletal muscle GLUT4 trafficking.

      To better understand, why kinesore led to increased rather than decreased GLUT4 translocation in L6 cells, we also disrupted the microtubule network using nocodazole and colchicine prior to kinesore stimulation. Surprisingly, kinesore stimulation enhanced membrane GLUT4 even in microtubule-disrupted L6 cells, indicating that the effect of kinesore on GLUT4 translocation is microtubule-independent in L6 cells. With three of four data sets supporting a necessary role of Kinesin-1 motor proteins in GLUT4 trafficking, including the adult muscle data, we end up concluding:

      …our shRNA data in L6 myoblasts and kinesore data in adult muscle support the requirement of KIF5B-containing Kinesin-1 motor proteins in insulin-stimulated GLUT4-dependent glucose uptake in skeletal muscle.

      However, we would also like to include the discrepant effect of Kinesore in L6 myoblasts as this may be useful information to others using this compound and/or studying GLUT4 in cultured cells.

      4) The analysis of Kif5b is a bit cursory. Depolymerization of microtubules in muscle fibers essentially blocks all GLUT4 movement (only the insulin condition is shown in Fig.2B but I assume basal would be equally inhibited), and fully inhibits insulin-stimulated glucose uptake in muscle fibers. What are the effects of nocodazole in L6 cells (cell used for kif5b studies) and is it similar in magnitude to kif5b knockdown? Those data would identify there are non-Kif5b microtubule-dependent effects.

      To address the magnitude of reduced insulin-stimulated GLUT4 translocation in microtubule-disrupted L6 cells, we investigated the effect of nocodazole (13 µM) and colchicine (25 µM) on GLUT4 translocation in L6 cells.

      Insulin stimulated GLUT4 translocation was reduced but not blocked by either nocodazole or colchicine. This is in accordance with previous in vitro studies in 3T3 adipocytes and muscle cells (PMID: 11085918, PMID: 11145966, PMID: 24705014). Overall, these data still support that Kif5b is a major microtubule motor protein regulating GLUT4 translocation across cell-types.

      5) The authors need to show that the fibers isolated from the HFD mice remain insulin-resistant ex vivo by measuring glucose uptake. It is possible that once removed from the mice they "revert" to normal insulin-sensitivity, which might contribute to the differences reported in Fig5.

      This is an important point. In figure 5 figure supplement 1E, we show that the fibers isolated from the diet-induced obese mice display impaired insulin-induced p-Akt Thr308 and p-TBC1D4 Thr642 after isolation and in vitro culture. This shows that the insulin resistance is present at the muscular level and is preserved after isolation and in vitro culturing.

      6) Although it is interesting that the authors have included the insulin-resistance models/experiments, they are not well developed and therefore the conclusions are not particularly strong.

      In this study, we induced insulin resistance by two different means (C2 ceramide treatment and diet-induced obesity) and demonstrated at the level of p-Akt and p-TBC1D4 in cultured muscle fibers that we successfully achieved insulin resistance in our models. In particular the high fat diet model is arguably the most common in vivo model of obesity-linked insulin resistance. Thus, we were able to study GLUT4 trafficking on microtubules in normal vs. insulin-resistant muscle fibers and found this to be impaired in insulin-resistant muscle. Although one could always have done more, we believe that our data on adult muscle GLUT4 movement in insulin-resistance are robust, novel and do support our conclusions and title.

      7) The data do not support the title.

      We respectfully disagree. See our reply to comment 6 above.

    1. Authorr Response

      Reviewer #1 (Public Review):

      1) The study finds Lyn to be degraded more efficiently via the proteasome and to be more tightly controlled by phosphatases when compared to Lck. However, rather than interpreting the findings as distinct kinase-intrinsic properties, one could attribute the slower degradation and stricter PTP control of Lyn to the fact that Lyn is the principal and predominant SFK in B cells and thus a "standard target" of the B-lymphoid molecular machinery, to which it is better adapted to.

      We respectfully disagree with the reviewer’s comment that our interpretation is limited to “kinase-intrinsic properties”. In many points within the manuscript we refer to the “B-lymphoid molecular machinery”. More specifically:

      • Lines 62-64 in the original submission (lines 60-61 in the revised manuscript): “….enzymatic promiscuity of SFKs can be buffered by their differential susceptibility to regulatory control mechanisms designed for keeping global SFK activity levels under strict control….”

      • Lines 113-114 in the original submission (lines 137-138 in the revised manuscript): “Lck and Lyn differ in the efficiency for signal ignition and in their susceptibility to regulatory mechanisms in B-cells”

      • Lines 135-136 in the original submission (lines 159-160 in the revised manuscript): “Thus, the proteasomal degradation machinery constrains the abundance of Lyn, but not Lck, within B-cells.”

      • Lines 162-163 in the original submission (lines 185-186 in the revised manuscript): “Collectively these data show that the BCR signaling machinery is more responsive to the action of Lyn, at the same time imposing stricter regulation on its expression and activity levels.”

      • Lines 475-477 in the original submission (lines 527-528 in the revised manuscript): “…identified specialized control mechanisms designed to keep Lyn, but not Lck, activity levels under strict control.”

      However, we cannot rule out, as a mutually inclusive scenario, that intrinsic SFK features contribute to their differential regulation by cellular mechanisms, a possibility that we also refer to in the manuscript. More specifically:

      • Lines 335-337 in the original submission (modified text in the revised version, lines 372-374): “On one hand there is the total amount of SFK activity within the cell, and on the other the individuality of SFK family members, dictated by intrinsic molecular features.”

      • Lines 477-478 in the original submission (lines 528-529 in the revised manuscript): “These data may signify that SFKs have been evolutionarily diversified to best suit the needs of the cellular environment they are expressed in…”

      Based on the reviewer’s comment, and to clarify further, we have modified the revised version of the manuscript (lines 372-374) as follows:

      “On one hand there is the total amount of SFK activity within the cell, and on the other the individuality of SFK family members, dictated by intrinsic molecular features and/or adaptation to cell-specific regulatory mechanisms.”

      We hope that our clarifications, satisfy the reviewer.

      2) Venn diagram depicting differentially regulated transcripts between Lck- and Lyn-expressing cells, it does not seem like Lck is able to regulate pathways which are not "canonically" regulated by Lyn.

      and

      As a distinct functional difference between Lck and Lyn is not established in this work, said SFKs' largely exclusive expression in T and B cells remains enigmatic.

      We thank the reviewer for the comment. We address this issue on the discussion section of the revised manuscript (lines 514-519).

      3) There is also the persisting problem of Lck being expressed to a much higher extent and the effect of the endogenously expressed Lyn since the model systems are not based on a Lyn-deficient cell line.

      For the purpose of the analysis, we tried to circumvent the discrepancies between Lck and Lyn expression levels by our equal GFP gating strategy (explained in Figure 1-figure supplement 3E/Fig.S3E in the original submission). Nevertheless, as shown in Figure 1C there is a physiological reason for the two SFKs not being equally expressed, and we refer to the biological implications of these individualities in the Discussion.

      The effect of endogenously expressed Lyn is represented by the phenotype of -Dox cells which we use as background in all our studies, especially since we show that there are no alterations on Lyn or any other SFK activation status resulting from Lck overexpression (Figure 1-figure supplement 2B/ Fig.S2B in the original submission), so we do not believe this is a problem. Additionally, a Lyn-deficient environment would also not be perfect, since very plausibly it could have undergone further signaling and survival adaptations that we could not account for.

      4) Lastly, the authors follow up their finding of deregulated transcripts belonging to the ER/UPR ontology cluster. Flow cytometric analysis indeed shows an influence of Lck and Lyn expression on ER homeostasis, which can be reverted with SFK inhibitors. Alas, additional follow-up experiments to functionally investigate the deregulated pathways suggested by the RNAseq analysis are not included in this study.

      We thank the reviewer for the comment, and we agree. However, its beyond of our capabilities and manpower and the scope of the present work to perform numerous functional or semi-functional studies for every GO analysis pathway that emerged from the transcriptomics studies. Although follow up work from our group will focus on comprehensive and meticulous analyses of gene expression profiles, currently such an effort would require long-lasting studies which would also significantly extend the size of the manuscript but also distort the focus from the effects we wish to pinpoint with the present work i.e. the unique adaptation of SFKs within the lymphocyte environment and gene expression profile tendencies exclusively controlled by SFK-generated signals.

      In an effort to satisfy the reviewer, we performed focused follow up studies specifically on the ER effect of SFK-transduced signals, since it appears to be a so-far unknown aspect of their function. The new data are presented in the revised version of Figure 4 (panels C and D) and Supplementary Figure 4-figure supplement 1. Corresponding text can be found in lines 323-345 of the revised manuscript (results section) and lines 499-512 and line 531 of the discussion. In brief, we show an SFK kinase-activity dependent activation of the ER-phagy receptor FAM134B, which is not accompanied by recruitment of LC3B, as dictated by the currently known canonical ER-phagy pathway. This is the first report of SFKs’ involvement in ER-phagy process and first time FAM134B activation is described in B-cells. Since this field is relatively new, and the role and regulation of ER-phagy is almost unexplored in B-cells, we hope that the reviewers will appreciate the novelty of the finding and its sufficiency for the current manuscript. We do realize that these initial data prompts for more detailed mechanistic investigation, which we are pursuing in the form of a more complete and comprehensive future study.

      Reviewer #2 (Public Review):

      1) Studies reveal no qualitative functional differences in Lck and Lyn that are likely to explain its unique ectopic expression of Lck in CLL

      and

      If Lck promotes pathophysiology by transduction of a qualitatively unique signal, one would expect that transcriptome analysis should reveal this difference.

      We thank the reviewer for the comment. We address this issue on the discussion section of the revised manuscript (lines 514-519).

      2) It is unclear from the material and methods whether the overexpressed Lyn is LynA or Lyn B. It appears in the text (lines 130-133) that they overexpress LynB specifically. A recent paper from Tania Freedman (Sci Adv 2022 PMID:35452291) suggests that LynA is more activating whereas LynB is more balanced with an inhibitory bias. The point is that it is important to discuss this because they may not be making a relevant comparison.

      We thank the reviewer for the comment, to clarify this, we added in the Materials and Methods section of the revised manuscript (under “Cloning and Plasmids”) the use of Lyn isoform B.

      We initially attempted to produce BJAB lines overexpressing LynA, however expression levels of this isoform was particularly low and we could not proceed with further analyses, so we cannot comment on how LynA might behave in an overexpression model in B-cells, especially given the absence of relevant information in the existing literature.

      The recent Sci Adv 2022 PMID:35452291 study deals with germline LynA and LynB isoform-specific knockouts and their propensity towards autoimmunity in mice. The authors compared the single isoform (LynA or LynB) and total Lyn knockouts by performing systemic phenotypic analyses of autoimmunity features (splenomegaly, myeloid cell profiles, proinflammatory markers on myeloid cells, B cell development, expansion of activated and autoimmunity-associated B cell subsets, autoimmunity scores). Differences they pinpoint between LynA and LynB are summarized as follows:

      1. “It was found that LynB has the dominant regulatory role in mice of both sexes, but that LynA expression is uniquely required to prevent autoimmunity in female mice”. The etiology of which is unclear.

      2. “LynB generally appears to be the dominant immunosuppressive isoform, with LynB deletion causing severe autoimmune disease in male and female mice. For some indicators (splenomegaly, glomerular IgG and C3 deposition, and kidney fibrosis), LynBKO and total LynKO mice developed equally severe phenotypes. In other cases (serum IgM and BAFF, glomerular immune infiltration, myeloid cell polarization, and monocyte/granulocyte expansion), LynBKO mice had less severe phenotypes than total LynKO mice, suggesting an additive effect with LynA”.

      3. “LynA and LynB seemed equally capable of promoting B cell development, regulating myeloid cell polarization and restraining myeloid-driven inflammation. Given the increased number of activated/inflammatory B cell types in LynAKO and LynBKO mice, future studies will be aimed at determining whether the single-isoform knockouts have a more B cell–initiated than myeloid cell–initiated form of autoimmune disease”.

      After careful reading of the manuscript, we could not find any functional analyses on the activation status of the distinct isoforms, or signaling events they elicit. Furthermore, the authors do not report any conclusions that LynA is more activating at the molecular level. Based on the above, we cannot connect the data published in PMID:35452291 paper and our results for discussing “LynA being more activating” and implications this might have on our studies.

      To comply with the reviewer’s suggestion, in our revised manuscript we cite this study (ref number 29) in the following sentence appearing in lines 380-383:

      “Lyn exists as two alternatively spliced variants LynA and LynB. Distinct biological functions between the two isoforms still remain poorly understood. A recent study (29) documented that LynB provides an advantage in protecting against autoimmunity compared to LynA; however, the underlying mechanisms for this phenotype are unclear.”

    1. Author Response

      Reviewer #2 (Public Review):

      The authors use data from 3 cross-sectional age-stratified serosurveys on Enterovirus D68 from England between 2006 and 2017 to examine the transmission dynamics of this pathogen in this setting. A key public health challenge on EV-D68 has been its implication in outbreaks of acute flaccid myelitis over the past decade, and past circulation patterns and population immunity to this pathogen are not yet well-understood. Towards this end, the authors develop and compare a suite of catalytic models as fitted to this dataset and incorporate different assumptions on how the force of infection varies over time and age. They find high overall EV-D68 seroprevalence as measured by neutralizing antibodies, and detect increased transmission during this time period as measured by the annual probability of infection and basic reproduction number. Interestingly, their data indicate very high seroprevalence in the youngest children (1 year-olds), and to accommodate this observation, the authors separate the force of infection in this age class from the other groups. They then reconstruct the historical patterns of EV-D68 circulation using their models and conclude that, while the serologic data suggest that transmissibility has increased between serosurvey rounds, additional factors not accounted for here (e.g., changes in pathogenicity) are likely necessary to explain the recent emergence of AFM outbreaks, particularly given the broader age-profile of reported AFM cases. The Discussion mentions important current unknowns on the biological interpretation of EV-D68 neutralizing antibody titers for protection against infection and disease. The analysis is rigorous and the conclusions are well-supported, but a few aspects of the work need to be clarified and extended, detailed below:

      1) Due to the lack of a clear single cut-point for seropositivity on this assay, the authors sensibly present results for two cut-points in the main text (1:16 and 1:64). While some differences that stem from using different cut-points are fully expected (i.e., seroprevalence being higher using the less stringent cut-point), differences that are less expected should be further discussed. For instance, it was not clear in Figure 2 why the annual probability of infection decreased after 2010 using the 1:64 cut-point, while it continued to increase using the 1:16 cut-point. It would also be helpful to explain why overall seroprevalence and R0 continue to increase over this time period using the 1:64 cut-point. Lastly, it would be useful to see the x-axis in Figure 4 extended to the start of the time period that FOI is estimated, with accompanying credible intervals.

      For the discussion on differences between the two cut-offs, please see response to essential comment 1.

      Extending the x-axis before 2006 in Figure 4 is not possible. Estimates of the overall seroprevalence at a year y require FOI estimates up until y-40. This implies the first estimates we can provide are for 2006.

      Credible intervals have been added to Figure 4.

      2) Additional context of EV-D68 in the study setting of England would be useful. While the Introduction does mention AFM cases "in the UK and elsewhere in Europe" (line 53), a summary of reported data on EV-D68/AFM in England prior to this study would provide important context. The Methods refers to "whether transmission had increased over time (before the first reported big outbreak of EV-D68 in the US in 2014)" (lines 133-134), rather than in this setting. It would be useful to summarize the viral genomic data from the region for additional context - particularly since the emergence of a viral clade is highlighted as a co-occurrence with the increased transmissibility detected in this analysis.

      We have added a figure (new Figure 1 – figure supplement 1) showing the annual number of EV-D68 detections reported by Public Health England from 2004 to 2020.

      We have also added the following text to the introduction: “Similarly, in the UK, reported EV-D68 virus detections also show a biennial pattern between 2014 and 2018 (Figure 1 – figure supplement 1).”

      We have also amended the sentence in the Methods.

      Finally, below is a screenshot of the nexstrain tree for EV-D68 based on the VP1 region and with tips representing sequences from the UK (light blue) and European countries in colour. There is a lot of mixing between sequences from different regions, indicating widespread transmission and small regional clustering. We have added the following text to the Discussion: “Reported EV-D68 outbreaks in 2014 and 2016 were due to clade B viruses, while the 2018 outbreaks were reported to be linked to both B3 and A2 clade viruses in the UK (10), France (32) and elsewhere.”

      Reviewer #3 (Public Review):

      In the proposed manuscript, the authors use cross-sectional seroprevalence data from blood samples that were tested for evidence of antibodies against D68 for the UK. Samples were collected at 3 time points from individuals of all ages. The authors then fit a suite of serocatalytic models to explain the changing level of seropositivity by age. From each model they estimate the force of infection and assess whether there have been changes in transmissibility over the study period. D68 is an important pathogen, especially due to its links with acute flaccid myelitis, and its transmission intensity remains poorly understood.

      Serocatalytic models appear to be appropriate here. I have a few comments.

      The biggest challenge to this project is the difficulty in assigning individuals as seronegative or seropositive. There is no clear bimodal distribution in titers that would allow obvious discrimination and apparently no good validation data with controls with known serostatus. The authors tackle this problem by presenting results to four different cut-points (1:16 to 1:128) - resulting in seropositivity ranging from around 50% to around 80%. They then run the serocatalytic models with two of these (1:16 and 1:64) - leading to a range of FoI values of 0.25-0.90 for the 1 year olds and 0.05-0.25 for older age groups (depending on model and cutpoint). This represents a substantial amount of variability. While I certainly see the benefit of attacking this uncertainty head on, it does ultimately limit the inferences that can be made about the underlying risk of infection in UK communities, except that it's very uncertain and possibly quite high.

      I find the force of infection in 1 year olds very high (with a suggestion that up to 75% get infected within a year) and difficult to believe, especially as the force of infection is assumed much lower for all other ages.

      The authors exclude all <1s due to maternal antibodies, which seems sensible, however, does this mean that it is impossible for <1s to become infected in the model? We know for other pathogens (e.g., dengue virus) with protection from maternal antibodies that the protection from infection is gone after a few months. Maybe allowing for infections in the first year of life too would reduce the very large, and difficult to believe, difference in risk between 1 year olds and older age groups. I suspect you wouldn't need to rely on <1 serodata - just allow for infections in this time period.

      Relatedly, would it be possible to break the age data into months rather than years in these infants to help tease apart what happens in the critical early stages of life.

      Yes. We have added two figures (new Figures 1C and 1D) showing the prevalence of antibodies in children <1 yo. We show these data for the three serosurveys combined, because the number of individuals per month of age is very small.

      One of the major findings of the paper is that there is a steadily increasing R0. This again is difficult to understand. It would suggest there are either year on year increases in inherent transmissibility of the virus through fitness changes, or year on year increases in the mixing of the population. It would be useful for the authors to discuss potential explanations for an inferred gradual increase in R0.

      We have removed the estimates of R0 from the manuscript.

      On a similar note, I struggle to reconcile evidence of a stable or even small drop in FoI in the 1:64 models 4 and 5 from 2010/11 (Figure 3) with steadily increasing R0 in this period (Figure 4). Is this due to changes in the susceptibility proportion. It would be good to understand if there are important assumptions in the Farrington approach that may also contribute to this discrepancy.

      We have removed the estimates of R0 from the manuscript and only present the reconstruction of the annual number of new infections per age class and year (new Figure 5). We think this measure is more adapted to the discussion of the results.

      In addition, when using the classical expression R{0t}=1/(1-S(t)), with S(t) the annual proportion seropositive, the high seroprevalence estimates (new Figure 4) result in extremely high estimates of the basic reproduction number (median ranges: 11.6 – 29.7 for 1:16 and 3.3 – 7.6 for 1:64 during the period 2006 to 2017).

      We had previously used the Farrington approach as it is adapted to cases when the force of infections is different for different age classes.

      The R0 estimates (Figure 4) should also be presented with uncertainty.

      R0 no longer presented, but estimates of overall seroprevalence now presented with uncertainty.

      Finally, given the substantial uncertainty in the assay, it seems optimistic to attempt to fit annual force of infections in the 30 year period prior to the start of the sampling periods. I would be tempted to include a constant lambda prior to the dates of the first study across the models considered.

      We thank the reviewers for the suggestion.

      We implemented this change (constant FOI before 2006) in the previous models without maternal antibodies and the result for the random-walk-based models was that the variance of the random walk was estimated over a very short period, thus resulting in a rather non- smoothed FOI.

      Implementing this change with the new models with maternal antibodies and random-walk on the FOI was technically a bit complex. We therefore kept the simple random-walk over the whole period and added the following paragraph to the Discussion:

      “It is important to interpret well the results for the estimates of the FOI over time from our analysis under the assumptions of the models. First, as the best model uses a random walk on the FOI, the change in transmission that we infer happens continuously over several years. In reality, this may have occurred differently (e.g. in a shorter period of time). Our ability to recover more complex changes in transmission is limited by the data available. It would not be surprising if EV-D68 has exhibited biennial (or longer) cycles of transmission in England over the last few years, as it has been shown in the US (7) and is common for other enteroviruses (30). However, it is difficult to recover changes at this finer time scale with serology data unless sampling is very frequent (at least annual). Therefore, our study can only reveal broader long-term secular changes. Second, interpretation of the results before 2006 must be avoided for two resasons. On the one hand, as we go backwards in time, there is more uncertaintly about the time of seroconversion of the individuals informing the estimates of the FOI. On the other hand, because age and time are confounded in cross-sectional seroprevalence measurements, the random walk on time may account for possible differences in the FOI through age (possibly higher in the youngest age classes, and lowest in the oldest), which are note explicitly accounted for here. This may explain the decline in FOI when going backwards in time before the first cross-sectional study in 2006.”

    1. Author Response

      Reviewer #3 (Public Review):

      A large body of work in the literature has established that the diversity in cells of identical genetic background occurs due to two components: 1) intrinsic noise - such as stochastic fluctuations in gene expression - as well as 2) extrinsic noise - variability that arises from sources that are external to the biochemical process of gene expression, such as abundances of ribosomes or stage in the cell cycle. Note that this widely-accepted definition does not separate intrinsic and extrinsic from intracellular and extracellular. The authors cite a few of these seminal papers (which focus on noise introduced to gene expression) but then define their interpretation of intrinsic noise much more broadly "... intrinsic noise as phenotype(s) fluctuations across isogenic cell populations cultured under the same conditions. Measurement noise in some cases can also be thought of as intrinsic noise. Fluctuations in cellular phenotype(s) driven by the global environment will be referred to as extrinsic noise." This misuse of widely accepted terminology creates significant confusion in the interpretation of the results.

      A point of contention with redefining noise as the authors have done is that they are lumping all processes unique to the cell as intrinsic and all environmental factors as extrinsic. Thus, when statements are made such as "external factors that contribute to noise are principally manifest through convection" (line 40-41, page 2) the veracity of these assumptions must be established. For example, when a ligand binds and unbinds from a receptor due to thermal energy, that "noise" in cellular stimulation is not convection-based, yet an example of how extrinsic noise can influence cellular responses. The definition is important because the underlying premise for the pipeline presented is that "While intrinsic cell variability can be significant, we believe that it is the extrinsic factor(s) that drive sample variability in most experimental cellular systems" (lines 42-43, page 4).

      We thank the referee for this very important critical comment. The referee correctly points out that the terminology (intrinsic vs. extrinsic noise) used in the cited papers has to be adapted and more clearly stated.

      We wish to point out that the autonomous system in Michael Elowitz and colleagues’ original paper was a single protein within a single cell. The noise that was measured in these experiments was driven by temporal fluctuations. An example of extrinsic noise for this system is, indeed, as pointed out by the referee, ligand binding and unbinding from a receptor.

      By contrast, our autonomous system is an ensemble of cells isolated from other samples but still subject to fluctuations in the external environment. We did not continuously measure temporal fluctuations in individual cells, but recorded snapshot(s) of cellular phenotype(s) within a single sample. The source of noise in these measurements is variability between individual cells, and we referred to this type of noise as intrinsic because it driven by the processes within the sample. We denoted as extrinsic noise that which is driven by external factors to this autonomous system (a particular sample), such as variability between different samples due to temperature, humidity, etc.

      All of these external factors (to the best of our knowledge) are related to movement and gradient formation of fluid or gas and, hence, from a physicochemical perspective, driven by convection process(es). The initial cell seeding that eventually leads to unique microenvironment formation can also be thought as an example of extrinsic noise using this terminology. The process of cell sedimentation and attachment is driven by advection, as the referee correctly points out. We have, therefore, adjusted the text accordingly.

      We hope that clarifying the intrinsic/extrinsic terminology in the "Introduction" section of the manuscript (line 37) should be sufficient to avoid the confusion the referee discusses. We are open (very reluctantly) to switching terminology to terms internal and external noise.

      Throughout, figures lack labels and sufficient explanation for interpretation, as well as the number of experiments used to generate the data that is processed through the pipeline for each condition. For a study designed to eliminate replicate culture conditions, the onus is on the authors to show that replicates are in fact fully recapitulated in the population variance after statistical binning/processing.

      To address this comment, we modified the figure legends and labels of most of the figures.

      We wish to emphasize that each point-injection experiment we performed is unique due to randomness in the local delivery method. This is due to the variability in the manual micro-injection release rate and direction of the initial flow. Several experiments (3+) were performed to improve the width of the label(s) distribution(s) and their mixing condition, and the results of the better optimized local delivery were selected as representative for the manuscript. Sample selection was independent of the outcome of drugs action and based on initial label distribution only. An experimental improvement of our method, similar to initialization of the pseudo-random number generator in numerical experiments, is required to achieve systematic reproducibility of drug(s) distribution(s). One way to do so is robotically, but certainly the best is to design a system that utilizes a predictably constant drug gradient within a sample that contains large enough cells, a topic that will be the subject of future experiments.

      Ultimately, when the paper presents results such as Figure 9 as the culmination of the pipeline as applied to cell viability studies, it is unclear how useful insight is extracted from this methodology. Four drugs are applied in combination to adherent HeLa cells and time-dependent local cell density is provided as a proxy for cell viability. While it is stated that "The absolute drug concentration can be determined using the homogeneous delivery method discussed above" (line 421-422, page 19), this analysis is not performed, and I am left unsure of whether extrinsic factors are truly driving sample variability under this context. It is unclear to the reader how the point injections were administered, and no discussion of how the confounding factors of synergy or antagonism will be addressed through this methodology.

      We attempted to explain that data shown in Figure 9 were not meant to be the climactic point of the entire pipeline (rather, the data shown in Figure 6 represent our key achievement). In this four-drug experiment, we exhausted the fluorescent spectrum bandwidth necessary to distinguish drug labels (i.e., using commonly available microscopy tools). In order to estimate local cell density, we had to rely on bright field imaging data which is not the most accurate possible implementation (see further response to your comment below). More importantly, we had to wash samples between the measurements to remove detached (dead) cells and cell debris. This step can (and usually does) influence local cell density in a non-uniform fashion, since both media removal and deposition are performed locally by pipetting (cells in the vicinity of aspiration/media deposit sites can be washed off regardless of the drug treatment.)

      To clarify how point injections were administered, we added a detailed description in the Methods section. Please see section Drug labeling and delivery, pages 11-12.

      In this manuscript, we wished to establish possible applications of our method and avoid in depth analysis or biological interpretation of a specific drug combination that is dependent on the cell line or on a particular experimental condition. We added a paragraph in the "Discussion" section suggesting the necessity of future research dedicated to methodology and analytical interpretation of high-dimensional context-dependent drug interaction data.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors unexpectedly found that the protein Grb2, an adaptor protein that mediates the recruitment of the Ras guanine-nucleotide exchange factor, SOS, to the EGF receptor, can be recruited to membranes by the immune cell tyrosine kinase Btk. The authors show, using total internal reflection fluorescence (TIRF) microscopy that the interaction with Grb2 is reversible, dependent on the proline-rich region of Btk, and independent of PIP3. These experiments are well performed and unambiguous.

      The authors next asked whether Grb2 binding to Btk influences its kinase activity, by evaluating (i) Btk autophosphorylation and (ii) the phosphorylation of a peptide from the endogenous substrate PLCy1. The readout relies on non-specific antibody-mediated detection of phosphotyrosine but nevertheless reveals a concentration-dependent increase in both Btk autophosphorylation and PLCy1 phosphorylation. The experiments, however, have only been performed in duplicate and, particularly in the case of PLCy1 phosphorylation, exhibit enormous variability which is not reflected in the example blot the authors have chosen to display in Figure 3C. Comparison of the same, duplicate experiment presented in Figure 3 Supplement 2 paints a very different picture.

      We added an experiment wherein we measure phosphorylation of the PLC𝛾2-peptide fusion by Btk in the presence of different concentrations of Grb2, and we have carried out LC-MS/MS to probe which Tyr are phosphorylated in these experiments. We have also modified our presentation of the Western blot data to allow readers to view each replicate separately. We believe this makes it easier to evaluate the trends observed in each replicate, and because the intensity measured here is only semi-quantitative, due to limitations of the technique, we believe this is a more accurate way to present our results. Both Tyr of the PLC𝛾2-peptide are phosphorylated, as well as one Tyr at the very C-terminus of GFP (Figure 3 – Supplements 3-5).

      The authors next sought to determine which domains of Grb2 are required for activation of Btk. Again, these experiments were only performed in duplicates, and the authors’ claims that Grb2 can moderately stimulate the SH3-SH2-kinase module of Grb2 are not well supported by their data (Figure 4C-D).

      We have opted to remove the data for the activation of the SH3-SH2-kinase construct (Src module) from the revised manuscript. Upon further inspection, we agree that these experiments only showed a weak trend and believe that much more experimentation is needed to draw firm conclusions regarding this construct. We do still speculate that SH2 linker displacement may contribute to our observations of enhanced catalytic activity of Btk in the presence of Grb2, however this speculation is based solely on previous work with Btk and other kinases (Aryal et al., 2022; Moarefi et al., 1997).

      The authors next asked whether Grb2 stimulates Btk by promoting its dimerization and trans- autophosphorylation. The authors measured the diffusion coefficient of Btk on PIP3- containing supported lipid bilayers in the presence and absence of Grb2. They noted that the diffusion coefficient of individual Btk particles decreases with increasing unlabeled Btk, which they interpret as Btk dimerization. Grb2 does not appear to influence the diffusion of Btk on the membrane (Figure 5A). Presumably, the diffusion coefficient reported here is the average of a number of single-molecule tracks, which should result in error bars. It is unclear why these have not been reported. Next, the authors assessed the ability of Grb2 to stimulate a mutant of Btk that is impaired in its ability to dimerize on PIP3-containing membranes. In contrast to wild-type Btk, autophosphorylation of dimerization-deficient Btk is not enhanced by Grb2. Whilst the data are consistent with this conclusion, again, the experiment has only been repeated once and the western blot presented in Figure 5 Supplement 2 is unreadable. It is also puzzling why Grb2 gets phosphorylated in this experiment, but not in the same experiment reported in Figure 3 Supplement 2.

      The diffusion coefficient reported here is determined from a large number of single molecule tracks. We have expanded our explanation of how this is done in the Materials and Methods, as well as providing an example of the data and fits from one of the conditions in Figure 4 – Supplement 3. We are now including standard deviation for each diffusion coefficient determined from the fit of the step size distribution.

      We have opted to remove the data involving the dimerization-deficient Btk construct. We agree that these results are difficult to interpret.

      We have investigated the Grb2 phosphorylation signal and concluded that this is an off-target effect of the antibody. MS/MS cannot detect and phosphorylation on Grb2. We now comment on this in the figure legend of Figure 3 – Supplement 2.

      Finally, the authors argue that Grb2 facilitates the recruitment of Btk to molecular condensates of adaptor and scaffold proteins immobilized on a supported lipid bilayer (SLB) (Figure 6). This is a highly complex series of experiments in which various components are added to supported lipid bilayers and the diffusion of labelled Btk is measured. When Btk is added to SLBs containing the LAT adaptor protein (phosphorylated in situ by Hck immobilized on the membrane via its His tag), it exhibits similar mobility to LAT alone, and its mobility is decreased by the addition of Grb2. The addition of the proline-rich region (PRR) of SOS further decreases this mobility. In this final condition, the authors incubate the reactions for 1 h until LAT undergoes a phase transition, forming gel-like, protein-rich domains on the membrane, shown in Figure 6B. The authors’ conclusion that Btk is recruited into these phase-separated domains based on a slow-down in its diffusion is not well supported by the data, which rather indicates that Btk is excluded from these domains (Figure 6B – Btk punctae (green) are almost exclusively found in between the LAT condensates (red)). As such, the restricted mobility of Btk that the authors report may simply reflect the influence of barriers to diffusion on the membrane that result from LAT condensation into phase- separated domains. The authors also present data in Figure 6 Supplement 1 indicating that Grb2 recruitment to Btk is out-competed by SOS-PRR and that Btk does not support the co- recruitment of Grb2 and SOS-PRR to the membrane. These data would appear to suggest that the authors’ interpretation of the decreased mobility of Btk on the membrane may not be correct.

      We have now included an example of one of the single molecule videos, overlayed with the surrounding LAT phase, to more directly display the data that was recorded for this experiment. In this video, it is possible to see that the LAT dense phase occupies only some of the observed window, and although it is possible that these dense “islands” function as barriers to Btk diffusion, Btk would be expected to diffuse freely outside of the LAT dense areas of the bilayer. This property can be clearly seen in the video we have now included. This is reminiscent of what was observed previously during the LAT phase transition for tracking of LAT itself (Sun et al., 2022). Given the extensive previous analysis of LAT diffusion on supported lipid bilayers (Lin et al., 2022; Sun et al., 2022), we believe the necessary controls have been included to support our conclusions. However, we agree there is much to be learned about this interaction and we hope that future studies will further investigate the relationship between cytoplasmic kinases and plasma membrane associated signaling clusters.

      Reviewer #3 (Public Review):

      The study of Nocka and colleagues examines the role of membrane scaffolding in Btk kinase activation by the Grb2 adaptor protein. The studies appear to make a case for a reinterpretation of the "Saraste dimer" of Btk as a signaling entity and assigns roles to the component domains in the Src module in Btk activation. The point of distinction from earlier studies is that this work ascribes a function to an adaptor protein as promoting the kinase activation, rather than vice versa, and also illustrates why Btk can be activated via modes distinct from its close relative, such as Itk. Importantly, these studies address these key questions through membrane tethering of Btk, which is a successful, reductionist way to mimic cellular scenarios. The writing could be improved and can absolutely be more economical in word choice and use; currently, there is a good deal of background to each section that is not always comprehensive or crucial to contextualise the findings, while key information is often omitted. The results are currently not described in a detailed manner so there is an imbalance between the findings, which should be the focus, relative to background and interpretations or models.

      We have assessed the manuscript and made many improvements to shift the focus to the findings, while providing only the necessary background for readers unfamiliar with the specifics of Btk and Grb2 signaling and structure.

    1. Author Response

      Reviewer #1 (Public Review):

      Ge et. al., examined sodium-glucose cotransporter-2 inhibitors (SGLT2i) in Alport syndrome (AS), and demonstrate that it was beneficial in AS through reduced lipotoxicity in podocytes as a key mechanism of action. The SGLT2i empagliflozin has been previously shown to have positive effects on hyperglycemia control, as well as on cardiovascular and renal outcomes of type II diabetes mellitus through tubuloglomerular feedback, but its effect on glomerular diseases such as AS are unknown to date. The authors have previously identified that cholesterol efflux in podocytes plays a critical pathogenic role in a diabetic kidney disease setting. The evidence that authors provide in favor of their hypothesis in a disease of non-metabolic origin such as AS, was supported as the SGLT2i was effective in reducing the deleterious effects of lipotoxicity in podocytes, ameliorated glomerular injury and proteinuria, and extending the life span of Col4a3 knockout mice. They further show that empagliflozin treatment mitigated AS podocytes from cell death through apoptosis, but did not impact the cell's cytotoxicity. These results support the notion that empagliflozin affects the regulation of important metabolic switch in mouse kidneys, perhaps through decreasing lipid accumulation in podocytes.

      However, the authors solely rely on one IHC staining image of a human biopsy to demonstrate SGLT2 expression in podocytes in vivo. Although the authors have done several experiments which greatly increase the confidence in their findings that empagliflozin is beneficial in AS and would have clinical significance, their data does not rule out the possibility that empagliflozin has beneficial effects through the other glomerular cells in AS, or limited to impacting lipids in podocytes in AS.

      We thank the reviewer for recognizing the significance of our findings and for pointing out some additional concerns with our study. In this revised version, we have added experiments that focus on investigating the specific effect of empagliflozin on AS podocytes. We added immunofluorescence staining of AS mouse kidney sections which supports the idea that SGLT2 is expressed in podocytes. We investigated the effect of SGLT2 knockdown in AS podocyte using siRNA and compared the anti-lipotoxic effects of siSGLT2 to SGLT2i.

      Reviewer #3 (Public Review):

      Using cultured human podocytes the expression of SGLT2 is established using immunostaining and western blotting. An analysis of podocyte RNA wasn't performed, but the expression in cultured podocytes was comparable to that seen in human cultured proximal tubular cells. This work then paved the way for treatment of immortalized cells obtained from an Alport syndrome mouse model (Col4A3-/-), representing an autosomal recessive form of Alport syndrome. Podocytes from Alport syndrome mice showed a lipid droplet accumulation which was reduced to some extent by SGLT2 inhibition. In a series of metabolic experiments, it was shown that SGLT2 inhibition reduced the formation of pyruvate as a metabolic substrate in Alport podocytes. In vivo experiments showed an improvement in survival of Col4a3-/- mice treated with SGLT2 inhibition. When compared to ace inhibitor, SGLT2 inhibition has a similar effect on renal function and no additive effect was seen with SGLT2 inhibitor plus ace inhibitor. Like the cell assays, the in vivo treatment seemed to prevent the podocyte lipid accumulation in Alport syndrome mice.

      This data in cells and animals generally supports the findings in SGLT2 inhibitor human studies, where Alport syndrome patients with proteinuria and progressive CKD seem to benefit. The work paves the way for a dedicated trial of SGLT2i in Alport patients and a reassessment of the human podocyte disease phenotype in this condition, before and after treatment. There are patients with mutations in SGLT2 with familial renal glycosuria - it would be interesting to test via urine derived podocytes whether a similar metabolic switch was occurring and its consequences to pave the way for long term treatment regimes.

      We thank the reviewer for recognizing the significance of our findings. We appreciate the reviewer’s concern that podocyte SGLT2 RNA levels should be studied. In this revised version, we added the results of SGLT2 mRNA expression analysis in immortalized podocytes and tubular cells. These results were added in Figure 1E. We agree with the insightful suggestions to study the metabolic switch in familial renal glucosuria in patients with SGLT2 mutations, as well as to evaluate Col4a5 AS model. We have included these insights in our discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors address the origin of the macrophage increase in sensory ganglia after peripheral nerve injury, showing that there is no major influx by blood-derived monocytes into ganglia after injury and that resident macrophages proliferate, which is dependent on CX3CR1 signaling.

      • Interesting and relevant question, mainly addressed with adequate experimental approaches.

      • Most conclusions are supported by the data, however, some important controls and experiments are missing.

      • The authors should demarcate their results from the study of Iwai et al, 2021 which addresses similar questions.

      Thank you for the positive comments, we hope that our point-by-point responses below and the important changes/inclusions in the MS satisfactorily addressed your concerns. We agree that some important controls were missing, and we have included additional data in the revised manuscript. Regarding the Iwai et al. paper, it is in line with our hypothesis. In fact, they suggest that in trigeminal ganglia (TG), resident macrophages proliferate after peripheral injury, although they detected few blood monocytes infiltrating the TG. Our paper, besides to confirm Iwai et al. results, by using different and complementary approaches are more specific compared to BM transfer in irradiated mice, we also advanced in terms of the mechanisms that these cells proliferate (CX3CR1 signalling) and the impact of these proliferation for neuropathic pain development. We discussed these points in the new version of the MS. Please see page 4 lines 88-93.

      Reviewer #2 (Public Review):

      The investigators looked at mφs in lumbar DRG after a spared nerve injury in which two of the three branches of the sciatic nerve are transected and the third left intact. This is a classical preparation for studying neuropathic pain. This paper demonstrates that the increase of mφs is an increase in the number of CX3CR1+ (resident) mφs and not CCR2+ (infiltrating mφs) by using CX3CR1 and CCR2 individual reporter mice. Using a CX3CR1 conditional knockout (KO) mouse, they found that this receptor must be present on the mφs for the increase in number to occur. Next, they did a parabiosis experiment with GFP+ mice and found that neither of these mφ subtypes infiltrated into the DRG. To examine proliferation, they injected animals with Ki67 and found this label, which is an indication of proliferation, was present in the CX3CR1+ mφs (but not the CCR2+ mφs). Finally, they identified the CX3CR1 mφs to be the cells that express TNFα and IL-1β but not IL-6.

      An experiment that would be useful would be to determine if there is an increase or a decrease in the availability to mφs of the ligand CXC3L1 after the spared nerve injury. The authors state from the work of others that membrane-bound CX3CL1 is constitutively expressed and that it is decreased after nerve injury. They hypothesize that this indicates a release of the chemokine, but such a decrease could also indicate a decrease in expression. A few sentences on what is known in other systems on the importance and mode of action of membrane-bound and non-membrane-bound CX3CL1 would be useful.

      Thanks to the reviewer for a great summary of our manuscript. We have now performed a time course of Cx3cl1 expression in the DRG after the spared nerve injury and it was included in figure 7A. We also apologise for the lack of information regarding the importance and mode of action of membrane-bound and non-membrane-bound CX3CL1, which is now included in the discussion section (Page 16).

      The main weakness of the manuscript is that many highly relevant previous findings, in some cases reporting nearly identical experiments sometimes with the same and sometimes with somewhat different results, are not mentioned. Kalinski et al. (which is cited but not in this context) reported a very similar parabiosis experiment. While they did not identify subtypes of mφs, they found an increase in infiltration of mφs, which was small (though statistically significant) compared to the larger increase that occurred in the distal nerve. In 2013 and 2018, Niemi et al. and Lindborg et al (J Neurosci and J

      Neuroinflammation respectively) reported that mφs in the DRG are somewhat decreased in a CCR2 KO mouse, suggesting again that there is some infiltration of mφs into the DRG after axotomy. They also showed that the mφ chemokine CCL2 increases in the DRG after sciatic nerve injury. With regard to proliferation, Yu et al. in 2020 (which again is cited but not in this context) also used a spared nerve paradigm stained DRGs for CX3CR1+ mφs and found an increase. They then stained DRG sections for Ki67 and demonstrated proliferation in this population. An earlier reference by Krishnan et al in 2018 published in J Neuropathol Exp Neurol is entitled "An Intimate Role for Adult Dorsal Root Ganglia Resident Cycling Cells in the Generation of Local Macrophages and Satellite Glial Cells". With regard to cytokine expression, in 1995, Murphy et al published a paper in J Neurosci demonstrating induction of interleukin-6 in axotomized sensory neurons.

      Thank you for the comment. These papers, you have indicated, are the main reason we have idealised our MS. The controversy regarding the possible infiltration of peripheral blood monocytes for the increase in the number of macrophages in the sensory ganglia after peripheral nerve injury. Furthermore, some of these papers you also indicated, came out during the execution of this manuscript, and they also brought controversies or did not explore some points. Therefore, we believe that our work by using different and complementary approaches strongly support the hypothesis that after peripheral nerve injury, peripheral blood monocytes did not infiltrate the DRGs significantly, but that the increase in the macrophages population is due to the proliferation of resident macrophages. Furthermore, we provided novel mechanistic evidence of the role of CX3CR1 signalling for the proliferation of these cells (figures 7 and S6). In addition, our new experiments suggested by the referees and editor suggest that CX3CR1-dependent proliferation of DRG macrophages is involved in the development of neuropathic pain (Figures 6D and 7E). We will make these points clear in the new version of the MS. Please see pages 11, 12, 14 and 17 (discussion and introduction section).

      Reviewer #3 (Public Review):

      This paper addresses the mechanism underlying a well-documented finding whereby the numbers of resident macrophages increase in dorsal root ganglia following peripheral nerve injury. It delineates the relative contribution of monocyte recruitment via circulation and local proliferation. The paper is clearly structured and written, and the data overall support the main conclusion that the increase in nerve-associated macrophages is primarily driven by proliferation, not monocyte recruitment. Its main weakness is that the question that is being asked is rather restricted, so the additional insight gained for the field will be incremental. It would be particularly interesting in the future to address whether the existence of a protective barrier indeed is the reason peripheral cells are not recruited to the nerve injury lesion and to assess e.g. whether forced breaching of this barrier results in monocyte influx and altered injury response.

      We appreciate your comments and suggestions. In the new version of the MS, we are presenting a series of novel experiments that confirm and support our initial hypothesis. Furthermore, novel experiments also explore the importance of the phenomenon we have explored in the context of neuropathic pain development. Regarding your suggestion about the next steps, we are working now in an attempt to understand why these cells are not able to infiltrate the DRGs after injury. Interestingly, one paper that came out during the revision of this work, showed that CD8+ T cells that are not able to infiltrate the DRGs after nerve injury in adult mice, start to infiltrate the DRGs of old mice (Zhou et al. 2022), indicating that ageing process may promote changes in this protective barrier. In addition, we have published a recent paper indicating that immune cells infiltrate the dorsal root leptomeninges after SNI (Maganin et al. 2022). We included these references and discussed these points in the new version of our MS. Please see page 15 lines 366 and 370.

      References:

      Zhou, L., G. Kong, I. Palmisano, M. T. Cencioni, M. Danzi, F. De Virgiliis, J. S. Chadwick, G. Crawford, Z. Yu, F. De Winter, V. Lemmon, J. Bixby, R. Puttagunta, J. Verhaagen, C. Pospori, C. Lo Celso, J. Strid, M. Botto, and S. Di Giovanni. 2022. "Reversible CD8 T cell-neuron cross-talk causes aging-dependent neuronal regenerative decline." Science 376 (6594): eabd5926. https://doi.org/10.1126/science.abd5926.

      Maganin, A. G., G. R. Souza, M. D. Fonseca, A. H. Lopes, R. M. Guimarães, A. Dagostin, N. T. Cecilio, A. S. Mendes, W. A. Gonçalves, C. E. Silva, F. I. Fernandes Gomes, L. M. Mauriz Marques, R. L. Silva, L. M. Arruda, D. A. Santana, H. Lemos, L. Huang, M. Davoli-Ferreira, D. Santana-Coelho, M. B. Sant'Anna, R. Kusuda, J. Talbot, G. Pacholczyk, G. A. Buqui, N. P. Lopes, J. C. Alves-Filho, R. M. Leão, J. C. O'Connor, F. Q. Cunha, A. Mellor, and T. M. Cunha. 2022. "Meningeal dendritic cells drive neuropathic pain through elevation of the kynurenine metabolic pathway in mice." J Clin Invest 132 (23). https://doi.org/10.1172/JCI153805.

  2. Mar 2023
    1. Author Response

      Reviewer #1 (Public Review):

      This study investigates how pathogens might shape animal societies by driving the evolution of different social movement rules. The authors find that higher disease costs induce shifts away from positive social movement (preference to move towards others) to negative social movement (avoidance from others). This then has repercussions on social structure and pathogen spread.

      Overall, the study comprises a good mixture of intuitive and less intuitive results. One major weakness of the work, however, is that the model is constructed around one pathogen that repeatedly enters a population across hundreds of generations. While the authors provide some justification for this, it does not capture any biological realism in terms of the evolution of the pathogen itself, which would be expected. The lack of co-evolution in the model substantially limits the generality of the results. For example, a number of recent studies have reported that animals might be expected to become very social when pathogens are very infectious, because if the pathogen is unavoidable they may as well gain the benefits of being social. The authors make some arguments about being focused on introduction events, but this does not really align well with their study design that carries through many generations after the introduction. Given the rapid evolutionary dynamics, perhaps the study could have a more focused period immediately after the initial introduction of the pathogen to look at rapid evolutionary responses (albeit this may need some sensitivity analyses around the parameters such as the mutation rates).

      We appreciate the reviewer’s evaluation of our work, and acknowledge that we have not currently included evolutionary dynamics for the pathogen.

      One conceptual impediment to such inclusion is knowing how pathogen traits could be modelled in a mechanistic way. For example, it is widely held that there is a trade-off between infection cost and transmissibility, with a quadratic relationship between them, but this is a pattern and not a process per se. We are unsure which mechanisms could be modelled that impinge upon both infection cost and transmissibility.

      On the practical side, we feel that a mechanistic, individual-based model that includes both pathogen and host evolution would become very challenging to interpret. It might be more tractable to begin with a mechanistic, spatial model that examines pathogen trait evolution with an unchanging host (such as an adaptation of Lion and Boots, 2010). We would be happy to take this on in future work, with a view to combining models thereafter.

      We have taken the suggestion to focus on the period immediately after the introduction, and we now focus on the following 500 generations. While 500 generations is still a long time, we would note that our model dynamics typically stabilise within 200 generations. We show the following generations primarily to check that some stability in the dynamics has indeed been reached (but see our new scenario 2).

      We also appreciate the point regarding mutation rates. Our mutation rates are relatively high to account for the small size of our population. We have found that with smaller mutation rates (0.001 rather than 0.01), evolutionary shifts in our population do not occur within the first 500 generations. This is primarily because prior to pathogen introduction, the ‘agent avoiding’ strategy that becomes common later is actually quite rare. Whether a rapid transition takes place thus depends on whether there are any agent avoiding individuals in the population at the moment of pathogen introduction, or on whether such individuals emerge rapidly thereafter through mutations on the social weights. We expect that with larger population sizes, we would be able to recover our results with smaller mutation rates as well.

      A final, and much more minor comment is whether this is really a paper about movement. The model does not really look at evolutionary changes in how animals move, but rather at where they move. How important is the actual movement process under this model? For example, would the results change if the model was constructed without explicit consideration of space and resources, but instead simply modelled individuals' decisions to form and break ties? (Similar to the recent paper by Ashby & Farine https://onlinelibrary.wiley.com/doi/full/10.1111/evo.14491 ). It might help to provide more information about how putting social decisions into a spatially explicit framework is expected to extend studies that have not done so (e.g.., because they are analytical).

      This paper is indeed about movement, as where to move is a key part of the movement ecology paradigm (Nathan et al. 2008). That said, we appreciate the advice to emphasise the importance of social decisions in a spatial context, we have added these to the Introduction (L. 79 – 81) and Discussion (L. 559 – 562). In brief, we do expect different dynamics that result from the explicit spatial context, as compared to a model in which social associations are probabilistic and could occur with any individual in the population.

      In our models, individual social tendency (whether they are prefer moving towards others) is separated from individual sociality (whether they actually associate with other individuals). This can be seen from our (new) Fig. 3D, in which individuals of each of the social strategies can sometimes have similar numbers of associations (although modulated by movement). This separation of the pattern from the underlying process is possible, we believe, due to the heterogeneity in the social landscape created by the explicit spatial context.

      Reviewer #2 (Public Review):

      This theoretical study looks at individuals' strategies to acquire information before and after the introduction of pathogens into the system. The manuscript is well-written and gives a good summary of the previous literature. I enjoyed reading it and the authors present several interesting findings about the development of social movement strategies. The authors successfully present a model to look at the costs and benefits of sociality.

      I have a couple of major comments about the work in its current form that I think are very important for the authors to address. That said, I think this is a promising start and that with some revisions, this could be a valuable contribution to the literature on behavioral ecology.

      We appreciate the reviewer’s kind words.

      Before starting, I would like to be precise that, given the scope of the models and the number of parameter choices that were necessary, I am going to avoid criticisms of the decisions made when designing the models. However, there are a few assumptions I rather find problematic and would like to give proper attention to.

      The first regards social vs. personal information. Most of the model argumentation is based on the reliance on social information (considering four, but to me overlapping, social strategies that are somehow static and heritable) but in fact, individuals may oscillate between relying on their personal information and/or on social information -- which may depend on the availability of resources, population density, stochastic factors, among others (Dall et al. 2005 Trends Ecol. Evol., Duboscq et al. 2016 Frontiers in Psychology). In my opinion, ignoring the influence of personal and social information decreases the significance of this work. I am aware that the authors consider the detection of food present in the model, but this is considered to a much smaller extent (as seen in their weight on individual decisions) than the social information cues.

      We appreciate the point that individuals can switch between relying on social and personal information. However, we would point out that in our model, the social strategies are not static. The social strategy is a convenient way of representing individuals’ position in behavioural trait-space (the ‘behavioural hypervolume’ of Bastille-Rousseau and Wittemeyer 2019). This essentially means that the importance assigned to each of the three cues available in our model varies among individuals. There are indeed individuals that are primarily guided by the density of food items, and this is the commonest ‘overall’ movement strategy before the pathogen is introduced. We represent this by showing how the importance of social information is low before pathogen introduction (Fig. 2B).

      While we primarily focus on the importance of social information, this is because the population quite understandably evolves a persistent preference for moving towards food items (i.e., using personal information if available). We have made this clearer in the text on lines 367 – 371.

      Critically, it is also unclear how, if at all, the information and pathogen traits are related to each other. If a handler gets sick, how does this affect its foraging activity (does it stop foraging, slow its activities, or does it show signs of sickness)? Perhaps this model is attempting to explore the emergence of social movement strategies only, but how they disentangle an individual's sickness status and behavioral response is unclear.

      We appreciate that infection may lead to physiological effects (e.g. altered metabolic rates, reduction in cognitive capacity) that may then influence behaviour. Our model aims to be relatively simple and general one, and does not consider the explicit mechanisms by which infection imposes a cost on fitness. Thus we do not include any behavioural modifications due to infection, as we feel that these would be much too complex to include in such a model. We would be happy to explore, in future work, phenomena such as the evolution of self-isolation and infection detection which is common among animals such as social insects (Stroeymeyt et al. 2018, Pusceddu et al. 2021).

      However, we have considered an alternative implementation of our model’s scenario 1 which could be interpreted as the infection reducing foraging efficiency by a certain percentage (other interpretations of the redirection of energy away from reproduction are also possible). We show how this implementation leads to very similar outcomes as those seen in our

      Very little is presented about the virulence of the pathogens and how they could affect the emergence of social strategies. The authors keep their main argumentation based on the introduction of novel pathogens (without distinctions on their pathogenicity), but a behavioral response is rather influenced by how fast individuals are infected and which are their chances of recovering. Besides, they consider that only one or two social interactions would be enough for pathogen transmission to occur.

      We have indeed considered a fixed transmission probability of 0.05, a relatively modest attack rate. Setting transmission probability to two other values (0.025, 0.1), we find that our general results are recovered - there is an evolutionary transition away from sociality, with the proportion of agent avoidance evolved increasing with the transmission probability. While we do not show these results in the main text, we have included figures showing the proportions of each social movement strategy here for the reviewers’ reference.

      Figures showing the proportion of social movement strategies in two simulation runs of our default implementation of scenario 1 (dE = 0.25, R = 2, pathogen introduction begins from G = 500). Top: Probability of transmission = 0.025 (half of the default). Bottom: Probability of transmission = 0.10 (double the default). Overall, the proportion of agent avoidance evolved (purple) increases with the probability of transmission. Each figure shows a single replicate of each parameter combination, for only 1,000 generations.

      Another important component is that individuals do not die, and it seems that they always have a chance (even if it is small) to reproduce. So, how the authors consider unsuccessful strategies in the model outputs or how these social strategies would be potentially "dismissed" by natural selection are not considered.

      We appreciate the point that our simulation does not include mortality effects, and that all individuals have some small chance of reproducing. There are a few practical and conceptual challenges when incorporating this level of realism in a general model. Including mortality effects could allow for the emergence of more complex density-dependent dynamics, as dead individuals would not be able to transmit the pathogen to other foragers (although for some pathogens, this could be a valid choice), nor would they be sources of social information. This would make the model much more challenging to interpret, and we have tried to keep this model as simple as possible.

      We have also sought to keep the model’s focus on the evolutionary dynamics, and to not focus on mortality. In order to balance this aim with the reviewer's suggestion, we have included a new implementation of the model’s scenario 1 which has a threshold on reproduction. That means that only individuals with a positive energy balance (intake > infection costs) are allowed to reproduce. We show a potentially counter-intuitive result, that the more social ‘handler tracking’ strategy persists at a higher frequency than in our default implementation, despite having a higher infection rate than the ‘agent avoiding’ strategy. We suggest that this is because the ‘agent avoiding’ individuals have very low or no intake. This is sufficient in our default implementation to have relatively higher fitness than the more frequently infected handler tracking individuals.

      Reviewer #3 (Public Review):

      Gupte and colleagues develop an individual-based model to examine how the introduction of a novel pathogen influences the evolution of social cue use in a population of agents for which social cues can both facilitate more efficient foraging, but also expose individuals to infection. In their simulations, individuals move across a landscape in search of food, and their movements are guided by a combination of cues related to food patches, individuals that are currently handling food items, and individuals that are not actively handling food. The latter two cues can provide indirect information about the likely presence of food due to the patchiness of food across the landscape.

      The authors find that prior to introducing the novel pathogen, selection favors strategies that home in on agents, regardless of whether those agents are currently handling food items. The overall contribution of these social cues to movement decisions, however, tends to be relatively small. After pathogen introduction, agents evolve to rely more heavily on social information and to either be more selective in their use of it (attending to other agents that are currently handling food and avoiding non-handlers) or avoiding other agents altogether. Gupte and colleagues further examine the ecological consequences of these shifts in social decision-making in terms of individuals' overall movement, food consumption, and infection risk. Relative to pre-introduction conditions, individuals move more, consume less food, and are less likely to be infected due to reduced contact with others. Epidemiological models on emergent social networks confirm that evolved behavioral changes generate networks that impede the spread of disease.

      The introduction of novel pathogens into wild populations is expected to be increasingly common due to climate change and increasing global connectedness. The approach taken here by the authors is a potentially worthwhile avenue to explore the potential eco-evolutionary consequences of such introductions. A major strength of this study is how it couples ecological and evolutionary timescales. Dominant behavioral strategies evolve over time in response to changing environmental conditions and impact social, foraging, and epidemiological dynamics within generations. I imagine there are many further questions that could be fruitfully explored using the authors' framework. There are, however, important caveats that impact the interpretation of the authors' findings.

      First, reproduction bears no cost in this model. Individuals produce offspring in proportion to their lifetime net energy intake, which is increased by consuming food and decreased by a set amount per turn once infected. However, prior to reproduction, net energy intake is normalized (0-1) according to the lowest individual value within the generation. This means that individuals need not maintain a positive energy balance nor even consume food at all to successfully reproduce, so long as they perform reasonably well relative to other members of the population. Since consuming food is not necessary to reproduce, declining per capita intake due to evolved social avoidance (Fig. 1d) likely decreases the importance of food to an individual's reproductive success relative to simply avoiding infection. This dynamic could explain the delayed emergence of the 'agent avoiding' strategy (Fig. 1a), as this strategy potentially is only viable once per capita intake reaches a sufficiently low level across the population (Fig. 1d). I am curious to know what the results would be if reproduction required some minimal positive net energy, such that individuals must risk food patches in order to reproduce. It would also be useful for the authors to provide information on how net energy intake changes across generations, as well as whether (and if so, how) attraction to the food itself may change over time.

      We thank the reviewer for their assessment of our work, and appreciate the point raised here (and in an earlier review) about individuals potentially reproducing without any intake. We have addressed this by running our default model [repeated introductions, R = 2, dE = 0.25], with a threshold on reproduction such that only individuals with a positive energy balance can reproduce. We mention these results in the text (L. 495 – 500), and show related figures in the SI Appendix. In brief, as the reviewer suggests, agent avoiding is less common for our default parameter combination, but becomes as common as the default combination when the infection cost is doubled (to dE = 0.5).

      We appreciate the reviewer’s suggestion about decreasing per-capita intake being a precondition for the proliferation of the agent avoiding strategy. With our new results, we now show that there is no overall decrease in intake, but the agent avoiding strategy still becomes a common strategy after pathogen introduction. As the reviewer suggests, this is because these individuals have an equivalent net energy as handler tracking individuals, as they are less frequently infected.

      We suggest that the delayed emergence of the agent avoiding strategy is primarily due to mutation limitations – such individuals are uncommon or non-existent in the simulation before pathogen introduction, and random mutations are required for them to emerge. As we have noted in response to an earlier comment, this becomes clear when the mutation rate is reduced from 0.01 to 0.001 – agent avoidance usually does not evolve at all.

      A second important caveat is that the evolutionary responses observed in the model only appear when novel pathogen introductions are extremely frequent. The model assumes no pathogen co-evolution, but rather that the same (or a functionally identical) pathogen is re-introduced every generation (spillover rate = 1.0). When the authors considered whether evolutionary responses were robust to less frequent introductions, however, they found that even with a per-generation spillover rate of 0.5, there was no impact on social movement strategies. The authors do discuss this caveat, but it is worth highlighting here as it bears on how general the study's conclusions may be.

      We appreciate the reviewer’s point entirely. We would point out that current knowledge about pathogen introductions across species and populations in the wild is very poor. However, the ongoing highly pathogenic avian influenza outbreak (Wille and Barr 2022), the spread of multiple strains of SARS-CoV-2 to wild deer in several different human-to-wildlife transmission events, and recent work on the potential for coronavirus spillovers from bats to humans, all suggest that at least some generalist pathogens must circulate quite widely among wildlife, often crossing into novel host species or populations. We have added these considerations to the text on lines 218 – 231.

      We have also added, in order to confront this point more squarely, a new scenario of our model in which the pathogen is introduced just once, and then transmits vertically and horizontally among individuals (lines 519 – 557). This scenario more clearly suggests when evolutionary responses to pathogen introductions are likely to occur, and what their consequences might be for a pathogen becoming endemic in a population. This scenario also serves as a potential starting point for models of host-pathogen trait co-evolution, and we have added this consideration to the text on lines 613 – 623.

      References

      ● Albery, G. F. et al. 2021. Multiple spatial behaviours govern social network positions in a wild ungulate. - Ecology Letters 24: 676–686.

      ● Bastille-Rousseau, G. and Wittemyer, G. 2019. Leveraging multidimensional heterogeneity in resource selection to define movement tactics of animals. - Ecology Letters 22: 1417–1427.

      ● Gupte, P. R. et al. 2021. The joint evolution of animal movement and competition strategies. - bioRxiv in press.

      ● Lion, S. and Boots, M. 2010. Are parasites ‘“prudent”’ in space? - Ecology Letters 13: 1245–1255.

      ● Lloyd-Smith, J. O. et al. 2005. Superspreading and the effect of individual variation on disease emergence. - Nature 438: 355–359.

      ● Nathan, R. et al. 2008. A movement ecology paradigm for unifying organismal movement research. - PNAS 105: 19052–19059.

      ● Pusceddu, M. et al. 2021. Honey bees increase social distancing when facing the ectoparasite varroa destructor. - Science Advances 7: eabj1398.

      ● Sánchez, C. A. et al. 2022. A strategy to assess spillover risk of bat SARS-related coronaviruses in Southeast Asia. - Nat Commun 13: 4380.

      ● Stroeymeyt, N. et al. 2018. Social network plasticity decreases disease transmission in a eusocial insect. - Science 362: 941–945.

      ● Wilber, M. Q. et al. 2022. A model for leveraging animal movement to understand spatio-temporal disease dynamics. - Ecology Letters in press.

      ● Wille, M. and Barr, I. G. 2022. Resurgence of avian influenza virus. - Science 376: 459–460.

    1. Author Response

      Reviewer #1 (Public Review):

      This study focuses on the role of polo like kinase 1 (PLK-1) during oocyte meiosis. In mammalian oocytes, Plk1 localizes to chromosomes and spindle poles, and there is evidence that it is required for nuclear envelope breakdown, spindle formation, chromosome segregation, and polar body extrusion. However, how Plk1 is targeted to its various locations and how it performs these functions is not well understood. This study uses C. elegans oocytes as a model to explore PLK-1 function during meiosis. They take advantage of an analogue-sensitive allele of plk-1, which enabled them to bypass nuclear envelope breakdown defects that occur following PLK-1 RNAi. This allowed them to dissect later roles of PLK-1 in oocytes, demonstrating that depletion causes defects in spindle organization, chromosome congression, segregation, and polar body extrusion. Moreover, the authors defined mechanisms by which PLK-1 is targeted to chromosomes, showing that CENP-C (HCP-4) is required for localization to chromosome arms and that BUB-1 is required for targeting to the midbivalent region. Finally, they demonstrate that upon removal of PLK-1 from both domains, there are severe meiotic defects. These findings are interesting. However, there is a need for additional analysis to better support some of their conclusions, and to aid in interpretation of particular phenotypes. Specific comments are below.

      • For many important claims of the paper, a single representative image is shown but the n is not noted. This is an issue throughout the paper for much of the localization analysis (e.g. Figure 1B, 1C, 1D, 2A, 2B, 3A, 3B, 3C, etc.); in cases like this, numbers should be included to increase the rigor of the presented data. How many images or movies were analyzed that looked like the one shown? For linescans, were they done only on one image? How many independent experiments were done, etc?

      We had initially chosen a representative image. Localisation was the same in all images that allowed ‘proper’ assessment of PLK-1 localisation. In our case, this means that we can only analyse bivalents that are perpendicular to the light path to distinguish between bivalent, chromosome arms, and kinetochore. We now report the number of oocytes (N) and bivalents (n) analysed for each condition. The line scans were done in one representative image.

      • In the abstract, it is stated that PLK-1 plays a role in spindle assembly/stability (this is also stated elsewhere, e.g. line 101). This phrasing implies that the authors have demonstrated roles in both spindle assembly and stability. However, to distinguish between these roles, they would have to show that removal of PLK-1 before spindle assembly causes defects, and also that removal of PLK-1 from pre-formed spindles causes collapse. I don't think it is necessary to do this, as the spindle roles of PLK-1 are not a focus of the paper. However, the language should be altered so that it does not imply that the paper has demonstrated roles in both. A good place to do this would be in the section from lines 144-147, where they first discuss the spindle defects. It would be straightforward to explain that their approach does not distinguish between spindle assembly and stability, and that PLK-1 could have a role in either or both.

      We fully agree with this comment. We cannot distinguish between spindle assembly and stability, and it is also not the focus of our current work. We have changed the text accordingly.

      • It is stated that there is kinetochore localization of PLK-1 (and I do see some dim cup-like localization in images after PLK-1 is removed from the chromosome arms via HCP-4 RNAi). However, this cup-like localization is not clear in most wild-type images (e.g. Figure 1B, 1D, 2A, 3A, etc.). Although I recognize that the chromatin staining might be obscuring kinetochore localization, if PLK-1 was truly a kinetochore protein I would also expect it to localize to filaments within the spindle (as many other kinetochore proteins do), especially since the authors state that BUB-1 targets PLK-1 to the kinetochore (and BUB-1 is in the filaments). In fact, the only images where it looks like PLK-1 may be localized to filaments are in Figure 4C and 6A, when HCP-4 has been depleted (though I don't know if this generally true across all HCP-4 RNAi images). For me, this calls into question the conclusion that PLK-1 truly is on the kinetochore in wild type conditions - could it be that PLK-1 only localizes to the kinetochore (and to the filaments) when HCP-4 is depleted? The authors need to resolve this issue and provide better evidence that PLK-1 normally localizes to the kinetochore, if they want to make this claim. Additionally, the observation that PLK-1 is not on the kinetochore filaments (in wild type conditions) should be addressed in the text somewhere - do the authors think that this is a special type of kinetochore protein that does not localize to the filaments?

      While our initial claim of PLK-1 kinetochore localisation was based on its cup-like localisation, we have now performed additional analysis and experiments to confirm this claim. First, we corroborated that PLK-1 cup-like pattern co-localises with the Mis12 complex component KNL-3 (New Figure 5-figure supplement 1). Second, we show that PLK-1 is present in the so called ‘linear elements’ (filaments) both within the spindle and in the cortex. Since PLK-1 presence in these filaments is seen in wild type as well as hcp-4 mutant oocytes, we conclude that PLK-1 likely localises in kinetochore in normal conditions.

      • The authors should provide a control experiment, treating wild-type worms with 10uM 3-IB-PP1. This would be important to ensure that the spindle defects seen at this concentration in the plk-1as strain are not non-specific effects of the inhibitor. There is a control in Figure 1 - figure supplement 3 using 1uM 3-IB-PP1 but didn't see a control for 10uM (the concentration at which spindle defects are observed).

      This control has now been included in Figure 1-figure supplement 3.

      • In Figure 2F, the gels for BUB-1+PLK-1 look different in the presence and absence of phosphorylation by Cdk1 - for these data, I agree with the authors that it looks as if the complex elutes at a higher volume if BUB-1 is not phosphorylated (lines 200-204). However, Figure 2G has a repeat of the condition with phosphorylated BUB-1, and in this panel, the complex appears to elute at a higher volume than it did on the gel in panel F. The gel in panel G looks much more similar to the unphosphorylated condition in panel F. The authors need to explain this discrepancy (i.e., Is there a reason why the gels cannot be compared between panels? How reproducible are these data?). Ideally, the authors would include a repeat of the unphosphorylated BUB-1 + PLK-1 condition in panel G, done at the same time as the conditions shown in that panel, to avoid the impression that their results may not be reproducible.

      The specific elution volume cannot be compared in different experiments as the column has proven to “drift” over time – with proteins eluting at a later volume than they did previously despite extensive washing. What is reproducible under the experimental conditions is that the unphosphorylated wild type proteins, or the phosphorylated T527A/T163A mutant proteins A) elute at a later volume than the phosphorylated wild type proteins and B) bind to a lower proportion of the MBP-PLK1PBD (as you can see in the relative absorbance profiles and Coomassie gels).

      • The authors would need to provide convincing evidence that co-depletion of BUB-1 and HCP-4 delocalizes PLK-1 from the chromosomes entirely, and that this co-depletion condition is more severe than either single depletion alone.

      We now provide a quantitation on the total PLK-1 levels to go along the images (New Figure 8-figure supplement 1).

      Additionally, the bub-1T527A and hcp-4T163A alleles are nice tools to, in theory, more specifically delocalize PLK-1 from the midbivalent and chromosome arms, respectively, to explore the functions of chromosome-associated PLK-1. However, I think the authors cannot rule out the possibility that other proteins are also being depleted from the midbivalent and/or chromosome arms in their conditions, and that this delocalization may contribute to the phenotypes observed. For example, hcp-4 depletion was recently shown to delocalize KLP-19 from the chromosome arms (Horton et.al. 2022), so in the experiment shown in Figure 6E (HCP-4 RNAi in the bub-1 mutant), PLK-1 was likely not the only protein missing from the chromosome arms. Therefore, understanding if other proteins are absent from these domains (in the bub-1T527A and hcp-4T16A3 mutants) would help the reader understand and interpret the presented phenotypes (and how specific they are to PLK-1 loss). Consequently, I think that to better understand the co-depletion analysis presented in Figure 6 (and Figure 6 supplement 1), the authors should analyze other midbivalent and chromosome arm proteins, to determine if any are also delocalized (e.g. SUMO, KLP-19, MCAK, etc.).

      As stated above, this paper focuses on identifying the specific meiotic events PLK-1 plays a role in and characterising its targeting mechanism. We are following on this work to understand what proteins are regulated by PLK-1 in different chromosome domains and how this relates to the observed phenotypes.

      For the current, we should emphasise that mutating a single Thr residue within an STP motif in a largely disordered region is far more specific than depleting HCP-4 or BUB-1, making it likely that the observed effects are mediated through PLK-1 targeting. It should be noted that the finding presented in Horton et.al. 2022 is in contradiction with another study in which hcp-4 depletion did not impact KLP-19 localisation (Hattersley et al 2022).

      Additionally, instead of performing a combination of mutant and RNAi analysis (i.e. HCP-4 RNAi in the bub-1 mutant (Figure 6) and BUB-1 RNAi in the hcp-4 mutant (Figure 6 figure supplement 1)), it would be more powerful to generate a double mutant - this has a higher chance of being a more specific depletion condition.

      We have performed these experiments, which are now presented in Figure 9.

    1. Author Response

      Reviewer #1 (Public Review):

      This work by Shen et al. demonstrates a single molecule imaging method that can track the motions of individual protein molecules in dilute and condensed phases of protein solutions in vitro. The authors applied the method to determine the precise locations of individual molecules in 2D condensates, which show heterogeneity inside condensates. Using the time-series data, they could obtain the displacement distributions in both phases, and by assuming a two-state model of trapped and mobile states for the condensed phase, they could extract diffusion behaviors of both states. This approach was then applied to 3D condensate systems, and it was shown that the estimates from the model (i.e., mobile fraction and diffusion coefficients) are useful to quantitatively compare the motions inside condensates. The data can also be used to reconstruct the FRAP curves, which experimentally quantify the mobility of the protein solution.

      This work introduces an experimental method to track single molecules in a protein solution and analyzes the data based on a simple model. The simplicity of the model helps a clear understanding of the situation in a test tube, and I think that the model is quite useful in analyzing the condensate behaviors and it will benefit the field greatly. However, the manuscript in its current form fails to situate the work in the right context; many previous works are omitted in this manuscript, exaggerating the novelty of the work. Also, the two- state model is simple and useful, but I am concerned about the limits of the model. They extract the parameters from the experimental data by assuming the model. It is also likely that the molecules have a continuum between fully trapped and fully mobile states, and that this continuum model can also explain the experimental data well.

      We thank the reviewer for the warm overview of our work and the insightful comments on the areas that need to be improved. We are very encouraged by the reviewer’s general positive assessment of our approach. We have addressed these comments in the revised manuscript

      Reviewer #2 (Public Review):

      In this paper, Shen and co-workers report the results of experiments using single particle tracking and FRAP combined with modeling and simulation to study the diffusion of molecules in the dense and dilute phases of various kinds of condensates, including those with strong specific interactions as well as weak specific interactions (IDR-driven). Their central finding is that molecules in the dense phase of condensates with strong specific interactions tend to switch between a confined state with low diffusivity and a mobile state with a diffusivity that is comparable to that of molecules in the dilute phase. In doing so, the study provides experimental evidence for the effect of molecular percolation in biomolecular condensates.

      Overall, the experiments are remarkably sophisticated and carefully performed, and the work will certainly be a valuable contribution to the literature. The authors' inquiry into single particle diffusivity is useful for understanding the dynamics and exchange of molecules and how they change when the specific interaction is weak or strong. However, there are several concerns regarding the analysis and interpretation of the results that need to be addressed, and some control experiments that are needed for appropriate interpretation of the results, as detailed further below.

      We thank the reviewer for the warm support of our work (assessing that our work is “remarkably sophisticated and carefully performed” and “will certainly be a valuable contribution”) and for the constructive comments/critiques, which we have now addressed in the revised manuscript (please refer to our detailed responses below).

      (1) The central finding that the molecules tend to experience transiently confined states in the condensed phase is remarkable and important. This finding is reminiscent of transient "caging"/"trapping" dynamics observed in diverse other crowded and confined systems. Given this, it is very surprising to see the authors interpret the single-molecule motion as being 'normal' diffusion (within the context of a two-state diffusion model), instead of analyzing their data within the context of continuous time random walks or anomalous diffusion, which is generally known to arise from transient trapping in crowded/confined systems. It is not clear that interpreting the results within the context of simple diffusion is appropriate, given their general finding of the two confined and mobile states. Such a process of transient trapping/confinement is known to lead to transient subdiffusion at short times and then diffusive behavior at sufficiently long times. There is a hint of this in the inset of Fig 3, but these data need to be shown on log-log axes to be clearly interpreted. I encourage the authors to think more carefully and critically about the nature of the diffusive model to be used to interpret their results.

      We thank the reviewer for the insightful comments and suggestions, which have been very helpful for us to think deeper about the experimental data and the possible underlying mechanism of our findings. Indeed, the phase separated systems studied here resemble previously studied crowed and confined systems with transient caging/trapping dynamics in the literature ((Akimoto et al., 2011; Bhattacharjee and Datta, 2019; Wong et al., 2004) for examples)(references have been added in the revised manuscript). In our PSD system in Figure 3, The caging/trapping of NR2B in the condensed phase is likely due to its binding to the percolated PSD network. Thus, NR2B molecules in the condensed phase should undergo subdiffusive motions. Indeed, from our single molecule tracking data, the motion of NR2B fits well with the continuous time random walk (CTRW) model, as surmised by this reviewer. We have now fitted the MSD curve of all tracks of NR2B in the condensed phase with an anomalous diffusion model: MSD(t)=4Dtα (see Response Figure 1 below). The fitted α is 0.74±0.03, indicating that NR2B molecules in the condensed phase indeed undergo sub- diffusive motions. The fitted diffusion coefficient D is 0.014±0.001 μm2/s. We have now replaced the Brownian motion fitting in Figure 3E in the original manuscript with this sub- diffusive model fitting in the revised manuscript to highlight the complexity of NR2B diffusion in PSD condensed phase we observed.

      Response Figure 1: Fitted the MSD curve (mean value as red dot with standard error as error bar) in condensed phase with an anomalous diffusion model (blue curve, MSD=4Dtα). The fitting gives D=0.014±0.001 μm2/s and α=0.74±0.03.

      We find it useful to interpret the apparent diffusion coefficient (D=0.014±0.001 μm2/s) derived from this particular anomalous diffusion model as containing information of NR2B motions in a broadly construed mobile state (i.e., corresponding to the network unbound form) as well as in a broadly construed confined state (i.e., corresponding to NR2B molecules bound to percolated PSD networks). The global fitting using the sub-diffusive model does not pin down motion properties of NR2B in these different motion states. This is why we used, at least as a first approximation, the two-state motion switch model (HMM model) to analyse our data (please refer also to our detailed response to the comment #7 from reviewer 1 and corresponding additional analyses made during the revision as highlighted in Response Figure 4).

      As described in our response to the comment points #4 and #7 from reviewer 1, the two- state model is most likely a simplification of NR2B motions in the condensed phase. Both the mobile state and the confined state in our simplified interpretative framework likely represent ensemble averages of their respective motion states. However, the tracking data available currently do not allow us to further distinguish the substates, but further analysis using more refined model in the future may provide more physical insight, as we now emphasize in the revised “Discussion” section: “With this in mind, the two motion states in our simple two-state model for condensed-phase dynamics should be understood to be consisting of multiple sub-states. For instance, one might envision that the percolated molecular network in the condensed phase is not uniform (e.g., existence of locally denser or looser local networks) and dynamic (i.e., local network breaking and forming). Therefore, individual proteins binding to different sub-regions of the network will have different motion properties/states. … In light of this basic understanding, the “confined state” and “mobile state” as well as the derived diffusion coefficients in this work should be understood as reflections of ensemble-averaged properties arising from such an underlying continuum of mobilities. Further development of experimental techniques in conjunction with more refined models of anomalous diffusion (Joo et al., 2020; Kuhn et al., 2021; Muñoz-Gil et al., 2021) will be necessary to characterize these more subtle dynamic properties and to ascertain their physical origins” (p.23 of the revised manuscript).

      A practical reason for using the two-state motion switch HMM model to analyse our tracking data in the condensed phase is that the lifetime of the putative mobile state (when the per-frame molecular displacements are relatively large) is very short and such relatively faster short trajectories are interspersed by long confined states (see Response Figure 4C for an example). Statistically, ascertaining a particular anomalous diffusion model by fitting to such short tracks is likely not reliable. Therefore, here we opted for a semi-quantitative interpretative framework by using fitted diffusion coefficients in a two-state HMM as well as the new correlation-based approach for demarcating a low-mobility state and a high- mobility state (see our detailed response to reviewer 1’s point #7) in the present manuscript (which is quite an extensive study already) while leaving refinements of our computational modelling to future effort.

      Even in the context of the 'normal' two-state diffusion model they present, if they wish to stick with that-although it seems inappropriate to do so-can the authors provide some physical intuition for what exactly sets the diffusivities they extract from their data. (0.17 and 0.013 microns squared per second for the mobile and confined states). Can these be understood using e.g., the Stoke-Einstein or Ogston models somehow?

      As stated above, we are in general agreement with this reviewer that the motion of NR2B in the condensed phase is more complex than the simple two-state picture we adopted as a semi-quantitative interpretation that is adequate for our present purposes. Within the multi-pronged analysis we have performed thus far, NR2B molecules clearly undergo anomalous diffusions in solution containing dense, percolated, and NR2B-binding molecular networks. As a first approximation, our simple two-state HMM analysis yielded two simple diffusion coefficients (0.17 μm2/s for the mobile state and 0.013 μm2/s for the confined state). For the diffusion coefficient in the mobile state, we regard it as providing a time scale for relatively faster diffusive motions (which may be further classified into various motion substates in the future) that are not bound or only weakly associated with the percolated network of strong interactions in the PSD condensed phase. For the confined or low-mobility state in our present formulation, these molecules are likely bound relatively tightly to the percolated networks, thus the diffusion coefficient should be much smaller than the unbounded form (i.e., the mobile state) according to the Stoke-Einstein model. However, due to the detection limitation of the supper resolution imaging method (resolution of ~20 nm), we could not definitively tell the actual diffusivity beyond the resolution limit. So the diffusion coefficient in the confined state can also be interpreted as a Gaussian distributed microscope detection error (𝑓(𝑥) =1 , which is x~N(0, σ2), where σ is the standard deviation of the Gaussian distribution viewed as the resolution of localization-based microscopy, x is the detection error between recorded localization and molecule’s actual position). The track length in the confined state is the distance between localizations in consecutive frames, which can be calculated by subtraction of two independent Gaussian distributions, and the distribution of this track length (r) will be r~N(0, 2σ2). To link the detection error with the fitted diffusion coefficient, we calculated the log likelihood function of Gaussian distributed localization error (, where σ is the standard deviation of the Gaussian distribution) for the maximum likelihood estimation process to fit the HMM model. The random walk shares a similar log likelihood term () in performing maximum likelihood estimation.

      These two log likelihood functions will produce same fitting results with 2σ2 equivalent to 4Dt according to the likelihood function. In this way, the diffusion coefficient yielded by our HMM analyses for the confined state (0.0127 μm2/s) can be interpreted as the standard deviation of localization detection error (or microscope resolution limit), which is 𝜎 =√2𝐷𝑡 = 19.5 𝑛𝑚. We have included this consideration as an alternate interpretation of the confined-state or low-mobility motions with the results now provided in the “Materials and Methods” section in the sentence, viz., “… the L-component distribution may be reasonably fitted (albeit with some deviations, see below) to a simple-diffusion functional form with a parameter s =13.6 ± 3.7 nm, where s may be interpreted as a microscope detection error due to imaging limits or alternately expressed as s = DLt with DL = 0.006149 μm2/s being the fitted confined-state diffusion coefficient and t = 0.03s is the time interval of the time step between experimental frames. (The HMM-estimated confined-state Dc = 0.0127 μm2/s corresponds to s = 19.5 nm)” (p.32 of the revised manuscript).

      (2) Equation 1 (and hence equation 2) is concerning. Consider a limit when P_m=1, that is, in the condensed phase, there are no confined particles, then the model becomes a diffusion equation with spatially dependent diffusivity, \partial c /\partial t = \nabla * (D(x) \nabla c). The molecules' diffusivity D(x) is D_d in the dilute phase and D_m in the condensed phase. No matter what values D_d and D_m are, at equilibrium the concentration should always be uniform everywhere. According to Equation 1, the concentration ratio will be D_d/D_m, so if D_d/D_m \neq 1, a concentration gradient is generated spontaneously, which violates the second law of thermodynamics. Can the authors please justify the use of this equation?

      Indeed, the derivation of Equation 1 appears to be concerning. The flux J is proportional to D * dc/dx (not kDc as in the manuscript). At equilibrium dc/dx = 0 on both sides and c is constant everywhere. Can the authors please comment?

      So then another question is, why does the Monte Carlo simulation result agree with Equation 1? I suspect this has to do with the behavior of particles crossing the boundary. Consider another limit where D_m = 0, that is, particles freeze in the condensed phase. If once a particle enters the condensed phase, it cannot escape, then eventually all particles will end up in the condensed phase and EF=infty. The authors likely used this scheme. But as mentioned above this appears to violate the second law.

      Thanks for the incisive comment. After much in-depth considerations, we are in agreement with the reviewer that Eq.1 should not be presented as a relation that is generally applicable to diffusive motions of molecules in all phase-separated systems. There are cases in which this relation can need to unphysical outcomes as correctly pointed out by the reviewer.

      Nonetheless, based on our theoretical/computational modeling, it is also clear, empirically, that Eq.1 holds approximately for the NR2B/PSD system we studied, and as such it is a useful approximate relation in our analysis. We have therefore provided a plausible physical perspective for Eq.1’s applicability as an approximate relation based upon a schematic consideration of diffusion on an underlying rugged (free) energy landscape (Zhang and Chan, 2012) of a phase-separated system (See Figure 3G in the revised manuscript), while leaving further studies of such energy landscape models to future investigations.

      This additional perspective is now included in the following added passage under a new subheading in the revised manuscript:

      "Physical picture and a two-state, two-phase diffusion model for equilibrium and dynamic properties of PSD condensates"

      (3) Despite the above two major concerns described in (1) and (2), the enrichment due to the presence of a "confined state", is reasonable. The equilibrium between "confined" and "mobile" states is determined by its interaction with the other proteins and their ratio at equilibrium corresponds to the equilibrium constant. Therefore EF=1/Pm is reasonable and comes solely from thermodynamics. In fact, the equilibrium partition between the dilute and dense phases should solely be a thermodynamic property, and therefore one may expect that it should not have anything to do with diffusivity. Can the authors please comment on this alternative interpretation?

      Thanks for this thought-provoking comment. We agree with the reviewer that the relative molecular densities in the condensed versus dilute phases are governed by thermodynamics unless there is energy input into the system. However, in our formulation, the mobile ratio should not be the only parameters for determining the enrichment fold in a phase separated system. In fact, the approximate relation (Eq.1) is EF ≈ Dd/PmDm, and thus EF ≈ 1/Pm only when Dd ≈ Dm . But the speed of mobile-state diffusion in the condensed phase is found to be appreciably smaller than that of diffusion in the dilute phase (Dd > Dm). In general, a hallmark of a phase separation system is to enrich involved molecules in the condensed phase, regardless whether the molecule is a driver (or scaffold) or a client of the system. Such enrichment is expected to be resulted from the net free energy gain due to increased molecular interactions of the condensed phase (as envisioned in Response Figure 9). For example, in the phase separation systems containing PrLD-SAMME (Figure 4 of the manuscript), Pm is close to 1, but the enrichment of PrLD-SAMME in the condensed phase is much greater than 1 (estimated to be ~77, based on the fluorescence intensity of the protein in the dilute and condensed phase; Figure 5—figure supplement 1). As far as Eq.1 is concerned, this is mathematically correct because the diffusion coefficient of PrLD-SAMME in the condensed phase (D ~0.2 μm2/s) is much smaller than the diffusion coefficient of a monomeric molecule with a similar molecular mass in dilute solution (D~ 100 μm2/s, measured by FRAP-based assay; the mobility of the molecules in the dilute solution in 3D is too fast to be tracked). Physically, it’s most likely that the slower molecular motion in the condensed phase is caused by favorable intermolecular interactions and the same favorable interactions underpinning the dynamic effects lead also to a larger equilibrium Boltzmann population.

    1. Author Response

      Reviewer #1 (Public Review):

      Sorkac et al. devised a genetically encoded retrograde synaptic tracing method they call retro-Tango based on their previously developed anterograde synaptic tracing method trans-Tango. The development of genetically encoded trans-synaptic tracers has long been a difficult stumbling block in the field, and the development of trans-Tango a few years back was a breakthrough that was immediately, widely, and successfully applied. The recent development of the retrograde tracer method BActrace was also exciting for the field, but requires lexA driver lines and required by its design the test of candidate presynaptic neurons instead of an unbiased test for connectivity.

      Retro-Tango now provides an unbiased retrograde tracer. They cleverly used the same reporter system as for trans-Tango by reversing the signaling modules to be placed in pre-synaptic neurons instead of post-synaptic neurons. Therefore, synaptic tracing leads to the labeling of pre-synaptic neurons under the regulation of the QUAS system. Using visual, olfactory as well sexually dimorphic circuits authors went about providing examples of specificity, efficiency, and usefulness of the retro-Tango method. The authors successfully demonstrated that many of the known pre-synaptic neurons can be successfully and specifically labelled using the retro-Tango method.

      Most importantly, because it is based on the most used, very well tested and widely adopted trans-Tango method, retro-Tango promises to not just be a clever development, but a really widely and well-used technique as well. This is an outstanding contribution.

      We would like to thank Dr. Hiesinger for his very kind words and for the overall appreciation of the contribution of the development of retro-Tango to the field. We are also grateful for the suggestions below aimed at improving the clarity of our manuscript. We individually address the points raised by Dr. Hiesinger below.

      Reviewer #2 (Public Review):

      Tools that enable labeling and genetic manipulations of synaptic partners are important to reveal the structure and function of neural circuits. In a previous study, Barnea and colleagues developed an anterograde tracing method in Drosophila, trans-TANGO, which targets a synthetic ligand to presynaptic terminals to activate a postsynaptic receptor and trigger nuclear translocation of a transcription factor. This allows the labeling and genetic manipulation of cells postsynaptic to the ligand-expressing starter cells. Here, the same group modified trans-TANGO by targeting the ligand to the dendrites of starter cells to genetically access pre-synaptic partners of the starter cells; they call this method retro-TANGO. The authors applied retro-TANGO to various neural circuits, including those involved in escape response, navigation, and sensory circuits for sex peptides and odorants. They also compared their retro-TANGO data with synaptic connectivity derived from connectivity obtained from serial electron microscopy (EM) reconstruction and concluded that retro-TANGO can allow trans-synaptic labeling of presynaptic neurons that make ~ 17 synapses or more with the starter cells.

      Overall, this study has generated and characterized a valuable retrograde transsynaptic tracing tool in Drosophila. It's simpler to use than the recently described BAcTrace (Cachero et al., 2020) and can also be adapted to other species. However, the manuscript can be substantially strengthened by providing more quantitative data and more evidence supporting retrograde specificity.

      We thank Dr. Luo for his kind words and his assessment of the value of retro-Tango as a new tool in the transsynaptic labeling toolkit in Drosophila. We followed the suggestions of Dr. Luo for providing more quantitative data and addressing the specificity and directionality of retro-Tango. We strongly believe that the implementation of his suggestions did enhance the quality of our manuscript.

      Reviewer #3 (Public Review):

      This is a valuable addition to the currently available arsenal of methods to study the Drosophila brain.

      There are many positives to the present manuscript as it is:

      (i) The introduction makes a clear and fair comparison with other available tracing methods.

      (ii) The authors do a systematic analysis of the factors that influence the labeling by retro-tango (age, temperature, male versus female, etc...)

      (iii) The authors acknowledge that there are some limitations to retro-TANGo. For example, the fact that retro-T does not label all the expected neurons as indicated by the EM connectome. This is fine because no technique is perfect, and it is very laudable that the authors did a serious study of what one should expect from retro-tango (for example, a threshold determined by the number of synapses between the connected neurons).

      We would like to thank the reviewer for the kind words and the positive assessment of our manuscript. In addition, we would like to acknowledge the reviewer for the recommendations below, which we followed and we think made our manuscript stronger.

    1. Author Response

      Reviewer #1 (Public Review):

      Bustion and colleagues outline the creation and testing of an in-silicon method to query gut microbiome databases for genes encoding enzymes predicted to catalyze a reaction of interest, which is provided by the user. Strengths of the tool include attempts to examine nearly 9,000 MetaCyc reactions in a pre-calculated fashion and to rank order enzymes based on their likelihood of catalyzing a reaction. Substrates, products, and even cofactors, if known, are employed to strengthen the power of the search algorithm, which also employs a hidden Markov model to improve the selection of putative hit enzymes. The authors outline high success rates with examples presented and compare those results with other extant methods, which are reported to perform in a less robust manner. Weaknesses include lack of evidence of success on a more difficult "real world" example. However, the tool outlined is a clear advance over existing methods and will be useful to explore the diversity of chemical transformation performed by commensal microbiota.

      We thank Reviewer 1 for their positive feedback and constructive summary. We agree that a real-world example would add confidence to our findings. We previously demonstrated SIMMER’s utility using published datasets. To expand upon these findings, we added another evaluation on an external dataset (Artacho et al., 2020) and performed new experiments to test SIMMER predictions for methotrexate metabolism into DAMPA and glutamate, a reaction known to be performed by the human microbiome but for which human gut strains and specific gut enzymes were not previously known. Both the new external dataset and our experimental findings validate SIMMER’s predictions of bacteria capable of metabolizing methotrexate, the mainline therapeutic for rheumatoid arthritis patients.

      Reviewer #2 (Public Review):

      This work provides a new computational tool for the systematic characterization of biotransformation reactions in the human gut microbiome: given a biotransformation reaction of interest, it predicts a list of candidate bacterial species, enzymes, and EC identifiers putatively capable of performing the queried reaction. The method is innovative and clearly presented.

      The pipeline that relies on both chemical and protein similarity algorithms, is in principle applicable to any biotransformation reaction that can be formulated as linked substrates and products (possibly including co-factors). This contrasts with other approaches that, for example, only rely on smaller databases and solely rely on substrates and chemical similarity. Moreover, SIMMER outperformed two other recently developed methods, against which it was benchmarked for its prediction accuracy when tested on a control test set derived from literature.

      The work interestingly focuses on predicting bacterial enzymes responsible for drug biotransformation, therefore showcasing its potential as a hypothesis generator for characterizing and validating novel bacterial enzymes in vitro.

      The authors correctly describe the relevance of an accurate input (in terms of reaction completeness, including cofactors and reaction products) as paramount for the quality of the prediction.

      The conclusions of this paper are mostly well supported by data, but some aspects of performance evaluation and its generality might benefit from additional elaborations and clarifications.

      1) Great emphasis has been dedicated to the prediction performance of SIMMER over a positive control set derived from the available literature. However, a more extensive description and analysis of false positive results are needed to better understand the possible impact of the (potentially many) false positive predictions listed for each reaction.

      We agree that our analysis would benefit from an assessment of false positives. Unfortunately, current literature usually reports which reactions an enzyme is capable, rather than incapable, of performing. For this reason, we took a conservative approach and decided to define all reactions preceding that which yielded a positive control enzyme sequence as false positives. This is now described above in Essential Revisions Response 1.3.

      2) The authors imply that the current method is superior to two other methods based on accuracy. However, a more extensive description of the benchmarking results would strengthen these benchmarking efforts.

      We have addressed this concern in Essential Revisions Response 3.

      3) The authors only showcase SIMMER in the context of drug metabolism but claim its applicability to be general enough to also describe other biotransformation in the human gut microbiota. Although in principle believable, the authors could improve the credibility and generalizability of their method by demonstrating another use case, e.g., food compounds, for which extensive metagenomic and metabolomic data are already available from previous gut microbiome studies.

      We agree that assessments of SIMMER’s predictions on food metabolism would improve the generalizability of the method. We have edited the text to focus on drug metabolism, as we believe SIMMER’s application to food metabolism merits a more thorough, future investigation.

      4) Showcasing experimental in vitro validation of SIMMER predicted enzyme(s) could greatly strengthen the relevance of this work.

      We have addressed this in Essential Revisions Response 2.

      5) Throughout the text and the title, a more careful and precise phrasing of the tool's scope (characterization of microbiome-encoded enzymatic reactions and not the identification of novel chemical transformations) would improve the reader's understanding of the work.

      We agree, and have reworded many key phrases in the text, including the title.

      Reviewer #3 (Public Review):

      This manuscript presents a new tool, SIMMER, to predict bacterial enzymemediated transformations of compounds, an important and incompletely understood aspect of microbiome drug metabolism. The authors compare their resource to existing resources that allow users to generate hypotheses related to compound toxicity and putative routes of compound metabolism. The authors identify the key innovations of their resource as including full chemical representations of reactions and a novel method to predict an enzyme's EC number (a description of function) from its reaction.

      Strengths

      Generating user-friendly tools to explore existing knowledge of bacterial enzymes and their reactions is important.

      SIMMER is a novel resource where the user provides the substrates and products as input and receives a list of potential microbiome enzymes as output.

      SIMMER includes a novel EC predictor based on reaction rather than based on sequence.

      Weaknesses

      Validation claims are not well supported by the results.

      We have extensively edited the manuscript to better describe our previous computational validations, and we have added new analyses to further evaluate SIMMER. We added an additional validation on an external dataset, an in vitro experimental assessment of SIMMER’s predictions for methotrexate metabolism, two new reactions to the positive control analysis, a false positive rate, and additional comparisons to the two competing methods.

      Need for the user to know both the substrate and the product for a reaction of interest limits the utility of the resource.

      We agree that this is a limitation for the user, but as we show in our Results, relying on substrates alone does not yield appropriate representations of reactions and therefore does not allow for accurate predictions of responsible species/strains and enzymes (i.e., finding True Positives, and confirming associations from previously collected data). We agree that tools requiring only substrates are convenient, but our results show that they are less helpful in finding appropriate metabolism and enzyme predictions. Many studies of biotransformation in the human gut identify the product information or product structure via HPLC, LC-MS, and NMR techniques. In cases where such data was not gathered, or not gathered with enough structural resolution, researchers can use tools such as Biotransformer to make product template predictions before inputting a query to SIMMER. This recommendation is included in the present manuscript’s lines 376–391:

      In instances when DrugBug and MicrobeFDT did make predictions, they suffered from low accuracy (Table 1), which we hypothesized was due to both methods’ reliance on substrate rather than reaction chemistry. Biotransformations involve the relationship between substrate(s), cofactor(s), and an enzyme to yield a particular product(s). As one substrate can exhibit affinity for multiple enzymes, resulting in multiple unique products, sole employment of substrates in a chemical fingerprint does not achieve the resolution necessary to make relevant predictions. To test if SIMMER’s better performance could be attributed to including cofactors and products, we modified our code to run with a chemical representation that includes only the substrate of each positive control reaction. Enzyme prediction accuracy dropped from 88% down to 33%, and EC prediction accuracy dropped from 93% down to 48% (Table 1—source data), supporting the hypothesis that SIMMER’s better performance when compared to DrugBug and MicrobeFDT is due in large part to our using chemical representations that include the full reaction. These results are in line with our previous demonstration that SIMMER clusters enzymatic reaction chemistry only when a full reaction is employed (Figure 2, Figure 2—figure supplement 4).

      Reliance on homology transfer annotation to predict enzyme function; this approach has important, microbiome-relevant, limitations.

      Please refer to our separate Common_Questions.pdf document, Common question 1: Are EC codes sufficient to select enzyme orthologs within an overall class?

    1. Author Response:

      The authors would like to thank the Editors and reviewers for their careful consideration of our article and we express our appreciation for the work required by both Editors and reviewers to study and produce the detailed reviewer reports. We are pleased at the general consensus that our paper is of interest and highlights an important region of the channel for drug-protein interaction. We are also cognizant that the reviewer reports highlight areas where important revisions need to be made to our work before it can be considered fully complete. We will revise the paper according to the comments of the reviewers and submit a new version in the near future which we hope will become the version of record.

    1. Author Response

      We thank the editors and reviewers for their support of our work, as well as their constructive feedback and useful suggestions, which have improved the readability and presentation of the manuscript for a broader audience.

    1. Author Response

      Reviewer 1 (Public Review):

      Fox, Birman, and Gardner use a previously proposed convolutional neural network of the ventral visual pathway to test the behavioral and physiological impact of an attentional gain spotlight operating on the inputs to the network. They show that a gain modulation that matches the behavioral benefit of attentional cueing in a matching behavioral task, induces changes in the receptive fields (RFs) of the model units, which are consistent with previous neurophysiological reports: RF scaling, RF shift towards the attentional focus, and RF shrinkage around the focus of attention. Ingenious simulations then allow them to isolate the specific impact of these RF modulations in achieving performance improvements. The simulations show that RF scaling is primarily responsible for the improvement in performance in this computational model, whereas RF shift does not induce any significant change in decoding performance. This is significant because many previous studies have hypothesized a leading role of RF shifts in attentional selection. With their elegant approach, the authors show in this manuscript that this is questionable and argue that changes in the shape of RFs are epiphenomena of the truly relevant modulation, which is the multiplicative scaling of neural responses.

      Strengths:

      The use of a multi-layer network that accomplishes visual processing, with an approximate correspondence with the visual system, is a strength of this manuscript that allows it to address in a principled way the behavioral advantage contributed by various attentional neural modulations.

      The simulations designed to isolate the contributions of the various RF modulations are very ingenious and convincingly demonstrate a superior role of gain modulation over RF shifts in improving detection performance in the model.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      There is no mention of a possible specificity of the manuscript conclusions in relation to the type of task to be performed. It is conceivable that mechanisms that are not important for detection tasks are instead crucial for a reproduction task, as in Vo et al. (2017).

      We agree that other behavioral tasks may rely on different attentional mechanisms then the ones we have studied here for detection and discrimination and now specifically point this out in the discussion [379-395].

      The manuscript puts emphasis on the biological plausibility of the model, and some quantitative agreements. But at some important points these comparisons do not appear very consistent:

      1) It is unclear what output of the model at each cortical area is to be compared with neurophysiological data. On the one hand, the manuscript argues that a 1.25 attentional factor is consistent with single-neuron results, but here this factor is applied to the inputs into V1 units. When this modulation goes through normalization in area V1, the output of V1 has a 2x gain. Intuitively, one would think that recordings in V1 neurons would correspond to layer V1 outputs in the model, but this is not the approach taken in the manuscript. This needs clarification. Also, note that the 20-40% gain reported in line 287 corresponds to high-order visual areas (V4 or MT), but not to V1, in the cited references. The quantitative correspondence between gain factors at various processing steps in the model and in the data is confusing and should be clearer.

      We agree that making a one-to-one mapping of gain effects measured in neurophysiology and different layers of the CNN is problematic. We therefore have clarified that the introduction of gain at the earliest stages of processing is meant to study how gain propagates through a complex CNN and has downstream effects [49-52 and 410-447] and we have also also clarified the various uncertainties in making one-to-one mapping from the CNN to neurophysiological measurements of gain [410-447].

      2) The model assumes a gain modulation in the inputs to V1. This would correspond to an attentional gain modulation in LGN unit outputs. There is little evidence of such strong modulation of LGN activity by attention. Also in V1 attentional modulation is small. As stated in Discussion (line 295), there is no reason to favor the current model as opposed to a model where the attentional gain is imposed later on in the visual hierarchy (for example V4). If anything, neurophysiology would be more consistent with this last scenario, given the evidence for direct V4 gain control from frontal eye fields (Moore and Armstrong, Nature 2003). The rationale for focusing on a model that incorporates the attentional spotlight on the inputs to V1 should be disclosed.

      We agree that measurements of gain changes with attention appear larger in later stages of visual processing and do not wish to explicitly link the gain changes imposed at the earliest stages of processing in our CNN observer model with changes in input from LGN as we agree this would be unrealistic. Instead, our goal was to examine how gain changes can propagate through complex neural networks and cause downstream effects on spatial tuning properties and the efficacy of readout. We have substantially re-written the manuscript, in particular the introduction [24-38, 49-52] and discussion [441-447] to better describe this rationale. We also now explicitly discuss how our propagated gain test demonstrates exactly the reviewer’s point - that gain can be injected late in the system, rather than at the earliest stages [274-276, 441-447].

      3) The model chosen is the CORnet-z model, but this model does not include recurrent dynamics within each layer. Recurrent dynamics is a prominent feature in the cortex, and there is evidence indicating that attentional modulations operate differently in feedforward and in recurrent architectures (Compte and Wang, Cerebral Cortex 2006). A specific feature of recurrent models is that the attentional spotlight need not be a multiplicative factor (which is biologically complicated) but an additive term before the ReLU non-linearity, which achieves the expected RF modulations (Compte and Wang, 2006). A model with recurrence thus represents another architecture that links gain and shift in a way that has not been explored in this manuscript, and this may limit the generalization of the conclusions (line 205).

      We appreciate the reviewer pointing us toward the Compte paper and we’ve added a discussion of recurrence as an alternate model [410-423].

      Reviewer 2 (Public Review):

      This manuscript by Fox, Birman, and Gardner combines human behavioral experiments with spatial attention manipulation and computational modeling (image-computable convolutional neural network models) to investigate the computational mechanisms that may underlie improvements in behavioral performance when deploying spatial attention.

      Strengths:

      • The manuscript is clear and the analyses, modeling, and exposition are executed well.

      • The behavioral experiments are carefully conducted and of high quality.

      • The manuscript takes a creative approach to constructing a ”neural network observer model”, that is, coupling an image-computable model to a potential readout mechanism that specifies how the representations might be used for the purposes of behavior. The focused analyses of the model innards (architecture, parameters) provide insight into how different model components lead to the final behavior of the model.

      Thank you for these supportive comments.

      Weaknesses:

      • The overall conclusions and insights gained seem heavily dependent on particular choices and design decisions made in this specific model. In particular, the readout mechanism lacks some critical descriptive details, and it is not clear whether the readout mechanism (512-dimensional representation that reflects summing over visual space) is a reasonable choice. As such, while the computational analyses and results may be correct for this model, it is not clear whether the strong general conclusions are justified. Thus, the results in their current form feel more like exploratory work showing proof of concept of how the issue of attention and underlying computational mechanisms can be studied in a rigorous and concrete computational modeling context, rather than definitive results concerning how attention operates in the visual system.

      Please see below for our response to the issue with readout and conclusions.

      Overall, the work is solidly constructed, but the overall generality and strength of the conclusions require substantial dampening.

    1. Author Response:

      We would like to thank the reviewers for their time, insights, and constructive feedback. We appreciate the recognition by the reviewers of the value and importance of our study. The reviewers also highlighted: the importance of carefully using and interpreting data from small molecule inhibitors due to possible off-target effects, considering inter-study differences in the cardiomyocyte cell trajectories, examining a possible role of PI3K signaling in proliferation and the intriguing yet not fully elucidated role of membrane protrusions in cardiac fusion. We agree with this important feedback. We plan to address these comments and others directly, in detail.

    1. Author Response:

      We thank the reviewers and editors for their careful reading and reviews of our work. We are grateful that they appreciate the value in our experimental approach and results. We acknowledge what we interpret as the major criticism, that in our original manuscript we focused too heavily on the hypothesized role of GABAergic neurons in driving habituation. This hypothesis will remain only indirectly supported until we can identify a GABAergic population of neurons that drives habituation. Therefore, we will revise our manuscript, decreasing the focus on GABA, and rather emphasizing the following three points:

      1. By performing the first Ca2+ imaging experiments during dark flash habituation, we identify multiple distinct functional classes of neurons which have different adaptation profiles, including non-adapting and potentiating classes. These neurons are spread throughout the brain, indicating that habituation is a complex and distributed process. 

      2. By performing a pharmacological screen for dark flash habituation modifiers, we confirm habituation behaviour manifests from multiple distinct molecular mechanisms that independently modulate different behavioural outputs. We also implicate multiple novel pathways in habituation plasticity, some of which we have validated through dose-response studies.

      3. By combining pharmacology and Ca2+ imaging, we did not observe a simple relationship between the behavioural effects of a drug treatment and functional alterations in neurons. This observation further supports our model that habituation is a multidimensional process, for which a simple circuit model will be insufficient. 

      We would like to point out that, in our opinion, there appears to be a factual error in the final sentence of the eLife assessment: “However, the data presented are incomplete and do not show a convincing causative link between pharmacological manipulations, neural activity patterns, and behavioral outcomes”. We believe that a “convincing causative link” between pharmacological manipulations and behavioural outcomes has been clearly demonstrated for PTX, Melatonin, Estradiol and Hexestrol through our dose response experiments. Similarly a link between pharmacology and neural activity patterns has also been directly demonstrated. As mentioned in (3), we acknowledge that our data linking neural activity and behaviour is more tenuous, as will be more explicitly reflected in our revised manuscript. Nevertheless, we maintain that one of the primary strengths of our study is our attempt to integrate analyses that span the behavioural, pharmacological, and neural activity-levels.

    1. Author Response

      Reviewer #1 (Public Review):

      Rosas et al studied the mechanism/s that enabled carbapenems resistance of a Klebsiella isolate, FK688, which was isolated from an infected patient. To identify and characterize this mechanism, they used a combination of multiple methods. They started by sequencing the genome of this strain by a combination of short and long read sequencing. They show that Klebsiella FK688 does not encode a carbapenemase, and thus looked for other mechanisms that can explain this resistance. They discover that both DHA-1 (located on the mega-plasmid) and an inactivation of the porin OmpK36, are required for carbapenem resistance in this strain. By using experimental evolution, it was shown that resistance is lost rapidly in the absence of antibiotics selection, by a deletion in pNAR1 that removed blaDHA-1. Moreover, their results suggested that it is likely that exposure to other antibiotics selected for the acquisition of the mega-plasmid that carries DHA-1, which then enabled this strain to gain resistance to carbapenemase by a single deletion.

      The major strength of this study is the use of various approaches, to tackle an important and interesting problem.

      The conclusions of this paper are mostly well supported by data, but one aspect is not clear enough. The description of the evolutionary experiment is not clear. I could not find a clear description of the names of the evolved populations. However, the authors describe strains B3 and A2, but their source is not clear. The legends of the relevant figure (Figure 5) are confusing. For example, the text describing panel B is not related to the image shown in this panel. Moreover, it is shown in panel C (and written in the main text) that the OmpK36+ evolved populations had only translucent colonies, so what is the source of B3(o)?

      We appreciate the point and in response have added a panel to Figure 5 (in the revised paper this is now Fig. 5A) to illustrate the evolutionary experiment and specify that there are two lineages (A and B) with 20 replicates each that, after 200 generations of evolution, give rise to populations of which A2 and B3 are the exemplars characterized.

      We have corrected the legends in Figure 5.

      We now explain (sentence starting on Line 197) that the B3 (o) is the single isolate of an opaque colony from lineage B3, it is the only colony that we identified from out of 595 colonies observed in the B3 population. B3(o) was sequenced and analysed as a comparator and has some value in that regard, despite being an anomaly.

      Reviewer #2 (Public Review):

      The authors sequenced a clinical pathogen, Klebsiella FK688, and definitively establish the genetic basis of the carbapenem-resistance phenotype of this strain. They also show that the causal mutations confer reduced fitness under laboratory conditions, and that carbapenem sensitivity readily re-evolves in the lab due to the fitness costs associated with the resistance mutations in the clinical isolate. They also establish that subinhibitory concentrations of ceftazidime select for the otherwise deleterious blaDHA-1 gene. Based on this finding the authors speculate that prior beta-lactam selection faced by the ancestors of Klebsiella FK688 potentiated the evolution of the carbapenem-resistance phenotype of this strain. If this hypothesis is true, then prior history of beta-lactam exposure may generally potentiate the evolution of carbapenem resistance.

      Strengths:

      From a technical perspective, the findings in this paper are solid. In addition, the authors establish a simple genetic basis for carbapenem resistance in a clinical strain, which is a valuable and non-trivial finding (i.e. they show that the CRE phenotype in this strain is not an omnigenic trait distributed over hundreds of loci).

      Weaknesses:

      The main weakness of this paper is that the authors draw overly broad conclusions of a conceptual nature from narrow experimental findings. This could be addressed by drawing more modest and narrow implications from the findings.

      1) The title of this paper is "Treatment history shapes the evolution of complex carbapenem-resistant phenotypes in Klebsiella spp." But they provide no data on the treatment history of the patient from whom this strain was isolated from. Therefore, the authors have no evidence to support their central claim. Indeed, it is completely possible that this strain never faced beta-lactam selection in the past, or that the patient's hypothetical history of betalactamase was irrelevant for the evolution of FK688. First, it is completely possible that this is a hospital-acquired infection, such that the history of this strain is due to selection in other contexts in the hospital that have little to do with the patient's treatment history. Second, it is completely possible that this strain (the chromosome anyway) has no prior history of beta-lactamase selection, and that it acquired the megaplasmid containing blaDHA-1 via conjugation from some other strain. In this second hypothetical scenario, it is possible that the fitness cost of the blaDHA-1 gene is not particularly high in a different source strain, but that it has some cost in the FK688 strain that it was isolated from. And of course, fitness costs in the human host could be very different than fitness costs in the laboratory, where strains are evolving under strong selection for fast growth. And given the benefit of resistance, it's clear that this strain clearly has a strong fitness advantage over faster-growing sensitive strains in the context of the source patient under antibiotic treatment.

      My general point here is that the broad claims made about patient history or prior history shaping the evolution of this strain are largely indefensible because there is no data here to make solid inferences about how prior history shaped the evolution of this strain.

      We appreciate the point and have changed our title and scaled back the strength of our conclusions regarding patient treatment history.

      2) Historical contingency. The authors claim that their work shows how historical contingency shapes the evolution of resistance. One problem with this claim is that it is trivial- this is only a significant claim if the reader believes that prior history is not important in the evolution of antibiotic resistance, which is a straw-man null hypothesis, to mix a couple metaphors. To be more concrete, clearly strain background (prior history) matters-eliminating the plasmid with the resistance gene eliminates resistance. But that is not particularly surprising, given the past 50 years of evolutionary microbiology literature on plasmids and resistance. By contrast to this work, the major contribution of papers that examine the role of historical contingency in evolution (i.e. various Lenski papers) is that those works quantitatively measure the role of history in comparison to other factors (chance, adaptation). Since this work is a deep dive into a single clinical isolate, the data presented here do not and cannot shed light on the role of historical contingency in the emergence of this strain. The authors' claims about the prior history that led to the CRE phenotype are reasonable- but are fundamentally speculative. I have nothing against speculation, as long as it is clear what claims are speculative, and what are concrete implications. But the authors frame these speculative claims as concrete implications of their findings.

      This is a fair point. We have reframed the study to not focus on historical contingency.

      As the reviewer points out, any discussion about historical contingency in the context of evolution is trivial in one sense. One of the reasons that the studies of Lenski and Blount provide new insights into the role of historical evolution because they knew the history of their populations (at, least for the number of generations since the LTEE began), and had a high degree of control and understanding of the growth conditions where the trait evolved. As such, they could go back to time points before the trait evolved, and then repeat the evolution experiment many times, in the exact same environment where the trait originally evolved, and then count how often they observed the evolution of that trait.

      Here we study a clinical isolate, and have less understanding of the evolutionary history of our strain. While we cannot re-evolve carbapenem resistant in the exact same environment experienced by the FK688 strain, we did test the capacity for the wild type, and two possible intermediate genotypes genotypes, to evolve carbapenem resistance in growth media with carbapenem.

      Altogether- we have comprehensive evidence for the genetic cause of carbapenem resistance: the BLA1 plasmid + OmpK36. We showed, by experiment, that it is much more likely for carbapenem resistance to evolve in a FK688 strain that carries the BLA1 plasmid, than in an FK688 strain that did not carry the plasmid even if it had acquired the OmpK36 mutation. We think this not trivial because a significant proportion of all of the carbapenem resistant Klebsiella that have been isolated are non-carbapenemase CRE. Our reconstruction provides a plausible explanation for why non-carbapenemase CRE evolve – because they are evolving from strains that have already been treated with a non-carbapenem beta-lactam drug and have thereby selected for the presence of a beta-lactamase (that is not a carbapenemase).

      So, while we have scaled back the strength of our claims, we do think that our results can provide some insight into how the evolutionary history of a pathogen can shape the molecular path to antibiotic resistance.

      3) The authors claim that "[This work] suggests that the strategic combinations of antibiotics could direct the evolution of low-fitness, drug-resistant genotypes". I suppose this is true, but I also think this is a stretch of an implication given these findings. To be blunt, while I suppose it's better to have costly resistance variants that re-evolve sensitivity than to have low-cost high-resistance strains circulating, I think the patient's family would probably disagree that the evolution of a low-fitness drug-resistant genotype was good or strategic in the clinical context, even if better from a public health perspective. Low-fitness drug-resistant strains are just as lethal under clinical antibiotic concentrations!

      Thank you for the comment, we see how this sentence could be seen as too strong a conclusion and have rewritten the last sentence of the DISCUSSION (line 351):

      “These results show how an individual’s treatment history might shape the evolution of AMR, and should be taken into consideration in order to explain the evolution of non-carbapenemase CRE”

      The authors do show the plausibility of their hypothesis/model that prior beta-lactam selection is sufficient to potentiate the evolution of carbapenem-resistance (by the additional ompK loss-of-function mutation). I think those findings are very nice. But the authors undermine their results by extrapolating too far from their data. Hence, I think narrowing the scope of the implications would improve this paper.

      In addition to narrowing the scope of the implications as written, I also would like to add that there may be other ways of framing this paper (other than historical contingency) that may make the significance of this work more apparent to a broader audience. This may be worth considering during the revision process.

      We have taken these suggestions on board and have re-framed the final sentences of the ABSTRACT, INTRODUCTION and DISCUSSION accordingly. Specifically, we have removed reference to historical contingency and instead have reframed our experiments as providing a genetic and evolutionary explanation for an interesting and concerning cause of antibiotic resistance – non-carbapenemase CRE.

    1. Author Response

      Reviewer #1 (Public Review):

      During the height of the Covid19-pandemic, there was great and widely spread concern about the lowered protection the screening programs within the cancer area could offer. Not only were programs halted for some periods because of a lack of staff or concern about the spreading of SARS CoV2. When screening activities were upheld, participation decreased, and follow-up of positive test results was delayed. Mariam El-Zein and coworkers have addressed this concern in the context of cervical screening in Canada, one of the rather few countries in the world with well organized, population-based, although regionalized, cervical screening program.

      Comment 1: Despite the existence of screening registries, they choose to do this in form of a survey on the internet, to different professional groups within the chain of care in cervical screening and colposcopy. The reason for taking this "soft data" approach is somewhat diffuse.

      We are happy to provide a counterargument to the reviewer’s concern about the “soft data” approach. Our unit – McGill’s Division of Cancer Epidemiology – is a major stakeholder in policymaking and cervical screening guideline development in Canada. It is one of the components in a McGill Task Force on COVID-19 and Cancer that has been widely engaged in assessing the pandemic’s impact on the entire spectrum of cancer control and care (examples: PMID: 33669102, PMID: 34843106). Canada is a country of continental size, and during the pandemic even travel between provinces was interrupted. It is only via a web-based survey that one could have captured the required information. We took advantage of our unit’s credibility and stature to secure a substantial response to the survey, which elicited a high level of detail.

      The survey questionnaire instrument was thoughtfully developed with input from Canadian experts who are active in the field of cervical cancer prevention and involved in clinical care to comprehensively formulate informative questions (and practical, reasonable responses) underpinning each of the themes covered. Of note, some of these coinvestigators, having executive roles in relevant clinical professional bodies, advised our team on the logistics of circulating the survey to members. The administration of the survey was coordinated with the pertinent societies. Our aim was to provide an overall portrait across Canada of the extent of the harms to cervical cancer screening and treatment processes at the beginning of the COVID-19 pandemic (specifically a snapshot from mid-March to mid-August 2020), as perceived by professional groups in multiple health disciplines.

      Indeed, as the reviewer mentioned, there are fully (i.e., for Saskatchewan) and partially (i.e., for British Columbia, Alberta, Manitoba, Ontario) organized cervical cancer screening programs in Canada in addition to opportunistic programs (i.e., for North West Territories, Yukon, Nunavut, Quebec). The Canadian Partnership Against Cancer also collects information on cervical cancer screening programs and/or strategies across Canada. Using data from these different sources enables a quantitative assessment of the impact of the pandemic on cervical cancer screening, but this was not the research methodology used; the survey approach was our research strategy as we attempted to collect responses from all provinces and territories, regardless of the different screening programs and modalities implemented across the country, and including regions that do not have an official screening program.

      Since the effects of the COVID-19 pandemic will stay with us for years to come, our research team is also examining – using a “hard data” approach via administrative healthcare datasets – the long-term effects that will accrue on cervical cancer morbidity and mortality from the interruptions and delays in screening processes and other activities in the process of care. A discussion of this is, however, beyond the scope and objectives of our manuscript.

      No modifications were made in the manuscript to address this comment.

      Comment 2: The authors claim they want to "capture modifications". However, the suggestions that come from this study are limited and are submitted for publication 2 years after the survey when the height of the pandemic has passed long since, and its burden on the screening program has largely disappeared. The value of the study had been larger if either the conclusions had been communicated almost directly, or if the survey had been done later, to sum up the total effect of the pandemic on the Canadian cervical screening program.

      We appreciate this comment. As part of our commitment to transparency, we now plainly acknowledge that considerable time (1.5 years) has elapsed between the time the survey data were available (March 2021) and manuscript submission (September 2022) for publication in the special issue, curated by eLife, on the impact of the COVID-19 pandemic on cancer prevention, control, care and survivorship. However, we also argue that this lag time is reasonable given the undertaking of data management, analysis, and reporting of a large amount of data, including the synthesis of replies to open-ended questions. We also took this opportunity to expose two graduate students to the research process.

      Changes made: Page 15, Lines 437-440.

      In terms of assessing the total effect of the pandemic on the Canadian cervical screening program, this work is in progress, but not within the current manuscript. The PubMed references mentioned above show examples of directions we are taking. Also, as mentioned in our response 1 to comment 1, we will use data from administrative healthcare datasets (medical and drug claims, hospitalization data, death registry data) and hospital cancer registries (clinical characteristics such as cancer stage, grade, and biomarkers) on cancer patients diagnosed in Quebec between 2010 and 2026. Using these datasets, we intend to compare the pre- and post-pandemic eras in order to analyze changes in patterns of cancer care, cancer prognosis, and survival, including shifts at stage at diagnosis.

      Comment 3: Another major problem with this study is the coverage. The results of persistent activities to get a large uptake is somewhat depressing although this is not expressed by the authors. 510 professionals filled out the survey partially or in total. 10 professions were targeted. The authors make no attempt to assess the coverage or the validity of the sample. They state the method used does not make that possible. But the number of family practicians, colposcopists, cytotechnicians, etc. involved in the program should roughly be known and the proportion of those who answered the survey could have been calculated. My guess is that it is far below 10%.

      There were no extensive additional efforts to increase participation rate, apart from follow-up reminder emails to complete the survey, which is standard practice followed by the societies that administered the survey to their constituents. We respectfully disagree with the reviewer concerning coverage being a major limitation, particularly in view of the difficulty in general to secure a high response rate in a survey such as ours, at a time like the middle of the pandemic. Although it appears to be a seemingly easy to compute classic non-response rate, information on the “population of interest” (i.e., number of professionals approached in addition to the advertisement of the survey on social media platform”) is not available to estimate the extent of non-response. Even if the response rate is below 10% as suggested by the Reviewer, our survey and findings should be considered on their merits; the target population was involved in the survey design to ensure the validity of coverage of the questions along the continuum of care in cervical cancer screening and treatment. In addition, we followed the Checklist for Reporting Results of Internet E-surveys to inform the design, conduct, and reporting of our survey research.

      Changes made: Page 14, Lines 421-425.

      Comment 4: The national distribution seems shewed despite the authors boosting its pan-Canadian character. I am just faintly familiar with the Canadian regions, but, as an example, only 2 replies from Quebec must question the national validity of this survey.

      We apologize for this typo error in Table 1; many cells were accidently shifted down (the last couple of provinces had the wrong numbers). There were actually 21 survey respondents from the province of Quebec. This has now been corrected.

      Changes made: Page 19.

      Comment 5: The result section is dominated by quantitative data from the responses to the 61 questions. All questions and their answers are tabulated. As there is no way to assess the selection bias of the answers these quantitative results have no real value from an epidemiological standpoint.

      Indeed, we opted to provide the reader with descriptive results on all the questions and sub-questions that were asked, with explicit annotation to each question number and clear reference to the formulated question by appending the full survey instrument to the manuscript. We designed the survey as a descriptive and not an analytical study, contrary to traditional epidemiology studies that investigate a specific exposure-outcome relationship.

      Changes made: Page 12, Lines 366-368.

      In the spirit of other papers in the special issue on COVID-19 and cancer, curated by eLife, we measured the impact of the pandemic on the process of care like many other eLife articles did. The eLife collection is a snapshot of a period when not only was cancer control disrupted, but the ability to conduct valid research was also severely curtailed. The reviewer will likely agree that our paper is not the only one to suffer from these methodological shortcomings. Yet, taken together, the gestalt value of the eLife collection will inform epidemiologic modellers for the next long while on how this period affected cancer control. We are happy to contribute with this paper a few more pieces of the puzzle, adding to that which eLife published for many other jurisdictions.

      Comment 6: The replies to the open-ended questions are summarized in a table and in the text. The main conclusion of the content analysis of the answers to the direct questions, and one of the main conclusions of the study, is that the majority favors HPV self-sampling in light of the pandemic. However, this not-surprising view is taken by only 80 responders while almost as many (n=60) had no knowledge about HPV self-sampling.

      Another aim of our survey was to identify the windows of opportunity that were created by the pandemic and pinpoint positive aspects that could enable the transformation of cervical cancer screening (i.e., HPV primary based screening and HPV self-sampling). We found that 33% of respondents were of the opinion that the pandemic context could facilitate the implementation of self-sampling and that 50.1% were in favor of the implementation of this new screening practice (described in Results Theme 1: Screening Practice and Stable 5).

      Changes made: Page 4, Lines 93-97.

      The reviewer is correct that in the open-ended sub-question of Question 23 “Are you in favor of the implementation of HPV self-sampling as an alternative screening method in your clinical practice?”, 60 respondents justified their answer to the nominal question by their lack of familiarity with HPV self-sampling, compared to 80 who shared positive comments. However, we would like to draw the reviewer’s attention to the responses to the nominal part of the question in Stable 5. Of those who answered “Maybe”, 47.1% said that they were not familiar enough to express a favorable or unfavorable opinion. We would also like to draw the reviewer’s attention to the results of our cross-tabulation of profession and the question of relevance (described in Results Theme 1: Screening Practice). The lack of familiarity with novel screening practices such as self-sampling can be explained by the fact that most (75.0%) of those who expressed these views were primary healthcare professionals, and not secondary and tertiary specialists.

      Changes made: Page 12, Lines 344-346

      Comment 7: The authors conclude that their study identified the need for recommendations and strategies and building resilience in the screening system. No one would dispute the need, but the additional weight this study adds, unfortunately, is low, from a scientific standpoint.

      Although no one would dispute the need as the reviewer is suggesting, but as epidemiologists we needed to collect this empirical evidence. We urge the reviewer to consider that this article is to contribute to a more complete picture of the collective process of discovery of the impact of the pandemic initiated by eLife’s special issue.

      No modifications were made in the manuscript to address this comment.

      Comment 8: The conclusion I draw from this study is that the authors have done a good job in identifying some possible areas within the Canadian screening programs where the SARS-Cov2 pandemic had negative effects and received some support for that in a survey. Furthermore, they listed a few actions that could be taken to alleviate the vulnerability of the program in a future similar situation, and received limited support for that. No more, no less.

      We thank the Reviewer for the positive feedback provided in the first part of the comment. As for the rest, we believe we have addressed above the reviewer’s concerns.

      Reviewer #2 (Public Review):

      The study aimed to provide information on the extent to which the COVID-19 pandemic impacted cervical cancer (CC) screening and treatment in 3 Canadian provinces. The survey methodology is appropriate, and the results provide detailed descriptive statistics by province and type of practice. The results support the authors' conclusions. This evidence together with data gathered from other national surveys may provide baseline data on the impact of the pandemic on CC outcomes such as late-stage diagnoses and CC treatment outcomes due to these delays.

      We are flattered by the Reviewer’s overall assessment of our manuscript.

      Comment: This study relies mostly on descriptive statistics and open-ended questions that provide details about what CC screening and treatment procedures were delayed. It is unclear how the reader would use the results to affect current or future practice.

      As mentioned in our reply above to a similar comment raised by reviewer 1, our overarching aim was to portray in a purely descriptive manner the negative and positive impacts of the COVID-19 pandemic on cervical cancer screening-related activities, as perceived by healthcare professionals. Please refer to arguments above.

      Changes made: Page 12, Lines 366-368; Page 15, Lines 437-440.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors set out to determine the degree to which early language experience affects neural representations of concepts. To do so, they use fMRI to measure responses to 90 words in adults who are deaf. One group of deaf adults (n=16) were native signers (and thus had early language exposure); a second group (n=21) was exposed to sign language later on. The groups were relatively well-matched in other respects. The primary finding was that the high dimensional representations of concepts in the left lateral anterior temporal lobe (ATL) differed between native and delayed signers, suggesting a role for early language experience in concept representation.

      The analyses are carefully conducted and reflect a number of thoughtful choices. These include the "inverted MDS" method for constructing semantic RDMs, a normal hearing comparison group for both behavioral and fMRI data, and care taken to avoid bias in defining functional ROIs. And, comparing early and delayed signing groups is a clever way to study the role of early language experience on adult language representations.

      We greatly appreciate the reviewer’s positive evaluation and constructive comments on our study.

      One interesting result that I struggled to put in a broader context relates to the disconnect between behavioral and neural results. Specifically, the behavioral semantic RDMs (Figure 1a) did not differ between any of the groups of participants. This suggests that the representations of the 90 concepts are represented similarly in all of the participants. However, the similarity of the neural RDMs in left lateral ATL differs between the native and delayed signing groups (but not in other regions). Given the similarity of the behavioral semantic RDMs, it is unclear how to interpret the difference in left lateral ATL representations. In other words, the neural differences in left ATL do not affect behavior (semantic representation). The importance of the differences in neural RDMs is therefore questionable.

      Thank you for this comment. In the Revision we have added explicit discussions about this important issue of the relationship between the behavioral and neural profiles for semantics:

      Introduction (pages 4-5): “(previous) studies have reported little effects on semantics behaviors, including semantic interference effects in the picture-sign paradigm (Baus et al., 2008), scalar implicature (Davidson and Mayberry, 2015), or accuracy scores of several written word semantic tasks (e.g., synonym judgment) (Choubsaz and Gheitury, 2017). However, as shown by the color knowledge in the congenitally blind studies (e.g., Wang et al., 2020), similar semantic behaviors may arise from (partly) different neural representations. Semantic processing is supported by a multifaceted cognitive system and a complex neural network entailing distributed semantic regions (Bi, 2021; Binder and Desai, 2011; Lambon Ralph et al., 2017; Martin, 2016), and thus focal neural changes may not necessarily lead to semantic behavioral changes. Neurally, neurophysiological signatures assumed to reflect semantic processes showed incongruent effects across studies: N400 effects in the semantic violation of written sentences were not affected (Skotara et al., 2012), whereas M400 in the picture-sign matching task showed atypical activation patterns (reduced recruitment of left fronto-temporal regions and involvement of right parietal and occipital regions) (Ferjan Ramirez et al., 2016, 2014; Mayberry et al., 2018). It remains to be tested whether and where delayed L1 acquisition affects how semantics are neurally represented, using imaging techniques with higher spatial resolutions.”

      Discussion (pages 17-18): “Notably, different from phonological and syntactic processes, where both visible behavioral underdevelopment (e.g., Caselli et al., 2021; Cheng and Mayberry, 2021; Mayberry et al., 2002) and brain functional changes (Mayberry et al., 2011; Richardson et al., 2020; Twomey et al., 2020) were observed, for semantics we only observed brain functional changes in dATL but no visible behavioral effects. Consistent with the literature where deaf delayed signers did not show differences to controls in semantic interference effects in the picture-sign paradigm (Baus et al., 2008), scalar implicature (Davidson and Mayberry, 2015), or N400 measures (Skotara et al., 2012), we did not observe visible differences in terms of semantic distance structures (Figure 1a) or reaction time of lexical decision and word-triplet semantic judgment (Supplementary file 1). As reasoned in the Introduction, this seeming neuro-behavior discrepancy might be related to the multifaceted, distributed nature of the cognitive and neural basis of semantics more broadly. The general semantic behavioral tasks we employed could be achieved with representations derived from multiple types of experiences, supported by highly distributed neural systems (e.g., (Bi, 2021; Binder and Desai, 2011; Lambon Ralph et al., 2017; Martin, 2016), including those not affected by the delayed L1 acquisition in regions beyond the dATL. This finding invites future studies to specify the exact developmental mechanisms in the left dATL (Fu et al., 2022; Unger and Fisher, 2021) and to uncover semantic behavioral consequences related to the functionality of this area.”

      An important point is that, if I understand correctly, the semantic space is defined by the 90 experimental items. That is, behavioral RDMs were created by having normal hearing participants arrange 90 items spatially, and neural RDMs were created by comparing patterns of responses to these 90 experimental items. This 90-dimensional space is thus both (a) lower dimensional than many semantic space models that include hundreds of directions and (b) constrained by the specific 90 experimental items chosen. On the one hand, this seems to limit the generalizability of the findings for semantic representations more broadly.

      Indeed, for the RDM the spaces were constructed by the relations among the 90 items, as is the standard practice for current RSA analyses. Regarding the dimensionality issue, we would like to clarify that although the space is a 90 x 90 matrix, the semantic distance for each pair was obtained by the subjects’ ratings, i.e., the psychological space, which is likely to be high-dimensional in nature. That is, we compressed the potentially high-dimensional psychological construct into one measure to construct the 90 x 90 matrix. If we understood correctly, semantic space models with hundreds of directions the reviewer referred to are various types of embedding and/or distributional models. There although each word is projected onto a high-dimensional vector, the distance for each pair is still extracted (e.g., by cosine similarity) to construct the cross-item similarity matrix for RSA. Regarding the generalization of the findings across items, we greatly appreciate this concern and indeed that was one of the reasons why we extracted the categorical structure based on the clustering of the items (see also response to the next Comment). We also examined the univariate abstractness contrast, which looked at the broad categorical effects rather than specific items. We have made clarifications accordingly in the Revision to address these concerns (page 8).

      The logic behind using a categorical semantic RDM (e.g., Figure 2a) was not clear. The behavioral semantic RDMs (Figure 1a) clearly show gradations in dissimilarity, particularly for the abstract categories. It would seem that using the behavioral semantic RDM would capture a more accurate representation of the semantic space than the categorical one.

      Thank you for this suggestion. We opted for the categorical structural similarity based on the clustering analyses to boost signal and to allow for better generalization across items (i.e., along the categorical structure). Agreeing with the reviewer that such an approach may lose the important graded space especially for the abstract items, we added an analysis using continuous semantic distances specifically focused on the abstract items (page 10):

      “1) Types of semantic distance measures: While semantic categories for concrete/object words are robust and well-documented, the semantic categorization within the abstract/nonobject words is much fuzzier and remains controversial (Catricalà et al., 2014; Wang et al., 2021). The behavioral semantic RDM in Figure 1a indeed shows gradations in dissimilarity for abstract/nonobject words. We thus checked the two groups’ semantic RDMs using the continuous behavioral measures and further examined whether group differences in the left dATL were affected by the types of semantic distance (categorical vs. continuous) being used for abstract/nonobject words. The two deaf groups showed comparable similarities to the hearing benchmark (by correlating each deaf subject’s RDM with the group-averaged RDM of hearing subjects, Welch’s t23.0 = -0.12, two-tailed p = .90). RSA was performed by correlating each deaf subject’s neural RDM in the left dATL with these two types of semantic RDMs. Significant group differences were observed (Figure 3), for both the categorical RDM (Welch’s t31.0 = 3.06, two-tailed p = .005, Hedges’ g = 0.98) and the continuous behavioral semantic RDM (Welch’s t36.7 = 2.47, two-tailed p = .018, Hedges’ g = 0.76), with significant semantic encoding in dATL observed in both analyses for native signers (one-tailed ps < .003) and neither for delay signers (one-tailed ps > .42). These results indicate that the reduced dATL encoding of abstract/nonobject word meanings induced by delayed L1 acquisition was reliable across semantic distance measures.”

      As the reviewer suggested, we could also carry out RSA using the 90-word behavioral semantic RDM. We did observe similar group differences with this RDM, with delayed signers showing a trend of semantic encoding reduction in the left dATL relative to native signers (native signers, mean (SD): 0.019 (0.023); delayed signers, mean (SD): 0.006 (0.022), Welch’s t31.5 = 1.78, two-tailed p = .085; a delayed signer was excluded from this analysis for being an outlier beyond 3 standard deviations). It appears that the behavioral semantic RDM yielded smaller effect sizes in group differences than the categorical RDM, but the ANOVA (the within-subject factor - RDM-type: categorical, behavioral; the between-subject factor – group: native, delayed) revealed no significant effects of RDM-type or its interaction with the group (ps > .71), but a significant main effect of group (F(1,36) = 9.19, p = .004). The seemingly weaker group differences using the behavioral semantic RDM should not be over-interpreted.

      Reviewer #2 (Public Review):

      The authors investigated patterns of fMRI activation for familiar words in two groups of deaf people. One "language rich" group received exposure to sign from birth, whereas the "language poor" group included kids born to hearing parents who had limited exposure to language during the first few years of life. The primary findings involved group differences in BOLD activation patterns across different areas of interest within the semantic network when participants made intermittent 1-back category judgments for words appearing in succession.

      There was much to be liked about this study, including the rigor of the methods and the novel contrasts of two deaf samples. These strengths were balanced by a number of questions about the assumptions and theoretical interpretations underlying the data. I will elaborate on the major points in the paragraphs to follow, but briefly, the ways in which the authors are framing critical period constraints in language fundamentally differ from the standard nativist perspectives (e.g., Chomsky, Lenneberg). The assumptions of what constitutes a deprivation model require further justification and perhaps recasting to avoid unnecessary stigma (i.e., this reviewer was uncomfortable with the assertion that being born deaf to hearing parents by default constitutes deprivation). The introduction lacked principled hypotheses that motivated the choice of comparing abstract and concrete words, and potential accounts of group differences were underdeveloped (e.g., how do parents in China typically react to having a deaf child, and what supports are in place for preventing language deprivation? Are newborn infants universally screened for hearing loss in China? The answers to these questions might help the readers to understand why/how deaf children in this circumstance might experience deprivation).

      We appreciate the reviewer’s positive evaluations and constructive comments on our study. We have revised the manuscript substantially in light of these comments (see below).

      References to critical periods require a bit more elaboration with respect to lexical-semantic vs. semantic acquisition. The nature of the critical period in language acquisition remains controversial with respect to its constraints. Lenneberg and Chomsky speculated that the limit of the critical period for language acquisition was about puberty (13ish years of age). This is much older than the deaf sample tested here so arguments about aging out of the critical period at least for language acquisition need more nuance. Another issue relates to learning semantic mappings vs. learning language as falling under the same critical period umbrella. This seems highly unlikely as semantic acquisition in early childhood is aided by linguistic labeling but would likely occur in parallel even in the context of language deprivation. Much of the prior literature on critical periods and nativist approaches to language development has focused on syntactic acquisition and elements such as recursion rather than a mapping of symbols to conceptual referents. This makes the critical period group comparison somewhat tenuous because what you are really interested in is a critical period for word meaning acquisition not the more general case of syntactic competency.

      The point above is highlighted in the following statement underlying one of the primary assumptions of the study:

      Pg. 3, "Here, we take advantage of a special early-life language-deprivation human model: individuals who were born profoundly deaf in hearing families and thus had very limited natural language exposure (speech or sign) during the critical period of language acquisition in early childhood"

      "hypofunction of the language system as a result of missing the critical period of language acquisition" (pg 3), same critique as previous - the critical period window is thought to be 13ish years old.

      There are a couple of problems with this assertion/assumption. Although it is true that most children who are born deaf have hearing parents, it is not justifiable to label this condition an early-life deprivation model. Hearing parents who are extremely motivated to learn sign language and pursue related language enrichment strategies can successfully offset many of these effects. Similarly, it is not inconceivable that a deaf child born to a deaf parent might be neglected or abandoned without the benefit of early sign exposure. My argument here is that classifying deaf children born to hearing parents as automatically 'language deprived' is potentially both stigmatizing and scientifically unjustified.

      We originally used the term “language deprivation” because it has been recently advocated in the deaf field mainly to increase society’s awareness of the risks of language deprivation and the lifelong impact that deaf and hard-of-hearing children face (e.g., Hall, 2017, Maternal and Child Health Journal; Lillo-Martin & Henner, 2020, Annual Review of Linguistics). In the current context, we agree with the reviewer that “early-life deprivation” model may not precisely describe the language acquisition condition of delayed signers. Indeed, for some of the delayed subjects in our study, their hearing parents actively tried to provide additional aids of exposure to signs (via preschool special education programs; learning signs by themselves) or speech (via hearing aids). In the revision, we avoided the term “language deprivation” and used the terms “subjects with varying amounts and qualities of early language exposure” or “delayed L1 acquisition” to more precisely describe our experimental manipulation throughout the revised manuscript.

      We fully agree with the reviewer that the “critical period” of language acquisition is too much an umbrella term, which may be taken to refer to critical period for different, specific cognitive and/or neural development in the literature. In the Revision we avoided using this term to reduce ambiguity. Instead, we now made explicit throughout the specific processes being discussed (phonology, syntax, semantics). The effects of early language experience (reduced in delayed L1 acquisition) on the behavioral and neural patterns relating to phonology, syntax, and semantics are now elaborated, discussed separately and explicitly in both the Introduction and Discussion (pages 3-4, 17-18).

      Regarding the potential nonlinguistic socio-environmental differences (e.g., coping strategies after deafness awareness), we have added further clarifications (page 15): “Notably, routine nation-wide neonate hearing screening in China did not start until 2009, years after the early childhood of our participants (born before 2000), and some hearing parents may nonetheless try to give deaf children additional aids of exposure to signs (via preschool special education programs) or speech (via hearing aids). Critically, our positive results of the robust group differences in dATL suggest that early homesign/aid measures and later formal education for sign and written language experiences are insufficient for typical dATL neurodevelopment; the full-fledged language experience during early infancy and childhood (before school age) plays a necessary role in this process.” Relevant information has also been added in the Method/Result sections.

      Pg. 6 "It should be noted that the neural semantic abstractness effect does not equate with language-derived semantic knowledge, as it might arise from some nonverbal cognitive processes that are more engaged in abstract word processing (Binder et al., 2016)." - I had great difficulty understanding what this meant.

      We have revised this sentence as follows: “While the abstractness effect has often been used to reflect linguistic processes (e.g., (Wang et al., 2010)), “abstractness” is not a single dimension and instead relates to both linguistic and nonlinguistic (e.g., emotion) cognitive processes (Binder et al., 2016; Troche et al., 2014; Wang et al., 2018).” (page 11)

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors present a method for discovering response properties of neurons, which often have complex relationships with other experimentally measured variables, like stimuli and animal behaviors. To find these relationships, the authors fit neural data with artificial neural networks, which are chosen to have an architecture that is tractable and interpretable. To interpret the results, they examine the first- and second-order approximations of the fitted artificial neural network models. They apply their method profitably to two datasets.

      The strength of this paper is in the problem it is attempting to solve: it is important for the field to develop more useful ways to analyze and understand the massive neural datasets collected with modern imaging techniques.

      The weaknesses of this paper lie in its claims (1) to be model free and (2) to distinguish the method from prior methods for systems identification, including spike triggered averaging and covariance (or rather their continuous response equivalents). On the first claim, the systems identification methods are arguably substantially more model free approach. On the second claim, this reviewer would require more evidence that the presented approach is substantially different from or an improvement on systems identification methods in common use applied directly to the data.

      We thank the reviewer for carefully engaging with the manuscript and believe that our revisions address these points of critique both through novel analysis and through clarifications.

      First claim: We fully agree that systems identification approaches are in theory truly model-free while MINE imposes constraints through the chosen architecture. However, our new analysis comparing MINE to direct fitting of the kernels of a Volterra expansion highlights that this is not really the case in practice. In order to obtain good fits, the model-free-ness has to be substantially reduced by imposing constraints on the degrees of freedom. We quantify this reduction in Figure S3 and directly compare it to the effective degrees of freedom of the CNN. Reducing degrees of freedom is also a theme that can be found throughout the literature on systems-identification, especially when the analysis does not involve Gaussian white noise as input stimuli. We therefore stand by our claim that MINE is “essentially model-free” in the sense that it does not rely on defining a model a-priori much like systems identification. And we also clarify our choice of calling the method “model-free” in the introduction where we state: “While the architecture and hyper-parameters of the CNN used by MINE do impose constraints on which relationships can be modeled, we consider the convolutional network ``model-free’’ because it does not make any explicit assumptions about the underlying probability distributions or functional forms of the data.”

      Second claim: We believe that our new analysis for the comparison with the Volterra expansion approach of systems identification addresses this point. By directly fitting Volterra kernels instead of relying on spike-triggered analysis we put the comparison on a more equal footing than our previous STA/STC exposition. We can show that while the methods are equivalent for Gaussian white noise stimuli, MINE is superior for highly correlated input stimuli. We show that imposing constraints on the regression used to identify the Volterra kernels can overcome this gap to a large extent, but MINE still produces a model that has higher predictive power and MINE also does more than extracting receptive fields. We are also not entirely sure to what extent Wiener/Volterra analysis has been applied to calcium imaging data. While there is a vast body of literature on systems identification, there is little evidence that it has been widely applied to data in which both inputs and outputs are highly correlated across time, such as calcium imaging experiments using naturalistic stimuli. While this doesn’t have to mean anything in and of itself it might point to the fact that this analysis is not easily accessible and requires ample tuning. These are precisely two problems that MINE aims to overcome. We now more explicitly state in the manuscript that we believe this accessibility to be one of the core strengths of MINE.

      Reviewer #2 (Public Review):

      This paper describes a relatively unbiased and sensitive method for identifying the contributions of different behavioral parameters to neural activity. Their approach addresses, in an elegant way, several difficulties that arise in modeling of neuronal responses in population imaging data, namely variations in temporal filtering and latency, the effects of calcium indicator kinetics, interactions between different variables, and non-linear computations. Typical approaches to solving these problems require the introduction of prior knowledge or assumptions that bias the output, or involve a trade-off between model complexity and interpretability. The authors fit individual neuron's responses using neural network models that allow for complex non-linear relationships between behavioral variables and outputs, but combine this with analysis, based on Taylor series approximations of the network function, that gives insight into how different variables are contributing to the model.

      The authors have thoroughly validated their method using simulated data as well as showing its applicability to example state of the art data sets from mouse and zebrafish. They provide evidence that it can outperform current approaches based on linear regression for the identification of neurons carrying behaviorally relevant signals. They also demonstrate use cases showing how their approach can be used to classify neurons based on computational features. They have provided Python code for the implementation and have explained the methods well, so it will be easy for other groups to replicate their work. The method could be applied productively to many types of experiments in behavioral and systems neuroscience across different model systems. Overall, the paper is clearly written and the experiments are well designed and analysed, and represent a useful contribution to the neuroscience field.

      We thank the reviewer for their favorable assessment of our work.

      Reviewer #3 (Public Review):

      In the current study, the authors present a novel and original approach (termed MINE) to analyze neuronal recordings in terms of task features. The method proposed combines the interpretability of regressor-based methods with the flexibility of convolutional neural networks and the aim is to provide an unbiased, "model-free" approach to this very important problem.

      In my opinion, the authors succeed in most of these aspects. They use three datasets: an artificially-generated one that provides a ground-truth, a published dataset from wide-scale cortical mouse recordings and a novel one that studies thermosensation in larval zebrafish. MINE compares favorably in all three cases.

      I believe that the paper would mostly benefit from an increased effort in clear exposition of the Taylor expansion approach, which is at the core of the method. The methods section describes the mathematics, but I wonder whether it would be possible to illustrate or schematize this in a main Figure, e.g. as an addition to Figure 1 or as a new figure. Around line 185, the manuscript reads: "We therefore perform local Taylor expansions of the network at different experimental timepoints. In other words, we differentiate the network's learned transfer function that transforms predictors into neural activity."

      It would help to explicitly state with respect to what the derivative is being computed (i.e. time) and maybe a diagram (which I had to draw to understand the paper) in which a neuronal activity trace is shown and from time t onwards a prediction is computed using terms in the Taylor expansion would be very instructive (showing on an actual trace how disregarding certain terms changes the prediction and hence the conclusions about the actual dependence of the trace on the behavioral features). The formulation in terms of Jacobians and Hessians can then be restricted to the Methods section and the paper will be easier to read for a wider audience.

      We agree with the reviewer that readability is key. We hope that our re-write and re-organization of the manuscript makes it easier to follow. We now start with a unified description of complexity and non-linearity both derived from a Taylor decomposition around the data-average. We use this section (starting Line 91) to lay out the logic of the Taylor expansion and explicitly state that the derivatives describe the expected change in output given any change in predictors. We did not want to remove the math entirely from the paper, simply because we found it hard to explain the concept entirely without it. We have provided an annotation to the formula parts in the new Figure 2 and a small schematic to illustrate the pointwise expansion of the Taylor metric in the new Figure 4.

      The method is presented as a "model-free" approach (title and introduction). I think it would help to discuss this with some precision. The Taylor expansion approach does imply certain beliefs on the structure of the data (which are well founded in most cases). Do the authors agree that MINE would encapsulate any regression model where both linear and interaction terms are allowed to include an arbitrary non-linearity (in the case of the interaction terms, different non-linearities for both variables)? If this is the case, maybe an explicit statement would allow the reader to quickly identify the versatility of MINE.

      We are now attempting to make the statement of model-free more precise through quantifications in our rewritten section on deriving receptive fields. We now provide an explanation in the introduction for why we believe that “model-free” is justified. We state: “While the architecture and hyper-parameters of the CNN used by MINE do impose constraints on which relationships can be modeled, we consider the convolutional network ``model-free’’ because it does not make any explicit assumptions about the underlying probability distributions or functional forms of the data.”

      In principle, MINE can accommodate higher-order interactions as well (say of the form xyz or x*y^2) and it certainly has flexibility in applying nonlinear transformations. However, we did not find a satisfying way to quantify the space of possible models MINE can represent exactly and therefore do not feel comfortable to make a precise statement about this.

      I find the section relating to non-linearities interesting, but was slightly disappointed to find that the authors do not propose a single method. In Figure 3E, the authors show that a logistic regression model that combines the curvature and NLC apporaches outperforms either, but the model is not described in any sort of detail. I appreciate the attempt made by the authors to apply this to the zebrafish imaging dataset in Figure 7, but it was still unclear to me how non-linearities and complexity are related.

      We fully agree with the reviewer. We have now merged non-linearity and complexity determination. We hope that this a) simplifies the paper and b) creates a metric that likely generalizes better and in which specific values are more interpretable. In brief, we now define both the nonlinearity and complexity based on truncations of the Taylor expansion around the data average. This new result section (Lines 90-142) also gives us a chance to (hopefully) better introduce the Taylor expansion approach.

    1. Author Response

      Reviewer #1 (Public Review):

      Li et al investigated the behavioral response and fMRI activations associated with deep brain stimulation (DBS) of the lateral habenula (LHb) in 2 distinct rodent models of depression. They found that a) LHb DBS reduces depressive and anxiety behaviors using multiple behavioral tests: sucrose preference, forced swim, and open field. These results held across multiple models of depression and multiple tests, and generally restored results of these behavioral tests to parity with controls. Furthermore, fMRI activations of brain regions with known connectivity to LHb strongly correlated with behavioral responses to LHb DBS, particularly in limbic regions. These behavioral responses clearly depended on electrode location, with more medial placements within the LHb producing a more robust behavioral effect.

      The conclusions of this paper are generally well supported by the data, with the primary weaknesses of the study being 1) limited novelty due to LHb already being a well-established target for DBS in depression, and 2) the questionable validity of rodent models of depression in general. The authors deal with the first point (novelty) by extending their study to electrode localization and fMRI correlates with the behavioral response, leading to insight into surgical targeting as well as mechanism of effect, respectively. They also partially mitigate fundamental problems with rodent models of depression by using 2 different models and showing consistent responses to LHb DBS across both. The methods used in this study were sound, with high-quality techniques used for electrode implantation, confirmation of electrode placement, fMRI acquisition, anesthesia and physiological monitoring, as well as an appropriate statistical analytic approach.

      We thank the reviewer deeply for the positive assessment on our work.

      Reviewer #2 (Public Review):

      This important paper is a real tour de force and combines functional MRI, behaviour, and brain stimulation to characterise the effect of stimulation of the lateral habenula in a rodent model for depression. The results are stunning and the data presented seems compelling.

      My only comment is I would like more discussion on the relevance of these results for the treatment of depression in humans, both in terms of the rodent model and in terms of the results shown in this study.

      We thank the reviewer deeply for the positive assessment on our work. We have added discussion on the relevance of our finding for the treatment of depression in humans on Page 17 of the revised manuscript as follows:

      “The WKY and LPS-treated depressive rat models share similar characteristics, including abnormalities in various neurotransmitter and endocrine systems and emotional changes resulting from inflammatory stimuli. These models are widely used in pharmacological and nonpharmacological depression treatment studies(Caldarone et al., 2015; Aleksandrova et al., 2019; Lasselin et al., 2020). Previous research indicates that classic antidepressants used in humans, such as selective serotonin reuptake inhibitors, also cause an antidepressant reaction in WKY rats. Ketamine, a rapid-acting antidepressant in clinical practice, has been shown to be effective in both WKY and LPS-treated rats(Aleksandrova et al., 2019; J. Zhao et al., 2020). In WKY rats, DBS of the NAc increased exploratory activity and exerted anxiolytic effects, and NAc-DBS was found to be effective for TRD treatment in humans(Dandekar et al., 2018; Aleksandrova et al., 2019). These results suggest that the depression rat models can provide valuable information about the efficacy of various pharmacological and nonpharmacological therapies. In a recent case report, researchers observed acute stimulation effects in addition to long-term clinical improvements in depression, anxiety, and sleep in a patient with TRD upon administering LHb-DBS (Wang et al., 2020). This finding supports the clinical relevance of our observations. However, no animal model of depression can completely replicate human symptoms, and further research is necessary to validate our findings in human patients. Additionally, the long-term efficacy and side effects of LHb-DBS require further investigation. Nevertheless, we believe that our findings propose a promising addition to the rapid-acting therapeutic options for the most refractory depression patients.”

    1. Authorr Response

      Reviewer #2 (Public Review):

      This manuscript is clear in that it shows no/minimal weight gain in a mouse model of trisomy 21 compared to the control mouse, even under a high-calorie diet. The difference is the clear demonstration of the increased expression of sarcolipin. It is important that the expression of SERCA was also shown not different between the genotypes. Additionally, an important result is that manipulating the skeletal muscle was sufficient to promote weight loss without the need for hypermetabolism in other tissues such as adipose tissue.

      • A clear explanation of why the expression of sarcolipin/hypermetabolism is different between mouse and human under the same condition would be useful.

      Overexpression of sarcolipin is only seen in this particular mouse model carrying the near complete human chromosome 21. In another widely used mouse model (Ts65Dn) of Down syndrome where all the triplicated genes (~40% of the human Chr21 orthologs) are of mouse origin, we did not observe the same overexpression of sarcolipin (PMID: 36587842). The reason for this is presently unknown. Human Chr21 contains a significant number of non-coding human genes (>400) with uncertain effects on the mouse transcriptome. Data in Figure 8 represents our efforts to understand what drives the overexpression of mouse sarcolipin (Sln) gene expression in the TcMAC21 mouse model. Although we narrowed it down and highlighted some potential candidate transcriptional drivers for Sln overexpression (Fig. 8), future work is clearly needed to confirm and establish if any of those candidates are the or one of the bona fide driver(s).

      • p.12-13 and15. The language around 'futile' cycling is not correct because Ca movement through the sarcoplasmic reticulum of the resting fiber is essential to the function of the muscle. Firstly, the cycle of Ca through the SR is through the ryanodine receptor (RyR) as well as due to slippage through the SERCA (PMID: 11306667, PMID: 35311921). This is not made clear anywhere in the manuscript. Ca leak out of the SR through RyR is an essential component to the control/setting of the resting cytoplasmic [Ca2+] via the activation of store-operated Ca2+ entry, which is in a balance with the activation of the PMCA on the t-system membrane (PMID: 35218018). The SERCA resequesters the leaked Ca2+ from the SR. It is not possible that the resting [Ca2+] is set by the reduced efficiency of the SERCA, as indicated in the ms (PMID: 20709761). It is expected that the mito [Ca2+] steady state is set by the raised resting cyto [Ca2+] (PMID: 20709761). Ca2+ transients during EC coupling will promote transient increases in mito Ca2+ (PMID: 21795684, PMID: 36121378), but not steady-state increases. Some of these problems are highlighted by the errors in the diagram Fig 5D: please change/correct (i) the invagination of the sarcolemma is called the t-system; (ii) the cycle of Ca leak through the SR starts with RyR Ca leak, where the Ca is resequestered by the SERCA, in addition to Ca slippage through the pump. Draw a RyR opposite the t-system on the SR terminal cisternae. The heat generated by SERCA is absorbed in the cytoplasm, metabolites enter the mito and the OxPhos generates heat (PMID: 31346851). (iii) Ca does not enter mito because it cannot get into the SR (the resting cyto Ca is controlled by the t-system/plasma membrane, PMID: 20709761, PMID: 35218018). Please redraw.

      We have redrawn Fig. 6D diagram as suggested by the reviewer. We have also clarified the information as presented in revised Fig. 6D in the text and figure legend. Heat is generated by mitochondrial oxidative activity. In addition, ATP hydrolysis by the Ca2+ ATPase (SERCA pump) also generates heat (PMID: 12512777; PMID: 34826239; PMID: 11342561; PMID: 17018526; PMID: 12887329). In resting muscle, for every ATP hydrolyzed by the SERCA pump, 2 Ca2+ molecules get transported into the sarcoplasmic reticulum (SR) (PMID: 15189143). In the presence of sarcolipin (SLN), a higher number of ATP needs to be hydrolyzed to move the same number of Ca2+ molecules into the SR, due to Ca2+ slippage (PMID: 34826239; PMID: 23341466). In essence, ATP hydrolysis and Ca2+ transport into the SR by SERCA becomes uncoupled in the presence of SLN. This uncoupling of the SERCA pump, in the context of Ca2+ cycling in and out of the SR (also involving Ryr1), represents the ATP-consuming futile cycle in the skeletal muscle (PMID: 34741717). Since SLN is persistently overexpressed, the ATP-consuming futile activity of the SERCA pump is presumably happening in resting muscle, as well as during EC coupling (since the TcMAC21 mice are also hyperactive).

      • The changing of the properties of the muscle towards oxidative properties is consistent with the expression of sarcolipin in mouse muscle (all of it is in type II fibers). It is important to show whether the muscles have fiber-type shifts. Please report the fiber types of the muscles that have been surveyed in this project.

      In the qPCR data as shown in Figure 6C, we have profiled many genes associated with slow- and fast-twitched muscle fibers in gastrocnemius, and little if any changes were noted. At least at the level of the transcript, there is no indication of fiber type switching in gastrocnemius muscle. However, we did not perform the same qPCR analyses for all the other muscle types isolated (i.e., EDL, quadriceps, plantaris, soleus, and tongue). The main reason for this is that we had used all of these muscle tissues in our respirometry analysis as shown in Figure 6O-Q and Figure 6-Figure Supplement 4-9. Unfortunately, we did not have any leftover muscle tissues to profile muscle fiber types.

      • Non-shivering thermogenesis (NST) is mentioned in this manuscript as the means of hypermetabolism, as has the lengthened duration of the cyto Ca transients during EC coupling. It is not clear at all what the contribution of NST compared to the increased work of the SERCA to clear released Ca from the cyto to the hypermetabolism. What are the relative proportions? If sarcolipin is largely for NST, then hypermetabolism is about the resting muscle.

      In our view, the hypermetabolism we observed in the TcMAC21 mice is primarily due to SLN-mediated uncoupling of the SERCA pump. Chronic effects of SLN overexpression elevates ATP consumption by the SERCA pump and drives the catabolic process (i.e., increased mitochondrial OXPHOS) to generate the ATP needed to meet the demand created by the persistent uncoupling of the SERCA pump. However, the TcMAC21 mice are also hyperactive, and this can also contribute to increased metabolic rate. Since the mice are both hyperactive and hypermetabolic, we do not know the relative contribution of each to the overall phenotype of the mice.

      • The link that SLN is causing more ATP use at the pump but the heat generated by OxPhos in mito is important and should be made, see Barclays' work (eg. PMID: 31346851). A direct link between the SERCA function and mito function is occurring but I currently don't see one being made in the ms. This could be made clear in Fig 5D diagram.

      We have modified and clarified Figure 6D as suggested.

      p.22. "The reprogramming of glycolytic...elevated Ca transients...". The language is wrong here. Oxidative fibers do not have elevated Ca transients compared to glycolytic. The amplitude of Ca release is greater in glycolytic and the duration of the transient is longer in the oxidative (eg. PMID: 12813151).

      We have corrected this in the text and added the citation.

      • p.22. "as less calcium is being transported into the SR due to uncoupling of the SERCA pumps". The same amount of Ca is being transported, just at the expense of more ATP than would be the case in the absence of SLN. Otherwise, the SR Ca2+ content would not be at a steady state while the SR continuously leaks Ca2+.

      We have corrected this in the revised text. The incorrect statement has been deleted.

      • p.23. Tavi & Westerblad (PMID: 21911615) show how Ca transient amplitude and frequency signal in slow and fast twitch fibres. Here, we are not concerned with what is happening in myotubes, where the SR is less developed than in adult fibres.

      We did not use any myotubes in the present study. The myotube was mentioned in the context of discussing a published work (PMID: 30208317).

      Reviewer #3 (Public Review):

      Sarver et al., propose that TcMAC21 mice are hypermetabolic and that this is the cause of their reduced weight. Unfortunately, the developmental defects of TcMAC21 mice make this a challenging question to definitively answer. The authors claim that TcMAC21 mice are hypermetabolic due to a futile calcium cycling in skeletal muscle, which is caused by up-regulation of SLN. However, all of the data that would go into the energy balance equation (food intake, energy absorption, and energy expenditure) have been improperly analyzed. TcMAC21 pups are 8.5 g lighter than euploid littermates. The body weight data and images in Fig. 3A indicate that TcMAC21 mice runted. This difference is primarily a result of lower lean mass (FIG. 2B). This is important as it sets up many concerns that need to be addressed. Specific comments are noted below.

      There is no overt developmental defect in the TcMAC21 mice as their birth weight are not different from the euploid controls (PMID: 32597754). A “runted” mouse is considered very small, poorly developed, and less competitive (PMID: 22822473). The lean phenotype of TcMAC21 mice is due to their hypermetabolism and not the result of developmental defects. The absolute lean mass of TcMAC21 mice is lower than the euploid controls. This is to be expected. A human being that weighs 150 pounds will have less lean mass compared to another person weighing 250 pounds. Lean mass scales with body weight. This does not mean that there is a muscle deficit in the person weighing 150 pounds. That is the reason why the lean mass is also generally presented as % lean mass (after normalizing to body weight). This normalization can tell us whether the amount of lean mass is appropriate (or normal) for a given weight. The % lean mass is either not different between TcMAC21 or euploid mice fed a control chow (Fig. 2B) or significantly higher in TcMAC21 mice fed a high-fat diet (Fig. 3B). This tell us that there is no developmental deficit in the skeletal muscle (biggest contributor to lean mass) of TcMAC21. The amount of lean mass seen in TcMAC21 mice scale appropriately with their lower body weight. Our food intake and energy absorption data were correctly done and analyzed (addressed below). In fact, TcMAC21 mice have the same or slighter higher food intake (absolute amount without normalization) despite weighing much less than the euploid controls (Fig. 2C and Fig. 3A, and Supplementary File 2 and Supplementary File 5). A sick or runted mouse generally consumes much less food and are physically much less active. The TcMAC21 mice are actually hyperactive (Fig. 2D-F and Fig. 4D-F). All our data argue against the notion of “runting” or “developmental defects” in TcMAC21 mice, and instead support our conclusion that TcMAC21 mice are lean due to elevated activity and hypermetabolism.

      Specific comments:

      1) It is incorrect to normalize EE to lean mass if this parameter is different between groups. Normalizing the EE data to lean mass makes it appear as though TcMAC21 mice exhibited increased EE when in fact this is a mathematical artefact. EE data should simply be plotted as ml/h (or kcal/h) per mouse. Alternatively, ANCOVA can be applied using lean mass as a covariate. Excellent reviews on this topic have been written (PMID: 20103710; PMID: 22205519).

      Energy expenditure (EE) data should not be plotted as kcal/h per mouse, as indicated in the review article that the reviewer alluded to (PMID: 22205519). It is a given that EE increases as a function of body weight, as larger body mass requires greater energy to maintain. Plotting EE data per mouse (i.e., kcal/h) would lead to the erroneous conclusion that a fat mouse would have a higher EE compared to a lean mouse. Because lean mass is metabolically much more active than fat mass, normalizing EE data to lean mass is an acceptable way to plot EE data, although not ideal, as indicated by the review article the reviewer alluded to (PMID: 20103710). Often times, normalizing EE to lean mass gives similar results as the ANCOVA, as pointed out by the authors (PMID: 22205519). However, both review articles recommend ANCOVA (using body mass as a covariant of EE) as the preferred method to plot and evaluate EE data. Alongside the EE data (normalized to lean mass), we have now also included the ANCOVA data (Fig. 2D-F and Fig. 4D-F) where we used body weight as a covariate as recommended (PMID: 22205519). The results clearly indicate that the TcMAC21 mice have significantly higher EE compared to the euploid controls.

      2) It makes no sense to normalize food intake to weight, as it makes no sense to divide metabolic rate by weight as well (see above). If food intake is not normalized, this will clearly show that TcMAC21 mice eat much less than controls, and if plotted as cumulative food intake will show that TcMAC21 are smaller and gain less weight on a high-fat diet because they simply eat less. This further indicates that the major tenet of this paper is not correct.

      It is expected that a smaller mouse will eat less food compared to a bigger mouse. Normalizing food intake to body weight can tell you whether the amount of food intake is appropriate (or normal) for a given weight. Amazingly, despite a much lower body weight, ad libitum fed TcMAC21 mice consumed the same or a slightly higher absolute amount of food, without normalizing the data to body weight (Fig. 2C and Fig. 4A and Supplementary File 2 for the chow-fed group and Supplementary File 5 for the HFD-fed group). In fact, the absolute food intake (without normalization) in the refeeding period, after a fast, was significantly higher in the TcMAC21 mice relative to euploid controls (17.7 ± 0.082 vs. 13 ±0.87 kcal, P = 0.002; Supplementary File 5). Thus, relative to their body weight, ad libitum fed TcMAC21 consumed a significantly higher amount of calories (Fig. 2C and Fig. 4A). For transparency, we chose to show side-by-side both the absolute and relative food intake data. These results, along with the rest of the data, provide compelling evidence that hypermetabolism, and not reduced food intake, underlies the lean phenotype of the TcMAC21 mice.

      3) The authors have tried to address the smaller weight of TcMAC21 mice by including weight-matched wild-type mice. However, they only focus on analyzing surface temperature, which is not an indicator of thermogenesis. Moreover, there is no information on whether these weight-matched wild-type mice are similar in age or body composition to the TcMAC21 mice. Nevertheless, the increased surface temperature can also indicate increased heat conservation, which is opposite to thermogenesis. It would make sense that TcMAC21 mice with massive reductions in lean mass would activate compensatory mechanisms of heat conservation to offset increased heat dissipation to the environment. This does seem to be the case, based on the data shown in Fig. 6D (see below).

      Skin temperature has been widely and extensively used a proxy for thermogenesis, often in association with thermogenesis of brown adipose tissue (BAT), which is located just deep to the skin over the shoulder blades of the mouse. Mice fed a high-fat diet lose the “brownness” of their brown adipose tissue as excessive circulating lipid is stored in this depot. This is a well-known phenomenon. One can see this clearly in Figure 4K where the euploid BAT has accumulated a significant amount of lipid while the TcMAC21 BAT has not. The addition of weight-matched mice was solely to help indicate whether or not the BAT was a major contributor to the TcMAC21 hypermetabolic phenotype.

      We did not conduct body composition analysis on the weight-matched mice. With a body weight of less than 30 grams, these wild-type mice represent a similarly lean and healthy adult mouse. They are not age-matched (the control mice are younger) because this is not possible. A wild-type mouse of the same age of TcMAC21 (already on high-fat diet for 12 weeks or longer) will weigh significantly more than the TcMAC21, just as the age-matched euploid littermates weighed significantly more than the TcMAC21 mice.

      The idea of heat conservation is possible, but our data clearly indicate the TcMAC21 mice have elevated thermogenesis. The supporting data include: 1) increased deep colonic temperature; 2) activation of oxidative and thermogenic gene program in skeletal muscle; 3) overexpression of sarcolipin in the skeletal muscle, leading to futile SERCA pump activity and heat generation; 4) Increased skeletal muscle mitochondrial respiration; 5) elevated T3 levels; 6) increased physical activity level; 7) increased energy expenditure (EE normalized to lean mass or ANCOVA using body weight as a covariate). Taken together, these data provide compelling evidence to support our conclusion that the TcMAC21 mice are indeed hypermetabolic and have elevated thermogenesis.

      4) A more optimal method of testing whether increased heat dissipation plays a role in the EE of TcMAC21 mice, is to measure EE at thermoneutrality, where energy dissipation to the environment will be minimized. Here the authors have attempted this in Fig. 6D. Unfortunately, the authors normalized EE to lean mass, artefactually elevating TcMAC21 EE. Despite this mistake, it now looks as though the large differences in EE that were seen at room temp have been attenuated, and only significantly limited to the dark phase. This indicates that in addition to the normalization artefact, higher heat dissipation from smaller TcMAC21 mice may also contribute to the elevated EE at 22C.

      It is well known that at thermoneutrality mouse will markedly reduce their EE. Therefore, it is not surprising that the TcMAC21 mice, housed at thermoneutrality, will have lower EE compared to the TcMAC21 mice housed at room temperature. This also holds true for the euploid controls. This is to be expected. Yet, remarkably, the TcMAC21 mice still have significantly higher EE compared to the euploid controls when housed at thermoneutrality. The TcMAC21 mice never reduce their EE to the level of the euploid controls. We have now included the ANCOVA data for EE using body weight as a covariate as recommended (PMID: 22205519) (Fig. 7F). The results clearly indicate that the TcMAC21 mice have significantly higher EE compared to euploid controls even at thermoneutrality. The data obtained at thermoneutrality, as well as the body weight-matched control experiment as shown in Figure 4I, argue against heat dissipation as the driver of increased EE. Instead, our data support hyperactivity and hypermetabolism as the driver of increased EE.

      5) In Fig. 6D, why is the hourly plot not shown here (like 2D and 4C)? The data clearly are not as striking as the EE data at 22C?

      Because of space limitation in Figure 7, we did not include the hourly tracing data and instead showed the overall energy expenditure (EE) during the light and dark cycle as bar graphs. Per reviewer request, we have now included the hourly tracing data in Fig. 7F, along with the ANCOVA data. The data clearly indicates that TcMAC21 mice, housed at thermoneutrality, have higher EE, especially in the dark cycle when they are active. This is quite remarkable. We know from many published studies that mice significantly reduce their EE when house at thermoneutrality. And yet, the TcMAC21 mice never reduce their EE to the level of euploid controls when housed at thermoneutrality.

      6) GTT was similar between TcMAC21 and controls (Fig. 3I). However, the smaller insulin response could be due to the fact that glucose was normalized to body weight. It would be better to normalize to lean mass, since that is different as well, or simply give all mice the same amount of glucose that the control group receives since this is how it is done in humans.

      The dose of glucose injection in GTT based on mouse weight is widely and extensively practiced across the metabolic community. The TcMAC21 mice are markedly more insulin sensitive, supported by multiple independent lines of evidence: 1) Overnight fasting blood glucose and insulin levels are significantly lower in TcMAC21 mice relative to euploid controls (Figure 3G). 2) Insulin tolerance test clearly indicate a substantial improvement in insulin sensitivity in TcMAC21 mice even though the insulin dose injected was much smaller (i.e., insulin dose was based on body weight) (Figure 3K). 3) The insulin response during refeeding, after an overnight fast, is dramatically lower even though the refeeding blood glucose levels rise to the same levels as the euploid controls (Fig. 3L-M). This is similar to the GTT data where the rate of glucose clearance in TcMAC21 mice is the same as the euploid controls despite a dramatically lower insulin response (Fig. 3I-J). Taken together, these data clearly indicate a markedly heightened insulin sensitivity in TcMAC21 mice relative to euploid controls.

      7) The fecal energy in Fig. 4B only measures the concentration of energy per gram of feces. However, this analysis has failed to take into account total fecal excretion, which should be used to multiply the energy density of the feces. Thus, these data are incomplete and not sufficient to exclude absorption differences between the groups. And it is now curious why if all other metabolic measurements (even though wrong), such as food intake and EE are normalized to body weight, why have the authors not normalized to body weight for the feces data? Is this because if this was done this would show massive elevating in fecal energy in TcMAC21 mice and thus falsify their hypothesis?

      The fecal data the reviewer requested was originally in the supplemental figure section. We have now moved these data to the main figure to ensure that this will not be missed by any reader. As indicated in the text and in Fig. 4B, TcMAC21 mice fed a HFD show no difference in fecal frequency (movements/day), fecal weight (g/movement), fecal energy composition (cal/g) and total fecal energy (kcal/day). These data clearly indicate that the fecal energy content is not different between TcMAC21 and euploid mice. These results, along with the rest of the data in the paper, provide compelling evidence that hypermetabolism, and not reduced nutrient absorption in the gut, underlies the lean phenotype and resistance of TcMAC21 mice to weight gain when fed a high-fat diet.

      8) I cannot find any indication of sample size in any of the EE experiments, aside from the bar graph in Fig. 6D. In any case, this experiment only an n=4 to 5 per group. This is an extremely small number for these types of experiments, so how can the authors be sure of reproducibility with such a low sample size? Are all of the other EE experiments also of similarly small sample sizes?

      Sample size for all EE experiments were clearly indicated in the original text, figure legends, and figures themselves, as well as in all supplemental figures and Supplementary files. In addition, for transparency, we always include individual data points, whenever possible, for all our data figures. They were sufficiently powered (n = 8-9 per genotype) and the effect size was large. Sample size for all thermoneutral experiments were lower than both the chow-fed and HFD-fed experiments because these mice are hard to breed and in limited supply.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      Gene expression level as a confounding factor was not well controlled throughout the study. Higher gene expression often makes genes less dispensable after gene duplication. Gene expression level is also a major determining factor of evolutionary rates (reviewed in http://www.ncbi.nlm.nih.gov/pubmed/26055156). Some proposed theories explain why gene expression level can serve as a proxy for gene importance (http://www.ncbi.nlm.nih.gov/pubmed/20884723, http://www.ncbi.nlm.nih.gov/pubmed/20485561). In that sense, many genomic/epigenomic features (such as replication timing and repressed transcriptional regulation) that were assumed "neutral" or intrinsic by the authors (or more accurately, independent of gene dispensability) cannot be easily distinguishable from the effect of gene dispersibility.

      We thank the reviewer for this important comment. We totally agree that transcriptomic and epigenomic features cannot be easily distinguished from gene dispensability and do not think that these features of the elusive genes can be explained solely by intrinsic properties of the genomes. Our motivation for investigating the expression profiles of the elusive gene is to understand how they lost their functional indispensability (original manuscript L285-286 in Results). We also discussed the possibility that sequence composition and genomic location of elusive genes may be associated with epigenetic features for expression depression, which may result in a decrease of functional constraints (original manuscript L470-474 in Discussion). Nevertheless, we think that the original manuscript may have contained misleading wordings, and thus we have edited them to better convey our view that gene expression and epigenomic features are related to gene function.

      (P.2, Introduction) This evolutionary fate of a gene can also be affected by factors independent of gene dispensability, including the mutability of genomic positions, but such features have not been examined well.

      (P6, Introduction) These data assisted us to understand how intrinsic genomic features may affect gene fate, leading to gene loss by decreasing the expression level and eventually relaxing the functional importance of ʻelusiveʼ genes.

      (P33, Discussion) Another factor is the spatiotemporal suppression of gene expression via epigenetic constraints. Previous studies showed that lowly expressed genes reduce their functional dispensability (Cherry, 2010; Gout et al., 2010), and so do the elusive genes.

      Additionally, responding to the advices from Reviewers 1 and 2 [Rev1minor7 and Rev2-Major4], we have added a new section Elusive gene orthologs in the chicken microchromosomes in which we describe the relationship between the elusive genes and chicken microchromosomes. In this section, we also argue for the relationship between the genomic feature of the elusive genes and their transcriptomic and epigenomic characteristics. In the chicken genome, elusive genes did not show reduced pleiotropy of gene expression nor the epigenetic features relevant with the reduction, consistently with the moderation of nucleotide substitution rates. This also suggests that the relaxation of the ‘elusiveness’ is associated with the increase of functional indispensability.

      (P27, Elusive gene orthologs in the chicken microchromosomes in Results) Our analyses indicates that the genomic features of the elusive genes such as high GC and high nucleotide substitutions do not always correlate with a reduction in pleiotropy of gene expression that potentially leads to an increase in functional dispensability, although these features have been well conserved across vertebrates. In addition, the avian orthologs of the elusive genes did not show higher KA and KS values than those of the non-elusive genes (Figure 3; Figure 3–figure supplement 1), likely consistent with similar expression levels between them (Figure 5–figure supplement 1) (Cherry, 2010; Zhang and Yang, 2015). With respect to the chicken genome, the sequence features of the elusive genes themselves might have been relaxed during evolution.

      Ks was used by the authors to indicate mutation rates. However, synonymous mutations substantially affect gene expression levels (https://pubmed.ncbi.nlm.nih.gov/25768907/, https://pubmed.ncbi.nlm.nih.gov/35676473/). Thus, synonymous mutations cannot be simply assumed as neutral ones and may not be suitable for estimating local mutation rates. If introns can be aligned, they are better sequences for estimating the mutability of a genomic region.

      We appreciate the reviewer for this meaningful suggestion. As a response, we have computed the differences in intron sequences between the human and chimpanzee genomes and compared them between the elusive and non-elusive genes. As expected, we found larger sequence differences in introns for the elusive genes than for the non-elusive genes. In Figure 2c of the revised manuscript, we have included the distribution of KI, sequence differences in introns between the human and chimpanzee genomes for the elusive and non-elusive genes. Additionally, we have added the corresponding texts to Results and the procedure to Methods as shown below.

      (P11, Identification of human ‘elusive’ genes in Results) In addition, we computed nucleotide substitution rates for introns (KI) between human and chimpanzee (Pan troglodytes) orthologs and compared them between the elusive and non-elusive genes.

      (P11, Identification of human ‘elusive’ genes in Results) Our analysis further illuminated larger KS and KI values for the elusive genes than in the non-elusive genes (Figure 2b, c; Figure 2–figure supplement 1). Importantly, the higher rate of synonymous and intronic nucleotide substitutions, which may not affect changes in amino acid residues, indicates that the elusive genes are also susceptible to genomic characteristics independent of selective constraints on gene functions.

      (P39, Methods) To compute nucleotide sequence differences of the individual introns, we extracted 473 elusive and 4,626 non-elusive genes that harbored introns aligned with the chimpanzee genome assembly. The nucleotide differences were calculated via the whole genome alignments of hg38 and panTro6 retrieved from the UCSC genome browser.

      The term "elusive gene" is not necessarily intuitive to readers.

      We previously published a paper reporting the group of genes that we refer to as ‘elusive genes,’ lost in mammals and aves independently but retained by reptiles, in the gecko genome assembly (Hara et al., 2018, BMC Biology). We initially termed them with a more intuitive name (‘loss-prone genes’) but changed it because one of our peer-reviewers did not agree to use this name. Later on, we have continuously used this term in another paper (Hara et al., 2018, Nat. Ecol. Evol.). In addition, some other groups have used the word ‘elusive’ with a similar intention to ours (Prokop et al, 2014, PLOS ONE, doi: 10.1371/journal.pone.0092751; Ribas et al., 2011, BMC Genomics, doi: 10.1186/1471-2164-12-240). We would appreciate the reviewer’s understanding of this naming to ensure the consistency of our researches on gene loss. In the revised manuscript, we have added sentences to provide a more intuitive guide to ‘elusive genes’,

      (P6, Introduction) We previously referred to the nature of genes prone to loss as ‘elusive’(Hara et al., 2018a, 2018b). In the present study, we define the elusive genes as those that are retained by modern humans but have been lost independently in multiple mammalian lineages. As a comparison of the elusive genes, we retrieved the genes that were retained by almost all of the mammalian species examined and defined them as ‘non-elusive’, representing those persistent in the genomes.

      Reviewer #3 (Public Review):

      Overall, the study is descriptive and adds incremental evidence to an existing body of extensive gene loss literature. The topic is specialised and will be of interest to a niche audience. The text is highly redundant, repeating the same false positive issue in the introduction, methods, and discussion sections, while no clear conclusion or interpretation of their main findings are presented.

      Major comments

      While some of the false discovery rate issues of gene loss detection were addressed in the presented pipeline, the authors fail to test one of the most severe cases of mis-annotating gene loss events: frameshift mutations which cause gene annotation pipelines to fail reporting these genes in the first place. Running a blastx or diamond blastx search of their elusive and non-elusive gene sets against all other genomes, should further enlighten the robustness of their gene loss detection approach

      For the revised manuscript, we have refined the elusive gene set as the reviewer suggested. In the genome assemblies, we have searched for the orthologs of the elusive genes for the species in which they were missing. The search has been conducted by querying amino acid sequences of the elusive genes with tblastn as well as MMSeqs2 that performed superior to tblastn in sensitivity and computational speed. In addition, regarding another comment by Reviewer 3. we have searched for the orthologs by referring to existing ortholog annotations. We used the ortholog annotations implemented in RefSeq instead of those from the TOGA pipeline: both employ synteny conservation. We have coordinated the identified orthologs with our gene loss criteria–absence from all the species used in a particular taxon–and excluded 268 genes from the original elusive gene set. These genes contain those missing in the previous gene annotations used in the original manuscript but present in the latest ones, as well as those falsely missing due to incorrect inference of gene trees. Finally, the refined set of 813 elusive genes were subject to comparisons with the non-elusive genes. Importantly, these comparisons retained the significantly different trends of the particular genomic, transcriptomic, and epigenomic features between them except for very few cases (Table R1 included below). This indicates that both initial and revised sets of the elusive genes reflect the nature of the ‘elusiveness,’ though the initial set contained some noises. We have modified the numbers of elusive genes in the corresponding parts of the manuscript including figures and tables. Additionally, we have added the validation procedures in Methods.

      Table R1. Difference in statistical significances across different elusive gene sets *The other features showed significantly different trends between the elusive and non-elusive genes for all of the elusive gene sets and thus are not included in this table.

      (P38 in Methods) The gene loss events inferred by molecular phylogeny were further assessed by synteny-based ortholog annotations implemented in RefSeq, as well as a homolog search in the genome assemblies (Table S2) with TBLASTN v2.11.0+ (Altschul et al., 1997) and MMSeqs2 (Steinegger and Söding, 2017) referring to the latest RefSeq gene annotations (last accessed on 2 Dec, 2022). This procedure resulted in the identification of 813 elusive genes that harbored three or fewer duplicates. Similarly, we extracted 8,050 human genes whose orthologs were found in all the mammalian species examined and defined them as non-elusive genes.

      The reviewer also suggested us investigating falsely-missing genes due to frameshift mutations (in this case we guess that the reviewer assumed the genome assembly that falsely included frameshift mutations). This requires us to search for the orthologs by revisiting the sequencing reads because the frameshift is sometimes caused by indels of erroneous basecalling. We have selected five elusive genes and searched for the fragments of orthologs in sequencing reads for the species in which they are missing. We have retrieved sequencing reads corresponding to the genome assemblies from NCBI SRA and performed sequence similarity search using the program Diamond against the amino acid sequences of the elusive genes and could not find the frameshift that potentially causes the mis-annotation of the elusive genes.

      Along this line, we noticed that when annotation files were pooled together via CD-Hit clustering, a 100% identity threshold was chosen (Methods). Since some of the pooled annotations were drawn from less high quality assemblies which yield higher likelihoods of mismatches between annotations, enforcing a 100% identity threshold will artificially remove genes due to this strict constraint. It will be paramount for this study to test the robustness of their findings when 90% and 95% identity thresholds were selected.

      cd-hit clustering with 100% sequence identity only clusters those with identical (and sometimes truncated) sequences, and, in the cluster, the sequences other than the representative are discarded. This means that the sequences remain if they are not identical to the other ones. If the similarity threshold is lowered, both identical and highly similar sequences are clustered with each other, and more sequences are discarded. Therefore, our approach that employs clustering with 100% similarity may minimize false positive gene loss.

      While some statistical tests were applied (although we do recommend consulting a professional statistician, since some identical distributions tend to show significantly low p-values), the authors fail to discuss the fact that their elusive gene set comprises of ~5% of all human genes (assuming 21,000 genes), while their non-elusive set represents ~40% of all genes. In other words, the authors compare their sequence and genomic features against the genomic background rather than a biological signal (nonelusiveness). An analysis whereby 1,081 genes (same number as elusive set) are randomly sampled from the 21,000 gene pool is compared against the elusive and non-elusive distributions for all presented results will reveal whether the non-elusive set follows a background distribution (noise) or not.

      Our study aims to elucidate the characteristics of genes that differentiate their fates, retention or loss. To achieve this, we put this characterization into the comparison between the elusive and non-elusive genes. This comparison highlighted clearly different phylogenetic signals for gene loss between elusive and non-elusive genes, allowing us to extract the features associated with the loss-prone nature. The random sampling set suggested by Reviewer may largely consists of the remainders that were not classified by the elusive and non-elusive genes. However, these remainders may contain a considerable number of genes with distinctive phylogenetic signatures rather than the intermediates between the elusive and nonelusive genes: the genes with multiple loss events in more restricted taxa than our criterion, the ones with frequent duplication, etc. Therefore, we think that a comparison of the elusive genes with the random-sampling set does not achieve our objective: the comparison of the clearly different phylogenetic signals.

      We also wondered whether the authors considered testing the links between recombination rate / LD and the genomic locations of their elusive genes (again compared against randomly sampled genes)?

      We have retrieved fine-scale recombination rate data of males and females from https://www.decode.com/addendum/ (Suppl. Data of Kong, A et al., Nature, 467:1099–1103, 2010) and have compared them between the gene regions of the elusive and non-elusive genes. Both comparisons show no significant differences: average 0.829 and 0.900 recombinations/kb for the elusive and non-elusive genes, respectively, p=0.898, for males; average 0.836 and 0.846 recombinations/kb for the elusive and non-elusive genes, respectively, p=0.256, for females).

      Given the evidence presented in Figure 6b, we do not agree with the statement (l.334-336): "These observations suggest that the elusive genes are unlikely to be regulated by distant regulatory elements". Here, a data population of ~1k genes is compared against a data population of ~8k genes and the presented difference between distributions could be a sample size artefact. We strongly recommend retesting this result with the ~1k randomly sampled genes from the total ~21,000 gene pool and then compare the distributions.

      Analogous random sampling analysis should be performed for Fig 6a,d

      As described above, our study does not intend to extract signals from background. To make the comparison objectives clear, we have revised the corresponding sentence as below.

      (P22, Transcriptomic natures of elusive genes in Results) These observations suggest that the elusive genes are unlikely to be regulated by distant regulatory elements compared with the non-elusive genes (Figure 6b).

      We didn't see a clear pattern in Figure 7. Please quantify enrichments with statistical tests. Even if there are enriched regions, why did the authors choose a Shannon entropy cutoff configuration of <1 (low) and >1 (high)? What was the overall entropy value range? If the maximum entropy value was 10 or 100 or even more, then denoting <1 as low and >1 as high seems rather biased.

      To use Figure 7 in a new section in Results, we have added an ideogram showing the distribution of the genes that retain the chicken orthologs in microchromosomes. In response to the comment by Reviewer 2, we have performed statistical tests and found that the elusive genes were significantly more abundant in orthologs in microchromosomes than the non-elusive genes. Furthermore, the observation that the elusive genes prefer to be located in gene-rich regions was already statistically supported (Figure 2f).

      As shown in Figure 5, Shannon’s H' ranged from zero to approximately 4 (exact maximum value is 3.97) and 5 (5.11) for the GTEx and Descartes gene expression datasets, respectively. Although the threshold H'=1 was an arbitrarily set, we think that it is reasonable to classify the genes with high pleiotropy from those with low pleiotropy.

    1. Author Response

      Reviewer #1 (Public Review):

      1) It would be helpful to include some sort of comparison in Fig. 4, e.g. the regressions shown in Fig 3, to indicate to what extent the ICCl data corresponds to the "control range" of frequency tuning.

      Figure 4 was modified to show the frequency range typically found in the ICCls. This range is based on results from Wagner et al., 2007, which extensively surveyed ICCls responses. This modification shows that our ICCls recordings in the ruff-removed owls cover the normal frequency hearing range of the owl.

      2) A central hypothesis of the study is that the frequency preference of the high-frequency neurons is lower in ruff-removed owls because of the lowered reliability caused by a lack of the ruff. Yet, while lower, the frequency range of many neurons in juvenile and ruff-removed owls seems sufficiently high to be still responsive at 7-8 kHz. I think it would be important to know to what extent neurons are still ITD sensitive at the "unreliable high frequencies" even if the CFs are lower since the "optimization" according to reliability depends not on the best frequency of each neuron per se, but whether neurons are less ITD sensitive at the higher, less reliable frequencies.

      The concern regarding the frequency range that elicits responsivity was largely addressed above. Specifically, Figure L1 showing frequency tuning of frontally tuned ICx neurons in ruff-removed owls indicates that while there is some variability of tuning across neurons, there is little responsivity above 6 kHz. In contrast, equivalent analysis in juvenile owls (Figure L3), shows there is much more responsiveness and variability across neurons to high and low frequencies. This evidence supports our hypothesis that the juvenile owl brain is still highly plastic, which facilitates learning during development. Although the underlying data was already reported in Figure 7 of our previously submitted manuscript, we can include Figures L1 and L2, potentially as supplemental figures, if considered useful by editors and reviewers. Nevertheless, this argumentation was further expanded in the revised text (Line 229).

      Figure L1. Frequency tuning of frontally-tuned ICx neurons in ruff-removed owls. Tuning curves are normalized by the max response. Thick black line indicates the average tuning curve. Dashed black line indicates basal response.

      Figure L2. ITD sensitivity across frequencies in ruff-removed owl. Two example neurons shown in a and b. ITD tuning for tones (colored) and broadband (black) plotted by firing rate (non-normalized). Solid colored lines indicate responses to frequencies that are within the neuron’s preferred frequency range (i.e. above the half-height, see Methods), dashed lines indicate frequencies outside of the neuron’s frequency range.

      Figure L3. Frequency tuning of frontally-tuned ICx neurons in juvenile owls. Tuning curves are normalized by the max response. Thick black line indicates the average tuning curve. Dashed black line indicates basal response.

      3) It would be interesting to have an estimate of the time scale of experience dependency that induces tuning changes. Do the authors have any data on this question? I appreciate the authors' notion that the quantifications in Fig 7 might indicate that juvenile owls are already "beginning to be shaped by ITD reliability" (line 323 in Discussion). How many days after hearing onset would this correspond to? Does this mean that a few days will already induce changes?

      While tracking changes induced by ruff-removal over development were outside of the scope of this study, many other studies have assessed experience-dependent plasticity in the barn owl. The recordings in this study were performed approximately 20 days after hearing onset, suggesting that the juveniles had ample time to begin learning. These points were expanded upon in the discussion (Lines 254, 280-283).

      Reviewer #2 (Public Review):

      1) Why is IPD variability plotted instead of ITD variability (or indeed spatial reliability)? The relationship between these measures is likely to vary across frequency, which makes it difficult to compare ITD variability across frequency when IPDs are plotted. Normalizing data across frequencies also makes it difficult to compare different locations and acoustical conditions. For example, in Fig.1a and Fig.1b, the data shown for 3 kHz at ~160 degrees seems quantitatively and visually quite different, but the difference (in Fig.1c) appears to be negligible.

      Justification of why IPD variability is used as an estimate of ITD variability was added to introduction (Lines 55-60), results (Line 100) and methods (Lines 371-374) sections of the manuscript, explaining the fact that because ITD detection is based on phase locking by auditory nerve and ITD detector neurons tuned to narrow frequency bands, responses of ITD detector neurons forwarded to downstream midbrain regions are therefore determined by IPD variability. Additionally, ITD is calculated by dividing IPD by frequency, which makes comparisons of ITD reliability across frequency mathematically uninformative.

      2) How well do the measures of ITD reliability used reflect real-world listening? For example, the model used to calculate ITD reliability appears to assume the same (flat) spectral profile for targets and distractors, which are presented simultaneously with the same temporal envelope, and a uniform spatial distribution of sounds across space. It is therefore unclear how robust the study's results are to violations of these assumptions.

      While we agree that our analysis cannot completely capture real-world listening for the barn owl, a general analysis using similar flat spectral profiles for targets and concurrent sounds provides a broad assessment of reliability of ITD cues. While a full recapitulation of real-world listening is beyond the scope of this study (i.e. recording natural scenes from the ear canals of wild barn owls), we included additional analyses of ITD reliability in Figure 1-figure supplement 1, described above.

      3) Does facial ruff removal produce an isolated effect on ITD variability or does it also produce changes in directional gain, and the relationship between spatial cues and sound location? Although the study considers this issue in some places (e.g. Fig.2, Fig.5), a clearer presentation of the acoustical effects of facial ruff removal and their implications (for all locations, not just those to the front), as well as an attempt to understand how these acoustical changes lead to the observed changes in ITD reliability, would greatly strengthen the study. In addition, Fig.1 shows average ITD reliability across owls, but it would be helpful to know how consistent these measures are across owls, given individual variability in Head-Related Transfer Functions (HRTFs). This potentially has implications for the electrophysiological experiments, if the HRTFs of those animals were not measured. One specific question that is potentially very relevant is whether the facial ruff attenuates sounds presented behind the animal and whether it does so in a frequency-dependent way. In addition, if facial ruff removal enables ILDs to be used for azimuth, then ITDs may also become less necessary at higher frequencies, even if their reliability remains unchanged.

      Additional analysis was conducted to generate representation of changes in directional gain induced by ruff removal, added to new figure (Fig 5). This analysis shows that changes in gain following ruff-removal are largely frequency-independent: there is a de-attenuation of peripherally and rearwardly located sounds, but the highest gain remains for high frequencies in frontal space. There is an additional increase in gain for high frequencies from rearward space, these changes would not explain the changes in frequency tuning we report. As mentioned in new additions to the manuscript, the changes at the most rearward-located auditory spatial locations are unlikely to have an effect on the auditory midbrain. No studies in the barn owl have found neurons in the ICx or optic tectum tuned to >120° (Knudsen, 1982; Knudsen, 1984; Cazettes et al., 2014). In addition, variability of IPD reliability across owls was analyzed and reported in the amended Figure 1, which notes very little changes across owls. In this analysis, we did realize that the file of one of the HRTFs obtained from von Campenhausen et al. 2006 was mislabeled, which explains slight differences in revised Fig 1b. Nevertheless, added analysis of IPD reliability across owls indicates that the pattern in ITD reliability is stable across owls (Fig. 1d,e), which supports our decision to not record HRTFs from owls used in this study. Finally, we added to the discussion that clarifies that the use of ILD for azimuth would not provide the same resolution as ITD would (Lines 295-303). We also do not believe that the use of ILD for azimuth would make “ITDs… less necessary at higher frequencies”, given that the ICCls is still computing ITD at these high frequencies (Fig 4), and that ILDs also have higher resolution at higher frequencies, with and without the facial ruff (Olsen et al, 1989; Keller et al., 1998; von Campenhausen et al., 2006).

      1) It is unclear why some analyses (Fig.5, Fig.7) are focused on frontal locations and frontally-tuned neurons. It is also unclear why neurons with a best ITDs of 0 are described as frontally tuned since locations behind the animal produce an ITD of 0 also. Related to this, in Fig.1, facial ruff removal appears to reduce IPD variability at low frequencies for locations to the rear (~160 degrees), where the ITD is likely to be close to 0. Neurons with a best ITD of 0 might therefore be expected to adjust their frequency tuning in opposite directions depending on whether they are tuned to frontal or rearward locations.

      An extensive explanation was added to the methods detailing why we do not believe the neurons recorded in this study are tuned to the rear. Namely, studies mapping the barn owl’s ICx and optic tectum have not reported neurons tuned to locations >120°, with the number of neurons representing a given spatial location decreasing with eccentricity (Knudsen, 1982; Knudsen, 1984; Cazettes et al., 2014). While we agree that there does seem to be a change in ITD reliability at ~160° following ruff-removal, the result is largely similar to the change that occurs in frontal space (Fig 1b), which is consistent with the ruff-removed head functioning as a sphere. Thus, we wouldn’t expect rearwardly-tuned neurons, if they could be readily found, to adjust their frequency tuning to higher frequencies. Finally, we want to clarify that we focused our analyses on frontally-tuned neurons because frontal space is where we observed the largest change in ITD reliability. Text was added to the Discussion section to clarify this point (Lines 313-321).

      2) The study suggests that information about high-frequency ITDs is not passed on to the ICX if the ICX does not contain neurons that have a high best frequency. However, neurons might be sensitive to ITDs at frequencies other than the best frequency, particularly if their frequency tuning is broader. It is also unclear whether the best frequency of a neuron always corresponds to the frequency that provides the most reliable ITD information, which the study implicitly assumes.

      The concern about ITD sensitivity at non-preferred frequencies was addressed under the essential revision #3, as well as under Reviewer 1’s concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      How morphogens spread within tissues remains an important question in developmental biology. Here the authors revisit the role of glypicans in the formation of the Dpp gradient in wing imaginal discs of Drosophila. They first use sophisticated genome engineering to demonstrate that the two glypicans of Drosophila are not equivalent despite being redundant for viability. They show that Dally is the relevant glypican for Dpp gradient formation. They then provide genetic evidence that, surprisingly, the core domain of Dally suffices to trap Dpp at the cell surface (suggesting a minor role for GAGs). They conclude with a model that Dally modulates the range of Dpp signaling by interfering with Dpp's degradation by Tkv. These are important conclusions, but more independent (biochemical/cell biological) evidence is needed.

      As indicated above, the genetic evidence for the predominant role of Dally in Dpp protein/signalling gradient formation is strong. In passing, the authors could discuss why overexpressed Dlp has a negative effect on signaling, especially in the anterior compartment. The authors then move on to determine the role of GAG (=HS) chains of Dally. They find that in an overexpression assay, Dally lacking GAGs traps Dpp at the cell surface and, counterintuitively, suppresses signaling (fig 4 C, F). Both findings are unexpected and therefore require further validation and clarification, as outlined in a and b below.

      a) In loss of function experiments (dallyDeltaHS replacing endogenous dally), Dpp protein is markedly reduced (fig 4R), as much as in the KO (panel Q), suggesting that GAG chains do contribute to trapping Dpp at the cell surface. This is all the more significant that, according to the overexpression essays, DallyDeltaHS seems more stable than WT Dally (by the way, this difference should also be assessed in the knock-ins, which is possible since they are YFP-tagged). The authors acknowledge that HS chains of Dally are critical for Dpp distribution (and signaling) under physiological conditions. If this is true, one can wonder why overexpressed dally core 'binds' Dpp and whether this is a physiologically relevant activity.

      According to the overexpression assay, DallyDeltaHS seems more stable than WT Dally (Fig. 4B’, E’, 5H, I). As the reviewer suggested, we addressed the difference using the two knock-in alleles and found that DallyDeltaHS is more stable than WT Dally (Fig.4 L, M inset), further emphasizing the insufficient role of core protein of Dally for extracellular Dpp distribution.

      (During the revising our figure, we found labeling mistake in Fig. 4M, N and Fig. 4Q, R and corrected the genotypes.)

      In summary, we showed that, although Dally interacts with Dpp mainly through its core protein from the overexpression assay (Fig. 4E, I), HS chains are essential for extracellular Dpp distribution (Fig. 4R). Thus, the core protein of Dally alone is not sufficient for extracellular Dpp distribution under physiological conditions. These results raise a question about whether the interaction of core protein of Dally with Dpp is physiologically relevant. Since the increase of HS upon dally expression but not upon dlp expression resulted in the accumulation of extracellular Dpp (Fig. 2) and this accumulation was mainly through the core protein of Dally (Fig. 4E, I), we speculate that the interaction of the core protein of Dally with Dpp gives ligand specificity to Dally under physiological conditions.

      To understand the importance of the interaction of core protein of Dally with Dpp under physiological conditions, it is important to identify a region responsible for the interaction. Our preliminary results overexpressing a dally mutant lacking the majority of core protein (but keeping the HS modified region intact) showed that HS chains modification was also lost. Although this is consistent with our results that enzymes adding HS chains also interact with the core protein of Dally (Fig. 4D), the dally mutant allele lacking the core protein would hamper us from distinguishing the role of core protein of Dally from HS chains.

      Nevertheless, we can infer the importance of the interaction of core protein of Dally with Dpp using dally[3xHA-dlp, attP] allele, where dlp is expressed in dally expressing cells. Since Dally-like is modified by HS chains but does not interact with Dpp (Fig. 2, 4), dally[3xHA-dlp, attP] allele mimics a dally allele where HS chains are properly added but interaction of core protein with Dpp is lost. As we showed in Fig.3O, S, the allele could not rescue dallyKO phenotypes, consistent with the idea that interaction of core protein of Dally with Dpp is essential for Dpp distribution and signaling and HS chain alone is not sufficient for Dpp distribution.

      b) Although the authors' inference that dallycore (at least if overexpressed) can bind Dpp. This assertion needs independent validation by a biochemical assay, ideally with surface plasmon resonance or similar so that an affinity can be estimated. I understand that this will require a method that is outside the authors' core expertise but there is no reason why they could not approach a collaborator for such a common technique. In vitro binding data is, in my view, essential.

      We agree with the reviewer that a biochemical assay such as SPR helps us characterize the interaction of core protein of Dally and Dpp (if the interaction is direct), although the biochemical assay also would not demonstrate the interaction under the physiological conditions.

      However, SPR has never been applied in the case of Dpp, probably because purifying functional refolded Dpp dimer from bacteria has previously been found to be stable only in low pH and be precipitated in normal pH buffer (Groppe J, et al., 1998)(Matsuda et al., 2021). As the reviewer suggests, collaborating with experts is an important step in the future.

      Nevertheless, SPR was applied for the interaction between BMP4 and Dally (Kirkpatrick et al., 2006), probably because BMP4 is more stable in the normal buffer. Although the binding affinity was not calculated, SPR showed that BMP4 directly binds to Dally and this interaction was only partially inhibited by molar excess of exogenous HS, suggesting that BMP4 can interact with core protein of Dally as well as its HS chains. In addition, the same study applied Co-IP experiments using lysis of S2 cells and showed that Dpp and core protein of Dally are co-immunoprecipitated, although it does not demonstrate if the interaction is direct.

      In a subsequent set of experiments, the authors assess the activity of a form of Dpp that is expected not to bind GAGs (DppDeltaN). Overexpression assays show that this protein is trapped by DallyWT but not dallyDeltaHS. This is a good first step validation of the deltaN mutation, although, as before, an invitro binding assay would be preferable.

      Our overexpression assays actually showed that DppDeltaN is trapped by DallyWT and by dallyDeltaHS at similar levels (Fig. 5H-J), indicating that interaction of DppDeltaN and HS chains of Dally is largely lost but DppDeltaN can still interact with core protein of Dally.

      (Related to this, we found typo in the sentence “In contrast, the relative DppΔN accumulation upon DallyΔHS expression in JAX;dppΔN was comparable to that upon DallyΔHS expression in JAX;dppΔN (Fig. 5H-J).” and corrected as follows, “In contrast, the relative DppΔN accumulation upon Dally expression in JAX;dppΔN was comparable to that upon DallyΔHS expression in JAX;dppΔN (Fig. 5H-J).”

      We thank the reviewer for the suggesting the in vitro experiment. Although we decided not to develop biophysical experiments such as SPR for Dpp in this study due to the reasons discussed above, we would like to point out that our result is consistent with a previous Co-IP experiment using S2 cells showing that DppDeltaN loses interaction with heparin (Akiyama2008).

      However, in contrast to our results, the same study also proposed by Co-IP experiments using S2 cells that DppDeltaN loses interaction with Dally (Akiyama2008). Although it is hard to conclude since western blotting was too saturated without loading controls and normalization (Fig. 1C in Akiyama 2008), and negative in vitro experiments do not necessarily demonstrate the lack of interaction in vivo. One explanation why the interaction was missed in the previous study is that some factors required for the interaction of DppDeltaN with core protein of Dally are missing in S2 cells. In this case, in vivo interaction assay we used in this study has an advantage to robustly detect the interaction.

      Nevertheless, the authors show that DppDeltaN is surprisingly active in a knock-in strain. At face value (assuming that DeltaN fully abrogates binding to GAGs), this suggests that interaction of Dpp with the GAG chains of Dally is not required for signaling activity. This leads to authors to suggest (as shown in their final model) that GAG chains could be involved in mediating the interactions of Dally with Tkv (and not with Dpp. This is an interesting idea, which would need to be reconciled with the observation that the distribution of Dpp is affected in dallyDeltaHS knock-ins (item a above). It would also be strengthened by biochemical data (although more technically challenging than the experiments suggested above). In an attempt to determine the role of Dally (GAGs in particular) in the signaling gradient, the paper next addresses its relation to Tkv. They first show that reducing Tkv leads to Dpp accumulation at the cell surface, a clear indication that Tkv normally contributes to the degradation of Dpp. From this they suggest that Tkv could be required for Dpp internalisation although this is not shown directly. The authors then show that a Dpp gradient still forms upon double knockdown (Dally and Tkv). This intriguing observation shows that Dally is not strictly required for the spread of Dpp, an important conclusion that is compatible with early work by Lander suggesting that Dpp spreads by free diffusion. These result show that Dally is required for gradient formation only when Tkv is present. They suggest therefore that Dally prevents Tkv-mediated internalisation of Dpp. Although this is a reasonable inference, internalisation assays (e.g. with anti-Ollas or anti-HA Ab) would strengthen the authors' conclusions especially because they contradict a recent paper from the Gonzalez-Gaitan lab.

      Thanks for suggesting the internalization assay. As we discussed in the discussion, our results suggest that extracellular Dpp distribution is severely reduced in dally mutants due to Tkv mediated internalization of Dpp (Fig. 6). Thus, extracellular Dpp available for labelling with nanobody is severely reduced in dally mutants, which can explain the reduced internalization of Dpp in dally mutants in the internalization assay. Therefore, we think that the nanobody internalization assay would not distinguish the two contradicting possibilities.

      The paper ends with a model suggesting that HS chains have a dual function of suppressing Tkv internalisation and stimulating signaling. This constitutes a novel view of a glypican's mode of action and possibly an important contribution of this paper. As indicated above, further experiments could considerably strengthen the conclusion. Speculation on how the authors imagine that GAG chains have these activities would also be warranted.

      Thank you very much!

      Reviewer #2 (Public Review):

      The authors are trying to distinguish between four models of the role of glypicans (HSPGs) on the Dpp/BMP gradient in the Drosophila wing, schematized in Fig. 1: (1) "Restricted diffusion" (HSPGs transport Dpp via repetitive interaction of HS chains with Dpp); (2) "Hindered diffusion" (HSPGs hinder Dpp spreading via reversible interaction of HS chains with Dpp); (3) "Stabilization" (HSPGs stabilize Dpp on the cell surface via reversible interaction of HS chains with Dpp that antagonizes Tkv-mediated Dpp internalization); and (4) "Recycling" (HSPGs internalize and recycle Dpp).

      To distinguish between these models, the authors generate new alleles for the glypicans Dally and Dally-like protein (Dlp) and for Dpp: a Dally knock-out allele, a Dally YFP-tagged allele, a Dally knock-out allele with 3HA-Dlp, a Dlp knock-out allele, a Dlp allele containing 3-HA tags, and a Dpp lacking the HS-interacting domain. Additionally, they use an OLLAS-tag Dpp (OLLAS being an epitope tag against which extremely high affinity antibodies exist). They examine OLLAS-Dpp or HA-Dpp distribution, phospho-Mad staining, adult wing size.

      They find that over-expressed Dally - but not Dlp - expands Dpp distribution in the larval wing disc. They find that the Dally[KO] allele behaves like a Dally strong hypomorph Dally[MH32]. The Dally[KO] - but not the Dlp[KO] - caused reduced pMad in both anterior and posterior domains and reduced adult wing size (particularly in the Anterior-Posterior axis). These defects can be substantially corrected by supplying an endogenously tagged YFP-tagged Dally. By contrast, they were not rescued when a 3xHA Dlp was inserted in the Dally locus. These results support their conclusion that Dpp interacts with Dally but not Dlp.

      They next wanted to determine the relative contributions of the Dally core or the HS chains to the Dpp distribution. To test this, they over-expressed UAS-Dally or UAS-Dally[deltaHS] (lacking the HS chains) in the dorsal wing. Dally[deltaHS] over-expression increased the distribution of OLLAS-Dpp but caused a reduction in pMad. Then they write that after they normalize for expression levels, they find that Dally[deltaHS] only mildly reduces pMad and this result indicates a major contribution of the Dally core protein to Dpp stability.

      Thanks for the comments. We actually showed that compared with Dally overexpression, Dally[deltaHS] overexpression only mildly reduces extracellular Dpp accumulation (Fig. 4I). This indicates a major contribution of the Dally core protein to interaction with Dpp, although the interaction is not sufficient to sustain extracellular Dpp distribution and signaling gradient.

      The "normalization" is a key part of this model and is not mentioned how the normalization was done. When they do the critical experiment, making the Dally[deltaHS] allele, they find that loss of the HS chains is nearly as severe as total loss of Dally (i.e., Dally[KO]). Additionally, experimental approaches are needed here to prove the role of the Dally core.

      Since the expression level of Dally[deltaHS] is higher than Dally when overexpressed, we normalized extracellular Dpp distribution (a-Ollas staining) against GFP fluorescent signal (Dally or Dally[deltaHS]). To do this, we first extracted both signal along the A-P axis from the same ROI. The ratio was calculated by dividing the intensity of a-Ollas staining with the intensity of GFP fluorescent signal at a given position x. The average profile from each normalized profile was generated and plotted using the script described in the method (wingdisc_comparison.py) as other pMad or extracellular staining profiles.

      Although this analysis provides normalized extracellular Dpp accumulation at different positions along the A-P axis, we are more interested in the total amount of Dpp or DppDeltaN accumulation upon Dally or dallyDeltaHS expression. Therefore, we plan to analyze the normalized total amount of Dpp against GFP fluorescent signal (Dally or Dally[deltaHS]) in the revised ms. In this case, normalization will be performed by dividing total signal intensity of extracellular Dpp staining (ExOllas staining) divided by GFP fluorescent signal (Dally or Dally[deltaHS]) in ROI in each wing disc.

      We agree with the reviewer that additional experimental approaches are needed to address the role of the core protein of Dally. As we discussed in the response to the reviewer1, to understand the importance of the interaction of core protein of Dally with Dpp, it is important to identify a region responsible for the interaction. Our preliminary results overexpressing a dally mutant lacking the majority of core protein (but keeping the HS modified region intact) showed that HS chains modification was also lost. Although this is consistent with our results that enzymes adding HS chains also interact with the core protein of Dally (Fig. 4D), the dally mutant allele lacking the core protein would hamper us from distinguishing the role of the core protein of Dally from HS chains.

      Nevertheless, we can infer the importance of the interaction of core protein of Dally with Dpp using dally[3xHA-dlp, attP] allele, where dlp is expressed in dally expressing cells. Since Dally-like is modified by HS chains but does not interact with Dpp (Fig. 2, 4), dally[3xHA-dlp, attP] allele mimics a dally allele where HS chains are properly added but interaction of core protein with Dpp is lost. As we showed in Fig.3O, S, the allele could not rescue dallyKO phenotypes, consistent with the idea that interaction of core protein of Dally with Dpp is essential for Dpp distribution and signaling.

      Prior work has shown that a stretch of 7 amino acids in the Dpp N-terminal domain is required to interact with heparin but not with Dpp receptors (Akiyama, 2008). The authors generated an HA-tagged Dpp allele lacking these residues (HA-dpp[deltaN]). It is an embryonic lethal allele, but they can get some animals to survive to larval stages if they also supply a transgene called “JAX” containing dpp regulatory sequences. In the JAX; HA-dpp[deltaN] mutant background, they find that the distribution and signaling of this Dpp molecule is largely normal. While over-expressed Dally can increase the distribution of HA-dpp[deltaN], over-expression of Dally[deltaHS] cannot. These latter results support the model that the HS chains in Dally are required for Dpp function but not because of a direct interaction with Dpp.

      Our overexpression assays actually showed that both Dally and Dally[deltaHS] can accumulate Dpp upon overexpression and the accumulation of Dpp is comparable after normalization (Fig. 5H-J), consistent with the idea that interaction of DppdeltaN and HS chains are largely lost. As the reviewer pointed out, these results support the model that the HS chains in Dally are required for Dpp function but not because of a direct interaction with Dpp.

      In the last part of the results, they attempt to determine if the Dpp receptor Thickveins (Tkv) is required for Dally-HS chains interaction. The 2008 (Akiyama) model posits that Tkv activates pMad downstream of Dpp and also internalizes and degrades Dpp. A 2022 (Romanova-Michaelides) model proposes that Dally (not Tkv) internalizes Dpp.

      To distinguish between these models, the authors deplete Tkv from the dorsal compartment of the wing disc and found that extracellular Dpp increased and expanded in that domain. These results support the model that Tkv is required to internalize Dpp.

      They then tested the model that Dally antagonizes Tkv-mediated Dpp internalization by determining whether the defective extracellular Dpp distribution in Dally[KO] mutants could be rescued by depleting Tkv. Extracellular Dpp did increase in the D vs V compartment, potentially providing some support for their model. However, there are no statistics performed, which is needed for full confidence in the results. The lack of statistics is particularly problematic (1) when they state that extracellular Dpp does not rise in ap>tkv RNAi vs ap>tkv RNAi, dally[KO] wing discs (Fig. 6E) or (2) when they state that extracellular Dpp gradient expanded in the dorsal compartment when tkv was dorsally depleted in dally[deltaHS] mutants (Fig. 6I). These last two experiments are important for their model but the differences are assessed only visually. In fact, extracellular Dpp in ap>tkv RNAi, dally[KO] (Fig. 6B) appears to be lower than extracellular Dpp in ap>tkv RNAi (Fig. 6A) and the histogram of Dpp in ap>tkv RNAi, dally[KO] is actually a bit lower than Dpp in ap>tkv RNAi, But the author claim that there is no difference between the two. Their conclusion would be strengthened by statistical analyses of the two lines.

      We will provide the statistical analyses in the revised ms.

      Strengths:

      1) New genomically-engineered alleles

      A considerable strength of the study is the generation and characterization of new Dally, Dlp and Dpp alleles. These reagents will be of great use to the field.

      Thanks. We hope that these resources are indeed useful to the field.

      2) Surveying multiple phenotypes

      The authors survey numerous parameters (Dpp distribution, Dpp signaling (pMad) and adult wing phenotypes) which provides many points of analysis.

      Thanks!

      Weaknesses:

      1) Confusing discussion regarding the Dally core vs HS in Dpp stability. They don't provide any measurements or information on how they "normalize" for the level of Dally vs Dally[deltaHS]? This is important part of their model that currently is not supported by any measurements.

      We explained how we normalized in the above section. We will update the analysis in the revised ms.

      2) Lacking quantifications and statistical analyses:

      a) Why are statistical significance for histograms (pMad and Dpp distribution) not supplied? These histograms provide the key results supporting the authors' conclusions but no statistical tests/results are presented. This is a pervasive shortcoming in the current study.

      Thanks. We will provide statistics in the revised ms.

      b) dpp[deltaN] with JAX transgene - it would strengthen the study to supply quantitative data on the percent survival/lethal stage of dpp[deltaN] mutants with or without the JAK transgene

      In this study, we are interested in the role of dpp[deltaN] during the wing disc development. Therefore, we decided not to perform the detailed analysis on the percent survival/lethal stage of dpp[deltaN] mutants with or without the JAX transgene in the current study. Nevertheless, the fact that dpp[deltaN] allele is maintained with a balanced stock and JAX;dpp[deltaN] allele can be maintained as homozygous stock indicates that the lethality of dpp[deltaN] allele comes from the early stages. Indeed, our preliminary results showed that pMad signal is severely lost in the dpp[deltaN] embryo without JAX (data not shown), indicating that the allele is lethal at early embryonic stages.

      c) The graphs on wing size etc should start at zero.

      Thanks. We corrected this in the current ms.

      d) The sizes of histograms and graphs in each figure should be increased so that the reader can properly assess them. Currently, they are very small.

      Thanks. We changed the sizes in the current ms.

      The authors' model is that Dally (not Dlp) is required for Dpp distribution and signaling but that this is not due to a direct interaction with Dpp. Rather, they posit that Dally-HS antagonize Tkv-mediated Dpp internalization. Currently the results of the experiments could be considered consistent with their model, but as noted above, the lack of statistical analyses of some parameters is a weakness.

      Thanks. We will perform the statistical analyses in the revised ms.

      One problematic part of their result for me is the role of the Dally core protein (Fig. 7B). There is a mis-match between the over-expression results and Dally allele lacking HS (but containing the core). Finally, their results support the idea that one or more as-yet unidentified proteins interact with Dally-HS chains to control Dpp distribution and signaling in the wing disc.

      Our results simply suggest that Dpp can interact with Dally mainly through core protein but this interaction is not sufficient to sustain extracellular Dpp gradient formation under physiological conditions (dallyDeltaHS) (Fig. 4Q). We find that the mis-match is not problematic if the role of Dally is not simply mediated through interaction with Dpp. We speculate that interaction of Dpp and core protein of Dally is transient and not sufficient to sustain the Dpp gradient without HS chains of Dally stabilizing extracellular Dpp distribution by blocking Tkv-mediated Dpp internalization.

      There is much debate and controversy in the Dpp morphogen field. The generation of new, high quality alleles in this study will be useful to Drosophila community, and the results of this study support the concept that Tkv but not Dally regulate Dpp internalization. Thus the work could be impactful and fuel new debates among morphogen researchers.

      Thanks.

      The manuscript is currently written in a manner that really is only accessible to researchers who work on the Dpp gradient. It would be very helpful for the authors to re-write the manuscript and carefully explain in each section of the results (1) the exact question that will be asked, (2) the prior work on the topic, (3) the precise experiment that will be done, and (4) the predicted results. This would make the study more accessible to developmental biologists outside of the morphogen gradient and Drosophila communities.

      Thanks. We will modify our texts to help non-experts understand our story in the revised ms.

    1. Author Response

      Reviewer #2 (Public Review):

      Major points:

      1). This study does not provide any evidence about the cell death of the transplanted cells. The immunostaining of the Caspase-3 or TUNEL staining should be used to address this issue.

      We have conducted immunostaining of Caspase-3 at 7 days after transplantation using the human-specific STEM121 antibody to demonstrate the transplanted cells. We have added the results to Figure 3A and modified the text accordingly (Page 8, Line 156-165).

      2). The authors showed that the neurological functions (evaluated by balance beam, ladder lung, rotarod test and Modified Neurological Severity Score (mNSS) up to 8 weeks after treatment (Figure 1C)) were significantly improved in the NES+Exo group compared to their control groups. However, these cells (transplanted cells) are progenitors (Nestin+) or undifferentiated cells (Tuj1+) at this stage (Figure 3). Thus, I was curious about that how can the immature neurons play neurological functions? This point should be explained.

      We agree with the reviewer’s insightful comments. We have performed immunostaining using antibodies against the post-mitotic mature neuron marker RBFOX3/NeuN, post-synaptic marker PSD-95 and human-specific STEM121 at 4 weeks after transplantation. The results confirmed that NeuN+/STEM121+ and PSD-95+/STEM121+ mature neurons appeared in NSC group and increased in NSC+Exo group (Figure 3B and Figure 3 - supplement 1D). Furthermore, our additional data showed that the expression of presynaptic marker SYN1 was increased in both NSC and NSC+Exo groups at 8 weeks after treatment. Therefore, we believe that there are mature neurons and newly formed synapses involved in neurological functions.

      3). The authors used the Golgi staining to show the NES+Exo can improve dendritic density and length. How do you know these neurons are transplanted cells?

      Our data show that mature neurons and synapses are generated by the transplanted cells (please also see response to reviewer #2-major ponts #2). We believe that the newly generated neurons partly contribute to the improved dendritic density and length. However, we agree that the neurons with increased dendritic density and length may be both survived local neurons and those generated by the transplanted cells.

      4). The cell morphology of tdTomato+ cells is fuzzy and it is difficult to distinguish the cell body. It looks like that these cells out of whack.

      We have immunostaining using the human-specific STEM121 antibody to demonstrate the transplanted cells and more neuronal markers such as RBFOX3/NeuN to identify NSC differentiation (Figure 3A and 3B; Figure 3 - supplement 1C and 1D).

    1. Author Response

      Reviewer #1 (Public Review):

      Lemerle et al utilize elegant imaging and molecular biology approaches to convincingly demonstrate the presence of Bin1 and caveolae containing rings capable of tubulation in developing muscle. The data is of fundamental potential significance as it advances our understanding of t-tubule biogenesis, which represents a major knowledge gap in muscle biology. The paper will be of broad interest to skeletal and cardiac muscle biologists and physiologists. The paper is well written, with a comprehensive yet concise introduction, clearly presented results, and an appropriate discussion. The imaging is spectacular, and the use of CLEM provides compelling validation of the protein constituents of ring structures identified via EM. When combined with time-lapse imaging, the combination of approaches provides powerful nanoscale structural information alongside temporal dynamics and live-cell confirmation of tubulating ability by Bin1-Cav3 containing rings. The data indicate that Bin1 is sufficient to generate circular structures that are subsequently decorated by caveolae which facilitate tubule formation at the membrane, and they support the requirement of both Bin1 and Cav3 for efficient tubule initiation and elongation. The authors also utilize myotubes from patients with cav3 mutations to explore whether altered ring formation may contribute to muscle pathology - however, this section requires additional controls and validation to confer pathological insight. Further, additional quantification of imaging data across the study is required to increase the rigor and strength of the conclusions of this work.

      We would like to thank reviewer #1 for his appreciation of our work, in particular the imaging experiments and for deeming our overall conclusions convincing. We have now performed additional experiments on patient myotubes including a rescue of Cav3, performed rigorous quantifications of rings and tubules under our different experimental conditions and re-wrote corresponding parts of the of the discussion to increase the strength of our conclusions.

      Reviewer #2 (Public Review):

      In this work Lemerle et al. provide long-awaited insight into how transverse tubules develop in skeletal muscle. Together with the sarcoplasmic reticulum transverse tubules form the triad, a specialized structure required for excitation-contraction coupling in skeletal muscle. Defects in transverse tubules or the triad can lead to problems such as muscular dystrophy. Whilst the involvement of specialist membrane structures (caveolae) and the membrane-bending protein Bin1 have long been recognized the precise mechanism of how caveolae and Bin1 cause transverse tubules to form and extend has remained unknown. This work provides compelling evidence, correlating antibody labelling with electron microscopy, to support the concept that caveolae rings form underneath the cell membrane which is surrounded by the endo/sarcoplasmic reticulum. These rings contain caveolin-3 and Bin1 and the authors show Bin1 enriched tubes extend from multiple points on these rings. Their data suggest that Bin1 assembles to initially form these scaffolds that then recruit the caveolae to form the ring. In addition, tubules appear continuous with the extracellular environment which is necessary for their function of facilitating calcium release during excitationcontraction coupling. In patients with mutations in caveolin-3 the caveolin ring formation as well as Bin1 tubulation were defective which may play a role in the pathology. The elegant experiments including time-lapse work clearly support the conclusions of the authors.

      The ability of the authors to combine labelling studies with advanced microscopy to show the underlying structures provides very strong evidence for the proposed mechanisms. The authors suggest that the muscle-specific isoforms of BIN1 are key to tubule extension from caveolae rings but it would be interesting for them to discuss how this fits with studies suggesting that constitutive Bin1 isoforms can also form transverse tubules. It would also be interesting to understand the authors' views on whether caveolae rings are involved in the turnover of transverse tubules in adult myotubes as well as the initial formation and, additionally, if the caveolae rings are restricted to the region just under the surface membrane.

      Insight into how transverse tubules are formed sets the groundwork for future therapies. This is clearly important for skeletal muscle myopathies but should also be considered in the heart. Cardiac transverse tubule loss and disorder play an important role in dysfunction in heart failure and atrial fibrillation and as such lessons learned in skeletal muscle may be successfully applied to the heart.

      We would like to thank reviewer #2 for this appreciation of our work. We agree with the points raised and have updated our discussion section to highlight these points.

      Reviewer #3 (Public Review):

      T-tubules are an elaborate series of membrane invaginations that bring membrane voltageactivated Ca2+ channels in close apposition to the sarcoplasmic reticulum containing RyR, allowing for Ca2+-induced Ca2+ release. They serve as critical hubs of excitation-contraction coupling and play a central role in myopathies and inherited and acquired cardiomyopathies. Several membrane structures and proteins have been implicated in striated muscle t-tubule biogenesis, but the specific mechanisms of early t-tubule biogenesis are not defined. Lemerle et al here investigate the biogenesis of transverse tubules in skeletal muscle. They use skeletal myoblasts from murine and human muscle as well as sophisticated high-resolution microscopy, live cell imaging, and adenoviral targeting to forward a model of BIN1 mediated caveolae ring formation which give rise to DHPR enriched t-tubules and associate with SR. While they demonstrate that BIN1 and Cav3 enriched caveolae act together to form t-tubules, the precise pathophysiological mechanisms by which this process acts in disease remain unclear. Strengths of the study consist in the use of both murine and human skeletal muscle experiments, suggesting a conserved molecular mechanism; the innovative approach of correlative light and electron microscopy, and the use of pathological specimens. The live cell timelapse provides imaging evidence of Cav3-enriched caveolae-rings forming in centers of high BIN1 enrichment, from which t-tubules emanate. This is novel evidence in support of the biogenesis model proposed by the authors. The pathological correlation of their model is promising but limited. Specifically, while the study of Cav3 mutant specimens is used to show the Cav3 dependence of BIN 1 action (in experiments using BIN 1 overload), the authors have not tested the sufficiency of their proposed mechanism by rescuing the pathologic state. Moreover, the conditions of development likely have an important effect on the studied mechanism - such as mechanical loading, contractile state, neurohormonal environment, and so on. Furthermore, a more complete description of the precise molecular binding sites between BIN1 and Cav3 would be important. While exon11 is required for tubulation, BIN1 not expressing exon 11 appears sufficient to assemble caveolar rings, suggesting this is mediated by other specific BIN1 regions.

      Overall, the study provides new details on early t-tubule biogenesis in skeletal muscle (likely shared with other striated muscle) and lays the foundations for further definition of the precise molecular mechanisms.

      We would like to thank reviewer #3 for the appreciation of our work. We have now performed additional experiments on patient myotubes including rescue experiments, analysis of key excitationcontraction coupling proteins by Western blot and quantification of caveolae rings and tubules to strengthen our claims with patient myotubes.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Mastrototaro et al. perform a series of experiments in transgenic murine models assessing the function of Palladin (PALLD) in the heart. Global PALLD KOs are embryonic lethal, precluding the assessment of the roles of this protein in adulthood. To circumvent this limitation, the authors generated a floxed Palld allele and ablated it with two cardiomyocyte-specific Cres: the constitutively active Myh6-Cre and the tamoxifen-inducible aMHC-MerCreMer. Interestingly, ablation with the constitutive Cre (cKO) did not produce any overt phenotype, but ablation in adulthood (cKOi) resulted in compromised cardiac function. These observations suggest a compensation mechanism that takes place when cardiomyocytes develop in the complete absence of this protein but not when cardiomyocytes develop in a wild-type background and are deprived of this protein after achieving full maturation. These experiments were complemented with yeast two-hybrid techniques to identify novel partners that bind to a region of PALLD for each no interactants had been previously identified. Experiments in human samples revealed an upregulation of PALLD transcripts in the hearts of patients.

      This manuscript adds important information to our understanding of sarcomeric proteins. Data are generally of good quality and well presented in figures. The numbers of animals in echocardiographic studies are also adequate for proper conclusions. Authors achieve most of their goals, including the identification of novel partners of PALLD and the identification of a requirement for PALLD in cardiomyocytes for normal heart function. However, given that all experiments performed in this study were focused on the loss-of-function of PALLD, it is not clear what is the relevance of the PALLD upregulation observed in human patients. Authors should clearly state this limitation in their results.

      Considering that authors have observed evidence for nuclear PALLD, which could hint at potential major gene expression changes when this protein is ablated, it would be interesting to perform an unbiased assessment of transcriptional alterations (RNA-seq) in cardiomyocytes isolated from control and cKOi hearts. In addition, to test if the compensation observed in the embryonic cKO involves mechanisms of transcriptional adaptation, it would be interesting to compare RNA-seq results from cKOi and cKO (genes encoding proteins similar to PALLD that are upregulated in cKO but not cKOi cardiomyocytes would be very strong candidates). However, these transcriptomic data are not essential to support current findings and can be performed in follow-up studies.

      We agree with the reviewer that it would be interesting to perform RNA-Seq on isolated cardiomyocytes from cPKOi mice and we are in fact planning to do this in a follow-up study.

      Reviewer #2 (Public Review):

      The role of the actin-binding protein palladin (PALLD) in cardiomyocyte development, growth, and function has not been defined. In order to address this question, the authors first identified that CARP and FHOD1 interact with PALLD in cardiomyocytes. They then performed cardiomyocyte selective deletion of PALLD in embryonic and adult mice and discovered that deletion of PALLD in adult mice leads to dilated cardiomyopathy (DCM) and intercalated disc ultrastructural changes. In contrast, embryonic deletion of cardiomyocyte PALLD did not cause a cardiomyopathy phenotype in neonatal or adult animals.

      1. The divergent cardiac phenotypes of the embryonic deletion of cardiomyocyte PALLD (no cardiomyopathy) versus the adult deletion of cardiomyocyte PALLD (dilated cardiomyopathy(DCM)) is an interesting result. The authors speculate that embryonic deletion of PALLD induces compensatory pathways that prevent the development of adult cardiomyopathy in these mice. However, these compensatory pathways remain unexplored.<br /> 2. The authors discovered that mice with adult cardiomyocyte deletion of PALLD had significant changes in the cardiomyocyte intercalated disc (ICD) ultrastructure. They suggest these changes in ICD ultrastructure contribute to DCM formation in the adult PALLD deletion mice (line 270). However, it remains unclear if these changes in ICD ultrastructure are specific to mice with adult deletion of PALLD.<br /> 3. The different transgenic Cre mouse lines may be an alternative explanation for the divergent cardiac phenotypes in the embryonic versus adult deletion of cardiomyocyte PALLD. The tamoxifen dose administered for the inducible Myh6:MerCreMer mice was 30mg/kg/day x 5 which has been reported to lead to the induction of cardiomyocyte DNA damage response pathways (Dis Model Mech. 2013 Nov; 6(6): 1459-1469, J Cardiovasc Aging 2022;2:8). The electron micrograph experiments in Figure 5 did not include a group of Myh6:MerCreMer mice administered tamoxifen. The authors only compared PALLD fl/fl and Myh6:MerCreMer/PALLD fl/fl mice.

      In the papers that the Reviewer refers to it was shown that administration of tamoxifen to Myh6:MerCreMer mice at a dose of 30 mg/kg/day for 3 (Bersell et al., Dis Model Mech. 6, 1459-1469, 2013) or 5 days (Rouhi et al., J Cardiovasc Aging 2, 8, 2022) is not associated with apoptosis. Bersell et al., found that amounts ≥40 mg/kg/day for 3 days is associated with apoptosis, and Rouhi et al., showed that injection of 30 mg/kg/day for 5 days causes transient minor changes in gene expression with no discernible effects on cardiac function, myocardial fibrosis, apoptosis, or induction of double-stranded DNA breaks. The reason that we chose to inject tamoxifen at an amount of 30 mg/kg/day for 5 days was in fact that this amount has been shown not to be associated with severe effects and has been widely used in the literature.

      4. The apoptosis assessment was performed 24 weeks after administration of tamoxifen to the Myh6:MerCreMer/PALLD fl/fl mice. However, cardiomyocyte apoptosis may have occurred much earlier if it was secondary to Myh6:MerCreMer tamoxifen-induced cardiotoxicity (or related to PALLD deletion).<br /> 5. The animal studies in Fig 3D show a DCM phenotype in mice with adult deletion of cardiomyocyte 200kDa PALLD which suggests a potential loss of function mechanism for DCM formation. However, the authors then report in Fig 6 that human DCM heart tissue samples have a ~2.5fold increase in mRNA expression of the 200kDa PALLD transcript which would suggest a possible gain of function mechanism for DCM formation. How do the authors reconcile these divergent results with regard to palladin's role in cardiomyocyte homeostasis and cardiomyopathy formation?

      In the revised manuscript we demonstrate that the transcriptional changes in PALLD expression are not reflected at the protein level.

      Reviewer #3 (Public Review):

      This study shows for the first time changes in palladin expression under disease conditions and mRNA alterations in human samples. The authors have identified novel binding partners for the protein as a first step toward determining how palladin mediates its effects in the heart. Finally, through the use of mouse models to decrease palladin expression they identify a crucial role for palladin in the cardiac response to pathological stress, with some interesting findings that show the effects of palladin depend on when the protein is altered.

      We appreciate that the Reviewer finds our study interesting. However, we did not show a role of PALLD in the cardiac response to pathological stress. On the contrary, we demonstrated that mice with constitutive knockout of PALLD in the heart (cPKO mice) show no pathological cardiac phenotype either under basal conditions or in response to mechanical pressure overload by transaortic constriction. On the other hand, deletion of PALLD in adult mice resulted in DCM under basal conditions within 8 weeks after tamoxifen induction.

      The novel findings of the study are supported by the data presented, but there are several instances where clarification is needed of the conclusions drawn from the data reach beyond what is presented in the Results section.

      The focus on only male mice is a significant limitation of the paper, as it is well known that there are profound sex differences in the response to pathological stressors. While the ability to obtain sufficient heart samples from male and female patients may be a reasonable justification for focusing on males, the preclinical mouse model should have been examined in both sexes and the limitation of this choice should be clearly noted in the paper.

      Due to the three Rs and the high costs associated with the breeding of the high amount mice required for the project, we chose to focus only on male mice.

      In line 537-539, we stated. “All experiments were performed on male mice as females often develop a less severe cardiac phenotype due to the cardioprotective role of estrogen (Brower, Gardner, & Janicki, 2003; Du, 2004).

      The changes in myopalladin expression were not measured in the disease model (TAC), which limits the ability to determine if myopalladin was altered in the disease state. This addition would strengthen the study.

      We have previously demonstrated that myopalladin protein levels are significantly reduced after TAC in wildtype mice (Figure 6K, L in Filomena et al., eLife 10:e58313, 2021). We did not measure myopalladin levels in cPKO subjected to TAC and unfortunately don’t have tissue from cPKO mice to perform the measurements.

      Finally, the myofilament data are presented as evidence that changes in the contractile apparatus are contributors to the observed contractile dysfunction at the organ level. But these studies were conducted using levels of calcium that far exceed what is seen in vivo and, therefore, do not support the conclusion drawn.

      The reviewer is right that the myofibril experiments were conducted at Ca2+ concentrations that cannot be reached under the physiological conditions of cardiac contraction. However, the result clearly demonstrates that the intrinsic force generating capacity of the cardiac sarcomeres of cPKOi mice is impaired 8 weeks after TAM independently from any changes in myofilament Ca2+ sensitivity and cardiomyocyte Ca2+ handling. Experiments at lower (more physiological) Ca2+ concentrations would have produced less clear results in the absence of a full investigation of the relation between force and [Ca2+]. Since data demonstrate that cross bridge mechanics and kinetics are not affected, the reported finding supports the idea that a myofibril structural defect is responsible for the lower maximal force of the KO sarcomeres.

    1. Author Response:

      Reviewer #1 (Public Review):

      This study presents a resource aiming to unify language and rules used in the literature to describe, curate and assess biology experiments, published or not. Focusing on host-pathogen interactions, the work presents a new ontology and controlled vocabulary, as well as rules to describe 'metagenotypes', a term coined for the joint description of interacting host-pathogen genotypes. 'PHI-Canto' extends a previous resource by also enabling using UniProtKB IDs to curate proteins. Among other important by-products, PHI-Canto could contribute to damping proliferating names and acronyms for genes, processes, and interactions; a chronic annoyance in the biosciences.

      The tool does give the impression that, with sufficient time and usage, it could become a rich and robust resource. Just addressing the Uniprot IDs issue is a nice move.

      We thank the reviewer for their positive comments and acknowledgement of the importance of using unified language in literature curation. We are pleased to see that our effort to improve interoperability and use existing resources has been recognized. We are also pleased that this reviewer recognizes the additional benefits of choosing to use UniProtKB accession numbers. 

      Reviewer #2 (Public Review):

      In this paper, the authors propose a system for annotating and curating scientific publications in the context of interspecies host-pathogen interactions. This system, called PHI-Canto (the Pathogen-Host Interaction Community Annotation Tool), is an extension of an existing tool (called Canto). In addition, they present the development of new concepts, controlled vocabularies, and an ontology for annotating relevant aspects in this domain, called PHIPO (Pathogen-Host Interaction Phenotype Ontology).

      The approach has been empirically validated by annotating ten publications. The application's source code is available, as well as the associated ontologies and vocabularies and an example of the data resulting from the annotation process.

      We thank the reviewer for their positive comments on our framework for curating interspecies interactions literature. We are pleased that the reviewer has recognized that the source code, associated ontologies and curated data are freely available for others to use. We are delighted that the reviewer found the curation of ten trial publications in PHI-Canto informative and benefited from the worked curation examples.

      Reviewer #3 (Public Review):

      In this work, the authors have built a framework for the annotation of interactions between species. The framework includes ontologies, methodologies, and an annotation tool called PHI-Canto. The framework makes use of multiple existing ontologies that are in wide use in the biocuration community. In addition, the authors have built their own project-specific controlled vocabularies and ontologies for the capture of pathogen-host interaction phenotypes (PHIPO), diseases (PHIDO), and environmental conditions (PHI-ECO). Their work builds on and extends methods that have been developed within the Gene Ontology Consortium and model organism databases. The tool PHI-Canto is an extension of the tool Canto developed by PomBase for curation. The authors used this framework to annotate pathogen-host interactions within the Pathogen-Host Interactions Database.

      Strengths: The manuscript is well-written and includes significant detail regarding curation policies/methods and the use of the actual PHI-Canto tool. The appendices are very detailed and provide useful illustrations of the annotation practices and tool interface. The work has built upon and extended well-established standards and methods that have proven their utility over many years of use in the biocuration community. The authors have rigorously tested their framework with the curation of a variety of publications providing a diverse assortment of annotation challenges. The concept of a "metagenotype" is important and providing such a structured system for the capture of this information is useful. All of the materials produced by the work are completely freely available for use by the wider community.

      Weaknesses: There are some areas of the manuscript and appendices which are a bit confusing and could be improved. The authors have developed their own set of disease terms (PHIDO) but do not comment on why existing disease terminologies (such as Mondo or DO) were not used or if the PHIDO terms relate to those other vocabularies. There is no discussion of the possible use of a graph representation for the capture of this complex information (which is being done in many settings including the Gene Ontology with GO Causal Activity Models (GO-CAMs)) or why such a structure was not used. Although the abstract talks about the use of the framework within the PHI database as a test case for broader use regarding interspecies interactions, there is no mention of extending the use of the tool to other species interaction communities beyond pathogen-host interactions.

      We thank the reviewer for their detailed response. We are pleased that the reviewer found the manuscript to be well-written and informative with useful examples. We thank the reviewer for their helpful suggestions to improve the appendices and manuscript text.

      We would like to clarify that PHIDO is not intended to compete with existing disease ontologies: it is instead being used as a placeholder, until the time when its terms can be replaced with terms from existing disease ontologies. PHIDO was an expedient solution, in the sense that it provided the fastest way for us to test the process of curating diseases with PHI-Canto. This is because we only had to convert the existing list of disease names already in PHI-base into a controlled vocabulary, thus removing the need to wait for maintainers of other ontologies to add terms for us (as reported in Urban et al., 2022).

      Additionally, we were required to use terms from PHIDO due to the lack of representation for plant and animal diseases in existing ontologies or vocabularies. Plant disease, in particular, is very underrepresented, with the ontologies we surveyed having either inappropriate semantics (e.g. the Plant Trait Ontology focusing on traits related to disease, rather than the diseases themselves) or still being in development (e.g. the Plant Stress Ontology). The majority of source ontologies used by MONDO are human-centric, and DO is exclusively for human disease, yet human disease represents only part of the focus of PHI-base (~35%). Furthermore, our choice of vocabularies is limited by the fact that Canto currently only supports ontologies in OBO format (for historical reasons).

      We have begun the process of harmonizing disease names in PHI-base with terms from existing disease ontologies – such as MONDO, DO, and the National Cancer Institute Thesaurus – with the ultimate aim of using terms from those ontologies in curation, instead of terms from PHIDO. As general vocabularies for animal and plant disease emerge or are identified, we will extend this procedure to those diseases.

      With regards to a graph representation of the data, we are aware of the examples the reviewer described, and we agree that this type of representation could be preferable. However, our data model is currently constrained by the developers of Canto, who use a relational data model and currently have no plans to implement a graph data model or a graph representation. We acknowledge that query languages like GraphQL can provide a graph-based interface to an existing relational data model, but we believe this would require a significant technological investment. For PHI-base, we plan to enable a graph representation of the data by integrating with existing knowledge graph tools, such as KnetMiner (www.knetminer.com;doi.org/10.1111/pbi.13583), which will provide graph-based queries on PHI-base (albeit only on select species for which knowledge graphs will be provided, i.e. Arabidopsis, rice, wheat, eight plant and human infecting fungal ascomycete pathogens, and two non-pathogenic yeast species). We will also use KnetMiner integration to embed subgraphs of the complete knowledge graph into the gene-centric pages on the PHI-base 5 website.

      We acknowledge the lack of discussion about extending the tool for broader interspecies interactions. These examples may have been omitted from a previous draft due to journal word count limits. Possible future uses of the PHI-Canto schema could include insect–plant interactions (both beneficial and detrimental), endosymbiotic relationships such as mycorrhiza–plant rhizosphere interactions, nodulating bacteria–plant rhizosphere interactions, fungi–fungi interactions, plant–plant interactions or bacteria–insect interactions, and non-pathogenic relationships in natural environments, such as bulk soil, rhizosphere, phyllosphere, air, freshwater, estuarine water or seawater, and tissues or organs (e.g. the gut, lungs, and skin of humans, birds, or other animals). The schema could also be extended to situations where phenotype relations to genes or genotypes have been established for predator–prey relationships, or where there is competition in herbivore–herbivore, predator–predator, or prey–prey relationships in the air, on land or in the water. Customizing Canto to use other ontologies and controlled vocabularies is as simple as editing a configuration file within the source code.

    1. Author Response:

      We appreciate the Reviewers’ feedback. The manuscript was extensively revised and ultimately accepted for publication (Petrican and Fornito, 2023, Developmental Cognitive Neuroscience). The revisions address the Reviewers’ key concerns, including the theoretical basis of the link between MDD and AD, the rationale for studying this link in adolescence, clear references to significant genetic associations between the two, detailed assessment of CCA and PLS model generalisability and reliability, quantification of resilience, residualization of confounders, and corrections for multiple comparisons. We also note that the details concerning the receptor density maps we use in our analysis have now been published (Hansen et al., 2022, Nature Neuroscience; Markello et al., 2022, Nature Methods).

    1. Author Response

      Reviewer #1 (Public Review):

      By performing immunopeptidomics of macrophages infected with virulent M. tuberculosis, the authors were able to appropriately address whether Mtb proteins are able to enter the MHC-I antigen processing pathway. Their interrogation provides convincing evidence that substrates of Mtb's type VII secretion systems (T7SS) are a significant contributor to the Mtb-derived peptides presented on MHC-I. Compelling data are provided to demonstrate that ESX-1 activity is required for the MHC-1 presentation of these newly identified peptides.

      Strength

      Employing a virulent strain of Mtb for infection of human monocyte-derived macrophages to identify Mtb proteins that access the MHC-I antigen processing pathways and the associated mechanisms.

      Weakness

      The immunogenicity of at least some of the identified peptides should have been evaluated.

      Although obtaining T cells from a cohort of TB-exposed patients was not within the scope of this study, we are also eager to assess the immunogenicity of the epitopes we identified in future work. In addition to the references we made in our initial submission to prior work showing that many of the proteins from which the epitopes we identified derive elicit T cell responses in Mtb-exposed humans, we’ve added references to prior studies that show that a few of the specific epitopes we identified are immunogenic, providing at least a preliminary indication that MHC-I peptides identified by MS can be immunogenic T cell epitopes (lines 420-423): “Individual peptides we identified by MS have also been previously shown to be recognized by human T cells, including EsxJ24-34 (Grotzke et al., 2010; Lewinsohn et al., 2013) and EsxA28-36 (Tully et al., 2005), providing a proof of concept that particular epitopes identified by MS can be immunogenic.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have performed scATACseq on multiple timepoints during mouse male gonadogenesis and germ cell maturation during the fetal to neonatal transition (E18.5 and postnatal days 1,2,5). Clustering of thousands of cells revealed striking cellular diversity and led to the identification of cell populations that were not known before. This work may have far reaching implications, but additional validation is needed.

      We would like to start by expressing our appreciation to the reviewer’s valuable comments and feedback on our manuscript. We would also like to express our sincere apologies for the delay in submitting our revised manuscript. The COVID-19 pandemic has had a significant impact on academic research and publication, and we encountered several challenges during this time. Both co-first authors of this manuscript were promoted to new roles, which required additional time and effort to transition into these new positions. Furthermore, we experienced significant delays in obtaining the necessary research materials due to longer shipment times for antibodies and other reagents during the pandemic, which further contributed to the delay. We understand that our delay may have caused inconvenience but we want to assure you that we have carefully addressed all of the reviewer comments and we deeply appreciate your understanding and patience during these challenging times.

      The identification of novel transitional spermatogonia population in Figure 4D is intriguing. Independent validation by flow cytometry or in testis cross section to better allow the colocalization of nr5a1 and Oct4 and other germ cell markers would be important. Additional validation is needed to ensure that populations 1 and 2 in figure 4d are not to doublets. Providing violin plots for both soma and germ cell markers will be helpful. Is SF1 the only gene expressed in this unique germ cell population or are many other somatic markers expressed in the population. Do these cells express well recognized SPG markers like Oct4+ , PLZF, GFRA?

      We have performed immunostaining of NR5A1 in testicular sections and showed that NR5A1+ germ cells (TRA98+ cells) exist in P5.5 testis (Figure 4D). We appreciate the reviewer's comment and understand the concern regarding potential doublets in figure 4d. We examined the expression of various markers in both scATAC-seq (gene score) and scRNA-seq (mRNA) datasets and provided violin plots. Sertoli cell markers and germ cell markers showed variable levels in unknown 1 and 2 populations while the Leydig cell marker did not (Supplementary figure S6D).

      As additional evidence supporting our finding that a subset of somatic markers are expressed in the unique germ cell population we identified, we reference a study where cells in the spermatogonial signature 3 cluster showed high levels of mRNAs characteristic of Sertoli cells, including Nr5a1, Sox9, and Wt1 (PMID: 25568304). This indicates that cells with germ cell identity can express somatic cell genes, which is consistent with our findings. Additionally, another study reported the expression of the somatic cell marker WT1 in some germ cells through immunostaining (Figure 3B, PMID: 34815802). We have included this information in the revised manuscript to further support our conclusion (line 301). In addition, as we have isolated nuclei rather than whole cells, it is less likely that germ cells and sertoli cells are sticking together during single cell capture. We hope that the additional evidence and analysis provided will help to ease the reviewer's concerns and further support the conclusions drawn from our data.

      The IF validation in 5F is not as convincing that these cells are potentially Sertoli stem cells. IF in cross-sections will be easier to interpret- especially when co-stained with several germ, somatic, or novel markers of that population. purification of these cells and further characterization is needed. A hallmark of fetal Sertoli cells is to mediate the migration of endothelial cells to the seminiferous tubules during testicular cord formation. Is it possible to purify these cells to determine whether they have functional Sertoli cells properties in vitro using human umbilical vein endothelial cells (HUVECs). Do these cells have immune privilege properties - can they suppress proliferation of Jurkat E6 cells.

      Following the reviewer’s suggestions, we conducted further immunostaining of MBD3 and AMH in Sertoli cells (Figure 5F). The observed staining results not only confirm the properties of MBD3+ cells (MBD3-high/AMH-high) but also highlight the heterogeneity of Sertoli cells, as evidenced by the presence of various expression patterns such as MBD3-low/AMH-high (cluster SC3 in Figure 5A) and MBD3-low/AMH-low (cluster SC2/4/5/6 in Figure 5A). This further emphasizes the complexity and diversity within the Sertoli cell population.

      However, we understand that it is premature to definitively conclude that MBD3-high cells are Sertoli stem cells without functional studies. We appreciate the suggestion of using additional functional assays such as in vitro co-culture with HUVECs and immune privilege assays to further characterize the potential Sertoli stem cell population. These are valuable experiments to consider for future research in order to gain a deeper understanding of the properties and functions of these cells. To more accurately reflect the scope of our study and avoid potential misinterpretation, we have revised the language to reflect that we have identified subpopulations of Sertoli cells with unique characteristics, rather than using the term "stem cell". We hope that our revised data adequately addresses the reviewer’s concerns.

      Reviewer #2 (Public Review):

      Liao et at performed single cell ATAC sequencing to reveal chromatin status in various cell types in the perinatal mouse testes. The chromatin status was then used to define cell types and identify potential transcription factors that control the progress of differentiation. This work could provide new insights into how various cell types acquire their fate in early testis development and establish a genomic framework that can be used to correlate with human data for infertility. The strength lies on the novelty of single cell analyses. The weaknesses include a lack of statistical power, the uncertainty on the correlation between chromatin status, gene expression, and transcription factor activity, and insufficient information and confirmation on some of the experiments and results.

      We would like to start by expressing our appreciation to the reviewer’s valuable comments and feedback on our manuscript. We would also like to express our sincere apologies for the delay in submitting our revised manuscript. The COVID-19 pandemic has had a significant impact on academic research and publication, and we encountered several challenges during this time. Both co-first authors of this manuscript were promoted to new roles, which required additional time and effort to transition into these new positions. Furthermore, we experienced significant delays in obtaining the necessary research materials due to longer shipment times for antibodies and other reagents during the pandemic, which further contributed to the delay. We understand that our delay may have caused inconvenience but we want to assure you that we have carefully addressed all of the reviewer comments and we deeply appreciate your understanding and patience during these challenging times.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Lujan and colleagues describes a series of cellular phenotypes associated with the depletion of TANGO2, a poorly characterized gene product but relevant to neurological and muscular disorders. The authors report that TANGO2 associates with membrane-bound organelles, mainly mitochondria, impacting in lipid metabolism and the accumulation of reactive-oxygen species. Based on these observations the authors speculate that TANGO2 function in Acyl-CoA metabolism.

      The observations are generally convincing and most of the conclusions appear logical. While the function of TANGO2 remains unclear, the finding that it interferes with lipid metabolism is novel and important. This observation was not developed to a great extent and based on the data presented, the link between TANGO2 and acyl-CoA, as proposed by the authors, appears rather speculative.

      We thank you for your advice and now include additional data that lends support to the role of TANGO2 in lipid metabolism. We have changed the title accordingly.

      1) The data with overexpressed TANGO2 looks convincing but I wonder if the authors analyzed the localization of endogenous TANGO2 by immunofluorescence using the antibody described in Figure S2. The idea that TANGO2 localizes to membrane contact sites between mitochondria and the ER and LDs would also be strengthened by experiments including multiple organelle markers.

      We agree that most of the data on TANGO2 localization are based on the overexpression of the protein. As suggested by the reviewer and despite the lack of commercial antibodies for immunofluorescence-based evaluation, see the following chart, we tested the commercial antibody described in Figure 2 on HepG2 and U2OS cells. Moreover, we used Förster resonance energy transfer (FRET) technology to analyze the proximity of TANGO2 and Tom20, a specific outer mitochondrial membrane protein. In addition, we visualized cells expressing tagged TANGO2 and tagged VAP-B, an integral ER protein in the mitochondria-associated membranes (doi:10.1093/hmg/ddr559) or tagged TANGO2 and tagged GPAT4-Hairpin, an integral LD protein (doi:10.1016/j.devcel.2013.01.013). These data strengthen our proposal and are presented in the revised manuscript.

      As suggested by the reviewer, we have also visualized two additional cell lines (HepG2 and U2OS) with the anti-TANGO2( from Novus Biologicals) that have been used for western blot (see chart above). As shown in the following figure, the commercial antibody shows a lot of staining in addition to mitochondria, especially in U2OS cells, where it also appears to label the nucleus.

      2) The changes in LD size in TANGO2-depleted cells are very interesting and consistent with the role of TANGO2 in lipid metabolism. From the lipidomics analysis, it seems that the relative levels of the main neutral lipids in TANGO2-depleted cells remain unaltered (TAG) or even decrease (CE). Therefore, it would be interesting to explore further the increase in LD size for example analyze/display the absolute levels of neutral lipids in the various conditions.

      We agree with the reviewer and now present the absolute levels of lipids of interest in the various conditions of the lipidomics analyses (Figure S 3).

      3) Most of the lipidomics changes in TANGO2-depleted cells are observed in lipid species present in very low amounts while the relative abundance of major phospholipids (PC, PE PI) remains mostly unchanged. It would be good to also display the absolute levels of the various lipids analyzed. This is an important point to clarify as it would be unlikely that these major phospholipids are unaffected by an overall defect in Acyl-CoA metabolism, as proposed by the authors.

      As stated above, we have now included the absolute levels of lipids of interest in the various conditions of the lipidomics analyses (Figure S 3).

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-performed and carefully executed and quantified study. There is however a point that needs clarification:

      We thank the reviewer for these motivating comments and appreciate the careful reflection of our work.

      The authors state that acute regeneration occurs between 5-10dpt. However, the graphs in Fig 1D, F, and 2F indicate that most PC generation occurs from 20-30 days. What happens in this period? Does proliferation increase? Can the authors perform BrdU incorporation between 6 days and 1 month?

      The reviewer is right that PC regeneration seems to be more intense from 20-30 days. Yet during this stage also wildtype larvae add a number of PCs to their PC population pool, thus we would consider only PCs being added in surplus to the number of regularly added PCs as a contribution to regeneration, and here we see in quantified samples the largest increase of regenerating PCs during 8-10 days post-treatment with 20,9 and 23,2 additional (surplus) PCs on average respectively.

      This question also relates to the first comment of reviewer 3 who asked for a combined BrdU and EdU labeling approach to address the cell cycle length of PC progenitors. We have therefore performed this experiment with the first pulse of BrdU-labeling at 18 days after PC-ablation to include the request stated here for a BrdU-labeling at later stages of regeneration. Again, no significant difference between BrdU-positive PC progenitors was found at this later stage of PC regeneration, but a small number of PC progenitors underwent additional rounds of proliferation compared to controls, which provide an explanation of how the entire PC population is replenished and why complete PC regeneration requires several months. Please see also our answer to question 1 of reviewer 3. These new findings are now presented in an additional Supplementary Figure (Figure 1-figure supplement 3) and have been added to the last paragraph of the section reporting the findings presented in Figure 1.

      Related to this, as the authors indicate in lines 129-131, the regeneration of new PCs overlaps with normal development. Are other neuronal cell types generated in appropriate numbers?

      This is an interesting question raised by the reviewer. But it is very general relating to all cerebellar neuronal cell types, which is out of our possibilities to address. We considered eurydendroid cells as the most likely cell population, which could be affected in their numbers by PC ablation and regeneration, because eurydendroid cells share the same ptf1a+-expressing progenitor cells with Purkinje cells. Eurydendroid cells – the zebrafish equivalents to deep nuclei neurons in mammals – can be identified by their expression of olig2. We have therefore quantified the number of eurydendroid cells in the cerebellum of double transgenic PC-ATTAC/olig2:GFP larvae 15 days after PC ablation. No significant difference in olig2:GFP positive cells could be observed between PC-regenerating and control zebrafish suggesting that eurydendroid cells are not affected in their quantity and are generated in appropriate numbers in PC regenerating larvae. These findings are presented in a new Supplementary Figure (Figure 3-figure supplement 3) and are described together with findings about eurydendroid cells presented in the main Figure 3.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Gonzalez et al investigated the dynamics of dopamine signals, measured with optophysiological methods in the lateral shell of the nucleus accumbens (LNAc), in response to different types of visual stimuli. Contrary to most current theories of dopamine signaling, the authors found that LNAc dopamine transients tracked sensory transitions in visual stimulation rather than any immediately apparent motivational variable. This unorthodox finding is of potential interest to the field, as it suggests that dopamine in this particular area of the striatum supports a very different, albeit unclear behavioral function than what has been previously attributed to this neuromodulator. Many of the approaches used by the authors were very elegant, like the careful selection of visual stimuli parameters and the use of Gnat1/2 KO mice to demonstrate that the dopamine responses were directly dependent on the visual stimulation of rods and cones. That said, the authors did not discuss how their findings relate to much previously published work, many of which offer potential alternative explanations for their results. It is also not clear from the manuscript text which mice were used for which experiments, and how testing history might affect the results.

      We would like to thank the reviewer for their careful review of our manuscript. In our revised manuscript, we reworked our Materials and Methods to better detail the experimental workflow, which is highlighted in yellow. We have also added new data in stimulus-naïve animals to better examine the effect of exposure history on the dopaminergic response to light. To provide validation of our recording sites, we have included a new figure (Figure 1-Figure Supplement 1) that contains a representative histological image showing the location of the optical fiber/virus expression, as well as a schematic demonstrating optical fiber placements. Finally, the reviewer’s point about discussing the current results in the context of previous literature is well taken, and we have added three new paragraphs of text in the Discussion to highlight these findings.

      Reviewer #2 (Public Review):

      In this elegant work, the authors investigated dopamine release (measured by dLight sensor fiber photometry) in the nucleus accumbens shell, in response to salient luminance change. They show that abrupt visual stimuli - including stimuli not detectable by the human eye - can evoke robust dopamine release in the accumbens shell.

      The fact that dopamine signals can be evoked by salient sensory stimuli is not itself novel, but the paper manages to make several important and new findings:

      1) The authors show that the dopamine signal is not related to the level of threat evoked by the visual stimuli.

      2) They provide important detail about the stimuli parameters relevant to dopamine release. For instance, they show that the rate of luminance change (or abruptness) is a key factor in evoking dopamine responses.

      3) They show that robust dopamine responses can be evoked by visual stimuli of low intensity, including stimuli not perceptible by the human eye.

      4) They show that these dopamine responses can be evoked by all wavelengths in the visible spectrum (with some higher sensitivity at certain wavelengths).

      5) Finally, by recording dopamine responses in two knockout mice strains, the authors show that the light-evoked dopamine release critically relies on rod and cone photoreceptors, but not melanopsin phototransduction.

      These results add to a series of recent findings showing that dopamine signals are not restricted to the encoding of reward prediction error, but instead contribute to signaling environmental changes more broadly. The study has been skillfully executed, the results are clear and appropriately analyzed, and the manuscript is very well written. Although the work did not include control mice lacking the dLight sensor, the fact that light-evoked dopamine responses were not observed in mice lacking cone + rod phototransduction is strong evidence that the fiberphotometry signals were not due to direct light artifacts.

      We would like to thank the reviewer for taking their valuable time over the holidays to review our manuscript. We appreciate their feedback and have responded to their concerns below.

      Comment/concerns are minor:

      1) The authors show that the dopamine response evoked by a brief visual stimulus is drastically reduced when the visual stimulus is repeated in rapid succession (stimulus train). The authors interpret this as evidence for the HABITUATION of this light-evoked dopamine release. An alternative explanation is that it is the prediction of the stimulus that is responsible for canceling the dopamine response (i.e. sensory prediction error). The authors should discuss this alternative explanation for this finding.

      This is a valid point, which we have now addressed in the revised Discussion section (Paragraph 3).

      2) Although the study largely focuses on dopamine responses to visual stimuli, the results are largely consistent with previous studies showing dopamine signals encoding value-neutral changes in sensory inputs (i.e. sensory prediction errors) in different modalities (taste or odors; cf. Takahashi et al., 2017, Neuron; Howard & Kahnt, 2018, Nat. Comm.). The authors might want to cite those papers (note that I am not affiliated with those papers).

      This is similar to the point brought up by Reviewer 1, namely that several key pieces of literature were not discussed in the original manuscript. We agree that this was an oversight and hope we have remedied it in the revised Discussion, as detailed in the response to Reviewer 1. We have included both citations in the new text.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes efforts to understand how independence from ribonucleotide reduction might evolve in obligate intracellular bacterial pathogens using E. coli as a model for this process. The authors successfully deleted the three ribonucleotide reductase (RNR) operons present in E. coli and showed that growth of this knockout strain can be achieved with deoxyribonucleotide supplementation. They also performed evolutionary experiments and analysis of cell growth and morphology under conditions of low nucleotide availability. In this work, they established that certain genes are consistently mutated to compensate for the loss of RNR activity and the low availability of deoxynucleotides. Comparison to genomes of intracellular pathogens that lack RNR genes shows that these patterns are largely conserved.

      While the experimental results support the conclusions of the study, the authors do report changes in cell morphology upon the growth of the RNR knockout strains with low concentrations of nucleotides. It would be ideal to note this complication earlier in the manuscript. And to clarify how the possibility of cell elongation might affect the OD measurements in Figure 3 describing the experiments to establish that dC is necessary for growth in the knockout strain. It would also be ideal to provide a more detailed explanation for that observation in the discussion.

      Thank you for the feedback. We have now added mention of cell morphology in the final paragraph of the introduction, where we summarise key findings.

      For establishing if there is either growth or no growth under various conditions, as we have done, a qualitative assessment such as the one presented in Figure 3 is sufficient. The issue of whether OD is impacted by cell elongation has been documented by Stevenson et al. (https://www.nature.com/articles/srep38828), and becomes a problem if trying to quantify parameters such as doubling time or when trying to estimate cell counts. We do not do either of these, as calculation of both requires an assumption of normal cell morphology in E. coli. We have added a note to clarify this in the first paragraph of the Discussion section, as per the suggestion from Reviewer #1.

      Reviewer #2 (Public Review):

      Ribonucleotide reductase (RNR) is crucial for de novo synthesis of the dNTP building blocks needed for DNA synthesis and is essential in nearly all organisms. In the current study, all three E. coli RNRs have been removed and the essential function of the enzyme is bypassed by the introduction of an exogenous deoxyribonucleoside kinase that enables dNTP production via salvage synthesis. This leads to a complete dependency on exogenously supplied deoxyribonucleosides (dNs), loss of control of dNTP regulation, and a highly increased mutation rate. The bacteria could also grow with only supplied deoxycytidine (and no other dNs), indicating that all dNTPs could be synthesized from deoxycytidine. An evolutionary analysis of the recombinant E. coli strain grown in multiple generations showed that mutations accumulated in genes involved in the catabolism of deoxycytidine and deoxyribose-1-P, supporting a model that all the other deoxyribonucleosides can be produced by a phosphorylase using nucleobases and deoxyribose-1-P as substrates and that the deoxycytidine (besides being a precursor of dCTP) could be a substrate to produce the deoxyribose-1-P needed by the phosphorylase working in the opposite direction.

      The story is very interesting with novel findings, and the experiments are well performed. There are a few missing pieces of information, but on the other hand, it is many steps to cover if everything is going to be shown in a single paper and I came to the conclusion that the data is enough at this stage. One of the missing points for future research is to check what happens with the dNTP pools. RNR is a very important enzyme to control the dNTP levels and it is likely that it is unbalanced dNTP pools that lead to the increased mutation rates. However, it would be interesting to really measure the dNTP pools and connect them to the mutations reported. Another missing piece is to identify which nucleoside phosphorylase is involved and investigate its substrate specificity to better understand why the cells can live on deoxycytidine but not other dNs.

      We thank the reviewer for these comments. It is certainly possible that the mutational biases we observe across the genomes of our evolved lines are related to skewed pools. We hope to examine this in a follow-up study. Likewise, it will be interesting to investigate the biochemical basis for our lines being able to grow solely on deoxycytidine, and to ascertain how this might also impact mutation.

      Reviewer #3 (Public Review):

      The study focuses on a compelling question focusing on a largely indispensable mechanism, ribonucleotide reduction. The authors generate a unique specific bacterial strain where the ribonucleotide reducatase operon, entirely, is deleted. They grow the mutant strain in environments that have various amounts of the necessary deoxyribonucleoside levels, further, they perform evolution experiments to see whether and how the evolved lines would be able to adapt to the limited deoxyribonucleosides. Finally, researchers identify key mutations and generate key isogenic genetic constructs where target mutants are deleted. A summary postulation based on the evolutionary trajectory of ribonucleotide reduction by bacteria is presented. Overall, the study is well presented, well-justified, and builds on fairly classic genetic and evolution experiments. The select question and hypotheses and the overall framing of the story are fairly novel for the respective communities. The results should be interesting to evolutionary biology researchers, especially those interested in RNA>DNA directional evolution, as well as molecular microbiologists interested in the ribonucleotide reception dependence and selection by the environment. A discussion on the limitations of the laboratory study for the broader understanding of the host dependence during endosymbiosis and parasitism would be a good addition given the emphasis on this phenomenon as a part of the broader impacts of the study.

      We thank the reviewer for suggestion that we consider the broader implications of our work. We have now added a final paragraph which addresses the question of why loss of ribonucleotide reduction appears so rare.

    1. Author Response:

      What is novel here is that we calculated the time-varying retinal motion patterns generated during the gait cycle using a 3D reconstruction of the terrain. This allows calculation of the actual statistics of retinal motion experienced by walkers over a broad range of normal experience. We certainly do not mean to claim that stabilizing gaze is novel, and agree that the general patterns follow directly from the geometry as worked out very elegantly by Koenderink and others.  We spend time describing the terrain-linked gaze behavior because it is essential for understanding the paper. We do not claim that the basic saccade/stabilize/saccade behavior is novel and now make this clearer.

      The other novel aspect is that the motion patterns vary with gaze location which in turn varies with terrain in a way that depends on behavioral goals. So while some aspects of the general patterns are not unexpected, the quantitative values depend on the statistics of the behavior.  The actual statistics require these in situ measurements, and this has not previously been done, as stated in the abstract.

      The measured statistics provide a well-defined set of hypotheses about the pattern of direction and speed tuning across the visual field in humans. Points of comparison in the existing literature are hard to find because the stimuli have not been closely matched to actual retinal flow patterns, and the statistics will vary with the species in question. However, recent advances allow for neurophysiological measurements and eye tracking during experiments with head-fixed running, head-free, and freely moving animals. These emerging paradigms will allow the study of retinal optic flow processing in contexts that do not require simulated locomotion. While the exact the relation between the retinal motion statistics we have measured and the response properties of motion-sensitive cells remains unresolved, the emerging tools in neurophysiology and computation make similar approaches with different species more feasible.

      A more detailed description of the methods including the photogrammetry and the reference frames for the measurements has been added primarily to the Methods section.

      Reviewer #1 (Public Review):

      Much experimental work on understanding how the visual system processes optic flow during navigation has involved the use of artificial visual stimuli that do not recapitulate the complexity of optic flow patterns generated by actual walking through a natural environment. The paper by Muller and colleagues aims to carefully document "retinal" optic flow patterns generated by human participants walking a straight path in real terrains that differ in "smoothness". By doing so, they gain unique insights into an aspect of natural behavior that should move the field forward and allow for the development of new, more principled, computational models that may better explain the visual processing taking place during walking in humans.

      Strengths:

      Appropriate, state-of-the-art technology was used to obtain a simultaneous assessment of eye movements, head movements, and gait, together with an analysis of the scene, so as to estimate retinal motion maps across the central 90 deg of the visual field. This allowed the team to show that walkers stabilize gaze, causing low velocities to be concentrated around the fovea and faster velocities at the visual periphery (albeit more the periphery of the camera used than the actual visual field). The study concluded that the pattern of optic flow observed around the visual field was most likely related to the translation of the eye and body in space, and the rotations and counter-rotations this entailed to maintain stability. The authors were able to specify what aspects of the retinal motion flow pattern were impacted by terrain roughness, and why (concentration of gaze closer to the body, to control foot placement), and to differentiate this from the impact of lateral eye movements. They were also able to identify generalizable aspects of the pattern of retinal flow across terrains by subsampling identical behaviors in different conditions.

      Weaknesses:

      While the study has much to commend, it could benefit from additional methodological information about the computations performed to generate the data shown. In addition, an estimation of inter-individual variability, and the role of sex, age, and optical correction would increase our understanding of factors that could impact these results, thus providing a clearer estimate of how generalizable they are outside the confines of the present experiments.

      Properties of gait depend on the passive dynamics of the body and factors such as leg length and subject specific cost functions which are influenced by image quality and therefore by optical correction. In this experiment all subjects were normal acuity or corrected to normal (with no information regarding their uncorrected vision). This is now noted in the Methods. The goal of the present work was to calculate average statistics over a range of observers and conditions in order to constrain the experience-dependent properties one might see in neurophysiology. We have added between-subjects error bars to Figure 2 and added gaze angle distributions as a function of terrain for individual observers in the Supplementary materials. Figure 4 b and d now show standard errors across subjects. Individual subject plots are shown in the Supplementary materials. For Figure 2, most variability between subjects occurs in the Flat and Bark terrains where one might expect individual choices of energetic costs versus speed and stability etc might come into play. This is supported by our subsequent unpublished work on factors influencing foothold choice. We have also found that leg length determines path choices and thus will influence the retinal motion. Differences between observers are now noted in the text. These individual subject differences should indicate the range of variability that might be expected in the underlying neural properties and perhaps in behavioral sensitivity. Because of the size of our dataset (n=11) it is not feasible to make comparisons of sex or age. There were equal numbers of males and females and age ranged from 24 to 54. Now noted in the Methods section.

      Reviewer #2 (Public Review):

      The goal of this study was to provide in situ measurements of how combined eye and body movements interact with real 3D environments to shape the statistics of retinal motion signals. To achieve this, they had human walkers navigate different natural terrains while they measured information about eyes, body, and the 3D environment. They found average flow fields that resemble the Gibsonian view of optic flow, an asymmetry between upper and lower visual fields, low velocities at the fovea, a compression of directions near the horizontal meridian, and a preponderance of vertical directions modulated by lateral gaze positions.

      Strengths of the work include the methodological rigor with which the measurements were obtained. The 3D capture and motion capture systems, which have been tested and published before, are state-of-the-art. In addition, the authors used computer vision to reconstruct the 3D terrain structure from the recorded video.

      Together this setup makes for an exciting rig that should enable state-of-the-art measurements of eye and body movements during locomotion. The results are presented clearly and convincingly and reveal a number of interesting statistical properties (summarized above) that are a direct result of human walking behavior.

      A weakness of the article concerns tying the behavioral results and statistical descriptions to insights about neural organization. Although the authors relate their findings about the statistics of retinal motion to previous literature, the implications of their findings for neural organization remain somewhat speculative and inconclusive. An efficient coding theory of visual motion would indeed suggest that some of the statistics of retinal motion patterns should be reflected in the tuning of neural populations in the visual cortex, but as is the present findings could not be convincingly tied to known findings about the neural code of vision. Thus, the behavioral results remain strong, but the link to neural organization principles appears somewhat weak.

      We agree, but we think that strengthening the neural links requires future studies. As mentioned above, it is very difficult to relate the measured statistics to existing neurophysiological literature and we have tried to make this clearer in the Discussion (p14, 15, 16). This is because the stimuli chosen are typically arbitrary and not chosen to be realistic examples of patterns consistent with natural motion across a ground plane. Other stimuli are simply inconsistent with self-motion together with gaze stabilization (eg not zero velocity at the fovea). It has also been technically difficult to map cell properties across the visual field. We have made the comparisons we thought were useful. The point of the paper is to provide a hypothesis about the pattern of direction and speed tuning across the visual field. So the challenge for neurophysiology is to show how the observed cell properties vary across the visual field. Note also that the motion patterns will be influenced by the body motion of the animal in question, and because of this we are now collaborating with a group who are attempting to record from monkey MT/MST during locomotion while tracking eyes and body. Similarly we are training neural networks to learn the patterns generated by human gait to develop more specific hypotheses about receptive field properties.

      Reviewer #3 (Public Review):

      Gaze-stabilizing motor coordination and the resulting patterns of retinal image flow are computed from empirically recorded eye movement and motion capture data. These patterns are assessed in terms of the information that would be potentially useful for guiding locomotion that the retinal signals actually yield. (As opposed to the "ecological" information in the optic array, defined as independent of a particular sensor and sampling strategy).

      While the question posed is fundamental, and the concept of the methodology shows promise, there are some methodological details to resolve. Also, some terminological ambiguities remain, which are the legacy of the field not having settled on a standardized meaning for several technical terms that would be consistent across laboratory setups and field experiments.

      Technical limits and potential error sources should be discussed more. Additional ideas about how to extend/scale up the approach to tasks with more complex scenes, higher speed or other additional task demands and what that might reveal beyond the present results could be discussed.

      This issue is addressed in more detail in the Discussion, second paragraph, and also the second last paragraph.

    1. Author Response

      Reviewer #1 (Public Review):

      This work presents a unification model (of sorts) for explaining how the flow of evidence through networks can be controlled during decision-making. The authors combine two general frameworks previously used as neural models of cortical decision-making, dynamic normalization (that implement value encoding via firing activity) and recurrent network models (which capture winner-take-all selection processes) into a unified model called the local disinhibition-based decision model (LDDM). The simple motif of the LDDM allows for the disinhibition of excitatory cells that represent the engagement of individual actions that happens through a recurrent inhibitory loop (i.e., a leaky competing accumulator). The authors show how the LDDM works effectively well at explaining both decision dynamics and the properties of cortical cells during perceptual decision-making tasks.

      All in all, I thought this was an interesting study with an ambitious goal. But like any good study, there are some open issues worth noting and correcting.

      MAJOR CONCERNS

      1. Big picture

      This was a comprehensive and extremely well-vetted set of theoretical experiments. However, the scope and complexity also made the take-home message hard to discern. The abstract and most of the introduction focus on the framing of LDDM as a hybrid of dynamic normalization models (DNM) and recurrent network models (RNMs). This is sold as a unification of value normalization and selection into a novel unified framework. Then the focus shifts to the role of disinhibition in decision-making. Then in the Discussion, the goal is stated as to determine whether the LDDM generates persistent activity and does this activity differ from RNMs. As a reader, it seems like the paper jumps between two high- level goals: 1) the unification of DNM and RNM architectures, and 2) the role of disinhibition. This constant changing makes it hard to focus as the reader goes on. So what is the big picture goal specifically?

      Also, the framing of value normalization and WTA as a novel computational goal is a bit odd as this is a major focus of the field of reinforcement learning (both abstractly at the computational level and more concretely in models of the circuits that regulate it). I know that the authors do not think they are the first to unify value judgements with selection criteria. The writing just comes across that way and should be clarified.

      We thank the Reviewer for their thoughtful consideration of the overall framing of the big picture goals of the paper. Upon reflection, we agree that the paper really centers on the importance of incorporating disinhibition into computational circuit-based models of decision-making. Thus, we have significantly revised the Introduction and Discussion to focus on the theoretical and empirical importance of incorporating disinhibition into computational models of decision-making, and use the integration of value normalization and WTA selection as an example of how disinhibition increases the richness of circuit decision models. Please see the response to recommendations below for more detail on the changes.

      1. Link to other models

      The LDDM is described as a novel unification of value normalization and winner-take-all (WTA) selection, combining value processing and selection. While the authors do an excellent job of referencing a significant chunk of the decision neuroscience literature (160 references!) the motif they end up designing has a highly similar structure to a well-known neural circuit linked to decision-making: the cortico-basal ganglia pathways. Extensive work over the past 20+ years has highlighted how cortical-basal ganglia loops work via disinhibition of cortical decision units in a similar way as the LDDM (see the work by Michael Frank, Wei Wei, Jonathan Rubin, Fred Hamker, Rafal Bogacz, and many others). It was surprising to not see this link brought up in the paper as most of the framing was on the possibility of the LDDM representing cortical motifs, yet as far as I know, there does not exist evidence for such architectures in the cortex, but there is in these cortical-basal ganglia systems.

      We thank the Reviewer for the suggestion to link the LDDM to disinhibition in CBG models; this is indeed an important body of empirical and computational work that we overlooked in the original manuscript. We have now added text to the Discussion to highlight the link between LDDM and these CBL disinhibition models, focusing on how they are conceptually similar and how they differ. Please see our response to recommendations below for a more detailed discussion of the revisions.

      1. Model evaluations

      The authors do a great job of extensively probing the LDDM under different conditions and against some empirical data. However, most of the time there is no "control" model or current state-of-the-art model that the LDDM is being compared against. In a few of the simulation experiments, the LDDM is compared against the DNM and RNM alone, so as to show how the two components of the LDDM motif compare against the holistic model itself. But this component model comparison is inconsistently used across simulation experiments.

      Also, it is worth asking whether the DNM and RNM are appropriate comparison models to vet the LDDM against for two reasons. First, these are the components of the full LDDM. So these tests show us how the two underlying architectural systems that go into LDDM perform independently, but not necessarily how the LDDM compares against other architectures without these features. Second, as pointed out in my previous comment, the LDDM is a more complex model, with more parameters, than either the DNM or RNM. The field of decision neuroscience is awash in competing decision models (including probabilistic attractor models, non-recurrent integrators, etc.). If we really want to understand the utility of the LDDM, it would be good to know how it performs against similarly complex models, as opposed to its two underlying component models.

      We greatly appreciate the Reviewer’s comments on the point of model comparison, which points out that our original manuscript failed to clearly convey a very important difference between the LDDM and the existing RNM(s). In the revision, we now make it clearer that the fundamental difference between the LDDM and the RNMs is the architecture of disinhibition (see the revised Introduction, especially p. 8 lines 164-168). The LDDM is not simply a combination of the DNM model with RNM architecture (a point we may have mistakenly conveyed in the original manuscript): the introduction of disinhibition separates LDDM inhibition into option-selective subpopulations, as opposed to the single pooled inhibition of RNM models. Given this fact, the LDDM predicts unique selectiveinhibition dynamics shown in recent optogenetic and calcium imaging results, a finding inconsistent with the common-pooled and non-selective inhibition assumed in the existing RNMs and many of its variants. Thus, we believe that a comparison between the LDDM and the RNM, which share similar level of complexity and numbers of parameters, is important.

      We also appreciated the Reviewer’s concern about testing the LDDM against alternative models. In order to better connect to the existing literature, we now compare the LDDM to another standard circuit model of decision-making - the leaky competing accumulator (LCA) model. The LCA is a circuit model that captures many of the aspects of perceptual decision-making seen in the mathematical drift diffusion model (DDM), but with a construction that allows for fitting to behavioral data and comparison of underlying unit activities. Please see our response to recommendations below for further detail.

      1. Comparison to physiological data

      I quite enjoyed the comparisons of the excitatory cell activity to empirical data from the Shadlen lab experiments. However, these were largely qualitative in nature. In conjunction with my prior point on the models that the LDDM is being compared against, it would be ideal to have a direct measure of model fits that can be used to compare the performance of different competing "control" models. These measures would have to account for differences in model complexity (e.g., AIC or BIC), but such an analysis would help the reader understand the utility of the LDDM in connecting with empirical data much better.

      We agree with the Reviewer that a quantitative comparison of the match between model neural predictions and empirical neurophysiological data is important. First, we wish to clarify that the model neural predictions are simulated from models fit to the behavioral (choice and RT data), not from fits to the neural activity traces – a point we now clarify in the text. While directly fitting dynamic models (LDDM, RNM, or LCA) to the neurophysiological data is appealing, there are currently several obstacles to this approach. The first problem is the complexity of the dynamic neural traces. Despite the long history of the random-dot motion paradigm, detailed features of the dynamics are still not understood. For example, the stereotyped initial dip after stimulus onset may reflect a reset of the network state to improve signal to noise ratio (Conen and Padoa-Schioppa, 2015) or simply reflect a surround suppression-like lateral inhibition in visual processing. A second problem is that the primary difference between the models is the activity of inhibitory (and disinhibitory) neurons, which are typically not recorded in neurophysiological experiments; thus, there is a lack of empirical data to which to fit the models. In the revision, we clarified that the model fitting to the Roitman & Shadlen data is for behavioral data only, and model unit activity traces are derived from models fit to behavioral data.

      That being said, we agree that a quantitative comparison of model activity predictions is helpful. Because the models are fit not to the neural data but to the behavioral data, rather than using likelihood-based measures like AIC and BIC we used a simple RMSE measure to compare the match between predicted and neural activity patterns (revised Fig. 6E, Fig 6-S4E, Fig 6-S5E). Please see response to recommendations below for details.

      Reviewer #2 (Public Review):

      The aim of this article was to create a biologically plausible model of decision-making that can both represent a choice's value and reproduce winner-take-all ramping behavior that determines the choice, two fundamental components of value- based decision-making. Both of these aspects have been studied and modeled independently but empirical studies have found that single neurons can switch between both of the aspects (i.e., from representing value to winner-take-all ramping behavior) in ways that are not well described by current biological plausible models of decision making.

      The current article provides a thorough investigation of a new model (the local disinhibition decision model; LDDM) that has the goal of combining value representations and winner-takes-all ramping dynamics related to choice. Their model uses biologically plausible disinhibition to control the levels of inhibition in a local network of simulated neurons. Through a careful series of simulation experiments, they demonstrate that their network can first represent the value of different options, then switch to winner-takes-all ramping dynamics when a choice needs to be made. They further demonstrate that their single model reproduces key components of value-based and winner-takes-all dynamics found in both neural and behavioral data. They additionally conduct simulation studies to demonstrate that recurrent excitatory properties in their network produce value-persistence behavior that could be related to memory. They end by conducting a careful simulation study of the influence of GABA agonists that provide clear and testable predictions of their proposed role of inhibition in the neural processes that underlie decision-making. This last piece is especially important as it provides a clear set of predictions and experiments to help support or falsify their model.

      There are overall many strengths to this paper. As the authors note, current network models do not explain both value- based and ramping-like decision-making properties. Their thorough simulation studies and their validation against empirical neural and behavioral data will be of strong interest to neuroscientists and psychologists interested in value- based decision-making. The simulations related to persistence and the GABA-agonist experiments they propose also provide very clear guidelines for future research that would help advance the field of decision-making research.

      Although the methods and model were generally clear, there was a fair amount of emphasis on the role of recurrence in the LDDM, but very little evidence that recurrence was important or necessary for any of the empirical data examined. The authors do demonstrate the importance of recurrence in some of their simulation studies (particularly in their studies of persistence), but these would need to be compared against empirical data to be validated. Nevertheless, the model and thorough simulation investigations will likely help develop more precise theories of value-based decision-making.

      We appreciate the Reviewer’s thoughtful comments. These comments - especially about anatomic recurrence and its relationship to the parameter 𝛼 - inspired us to think more about the uniqueness of the current circuit to others, especially the implications related to the parameters 𝛼 (i.e., self-excitation) and 𝛽 (i.e., local disinhibition). Recurrence is required to drive winner-take-all competition in the standard RNM of decision-making. However, we show here with both analytical and numerical approaches that recurrence helps WTA competition but is not necessary in our model. Instead, the key feature of the LDDM is to utilize disinhibition in conjunction with lateral inhibition to realize winner-take-all competition. That leads to many different predictions of the current model from the existing models, such as selective inhibition and flexible control of dynamics.

      In response to the Reviewer’s points and after careful consideration of the differential equations, we realized that in our model fitting, the 𝛼 parameter fitting to zero does not necessarily mean recurrence should be zero. The 𝛼 parameter shares a lot of similarity to the baseline gain control (parameter BG in our revision), and thus is unidentifiable in the current dataset. In the interest of parsimony, we did not include the parameter BG in the original manuscript, but now include it because it reveals the difficulty of interpreting fit 𝛼 values as simply the level of recurrence.

      Overall, disinhibition (𝛽) in the LDDM is required for WTA activity while recurrence (𝛼) can contribute but is not necessary; however, 𝛼 is theoretically important for generating persistent activity, with the caveat that in the current framework there is an unclear relationship between fit 𝛼 and recurrence. Regardless, we agree that the contribution of 𝛼 to the LDDM framework is worth further testing and examining with future empirical data.

      Reviewer #3 (Public Review):

      Shen et al. attempt to reconcile two distinct features of neural responses in frontoparietal areas during perceptual and value-guided decision-making into a single biologically realistic circuit model. First, previous work has demonstrated that value coding in the parietal cortex is relative (dependent on the value of all available choice options) and that this feature can be explained by divisive normalization, implemented using adaptive gain control in a recurrently connected circuit model (Louie et al, 2011). Second, a wealth of previous studies on perceptual decision-making (Gold & Shadlen 2007) have provided strong evidence that competitive winner-take-all dynamics implemented through recurrent dynamics characterized by mutual inhibition (Wang 2008) can account for categorical choice coding. The authors propose a circuit model whose key feature is the flexible gating of 'disinhibition', which captures both types of computation - divisive normalization and winner-take-all competition. The model is qualitatively able to explain the 'early' transients in parietal neural responses, which show signatures of divisive normalization indicating a relative value code, persistent activity during delay periods, and 'late' accumulation-to-bound type categorical responses prior to the report of choice/action onset.

      The attempt to integrate these two sets of findings by a unified circuit model is certainly interesting and would be useful to those who seek a tighter link between biologically realistic recurrent neural network models and neural recordings. I also appreciate the effort undertaken by the authors in using analytical tools to gain an understanding of the underlying dynamical mechanism of the proposed model. However, I have two major concerns. First, the manuscript in its current form lacks sufficient clarity, specifically in how some of the key parameters of the model are supposed to be interpreted (see point 1 below). Second, the authors overlook important previous work that is closely related to the ideas that are being presented in this paper (see point 2 below).

      1) The behavior of the proposed model is critically dependent on a single parameter 'beta' whose value, the authors claim, controls the switch from value-coding to choice-coding. However, the precise definition/interpretation of 'beta' seems inconsistent in different parts of the text. I elaborate on this issue in sub-points (1a-b) below:

      1a). For instance, in the equations of the main text (Equations 1-3), 'beta' is used to denote the coupling from the excitatory units (R) to the disinhibitory units (D) in Equations 1-3. However, in the main figures (Fig 2) and in the methods (Equation 5-8), 'beta' is instead used to refer to the coupling between the disinhibitory (D) and the inhibitory gain control units (G). Based on my reading of the text (and the predominant definition used by the authors themselves in the main figures and the methods), it seems that 'beta' should be the coupling between the D and G units.

      1b). A more general and critical issue is the failure to clearly specify whether this coupling of D-G units (parameterized by 'beta') should be interpreted as a 'functional' one, or an 'anatomical' one. A straightforward interpretation of the model equations (Equations 5-8) suggests that 'beta' is the synaptic weight (anatomical coupling) between the D and G units/populations. However, significant portions of the text seem to indicate otherwise (i.e a 'functional' coupling). I elaborate on this in subpoints (i-iii) below:

      (1b-i). One of the main claims of the paper is that the value of 'beta' is under 'external' top-down control (Figure 2 caption, lines 124-126). When 'beta' equals zero, the model is consistent with the previous DNM model (dynamic normalization, Louie et al 2011), but for moderate/large non-zero values of 'beta', the network exhibits WTA dynamics. If 'beta' is indeed the anatomical coupling between D and G (as suggested by the equations of the model), then, are we to interpret that the synaptic weight between D-G is changed by the top-down control signal within a trial? My understanding of the text suggests that this is not in fact the case. Instead, the authors seem to want to convey that top-down input "functionally" gates the activity of D units. When the top-down control signal is "off", the disinhibitory units (D) are "effectively absent" (i.e their activity is clamped at zero as in the schematic in Fig 2B), and therefore do not drive the G units. This would in- turn be equivalent to there being no "anatomical coupling" between D and G. However when the top-down signal is "on", D units have non-zero activity (schematic in Fig 2B), and therefore drive the G units, ultimately resulting in WTA-like dynamics.

      (1b-ii). Therefore, it seems like when the authors say that beta equals zero during the value coding phase they are almost certainly referring to a functional coupling from D to G, or else it would be inconsistent with their other claim that the proposed model flexibly reconfigures dynamics only through a single topdown input but without a change to the circuit architecture (reiterated in lines 398-399, 442-444, 544-546, 557-558, 579-590). However, such a 'functional' definition of 'beta' would seem inconsistent with how it should actually be interpreted based on the model equations, and also somewhat misleading considering the claim that the proposed network is a biologically realistic circuit model.

      (1b-iii). The only way to reconcile the results with an 'anatomical' interpretation of 'beta' is if there is a way to clamp the values of the 'D' units to zero when the top-down control signal is 'off'. Considering that the D units also integrate feed- forward inputs from the excitatory R units (Fig 2, Equations 1-3 or 5-8), this can be achieved either via a non-linearity, or if the top-down control input multiplicatively gates the synapse (consistent with the argument made in lines 115-116 and 585-586 that this top-down control signal is 'neuromodulatory' in nature). Neither of these two scenarios seems to be consistent with the basic definition of the model (Equations 1-3), which therefore confirms my suspicion that the interpretation of 'beta' being used in the text is more consistent with a 'functional' coupling from D to G.

      We thank the reviewer for pointing out this confusion. We apologize that the original illustrations (Fig. 2A) and the differential equations in Methods (Eqs. 5-8) did not convey very well our ideas. 𝛽 is intended to reference the coupling from R to D, not a change in the weights between D and G units. We realize there was some confusion on this part due to inconsistency between our original figures, text, and supplementary material.

      Given the lack of clarity in the previous version as well as the Reviewer’s questions, we now emphasize that 𝛽 represents a functional coupling between the R and D neurons. The biological assumption of the disinhibitory architecture is built based on recent findings that VIP neurons in the cortex always inhibit other neighboring inhibitory cells, such as SST and PV neurons, and consequently disinhibit the neighboring primary neurons (e.g., Fu et al., 2014; Karnani et al., 2014, 2016). We did not see evidence in the literature of fast-changing (anatomic) connections between VIP and SST/PV. However, there is evidence that the responsiveness of VIP neurons to excitatory neurons can be modulated by changing the concentrations of neuromodulators, such as acetylcholine and serotonin (Prönneke et al., 2020). While the stereotype of neuromodulator action is slow dynamics, recent findings show that for example basal forebrain cholinergic neurons respond to reward and punishment with surprising speed and precision (18 ± 3ms) (Hangya et al., 2015) to modulate arousal, attention, and learning in the neocortex. Given the large number of studies that identify long-term projections and neuromodulatory inputs to VIP neurons (e.g., Pfeffer et al., 2013; Pi et al., 2013; Alitto & Dan, 2013; Tremblay et al., 2016), we believe that it will be more plausible to assume the connection weights between R and D in our case is quickly modulated within a trial.

      To clarify this issue in the revised manuscript, we made the following corrections:

      1. We repositioned the 𝛽 parameter in Fig. 2A between the connection from R to D, to align the description of 𝛽 modulating R to D in the main text.

      2. We modified the differential equations 5-8 (now numbered as Eqs. 28-32) in Methods (pp. 61) to include the disinhibitory unit D as an independent control from the inhibitory unit I, in order to be consistent with the disinhibitory D units in LDDM. Such a change makes tiny differences in the model predictions (please see dynamics simulated after the change in Fig. 2-figure supplement 1B).

      3. We updated the neural circuit motif in Fig. 2 -figure supplement 1A accordingly.

      2) The main contribution of the manuscript is to integrate the characteristics of the dynamic normalization model (Louie et al, 2011) and the winner-take-all behavior of recurrent circuit models that employ mutual inhibition (Wang, 2008), into a circuit motif that can flexibly switch between these two computations. The main ingredient for achieving this seems to be the dynamical 'gating' of the disinhibition, which produces a switch in the dynamics, from point-attractor-like 'stable' dynamics during value coding to saddle-point-like 'unstable' dynamics during categorical choice coding. While the specific use of disinhibition to switch between these two computations is new, the authors fail to cite previous work that has explored similar ideas that are closely related to the results being presented in their study. It would be very useful if the authors can elaborate on the relationship between their work and some of these previous studies. I elaborate on this point in (a-b) below:

      2a) While the authors may be correct in claiming that RNM models based on mutual inhibition are incapable of relative value coding, it has already been shown previously that RNM models characterized by mutual inhibition can be flexibly reconfigured to produce dynamical regimes other than those that just support WTA competition (Machens, Romo & Brody, 2005). Similar to the behavior of the proposed model (Fig 9), the model by Machens and colleagues can flexibly switch between point-attractor dynamics (during stimulus encoding), line-attractor dynamics (during working memory), and saddle-point dynamics (during categorical choice) depending on the task epoch. It achieves this via a flexible reconfiguration of the external inputs to the RNM. Therefore, the authors should acknowledge that the mechanism they propose may just be one of many potential ways in which a single circuit motif is reconfigured to produce different task dynamics. This also brings into question their claim that the type of persistent activity produced by the model is "novel", which I don't believe it is (see Machens et al 2005 for the same line-attractor-based mechanism for working memory)

      We thank the Reviewer for pointing out the conceptual similarities between the LDDM and the Machens Romo Brody model, and now include a discussion of the link between the two early in the revised Discussion (p. 38, lines 826-837). Please see response to recommendations below for a more detailed discussion of this point.

      2b) The authors also fail to cite or describe their work in relation to previous work that has used disinhibition-based circuit motifs to achieve all 3 proposed functions of their model - (i) divisive normalization (Litwin-Kumar et al, 2016), (ii) flexible gating/decision making (Yang et al, 2016), and working memory maintenance (Kim & Sejnowski,2021)

      The Reviewer notes several relevant papers, and we have now discussed them and their relationship to the LDDM in a revised Discussion section (pp. 35-36). Please see response to recommendations below for a more details.

    1. Author Response

      Reviewer #2 (Public Review):

      The two new micropeptides are well characterized in the manuscript and appear to be functionally important with some chromatin-level consequences of their loss (which can be either direct or indirect), but the finding that lincRNA sequences encode micropeptides is not novel, and the two described in the paper appear to be zebrafish-specific and their function was tested only in zebrafish, which limits the interest in these genes. The use of ribosome profile data along behavioral screening to identify micropeptides is interesting and important, but the scope of the screen, the candidates selected for testing, etc. are not clear enough as presented. The ChIP-seq analysis of the new proteins is very interesting but is not described in any detail. Overall, the experimental part is well designed and the phenotypes reported by the authors appear to be strong and convincing, but the mechanistic understanding of what the two new proteins do and how, and the general interest in the results given the current scope of understanding of micropeptide is limited.

      We apologize for the misunderstanding that these genes are zebrafish-specific. In this revision, we have clarified throughout the text and with additional data that these genes are not zebrafish-specific, but that linc-mipep and linc-wrb are homologous to human Hmgn1.

    1. Author Response

      Reviewer #1 (Public Review):

      Francou et al. examine the dynamics of cell ingression at the primitive streak during mouse gastrulation and correlate this with the localization of elements of the apical Crumbs complex and the actomyosin cytoskeleton. Using time-lapse live imaging, they show that cells at the primitive streak ingress in a stochastic manner, by constricting their apical surface through a ratcheting shrinkage of individual junctions. Meticulous evaluation of immunofluorescent staining for many elements of the actomyosin contractile process as well as junctional and apical domain elements reveals anisotropic localization of Crumbs2, ZO1, and ppMLC. In addition, the localization of two groups of proteins showed a close correlation - actomyosin regulators and apical and junctional components - but there was a lack of correlation of localization of these two groups of proteins to each other. The localization of actomyosin and its activity, was altered and more homogeneous in Crumbs2-/- embryos, and there was a significant decrease in aPKC and Rock1. The authors conclude from these observations that Crumbs2 regulates anisotropic actomyosin contractility to promote apical constriction and cell ingression.

      The strengths of this manuscript are the very detailed observations on the process of apical constriction and the meticulous evaluation of the localization of the many proteins likely to be involved in the process. While many of the general observations are not new, Francou et al. provide a much richer understanding of this process, as well as a paradigm with which to evaluate the effects of mutations on the gastrulation process. The figures are beautiful, clear, and informative, and support the conclusions made by the authors. The data provide a very compelling picture of both the dynamics of cell behavior and the anisotropies in protein localization associated with it.

      However, much of the Crumbs2 mutant phenotype is not sufficiently explained by the authors' data or conclusions. First, the loss of Crumbs2 does not prevent ingression, as there are mesoderm cells evident between the epiblast and endoderm (Ramkumar et al., 2016, Xiao et al., 2011). There are certainly fewer, and the biggest effect appears to be during the elongation of the axis from E7.75 onward and not during the earlier migratory period (E6.5-E7.75) according to data from both previously published work (Xiao et al., 2011; Ramkumar et al., 2015, 2016) and the data presented here.

      • The reviewer makes a good point regarding the defects observed in Crumbs2 mutant embryos. It is true that in this mutant, a first wave of gastrulation EMT, taking place around E6.5, does not appear to be affected. We interpret this to mean that the gastrulation EMT is a sequential process under differential regulation, and that Crumbs2 is not required for the first wave of cells ingression through the primitive streak, at the onset of gastrulation. Consequently, a small number of early mesodermal cells are produced in Crumbs2 mutants. However, within 24hours of the onset of gastrulation, corresponding to around E7.75, ingression defects are evident in Crumbs2 mutant embryos.

      • For simplicity, these distinct sequential phases of gastrulation regulation, initially independent of Crumbs2, but subsequently dependent, were not initially discussed in our manuscript. We have now elaborated these details in the revised manuscript.

      Nor does the loss of Crumbs2 prevent apical constriction. Ramkumar et al. in their 2016 paper show by live imaging that the major effect of the Crumbs2 mutation is to prevent the cells from detaching from the epithelium, but that the apical domain does undergo constriction, leading to many elongated flask-shaped cells still attached at the apical end. These observations do not fit well with the model proposed by the authors of Crumbs2 regulating anisotropic actomyosin contractility to promote apical constriction and suggest a more complicated story.

      • We thank the reviewer for bringing this up, as it is an important point that we now discuss in greater detail and clarify in the revised manuscript.

      • Importantly, we do not believe our data are in disagreement with the previous study of Ramkumar et al. The precise details of the defect observed in Crumbs2 mutants are still not totally clear. However, we would like to point out that in Ramkumar et al., the timelapse imaging data did not depict cells constricting their surfaces, but rather these data revealed that cells having small apical surfaces failed to detach and delaminate out of the epiblast layer. Thus, this previous study focused on the subsequent step in the process of ingression (delamination), to that being addressed in the present work.

      • Furthermore, epiblast cells outside the domain occupied by the primitive streak, and even some cells positioned on the lateral sides of the embryo, were reported by Ramkumar and colleagues to exhibit abnormally small apical surfaces in Crumbs2 mutants. These cells, at a distance from the primitive streak, will not normally constrict their apical surfaces, since they are not going to undergo the gastrulation EMT, a behavior restricted to the region of the primitive streak. Thus, these previous data do not directly address nor demonstrate that epiblast cells in Crumbs2 mutants undergo apical constriction.

      • Moreover, in Crumbs2 mutants a large number of cells were reported to fail to ingress at the primitive streak, and consequently they were seen to accumulate within the epiblast epithelial layer. Indeed, we believe that the small apical surfaces first reported in Crumbs2 mutants by Ramkumar and colleagues, most likely result from the crowding/jamming of cells within the epiblast layer, and that this causes changes in the shape and volume of cells due to them being spatially constrained. Thus, increased crowding of epithelial cells within a spatially constrained tissue, likely drives a reduction in apical surface area and extensive apico-basal elongation, as observed in Crumbs2 mutants.

      However, the complications of the Crumbs2 mutant do not detract from the value of the basic observations presented in this manuscript, which are solid and well-documented, and will be a valuable resource for the field.

      Reviewer #2 (Public Review):

      In their manuscript, Francou and colleagues study the delamination of epiblast cells into the mesodermal layers using live imaging of mouse embryos cultured ex vivo. By segmenting the apical area of delaminating cells, they quantify extensively the dynamic behavior of delaminating cells. Using immunostaining and crumbs2 mutants, they propose that apical constriction of cells results from pulsed contractions, which could be guided by crumbs2 signals.

      The manuscript is interesting and provides extremely valuable data for our understanding of mouse gastrulation. Occasionally, the manuscript can be a bit confusing and contains a few inaccuracies.

      However, the main issues I have are with some of the interpretations from the authors, which may be incorrect due to limited time resolution (with a 5 min time resolution that was used, it might be difficult to distinguish pulses from measurement noise) and the analysis of immunostaining data, which would require more rigorous quantification.

      • We acknowledge the reviewer’s comments and agree that a shorter time resolution would be ideal to facilitate the detection of constriction pulses of apical surfaces. However, we need to consider that imaging the apical surface of cells within the epiblast layer, which constitutes the most internal surface inside the embryo, is technically challenging in a gastrulating mouse embryo.

      • As suggested by the reviewer, we attempted to image with a shorter time interval than 5min on several different microscope systems and modalities available at our institution (including two different laser point scanning confocals, a spinning disc system, as well as light-sheet microscopes with both upright and inverted configurations) and were not successful in acquiring usable images (having a shorted time-resolution) with the ZO1GFP knock-in reporter. We also need to consider that single-copy GFP knock-in reporters are often dim, thereby exacerbating the issue. In our hands, a high-speed resonant scanning confocal (Nikon A1RHD25) was the system that gave us the best signal-to-noise ratio, spatial resolution and temporal resolution, and was the set-up we used for our most recent live imaging experiments. Using this system, we were able to acquire a limited number of time-lapses with a time resolution of 2min, but none with a shorter time interval, and from our analyses, we determined that movies with a 2min time interval did not yield increased detail over movies with 5min time intervals to warrant a detailed reanalysis. We have provided additional detail relating to these technical issues within the revised manuscript and edited some of the conclusions.

      • We acknowledge that immunostaining is not the most quantitative method, but we were unable to come up with alternative methods that can be used with our samples. We believe the junctional reduction of Myosin, aPKC and Rock1 is generally due to a nonrecruitment or activation of these proteins at junctions, and do not reflect their reduced expression at the gene or protein level. We do not believe that methods such as RTqPCR or Western blotting would be informative in the context in which we are looking, especially since they do not yield spatial resolution. Furthermore, we would need to isolate primitive streak cells to consider applying these methods, and we do not believe they would provide a sufficient improvement over immunostaining.

      • By contrast to the live imaging, which was performed by placing the objective at the posterior side of the embryo in closest proximity to the outer visceral endoderm layer, for fixed tissue imaging, embryos were microdissected to recover the posterior side containing the primitive streak. Microdissected posterior regions were imaged on the side of the cavity by placing the objective in closest proximity to the inner epiblast layer, which permitted direct access to the apical surface of epiblast cells at the primitive streak. In this fixed tissue imaging configuration, the apical surfaces of cells in WT and Crumbs2 mutants were in closest proximity to the imaging objective and thus directly accessible. Thus, any difference in tissue thickness on the other side of the epithelium did not interfere with light penetration. We have edited the figures and include schematics to clarify how the objective positions are flipped with respect to the primitive streak regions at the embryo’s posterior for live vs. fixed tissue imaging.

      • We have now measured the signal intensity in the cytoplasmic region of WT and Crumbs2 mutant embryos, and junctional intensity measurements have been normalized to cytoplasmic intensities.

      Reviewer #3 (Public Review):

      The manuscript by Francou et al investigated cellular mechanisms of epiblast ingression during mouse gastrulation. The authors wanted to know whether/how epiblast cell-cell junctional dynamics correlate with apical constriction and subsequent ingression. Because mouse gastrula adopts an inverted-cup morphology (as a result of differential invasive behavior of polar and mural trophoblast cells), epiblast cells are located in the innermost position and are difficult to image. This is more so when one wants to perform live imaging of epiblast cells' apical surface. The authors tackled such problems/limitations by using a combination of ZO-1 GFP line, confocal time-lapse microscopy, fixed embryo immunostaining, and Crumbs2 mutant embryos. The authors observed that apical constriction was associated with cell ingression, that this constriction occurred in a pulsed fashion (i.e., 2-4 cycles with phases of contraction and expansion, eventually leading to reduction of apical surface and ingression), that this constriction took place asynchronously (i.e., neighboring epiblast cells did not exhibit coordinated behavior) and that junctional shrinkage during apical constriction also occurred in a pulsed and asynchronous manner. The authors also investigated localization/co-localization of several apical proteins (Crumbs2, Myosin2B, pMLC, ppMLC, Rock1, F-actin, PatJ, and aPKC) in fixed samples, uncovering somewhat reciprocal distribution of two groups of proteins (represented by Myosin2B in one group, and Crumbs2 in the other). Finally, the authors showed that Crumbs2 -/- embryos had disturbed actomyosin distribution/levels without affecting junctional integrity (partially explaining the ingression defect reported in Crumbs2 -/- mutant embryos). Overall, this manuscript offers high-quality live imaging data on the dynamic remodeling of epiblast apical junctions during mouse gastrulation.

      It would be interesting to see whether phenomena reported in this manuscript can be extended to the entire primitive streak (or are they specific only to a subset of mesoderm precursors) and to the entire period of mesendoderm formation. More importantly, it would be interesting to see whether the ingression behavior seen here is representative of all eutherian mammals regardless of their gastrular topography.

      • The reviewer raises a very interesting and important point. We focused our data analysis on a middle region in the proximo-distal axis of the embryo, because this is the most optically accessible and the flattest region of the posterior of the embryo to analyze. We also focused on the E7.5 stage of development when the primitive streak is fully elongated, so as to capture as many ingression events within a single time-lapse experiment as possible. Due to the difficulties associated with live imaging the apical epiblast layer of embryos at these stages, we chose to focus our analysis on a defined region of the embryo and a defined period of time. We acknowledge that it will be important to analyze different regions of the primitive streak and at different stages of gastrulation to glean any general versus more distinct modes of epiblast cell ingression, but given the technical difficulties discussed we believe that any extended analysis is beyond the scope of the current study.

      • We also agree that it would be interesting to know if the ingression behavior we observe in the mouse embryo is representative of all mammals, and even more generally of amniotes, but this is beyond the scope of our study.

    1. Author Response

      Reviewer #2 (Public Review):

      Throughout the manuscript, the authors aim to distinguish signal from the lack of it. All conclusions depend on the success of this process. In such an endeavor, the sensitivity of the applied methods is critical. Thus, the authors must use the most sensitive tools to draw meaningful conclusions. The latest iGluSnFR has amazing sensitivity allowing the detection of single AP-evoked responses. This is not the case for vGpH, which requires hundred APs to get a meaningful signal. Similar, synthetic Ca2+ dyes have much better dynamic range, linearity and sensitivity compared to GCaMP6f.

      The rate of silent boutons at 2 mM [Ca2+]e is lower for a single AP compared to 20 or 200 APs. The overall failure rate cannot be increased with increasing the number of APs. This clearly indicates a technical issue (e.g. insufficient sensitivity of vGpH and GCaMP6f).

      We thank the reviewer for raising this concern. We attribute the relatively lower rate of silencing with 1 AP in [Ca2+]e 2.0 mM in neurons expressing iGluSnFr to its sensitivity to detect glutamate exocytosed from neighboring, possibly non-transfected terminals. This limitation is described in the manuscript (page 7, line 26 – page 8, line 5). The overall agreement in the proportion of silencing with iGluSnFr compared to physin-GCaMP or vGpH at lower [Ca2+]e, where the contributions from neighboring terminals is likely greatly diminished, supports this interpretation.

      The authors used three different measuring tools and used three different stimulation protocols, making the interpretation of the data challenging. It is impossible to tell how the failure rate changes from 1 to 20 APs without knowing the release probability, the pool size, depletion, recovery of SVs, and facilitation. These are all unknown.

      In an ideal world, a measure of release probability during a train of stimuli at varied [Ca2+]e would provide the most insight, but this is difficult to achieve with any of the existing methods, including the remarkable new iGluSnFR. The challenge we face is, for our approach, it is impossible to exclude signals from neighboring axons that are closely packed near the axon harboring the indicator. This limitation is described in the manuscript (page 7, line 26 – page 8, line 5). Given this, we felt that showing that silencing can be revealed with all the different techniques was the most conservative approach to address the issue. Because we have focused on this phenomenon, the number of APs is experimentally important only to ensure an adequate response could be detected. We have also included, in the discussion, an acknowledgement of the possibility that we are failing to detect minimal Ca2+ entry (see response to #8 from the synthesized review).

      The last experiment with the GABAB agonist has little novelty in its present form. The authors demonstrate that GABAB agonism increases the rate of silent terminals. The interesting issue would be to reveal how the effect of GABAB activation depends on the [Ca2+]e. This information is essential to see whether there is indeed a shoulder in its effectiveness curve.

      We are grateful to the reviewer for this recommendation and we have performed additional experiments (see response to #7 from the synthesized review).

      The authors refer to a theoretical set-point in [Ca2+]e below which the function of the terminals is fundamentally different. From the presented experiments, the reviewer does not see any data that is inconsistent with a continuum. 'Thus, as with Ca2+ influx, SV recycling is modulated in an all-or-none manner by modest changes in [Ca2+]e around the physiological set point.' This statement is not supported by the data. The reviewer cannot see a set point.

      We appreciate the reviewer’s criticism and wish to clarify that we mean the normal physiologic [Ca2+]e in the CSF. We have changed the text to clarify this point (page 7, line 20).

    1. Author Response

      Reviewer #1 (Public Review):

      While the mechanism about arm-races between plant and specialist herbivores has been studied, such as detoxification of specific secondary metabolites, the mechanism of the wider diet breadth, so-called generalist herbivores have been less studied. Since the heterogeneity of host plant species, the experimental validation of phylogenetic generalism of herbivores seemed as hard to be conducted. The authors declared the two major hypotheses about the large diet breadth ("metabolic generalism" and "multi-host metabolic specialism"), and carefully designed the experiment using Drosophila suzukii as a model herbivore species.

      By an untargeted metabolomics approach using UHPLC-MS, authors attempted to falsify the hypotheses both in qualitative- and quantitative metabolomic profiles. Intersections of four fruit (puree) samples and each diet-based fly individual samples from the qualitative data revealed that there were few ions that occur as the specific metabolite in each diet-based fly group, which could reject the "multi-host metabolic specialism" hypothesis. Quantitative data also showed results that could support the "metabolic generalism" hypothesis. Therefore, the wide diet breadth of D. suzukii seemed to be derived from the general metabolism rather than the adaptive traits of the diverse host plant species. On the other hand, the reduction of the metabolites (ions) set using GLM seemed logical and 2-D clustering from the reduced ions set showed that quantitative aspects of diet-associated ions could classify "what the flies ate". These interesting results could enhance the understanding of the diet breadth (niche) of herbivorous insects.

      The authors' approach seemed clear to falsify the hypotheses based on the appropriate data processing. The intersection of shared ions from the qualitative dataset could distinguish the diet-specific metabolites in flies and commonly occurring metabolites among flies and/or fruits. Also, filtering on the diet-specific ions seemed to be a logical and appropriate way. Meanwhile, the discussion about the results seemed to be focused on different points regarding the research hypotheses which were raised in the introduction part. Discussion about the results mainly focused on the metabolism of D. suzukii itself, rather than the research hypotheses and questions that were raised from the evolution of the wide diet breadth of generalist herbivores. In particular, the conclusion seems to be far from the main context of the authors' research; e.g. frugivory. It makes the implication of the study weaker.

      We wish to thank Reviewer #1 for their appreciation of our study. As recommended, we now focus our discussion more on the general aspect of our findings (relevant to insects, herbivores, or frugivores), and less on the peculiarities of the metabolism of D. suzukii itself. Specifically, we now only mention D. suzukii in one section (two sentences) of our Discussion, to serve as an example (l.387-396). Thanks to this comment, the Discussion may interest a broader readership, on the evolution of diet breadth in generalist herbivorous species and offers a better understanding of the general implications of our findings.

      Reviewer #2 (Public Review):

      The manuscript: "Metabolic consequences of various fruit-based diets in a generalist insect species" by Olazcuaga et al., addresses an interesting question. Using an untargeted metabolomics approach, the authors study how diet generalism may have evolved versus diet specialization which is generally more commonly observed, at least in drosophila species. Using the phytophagous species Drosophila suzukii, and by directly comparing the metabolomes of fruit purees and the flies that fed on them, the authors found evidence for "metabolic generalism". Metabolic generalism means that individuals of a generalist species process all types of diet in a similar way, which is in contrast to "multi-host metabolic specialism" which entails the use of specific pathways to metabolize unique compounds of different diets. The authors find strong evidence for the first hypothesis, as they could easily detect the signature of each fruit diet in the flies. The authors then go on to speculate on the evolutionary ramifications of this for how potentially diet specializations may have evolved from diet generalism. Overall, the paper is well written, the experiments well documented, and the conclusions convincing.

      We thank Reviewer #2 for their comments and appreciation of our work.

      Reviewer #3 (Public Review):

      Laure Olazcuaga et al. investigated the metabolomes of four fruit-based diets and corresponding individuals of Drosophila suzukii that reared on them using comparative metabolomics analysis. They observed that the four fruit-based diets are metabolically dissimilar. On the contrary, flies that fed on them are mostly similar in their metabolic response. From a quantitative point of view, they find that part of the fly metabolomes correlates well with that of the corresponding diet metabolomes, which is indicative of insect ingestive history. By further focusing on 71 metabolites derived from diet-specific fly ions and highly abundant fruit ions, the authors show that D. suzukii differentially accumulates diet metabolism in a compound-specific manner. The authors claim that the data support the metabolic generalism hypothesis while rejecting the multi-host metabolic specialism hypothesis. This study provides a valuable global chemical comparison of how diverse diet metabolites are processed by a generalist insect species.

      Strengths:

      The rapid advances in high-resolution mass spectrometry have recently accelerated the discovery of many novel post-ingestive compounds through comparative metabolomics analysis of insect/frass and plant samples. Untargeted metabolomics is thus a very powerful approach for the systematic comparison of global chemical shifts when diverse plant-derived specialized metabolites are further modified or quantitatively metabolized after ingestion by insects. The technique can be readily extended to a larger micro- or macro-evolutionary context for both generalist and specialist insects to systematically investigate how plant chemical diversity contributes to dietary generalism and specialism.

      We would like to thank Reviewer #3 for their insightful comments on the power of untargeted metabolomics to evaluate the fate of plant metabolites and their use by herbivores. We also agree that these techniques can be used to tackle eco-evolutionary issues, such as the origin and maintenance of dietary generalism and specialism here. We hope that our study will inspire other researchers to explore such techniques and experiments to gain a global overview of biochemistry fluxes and their evolution. We now mention it in the conclusion (L454-459).

      Weaknesses:

      The authors claim that their data support the hypothesis of metabolic generalism, however, a total analysis of insect metabolism may not generate a clean dataset for direct comparison of fruit-derived metabolites with those metabolized by D. suzukii, given that much of these metabolites would be "diluted" proportionally by insect-derived metabolites. If the insect-derived metabolites predominate, then, as the authors observed, a tight clustering of D. suzukii metabolomes in the PCA plot would be expected. It is therefore very difficult to interpret these patterns.

      We agree with Reviewer #3 that a careful examination of the different possible origins of metabolites should take place to distinguish between our two competing hypotheses.

      The only source of metabolites for insects in our experimental setup is a mixture of (i) a large proportion of fruit purees and (ii) a minor proportion of artificial medium consisting mainly of yeast. Our goal is thus to understand the fate of (i) “fruit-derived” metabolites (transformed and untransformed), while controlling for (ii) “artificial media-derived” metabolites, that constitute a nuisance signal but are necessary for a complete development in our system.

      By “fruit-derived” and “insect-derived” metabolites, it is our understanding that Reviewer #3 means “fruit” metabolites (when in insects, untransformed “fruit-derived” metabolites) and “artificial medium-derived” metabolites. It is true that we do wish to avoid a predominance of “artificial medium-derived” metabolites and focus on “fruit-derived” metabolites in insects. We also want to note that it is of primary importance in our study to distinguish between “fruit” metabolites that are carried as is (“fruit” metabolites present in insects, ie untransformed “fruit-derived” metabolites), and “fruit” metabolites that are used after transformation by the insect (i.e., transformed “fruit-derived” metabolites).

      We agree with Reviewer #3 that the presence of “artificial medium-derived” metabolites could be problematic in direct comparisons of fruits and insects (and not among fruits or among insects’ comparisons).

      However, we took some steps to avoid such problems:

      1. We included control fly samples in our experiment: at each experimental generation, flies developed only on artificial medium (without fruit puree) were collected and processed simultaneously with flies that developed on fruit media. Results using these artificial medium-reared flies as controls (by subtracting their ions levels and removing ions that were similar, respective of their generation) were similar to results using raw data and conclusions were identical (see below).

      2. We lowered the proportion of artificial medium in our fruit media so that it was kept to a minimum, compatible with larval development and adult survival.

      Consistent with the low impact of this “artificial medium” component on our conclusions, we also wish to point out the presence pattern of metabolites found only in flies and never in fruits when using raw data (Figure 3, yellow stack). Even in the most conservative hypothesis of 100% of these metabolites originating from our artificial medium (which is probably not the case), we observe that it constitutes only a minor proportion of metabolites common to all flies (15.7%).

      For your consideration, we include below the main Figures, using both raw data and artificial medium-controlled:

      Figure 2, left = raw data; right = artificial-media controlled:

      Figure 3, left = raw data; right = artificial-media controlled:

      Figure 3S1, left = raw data; right = artificial-media controlled:

      Figure 4, above = raw data; below = artificial-media controlled:

      We hope that we convinced the Editor/Reviewers that raw data and artificial-medium controlled data provide a single and same answer to all our analyses. We chose to present only raw data, to simplify the Materials & Methods section.

      We however modified the current version of the manuscript to inform the reader that proper controls were done and that their inclusion do not modify any of our conclusions (l.110-113 and l.583-589).

      We also wish to point out two additional comments:

      • As Reviewer #1 also recommended, we modified the expectations drawn in Fig1G to better consider the general comment of “insect derived” metabolites being fundamentally different from plant metabolites (even if we do show in our study that only approx. 9% of metabolites are private to flies).

      • The main part of our care in the use of this global PCA analysis is that it follows two other analyses (global intersection and comparison of intersections among fruits and among flies) and precedes another one (fly-focused PCA). We hope that all these analyses help the readers get a comprehensive overview of the dataset and associated results, avoiding reliance on a single analysis.

      • We also help readers to explore and visualize all analyses presented in our manuscript by setting up a shiny application (in addition to our available dataset and R code), at https://fruitfliesmetabo.shinyapps.io/shiny/. This is now mentioned in the main text (l.588-589).

      We thank the Reviewer for their comment that greatly improved the manuscript.

      The authors generated a qualitative dataset using the peak list produced by XCMS which contains quantitative peak areas, it is unclear how the threshold was selected to determine if a peak is present or absent in a given sample. The qualitative dataset would influence the output of their data analysis.

      The referee is right in pointing out that the threshold used to determine if a peak is present or absent in a given sample was not clearly specified. This has now been corrected in the “Host use” section of the Materials & Methods (l.513-516). Briefly, a given replicate of a compound was considered present if the corresponding peak area following XCMS quantification was > 1000. This threshold was selected to be close to the practical quantification threshold of the Thermo Exactive mass spectrometer used in this study. This threshold was selected in order to allow the quantification of low-abundance compounds, as many plant-derived diet compounds were expected to be present in trace amounts in flies. We additionally applied a stringent rule for presence of any given compound (presence in at least 3 biological replicates).

      The authors reply on in-source fragmentation for peak annotation when authentic standards are not available. The accuracy of the annotation thus requires further validation.

      The Supplementary Table 1 was unfortunately omitted in the first submission of the manuscript. This oversight has been now corrected and the Supplementary Table 1 details all information used for metabolite annotation. In particular, MS/MS data comparison with mass spectral databases as well as with published literature have been added to substantiate metabolite identifications. This MS/MS data was produced thanks to the comment of the Reviewer. We also provide four more annotations from standards to attain 30 / 71 identifications validated through chemical standards.

    1. Author Response

      Reviewer #1 (Public Review):

      Part 1: Type 2 deiodinase

      Table I is supposed to clarify and summarize the results but brings confusion. The text says that table I supports the claim that "in the cerebellum, Luc-mRNA was lower in the Ala92-Dio2 mice" whereas figure 1G does not show any difference. It is unclear whether Table I and figure 1 report the same data, and what the statistical tests are actually addressing (effect of genotype vs effect of treatment, whereas what matters here is only the interaction between genotype and treatment). Overall, it is not acceptable to present quantitative data without giving numbers, standard deviation, p-value, etc. as in Table I.

      Thank you. We agree with the reviewer. We intended to minimize the amount of data presented, which was already very large, and therefore only presented the ratios of thr/alaDio2 and which created confusion. This part was removed from the new version of the MS.

      Also, evaluating T3 signaling by only looking at the luc reporter and the Hprt housekeeping gene is not always sufficient (many T3 responsive genes can be found in the literature and more than one housekeeping gene should be used as a reference).

      Thank you. The advantage of using the THAI mouse is that the Luciferase reporter gene is driven by a promoter that is only sensitive to T3, which is not the case for any other T3-responsive responsive gene. The Hprt housekeeping signal was stable among the samples, and the differences observed were not caused by differences in the housekeeping gene expression. This part was removed from the new version of the MS.

      Another important weakness is that the wild-type mice have a proline at position 92. Why not include them? In absence of structural prediction, one wonders whether the mouse models are relevant to the human situation and whether the absence of the proline reduces the enzymatic activity when substituted for an Ala or Thr. This might have been addressed in previous work, but the authors should explain.

      The position 92 in DIO2 is occupied by Thr in humans. Its Km(T4) is indistinguishable from mouse Dio2 which has a Pro in the position 92 (4nM vs. 3.1nM) [PMID 8754756; PMID: 10655523]. Humans also carry an Ala in position 92. Comparing the two human alleles is the purpose of the study.

      Experiment 2: Ala92-Dio2 Astrocytes Have Limited Ability to Activate T4 to T3

      Here, the authors use primary cell cultures from different areas of the brain to measure the in vitro conversion of T4 to T3 by Dio2. They find that hippocampus astrocytes are less active, notably if they come from Ala92-Dio2 mice.

      This part has the following weaknesses:

      • This result correlates with the results from Fig 1F however the difference between Ala92-Dio2 and Thr92-Dio2 is significant in vitro, but not in vivo.

      From a deiodinase perspective, TH signaling in vivo depends on the presence of D2 (expressed in glial cells) and D3 (expressed in neurons), whereas in vitro it only depends on D2. In fact, D2 and D3 are known for a reciprocal regulation to preserve TH signaling [PMID: 33123655]. Thus, it is conceivable that the differences observed between the two models are explained by the intrinsic differences in the models.

      What matters is not the activity/astrocytes, but the total activity of the brain area, which depends on the number of astrocytes x individual activity. This is not measured.

      We respectfully disagree with the reviewer. The total D2 activity in a brain area depends fundamentally on the number of astrocytes in that area and on the intrinsic activity of the enzyme. The reviewer is suggesting that having an area denser in astrocytes expressing a catalytically less active D2 preserves a normal local T3 production. This is unlikely to be the case because we have no evidence that the density of astrocytes is different in Ala-DIo2 mice. Please keep in mind that the intimate relationship between astrocytes and neurons is what defines the microenvironment that surrounds the neuron. By separating astrocytes from neurons we are able to measure T3 production that is occurring in the neuronal microenvironment and show that cells obtained from AlaDio2 mouse produce less T3.

      • What the authors called 'primary astrocytes' is an undefined mixed population of glial cells, (including radial glial cells, stem cells, ependymal cells, progenitor cells, etc...) that proliferated differentially for more than a week in culture, among which an unknown ratio expresses Dio2. The cellular model is thus poorly characterized, and the interpretation must be prudent.

      • Again, wild-type mice are not included.

      Thank you. We now include a reference to illustrate the types and percentages of cells present in our cultures. Given that the study is to compare the Thr92 and the Ala92 alleles, which are both present in humans, we did not believe it was necessary to include them here. Please note (as explained above) the Km(T4) for Thr92 and Pro92-Dio2 is indistinguishable.

      Part 2: Neuronal response to T3 Involves MCT8 and Retrograde TH transport

      The authors next move to primary neuronal cultures, prepared from the fetal cortex which they grow in the microfluidic chamber to study axonal transport. This is a surprising move: the focus is not on Dio2 anymore, but on the MCT8 transporter, which is known in humans to play an important role to transfer TH into the brain. It is expressed mainly in glia, but also in neurons. They study the influence of endosomes and type 3 deiodinase on the trafficking and metabolism of TH.

      Thank you.

      It would be useful to perform an experiment, in which radioactive T3 is introduced in the "wrong" side of the chamber, in an attempt to detect a possible anterograde transport. This would address the possibility that Mct8 also promotes efflux and control so that the chamber is not leaking.

      Thank you. To satisfy the reviewer, we have conducted three new experiments adding 125IT3 in the MC-CS. The first experiment verified that the T3 transport in the cortical neurons also occurs anterogradely. The second experiment showed that the anterograde transport depends on mct8. The third experiment shows that D3 activity in the neuronal soma is limiting the amount of T3 transported along axons. We have included a new paragraph in the results section describing these experiments (Line 154 to 167), and a new supplementary figure (Figure 3—figure supplement 3). We have also discussed these new findings. Line 383 to 386. In every experiment, we have controlled for the possibility of leaking using one device without neurons that received radioactive T3. After 24 and 72h samples from the opposite side were obtained but did not contain any radioactive T3. We refer the reviewer to figure 1, where this is explained.

      The authors use sylichristin as an inhibitor of Mct8, to demonstrate that transport is Mct8 dependent. They do not provide indications or references that would clearly indicate that this drug is a fully selective antagonist of Mct8 (but not of Oatp1c1, Mct10, Lat1, Lat2, etc., the other TH transporters). A good alternative would be to use Mct8 KO mice as controls.

      Thank you. We refer the reviewer to reference 27 [J. Johannes et al., Silychristin, a Flavonolignan Derived from the Milk Thistle, Is a Potent Inhibitor of the Thyroid Hormone Transporter MCT8. Endocrinology 157, 1694-1701 (2016)] clearly indicating that Silychristin has a remarkable specificity toward MCT8. While using mct8 KO is interesting, it would have prevented us from testing some of our hypotheses. Being able to selectively inhibit Mct8 either in the MC-CS or in the MC-AS was a clear advantage. For example, pls see the experiment in which we add T3 in the MC-AS and the silychristin in the MC-CS (Fig. 3F). Here, we discovered new roles of mct8, such as its involvement in the release of T3 from the endosomes (line 228 to 231).

      The B27 used in primary neuronal culture might contain TH. This is not easy to know, but at least some batches do.

      Thank you. While the neurons were cultured in B27, all experiments were performed in cells incubated with neurobasal only (B27 was removed 24 earlier). This was not clear in the initial version, where there was only a vague reference in the legend of figure 3F. Now, this has been explained in the footnote of figure 3 and in line 207.

      The presence of astrocytes, probably expressing Mct8 and Dio2 is inevitable in primary neuronal cultures, and is not mentioned, but might interfere with TH metabolism.

      Thank you. We were aware that, under normal conditions, primary neuronal culture contains 25% of astrocytes. This was however minimized/eliminated by 2-day culture with the anti-mitotic cytosine arabinoside, which restricts astrocytes and microglia to <0.01 in this type of culture. This was explained in the initial version of the manuscript in the material and methods section (lines x to x) and supported with reference 53 (reference 57 in the previous version).

      Part 3: T3 Transport Triggers Localized TH Signaling in the Mouse Brain

      The authors return to in vivo experiments, implanting T3 crystals, labeled or not with radioactive iodine. They do so in the hypothalamus, where they address the retrograde transport of TH in TRH neurons, and in the cortex, looking for contralateral transport. These data are the most difficult to interpret. - First, T3 is hydrosoluble and would probably migrate without active transport.

      Thank you. Please note that at no point we characterized the T3 transport “active transport”, which by definition is an ATP-dependent process. Please note that to address the issue raised by the reviewer “migrate without active transport”, in both experimental approaches, we included controls to assess the random diffusion of T3.

      In hypothalamic studies, we used the (i) cerebral cortex and (ii) the lateral hypothalamus, a region that is immediately adjacent to the PVN. Neither region exhibit an axonal connection to the median emminence. The results, in both cases, show that the presence of radioactive T3 in the control areas was minimal when compared to the PVN (Fig. 5C).

      In the cerebral cortical studies, we included ipsi- and contra-lateral hypothalamic measurements that served as controls given the absence of a connection between the cortex and the hypothalamus. Accordingly, T3 signaling was not detected in any of the control regions (Fig. 6C previous version; now figure 5). Thus, these controls indicate that it is unlikely that the results could be explained by “migrate without active transport” of T3.

      • The authors do not demonstrate that these specific neuronal populations contain Mct8, and that these observations are connected to the previous in vitro observation (which used cortical neurons prepared from the fetus).

      Thank you. In the previous version, we did not make it abundantly clear that the EM pictures in Fig. 3D-G (previous version; now figure 2 D-G) were from neurons in the mouse motor cortex (this information is now explained in lines 149 to 151), which is where we inserted the T3 crystals. In addition, we have done more histological work on the brain M1 (cortex) of adult mice and found that many neurons in the M1 express D3 and Mct8—lines 433-434 and Figure 5 G-K (along with histological studies showing the specificity of the ab against D3 Fig S6).

      The possibility that astrocytes are involved, as reported in the literature, is not considered.

      • Here again, using Mct8KO mice would greatly help to interpret the data. In particular, the experiments with cold T3 involve a 48h delay which is very long in comparison to the 30 minutes required for long-distance transfer of radioactive T3.

      Thank you. We are unsure about the question posed by the reviewer. We are wondering how would astrocytes play a role in inter-hemispheric transport of T3? Given that astrocytes are not known to project across long distances, we have not considered this possibility. We agree that using the Mct8KO mouse could have provided supporting evidence of the role played by Mct8 in this process, but please keep in mind that the Mct8KO mouse does not have or exhibits a very mild brain phenotype, indicating that during development compensatory mechanisms have occurred that obviate the function of the transporter. This compensatory mechanism most likely involved Oatp1c1, given that only the double Mct8 and Oatp1c1 KO mouse develops a significant phenotype. This consideration directed us to the utilization of sylycristin, the highly selective Mct8 inhibitor, which disrupts the Mct8 pathway in a mouse that developed normally.

      The two approaches used to demonstrate neuronal T3 transport in vivo are fundamentally different. The hypothalamus experiments employed radioactive T3, whereas T3 crystals were used in the cerebral cortex. The first approach studied T3 transport whereas the second studied downstream T3 effects, logically requiring more time. The solid T3 implant requires time to release T3 and activate gene expression. In the original paper that utilized T3 implants in the rodent brain, samples were processed after 4 days. (Dyess et al. 1988 Endo; PMID 3139393)

      Discussion

      Considering the diversity of questions that are addressed in the study, it is not surprising that the discussion is not covering all aspects. The authors implicitly consider that their conclusions can be extended to all neurons, while they use in their experiments a variety of different populations coming from either the fetal cortex, hippocampus, adult cortex, or hypothalamus. The claim that they discovered a mechanism applying to all neurons is not supported by the data.

      Thank you. We agree with the reviewer: the high number of neuronal subtypes might include different mechanisms in T3 transport. Our studies involved cortical (central) and dorsal root ganglia (peripheral) neurons in vitro and cortical and hypothalamic neurons in vivo. Thus we think that the described mechanism is not confined to specific neuronal subtypes. The discussion has been modified accordingly (lines 402 to 411).

      Moreover, we have done immunofluorescence studies to characterize the neurons present in the MC-CS better. We have found that all the neurons residing in the MC-CS are excitatory, expressing the vesicular glutamate transporter 1 (Vglut1). But no neurons were expressing GAD67, a marker for inhibitory neurons Figure 5—figure supplement 5). This is supported by the fact that during the mouse's brain development, the embryonic days 14.5 to 17.5 is the birth date of layer 4 and 2/3 excitatory neurons (PMID: 34163074). These neurons are migrating and have not extended their cellular processes, making them more likely to survive the isolation protocol from the cortex. On the other hand, the neurons (mostly excitatory) already residing in the cortex may have expanded their processes and changed their morphology, making them less capable of surviving the isolation process.

      Some highly relevant literature is not cited. In particular:

      • Mct8 KO mice do not have marked brain hypothyroidism (PMID: 24691440) which at least suggests that the pathway discovered by the authors can be efficiently compensated by alternative pathways.

      We agree with the reviewer. As mentioned above, a compensatory mechanism triggered during development “compensates” for the inactivation of Mct8. That, however, does not mean that mct8 is not critically important. We have added that limitation to the discussion (lines 342); ref 46.

      • Dio3 KO only increases T3 signaling in a few brain areas and only in the long term (PMID: 20719855).

      Thank you. That is now included in the ms; ref 25.

      • Anterograde transport of T3 has been reported for some brainstem neurons (PMID: 10473259).

      Thank you. This was our mistake, indeed. We had worked on several versions of the manuscript that included references to her seminal work but unfortunately deleted it from the final version. This is now included in refs 48 and 49.

      Reviewer #2 (Public Review):

      Salas-Lucia et al. investigated two main questions: whether the Thr92Ala-DIO2 mutation impairs brain responsiveness to T4 therapy under hypothyroidism induction and the mechanisms of neuronal retrograde transport of T3. They find that the Thr92Ala-DIO2 mutation reduces T4-initiated T3 signaling in the hippocampus, but not in other brain regions. Using neurons cultured in microfluidic chambers, they further describe a novel mechanism for retrograde transport of T3 that depends on MCT8 and endosomal loading (possibly protecting T3 from D3-mediated cytosolic degradation) and microtubule retrotransport. Finally, they present evidence of retrograde transport of T3 through hypothalamic projections and interhemispheric connections in vivo. The main novelty of this study is the delineation of the mechanism of T3 retrograde transport in neurons. This is interesting from the cell biology perspective. The notion of impaired hippocampal T3 signaling is relevant for the cognitive outcomes of hypothyroidism and its associated therapy.

      Thank you.

      Although the data are exciting and relevant for the community, some issues need to be addressed so that conclusions are more clearly justified by data:

      1) The title and the abstract mean that dissecting this novel mechanism of T3 retrograde transport may help improve cognition or brain responsiveness in patients taking T4 or L-T3 therapy. However, how initial results (Figs 1 and 2) connect to later data is not essentially clear. For example, do Thr92Ala-DIO2 mice present altered retrograde transport of T3? Would stimulation of retrograde transport in Thr92Ala-DIO2 mice rescue neurological phenotypes? Can the authors address this experimentally?

      Thank you. These are all interesting points raised by the reviewer. However, the three reviewers felt that a connection between the studies in astrocytes and the studies in neurons was missing, and complained about the disjoint nature of the manuscript. To satisfy the reviewers we removed from the MS the experiments with astrocytes and DIO2 polymorphism, and focused on the neuronal transport of T3.

      2) Although the authors present in vivo evidence of retrograde T3 transport in the hypothalamus and motor cortex, given the select susceptibility of the hippocampus to hypothyroidism, it would be especially interesting to test whether this mechanism also happens in a hippocampal circuit (CA3-CA1 Schaffer collaterals, mossy fibers or perforant pathway).

      Thank you. We agree that this would be interesting, but technically challenging. Nonetheless, we intend to study this in the future.

      3) Table 1 should present the raw values for Ala92-DIO2 mice and treatments instead of only displaying the direction of change and statistical significance. From Panels 1E-J, it is unclear if Thr92Ala-DIO2 mice or treatments caused any real change in brain regions other than the hippocampus.

      Thank you. These experiments were removed from the new version of the MS.

      4) The authors put forward the notion that a rapid nondegradative endosome/lysosome incorporation protects T3 from D3 degradation in the cytosol. Their experiments with pharmacological modulation of MCT8, lysosomes, and microtubules are in this direction. However, they do not represent an unequivocal demonstration of this mechanism. Therefore, the authors should be more cautious in their interpretation and discuss the limitations of their approaches.

      Thank you. The manuscript was edited to reflect these important points.

      Reviewer #3 (Public Review):

      Initially, Salas-Lucia et al examined the effect of deiodinase polymorphism on thyroid hormone-medicated transcription using a transgenic animal model and found that the hippocampus may be the region responsible for altered behavior. Then, by changing to topic completely, they examined T3 transport through the axon using a compartmentalized microfluid device. By using various techniques including an electron microscope, they identified that T3 is uptaken into clathrin-dependent, endosomal/non-degradative lysosomes (NDLs), transported in the axon to reach the nucleus and activate thyroid hormone receptor-mediated transcription.

      Although both topics are interesting, it may not be appropriate to deal with two completely different topics in one paper. By deleting the topic shown in Table 1, Figure 1, and Figure 2, the scope of the manuscript can be more clear.

      Thank you. We did as suggested by the reviewer. These studies were removed from the present version of the ms.

      Their finding showing that triiodothyronine is retrogradely transported through axon without degradation by type 3 deiodinase provides a novel pathway of thyroid hormone transport to the cell nucleus and thus can contribute greatly to increasing our understanding of the mechanisms of thyroid hormone action in the brain.

      Thank you.

    1. Author Response

      Reviewer #2 (Public Review):

      In their study the authors aimed to investigate the dissemination of Enterobacterales plasmids between geographically and temporally restricted isolates recovered from different niches, such as human blood stream infections, livestock, and wastewater treatment works. By using a very strict similarity threshold (Mash distance < 0.0001) the authors identified so-called groups of near-identical plasmids in which plasmids from different genera, species, and clonal background co-clustered. Also, 8% of these groups contained plasmids from different niches (e.g., human BSI and livestock) while in 35% of these cross-niche groups plasmids carried antimicrobial resistance (AMR) genes suggesting recent transfer of AMR plasmids between these ecological niches.

      Next, the authors set-out to examine the wider plasmid population structure by clustering plasmids based on 21-mer distributions capturing both coding and non-coding plasmid regions and using a data-driven threshold to build plasmid networks and the Louvain algorithm to detect the plasmid clusters. This yielded 247 clusters of which almost half of the clusters contained BSI plasmids and plasmids from at least one other niche, while 21% contained plasmids carrying AMR genes. To further assess cross-niche plasmids similarities, the authors performed an additional plasmid pangenome-like analysis. This highlighted patterns of gain and loss of accessory plasmid functions in the background of a conserved plasmid backbone.

      By comparing plasmid core gene or plasmid backbone phylogenies with chromosome core gene phylogenies, the authors assessed in more detail the dissemination of plasmids between humans and livestock. This indicated that, at least for E. coli, AMR dissemination between human and livestock-associated niches is most likely not the result of clonal spread but that plasmid movement plays an important role in cross-niche dissemination of AMR.

      Based on these data the authors conclude that in Enterobacterales plasmid spread between different ecological niches could be relatively common, even might be occurring at greater rates than estimated, as signatures of near-identity could be transient once plasmids occupy and adept to a different niche. After such a host jump, subsequent acquisition, and loss of parts of the accessory plasmid gene content, as a result of plasmid evolution after inter-host transfer, may obscure this near-identity signature. As stated by the authors, this will raise challenges for future One Health-based genomic studies.

      Strengths

      The article is well written with a clear structure. The authors have used for their analysis a comprehensive collection of more than 1500 whole genome sequenced and fully assembled isolates, yielding a dataset of more than 3600 fully assembled plasmids across different bacterial genera, species, clonal backgrounds, and ecological niches. A strong asset of the collection, especially when analyzing dissemination of AMR contained on plasmids, is that isolates were geographically and temporally restricted. Bioinformatic analyses used to discern plasmid similarity are beyond state-of-the-art. The conclusions about dissemination of plasmids between genera, species, clonal background and across ecological niches are well supported by the data. Although conclusions about inter-host plasmid dissemination patterns may have been drawn before, this is to my knowledge the first time that patterns of dissemination of plasmids have been studied at such a high-level of detail in such a well selected dataset using so many fully assembled genomes.

      Weaknesses

      One conclusion that is not entirely supported by the data is the general statement in the discussion that "cross-niche plasmid in not driven by clonal lineages". From the tanglegram, displaying the low congruence between the plasmid and chromosome core gene phylogeny in E. coli, this conclusion is probably valid for E. coli, but this not necessarily means that this is also the case for the other Enterobacterales genera and species included in this study. For these other genera, the data supporting this conclusion are not given, probably because total number of isolates for certain genera were low, or because certain niches were clearly underrepresented in certain genera.

      Thank you for reviewing our manuscript.

      We agree that this statement in the conclusion was too general, and have adapted it (lines 407-409):

      “By examining plasmid relatedness compared to bacterial host relatedness in E. coli, we demonstrated that plasmids seen across different niches are not necessarily associated with clonal lineages”

      In the limitations section of the Discussion, we have also referenced this specifically as a limitation (lines 422-424):

      “Although we evaluated four bacterial genera, 72% (1,044/1,458) of our sequenced isolates were E. coli, and so our analyses and findings are particularly focused on this species.”

      Furthermore, the BSI as well as the livestock niches were analyzed as single niches while the BSI niche included both nosocomial and community-derived BSI isolates and the Livestock niche included samples from different livestock-related hosts. Given the fact that a substantial number of plasmids were available from cattle, sheep, pigs, and poultry, it would be interesting to see whether particular livestock hosts were more frequently found in the cross-niche plasmid clusters than other livestock hosts and whether the BSI plasmids in these cross-niche clusters were predominantly of community or nosocomial origin.

      We agree that analyses which distinguish between nosocomial/community acquired BSI isolates would be interesting further work, but are beyond the scope of this study. Our analysis of the BSI/livestock cross-niche near-identical plasmid groups details the livestock hosts involved (lines 144-154). Briefly, of the n=8 BSI/livestock cross-niche groups, these involved

      • pig/poultry (1/8)

      • poultry (1/8)

      • pig (2/8)

      • sheep (3/8)

      • cattle/pig/poultry (1/8)

      We have added a note of explanation in the methods to explain how the distance threshold we use for near-identical clustering is maximally conservative at small plasmid sizes (a single SNP produces a new plasmid cluster) but remains highly conservative (tens of SNPs) at large plasmid sizes.

      We have carefully considered the point about whether particular hosts were more frequently found in cross-niche plasmid clusters. However, we do not think it is easy to infer whether a particular livestock host is represented more frequently in these cross-niche events than would be expected from chance, given the low density of the sampling.

      We have reorganised the paragraph in lines 144-154 to provide more clarity on the groups’ niches.

      “Sharing between BSI and livestock-associated isolates was supported by 8/17 cross-niche groups (n=45 plasmids). Of these, n=3/8 groups contained BSI/sheep plasmids: one group contained mobilisable Col-type plasmids, the remaining two groups contained conjugative FIB-type plasmids. Of these, one group contained plasmids carrying the AMR genes aph(3'')-Ib, aph(6)-Id, blaTEM-1, dfrA5, sul2, and the other group contained plasmids carrying the MDR efflux pump protein robA (see Materials and Methods). A further n=2/8 groups contained BSI/pig mobilisable Col-type plasmids, of which one group other carried the AMR genes aph(3'')-Ib, aph(6)-Id, dfrA14, and sul2. Lastly, n=1/8 groups contained BSI/poultry non-mobilisable Col-type plasmids, n=1/8 contained BSI/pig/poultry/influent non-mobilisable Col-type plasmids, and n=1/8 contained BSI/cattle/pig/poultry/influent mobilisable Col-type plasmids.”

      We have also added this as a limitation in the discussion (lines 424-426):

      “Additionally, we did not sample livestock-associated niches densely enough to explore individual livestock types (cattle/pigs/poultry/sheep) sharing plasmids with BSI isolates (see Appendix 1 Fig. 9).”

      We have already recognised that our culture methods may have affected our sensitivity to detect Klebsiella spp. isolates in the livestock/environmental samples – we have expanded on this to explicitly highlight that this may have affected our capacity to evaluate Klebsiella-associated plasmids (lines 443-444):

      “This limited our ability to study the epidemiology of livestock Klebsiella plasmids.”

    1. Author Response

      Reviewer #1 (Public Review):

      Although the authors have identified some properties/molecular markers of canine H3N2 influenza viruses that highlight the potential for infecting humans, it needs to be cautious to emphasize the threat of these viruses to public health. One fact is that despite the increasing prevalence of these viruses in dogs and the close proximity between dogs and humans, there is so far no report of human infection with canine H3N2 influenza viruses. The authors are wished to discuss this in their manuscript so that the readers can have a more comprehensive understanding of their findings and the public health importance of canine influenza viruses.

      We agree with the reviewer. We added the related discussion and revised some words to not emphasize the threat of these viruses to public health (lines 342-346).

      Reviewer #3 ( Public Review):

      1) The investigators should run neuraminidase inhibition assays to established the level of cross reactivity of human sera to the canine origin NA (one of reasons proposed as to the lower impact of the H3N2 pandemic was the presence of anti0N2 antibodies in the human population).

      We performed neuraminidase inhibition assays as suggested for both ferret sera against human H3N2 virus and human sera. The results showed that the NI titers of ferret sera against human H3N2 virus to canine H3N2 viruses were <10 (lines 147- 148, Supplementary file 2). Additionally, 2.0%–3.0% of the children's serum samples, 1.0%–2.0% of the adult's serum samples, and 1.0%–2.0% of the elderly adult's serum samples had NI antibody titers of ≥10 to canine origin NA (lines 158-161, Table 1, and lines 435-445).

      2) Please tone down the significance of ferret-to-ferret transmission as a predictor of human-to-human transmission. Although flu viruses that transmit among humans do show the same capacity in ferrets, the opposite is NOT always true.

      We agree with the reviewer. To tone down the significance of ferret-to-ferret transmission as a predictor of human-to-human transmission, we added the related discussion and deleted or revised some words (lines 342-346, line 37, line 302, line 308, line 322, and line 341).

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Vias and co-authors develop HGSOC PDOs and characterized their genomes, transcriptomes, drug sensitivity, and intra-tumoural heterogeneity. They show that PDOs represent the high variability in copy number genotypes observed in HGSOC patients. Drug sensitivity was reproducible compared to parental tissues and the ability of these models to grow in vivo.

      Overall, the manuscript lacks sufficient novelty. Several pieces of information and a number of conclusions that are presented here have been previously published by other groups (Nina Maenhoudt, Stem cell reports, 2020; Shuang Zhang, Cancer Discov, 2021).

      We agree that several important papers on HGSOC organoids have been published. However, we disagree about your assessment of “lacks sufficient novelty”. Our MS addresses critical questions about conservation of mechanisms of chromosomal instability, how PDOs can be selected as clinical relevant models based on patterns of CIN and their comparative drug response. These questions are vital to using PDOs for therapeutic development and have not been explored before. By contrast, Maenhoudt et al. performed many analyses on several organoids (whole-genome sequencing, whole exome sequencing) but did not analyse the relationships between copy number profiles, mutational signatures or drug sensitivity between donor tissues and derived organoids and did not perform transcriptomic or scDNA analyses. A major novelty of our approach is to provide robust clinical validation of individual HGSOC PDOs by analysing how our PDOs are statistically representative of the various CN subclasses of HGSOC. Maenhoudt et al and Zhang et al classify their models only using infrequent recurrent mutations in driver genes. We do not understand how the Zhang MS overlaps with our MS as it describes the CRISPR-engineering of mouse cells to model HGSOC and investigates drivers of the mouse tumour microenvironment.

      Reviewer #3 (Public Review):

      1) The manuscript adequately demonstrates that genomic instability is maintained in HGSOC tumourspheres. The use of 3-dimensional HGSOC models to more greatly resemble the in vivo environment has been used for more than a decade, but this is the first demonstration using a variety of genomic assessment tools to show genomic instability in the HGSOC tumoursphere model. It is clearly demonstrated that these HGSOC tumourspheres represent copy number variations similar to information in public datasets (TCGA, PAWG, BriTROC-1) and that cellular heterogeneity is present in these tumourspheres. The simple steps outlined to establish and passage tumourspheres will benefit the field to further study mechanisms of genomic instability in HGSOC.

      Thank you for these positive comments.

      2) A weakness of the manuscript is the lack of operational definitions for what constitutes an organoid and an appropriate definition to distinguish genomic instability from chromosomal instability (a distinct type of genomic instability). Line 147 states "As PDOs consist of 100% tumour cells...", although this does not appear to have been established by any assessment. This limited characterization of the 3D model is a weakness since no data is provided on whether the tumourspheres constitute only a single cell type (as indicated on line 147) or multiple cell types (e.g., HGSOC cell, mesothelial cells) using markers beyond p53 expression. Based on this information, this model cannot be called a PDO, rather it should be referred to as a tumoursphere.

      We define continuous PDO models on page 3 stating our criteria based on passage > 5 and successful reculture after thawing (previous publications have not defined whether their models are continuous or finite). As shown in our targeted-gene mutation analysis, all our PDOs contain a TP53 mutation allele fraction between 80–95%. Moreover, in our single cell DNA-Seq data we do not observe any normal copy number profiles that would indicate normal cells. This information is now included in the text for clarification. Our reasons not to use the term spheroids or tumourspheres are:

      1. The word spheroid comes from the in vitro spheroid formation assay which was originally designed to overcome the difficulties found in functional in vivo serial transplantations. This method generates colony-forming units in suspension. Our patient-derived cells are not growing in suspension but within an extra-cellular matrix.

      2. Spheroids are clonally expanded from a single-cell as part of the colony-forming assay; our patient-derived organoids were not clonally expanded in any way.

      3. Organoids derived from patient-tumours have been named PDOs in multiple publications where pure tumour cellularity was stated for the PDOs [Vlachofiannis et al. Science (2018) 359, 920; Li et al. Nat. Comm.(2018) 9, 2983; Lee et al. Cell (2018)173, 515; Kopper et al. Nat Med (2019) 25, 838]. Use of other terms will cause confusion for readers and prevent important comparisons between PDO from different researchers.

      3) Chromosome instability (CIN) is a type of genomic instability that is broadly defined as an increased rate of chromosome gains or losses and is best identified through analysis of single cells (e.g., karyotype analysis), something that bulk whole genome sequencing cannot determine since it is a reflection of cell populations and not individual cells. While the data demonstrate genomic instability is retained in the tumourspheres, and chromosome losses or copy-number amplifications were observed using single-cell whole genome sequencing, evaluation of samples from the same patient over time was not evaluated. While there is evidence to support CIN in these samples, in agreement with other published work that has demonstrated CIN in >95% of HGSOC samples analyzed at the single-cell level, this work is not conclusive. The title of the manuscript should be modified to more accurately represent what the evidence supports.

      We have discussed the ambiguity of CIN in our recent publication “A pan-cancer compendium of chromosomal instability” Drews et al Nature 2022.

      “CIN has complex consequences, including loss or amplification of driver genes, focal rearrangements, extrachromosomal DNA, micronuclei formation and activation of innate immune signalling. This leads to associations with disease stage, metastasis, poor prognosis and therapeutic resistance. The causes of CIN are also diverse and include mitotic errors, replication stress, homologous recombination deficiency (HRD), telomere crisis and breakage fusion bridge cycles, among others.

      Because of the diversity of these causes and consequences, CIN is generally used as an umbrella term. Measures of CIN either divide tumours into broad categories of high or low CIN, are restricted to a single aetiology such as HRD, are limited to a particular genomic feature such as whole-chromosome-arm changes, or can only be quantified in specific cancer types. As a result, there is no systematic framework to comprehensively characterize the diversity, extent and origins of CIN pan-cancer, or to define how different types of CIN within a tumour relate to clinical phenotypes. Here we present a robust analysis framework to quantitatively measure different types of CIN across cancer types.”

      Many authors use CIN to include the consequences of CIN and other specifically use CIN to indicate ongoing numerical and structural change. We do not think our usage of CIN in the title and text is controversial and is consistent with previous peer reviewed publications, including our own.

      4) An additional weakness is missing information (e.g., Figure 1d, Supplementary Figure 3b, and Supplementary Table 4 were not included in the manuscript; the 13 anticancer compounds used to test drug sensitivity are not indicated) making an assessment of the data impossible, and assessment of some conclusions difficult.

      We apologise for this misunderstanding as a typo suggested that there was a Figure 1d (it should have referred to Figure 1c) or Figure 1-Figure supplement 3B (the label of which was missing); we also apologise for the omission of Supplementary Table 4. These errors have been corrected and the list of compounds is now included in the Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      We would like to thank reviewer #1 for her helpful comments and would like to respond to these as follows:

      1) “Editing efficiencies were variable (99% to 0%) depending on the species, being worst for L. major.”

      It is true that the editing efficiency was different in each species and worst for L. major. However, it is important to note that these efficiencies varied not only for each species but also amongst genes and especially chosen sgRNA sequences. Variations in efficiency across sgRNAs targeting the same gene and locus is a common problem in any CRISPR approach. We made this clearer in our revised manuscript (line 670 – 673).

      2) “The use of premature termination codons also clearly raises issues for false positives and negatives, especially as there is no evidence for nonsense-mediated mRNA decay in Leishmania.”

      We have now included in our revised manuscript that it is currently unclear whether a classical nonsense-mediated decay pathway is present in Leishmania or not. If such a pathway would be present, mutant mRNAs in which a termination codon is present within the normal open reading frame would be removed (Clayton, Open Biology 2019; Delhi et al., PLoS One 2011). But if not, remaining N-terminal protein parts could be functional and may lead to false positive and negative results. However, as reviewer #2 pointed out, this may also provide extra information about functional domains of the targeted protein and highlights that our tool can not only be used to create functional null mutants by inserting premature STOP codons but also to pursue targeted mutagenesis screens (line 674 - 683).

      3) “There are already two genome-wide screening options for Leishmania, so the advantages and disadvantages of the method proposed here need to be discussed in a much more detailed and balanced way.”

      We have revised our manuscript to include in our introduction (line 36 - 73) and discussion (line 658 - 697) a better comparison of all potential tools for genome-wide screening in Leishmania, including RNAi, bar-seq and base editing screening. We highlight why we think that base editing has unique advantages.

      4) “In the "LeishGEM" project (http://www.leishgem.org) all Leishmania mexicana genes will be knocked out and each KO will be bar-coded. At the end, 170 pooled populations of 48 bar-coded mutants will be publicly available. The only real reason the authors of the current paper give for not using this approach is that it is labour-intensive. However, LeishGEM is funded and underway, with several centres involved, so that argument is weak.”

      In our original manuscript we gave multiple reasons why we think that the LeishGEdit method, which is being used for the LeishGEM screen and has been developed by the lead author of our here presented study, has clear disadvantages compared to base editing.

      As written in our original manuscript (line 709 – 716): “However, for a bar-seq screen, each barcoded mutant needs to be created individually by replacing target genes with drug selectable marker cassettes (20,21), making them extremely labour intensive and most likely “one-offs” on a genome-wide scale. Furthermore, aneuploidy in some Leishmania species can be a major challenge for gene replacement strategies as multiple rounds of transfection or isolation of clones may be required to target genes on multi-copy chromosomes. Using gene replacement approaches it is also not feasible to study multi-copy genes that have copies on multiple chromosomes. These are major disadvantages of bar-seq screening.”

      Therefore, we still think that the main disadvantage of bar-seq screening is that it is labour-intensive as each mutant needs to be created individually. The fact that LeishGEM requires five years and several research centres to knockout all genes in just one Leishmania species is proof for this argument.

      However, to clarify our position about this further, we have listed other disadvantages of the LeishGEM screen, including difficulties of sharing mutant pools between labs, possible problems in expanding mutant pools without losing uniformity, no ability to change the composition of generated pools and limited ability to distinguish between technical failures and essentiality. If any of these problems would occur, it would require a de novo generation of barcoded mutants and therefore this is an extremely labour-intensive method for large-scale screening. We also added that bar-seq screens are not feasible in Leishmania species that display extreme cases of aneuploidy, such as L. donovani (line 59 – 73).

      Despite all these disadvantages of the LeishGEdit approach for the LeishGEM project, there are of course also clear advantages, which we also point out in our introduction (line 52 – 55).

      5) “There is also a preprint describing RNAi for functional analysis in Leishmania braziliensis.”

      Although our original manuscript included the pre-print about RNAi screening in Leishmania braziliensis already (line 706-709), we understand that this deserves a stronger discussion. We have therefore highlighted now RNAi as a possible tool for genome-wide screening in selected Leishmania species in our revised introduction (line 36 - 43). However, we also argue that RNAi approaches are at the moment only available to Leishmania of the Viannia subgenus and that RNAi activity greatly varies between the species (line 36 – 43 and 665 - 669). In addition, we discuss that the use of RNAi genome-wide screens is much less specific, as usually randomly sheared genomic DNA is used to generate RNAi libraries (line 687 - 689). Since the pre-print is now published, we have replaced the pre-print publication with the peer-reviewed one.

      Reviewer #2 (Public Review):

      We would like to thank reviewer #2 for helpful comments and would like to respond to those as follows:

      1) “Line 482 - the authors wrote 'As expected, the proportion of cells showing a motility phenotype in the IFT88 targeted L. infantum population decreased further' Why is this result expected? Presumably, this is due to the fact that cells without a functional IFT system lack flagella and grow slower so can be outcompeted by faster-growing mutants. This speaks to the major caveat highlighted by the authors in the discussion and the final small-scale screen. In a population of cells, those with deleterious mutations in an essential gene or one whose disruption results in slower growth will be outcompeted by cells in which a non-deleterious mutation has occurred, which feeds into the issue of timing.”

      As the reviewer highlighted himself, deleterious mutations that result in slower growth will be outcompeted by cells in which a non-deleterious mutation has occurred. We have stated that the complete deletion of IFT88 in Leishmania mexicana has been shown to have reduced doubling time (Beneke et al., PLoS Pathogens 2019) and are therefore most likely outcompeted from the pool (line 529 – 532 and 767 - 769).

      2) “The authors show with CRK3 this process of non-deleterious mutants outcompeting deleterious mutants does result in a detectable drop in the number of parasites with specific CRK3 guides but not in those with IFT88. Is this due to the fact that the outgrowth of the non-deleterious IFT88 mutants occurs rapidly or that the mutation of the targets in IFT88 was ineffective? The data presented in Figure 5 shows that for some species at least a mutation of the IFT88 gene was possible. This might mean that for certain genes the outgrowth occurs within the first 12 days after transfections so will not be seen using this approach, without a wider study, which is beyond the scope of this manuscript it will be difficult to know.”

      As we stated in our discussion, we did not test IFT88 guides individually in L. mexicana. Therefore, the editing rate observed for the IFT88 guides in L. major and L. infantum (Fig. 5) may differ from the editing rate in L. mexicana, which is the species we used for the pooled transfection screen. It is therefore difficult to conclude why IFT88 was not depleted from the pool. This may be due to lower guide activity in L. mexicana or rapid selection of non-deleterious mutations (line 769 - 774). We are therefore planning to further optimize our system by streamlining the editing efficiency and eliminating species-specifics effects (line 735 - 745). As the reviewer highlighted, this is beyond the scope of this study.

      However, the reviewer raises a fair point about the exact timing of isolating DNA from pools, which might influence when exactly parasites with a deleterious mutation are depleted from the pool. This may differ between guides and may even be gene specific. We have added this point to our discussion (776 - 780).

      3) “The authors highlight that this base editing approach will leave potentially functional regions of the NT of proteins, which is true and may mean genes are missed. However, this may also provide extra information about the protein's function/domain structure if STOP codons in certain positions showed an effect on function whereas those in others don't.”

      We thank reviewer #2 for pointing out that functional parts of truncated proteins following base editing may actually allow to draw additional conclusions. We have included this in the manuscript (681 - 683).

    1. Author Response

      Reviewer #1 (Public Review):

      This umbrella review aims to synthesize the results of systematic reviews of the impact of the COVID-19 pandemic on various dimensions of cancer care from prevention to treatment. This is a challenging endeavor given the diversity of outcomes that can be assessed in cancer care.

      Search and review methods are good and are in line with recommendations for umbrella reviews. Perhaps one weakness of the search strategy was that only one database (Pubmed) was searched. The search strategy appears adequate, though perhaps some more search terms related to reviews and cancer could have been included. It is therefore possible that some reviews may have been missed by the search strategy.

      It is challenging to perform a good umbrella review that yields novel insights, as it is difficult to combine results from different reviews which themselves combine results from different studies with different methodologies. However, I think perhaps one of the main weaknesses of this study is that it is not clear to me what is the core objective of the umbrella review, and how analyses relate to that core objective. In other words, I do not understand based on the introduction what new information the authors are hoping to learn from their umbrella review that could not be learned from reading the individual systematic reviews, beyond a vague objective of "synthesizing" the literature. Because of this, it is not very clear to me how the data extracted and the analysis fits into the larger objectives, and what the new knowledge generated by this review is. Based on the reported results, it would appear that one of the main goals is to assess the quality of systematic reviews and of the underlying studies in the reviews, but it is hard to tell. I think there are potentially important insights this review could tell us, but the message and implications of current evidence remain for me a little confused in the current manuscript.

      We thank the reviewer for the encouraging remarks on our work, and for the useful feedback. We have now addressed all concerns as outline below.

      Reviewer #2 (Public Review):

      This umbrella review summarizes the results of systematic reviews about the impact of the COVID-19 pandemic on cancer care. PRISMA checklist is used for reporting. The literature search was performed in PubMed and systematic reviews published until November 29th, 2022 were included. The quality of included systematic reviews was appraised using the AMSTAR-2 tool and data were reported descriptively due to the high heterogeneity of 45 included studies. Based on the results of this paper, regardless of the low quality of included evidence, COVID-19 affected cancer care in many ways including delay and postponement of cancer screening, diagnosis, and treatment. Also, patients with cancer had been affected psychologically, socially, and financially during the COVID-19 pandemic.

      The main limitation of the current study is that the authors have searched only one database, which might have missed some relevant systematic reviews. Also, most of the included reviews in this paper had low and medium methodological quality.

      We thank the reviewer for this excellent remark. Guideline on umbrella reviews suggest PubMed, reference screening and an additional bibliographic database for an optimal database combination for searching systematic reviews (Goossen K et al. 2020). To follow the guidelines, and considering the specialized focused on COVID-19, in addition to Pubmed and reference screening, we also performed a search in the WHO COVID-19 Database. Furthermore, we revised the search strategy in Pubmed to include mesh terms. The search was performed by a specialized librarian with experiences in systematic review searches. Overall, we retrieve 485 new references, and found 6 new studies that met out inclusion criteria to be included in final analysis. We have now revised the manuscript to reflect the above changes, and also highlighted this as a strength of our work. In addition, we added the new detailed search strategy in the supplemental material.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe in the nematode C. elegans the effects of perturbed organization of Intermediate filaments (IFs), which form the cytoskeleton of animal cells together with actin filaments. They focus on a previously identified mutant of the kinase SMA-5, which when mutated leads to disorganized IF structure in intestinal cells of C. elegans. The authors found that the phenotypes caused by the mutated SMA-5 kinase concerning gut morphology and animal health can be reversed by removing IF network components such as the protein IFB-2. This finding is extended to other components of the IF network, which also display a certain degree of sma-5 phenotype alleviation when depleted.

      Strength:

      The finding that suppressing the intestinal phenotypes caused in sma-5 mutants can be suppressed by removing functional IF components is an interesting observation. It confirms a previous study showing that bbln-1 mutation-caused IF phenotypes can be suppressed by depleting IFB-2.

      Weakness:

      1) The finding of suppressing the intestinal phenotypes caused in sma-5 mutants can be considered a minor conceptual advancement. However, the study comes short of providing insight into the molecular processes of how deranged IF networks and its consequence can be rescued/suppressed by removing e.g. the IFB-2 filaments. Many statements concerning the relationship between SMA-5 and the IFs are based on assumptions. The study requires protein biochemical analysis to show whether SMA-5 phosphorylates the IF proteins - mainly the IFB-2 polypeptide. The relationship between SMA-5 / IFB-2 is a central aspect of this study but the main conclusions are based on the notion that IFB-2 and other IF proteins may be phosphorylated by SMA-5. Mutating putative phosphorylation sites of IFB-2 without having shown any proof that the modification occurs by SMA-5 is futile. This important open question needs to be addressed. And will allow statements whether the ifb-2(kc20) mutant allele-encoded shorter IFB-2 protein lacks phosphorylation or not.

      We have addressed the major concern of the Reviewer by performing phosphorylation analyses of IFB-2 showing that loss of SMA-5 induces phosphorylation of multiple sites throughout the IFB-2 molecule. The results are presented in new Figs. 5 and S5.

      2) No quantification of the morphological defects such as using fluorescent-labeled IF proteins as in previous studies is provided in the manuscript. The EM pictures are not sufficient to provide information on how often the IF network perturbations and morphology defects occur. Also, the rescue of the actual morphological gut defects was not quantified. The assessment of development time and arrest, body length, lifespan, oxidative stress resistance, and others should be related to intestinal tube defects. They are useful and important but are an indirect measure of intestine defects and rescue.

      We provide the requested data on IF localization and intestinal morphology in new Figs. S2 and S3, respectively.

      3) It is not clear how exactly the mutant ifb-2 allele kc20 was identified. In the Materials and methods section, the authors provide information on the specific primers for the ifb-2 locus. But how did they know that the mutation lies within this region? Was there mutation mapping or whole-genome sequencing applied?

      The requested information is included in the revised Result section (first paragraph).

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, the authors use an embedding of human olfactory perceptual data within a graph neural network (which they term principal odor map, or POM). This embedding is a better predictor of a diverse set of olfactory neural and behavior data than methods that use chemical features as a starting point to create embeddings. The embedding is also seen to be better for comparison of pairwise similarities (distances of various sorts) - the claim is that proximity of pairs of odors in the POM is predictive of their similarity in neural data from olfactory receptor neurons.

      A major strength of the paper is the conceptualization of the problem. The authors have previously described a graph neural net (GNN) to predict verbal odor descriptors from molecular features (here, a 2019 preprint is cited, but a newer related one in 2022 describing the POM is not cited). They now use the embedding created by that GNN to predict similarities in large and diverse datasets in olfactory neuroscience (which the authors have curated from published work). They show that predictions from POM are better than just generic chemical features. The authors also present an interesting hypothesis that the underlying latent structure discovered by the GNN relates to metabolic pathway proximity, which they claim accounts for the success in the prediction of a wide range of data (insect sensory neuron responses to human behavior). In addition to the creativity of the project, the technical aspects, are sound and thorough.

      There are some questions about the ideas, and the size of the effects observed.

      1) The authors frame the manuscript by invoking an analogy to other senses, and how naturalstatistics affect what's represented (and how similarity is defined). However, in vision or audition, the part of the world that different animals "look at" can be very different (different wavelengths, different textures and spatial frequencies, etc). It is still unresolved why any given animal has the particular range of reception it has. Each animal is presumably adapted for its ecological niche, which can have different salient sensory features. In vision, different animals pick different sound bandwidths or EM spectra. Therefore, it is puzzling to think that all animals will somehow treat chemicals the same way.

      Our assumption (an assumption of the broader interpretation, not of the analyses themselves) that all terrestrial animals have a correlated odor environment is certainly only true for some values of “correlated”. One could imagine, for example, that some animals are able to exploit food energy sources that humans cannot (for example, plants with high cellulose content), and that they might therefore be adapted to smell metabolic signatures of such plants, whereas humans would not be so adapted. This seems quite reasonable and there are probably many such examples. In future work they might be used to test the theory directly: representations might be more likely to differ across species on tasks when the relevant ecological niches are non-overlapping. We have updated the discussion to propose such future tests. However, it is also apparent that the odor environment overall is nonetheless highly correlated across species. Recent work (Mayhew et al, PNAS) showed that nearly all molecules that pass simple mass transport requirements (that should apply to all mammals, at the least) are likely to have an odor to humans, so it seems unlikely that the “olfactory blind spots” are intrinsically large.

      2) The performance index could be made clearer, and perhaps raw numbers shown beforeshowing the differences from the benchmark (Mordred molecular descriptor). For example, can we get a sense of how much variance in the data does it explain, what percent of the hold-out tests does it fit well, etc.?

      The performance index in Figure 1 is required to compare across different types of tasks, which are in turn dictated by the nature of the data (e.g. continuous vs categorical). Regression tasks yields an R2 value and categorical tasks yield an AUROC. We normalized and placed these on a single scale in order to show all of the tasks clearly together. We have added a table to the shared code (from link in Methods section, go to predictive_performance/data/dataset_performance_index_raw.csv) that shows the original (non-normalized) values, for both the POM and the benchmark(s) across multiple seeds and various metrics with the model hyper-parameters that generate the best performance.

      3) The "fitting" and predictions are in line with how ML is used for classification and regression inlots of applications. The end result is a better fit (prediction), but it's not actually clear whether there are any fundamental regularities or orders identified. The metabolic angle is very intriguing, but it looks like Mordred descriptor does a very good job as well (extended figure 5 [now Figure 2-figure supplement 5]). Is it possible to show the relation between metabolic distance and Mordred distance in Figure 2c? In fact, even there, cFP distance looks very well correlated with metabolic distance (we are talking about r= 0.9 vs r = 0.8). This could simply be due to a slightly nonlinear mapping between chemical similarity and perceptual similarity (which was used to get POM distance).

      We show additional “showdown” comparisons between metabolic distance, POM distance, and alternative distance metrics in the new Figure 2-figure supplement 3 and Figure 2-figure supplement 4. Indeed, the Mordred descriptors perform well; after all, metabolic reactants and products must be at least somewhat structurally related. But POM (derived only from human perceptual data) outperforms it significantly. Visual inspection of Figure 2c also reveals that the dispersion of structural distances (at a given metabolic distance) is just much higher than the dispersion of POM distances. This won’t change if one uses a non-linear curve fit, as it is a property of the data itself.

      It’s also worth noting while r=0.8 and r=0.9 might seem close, in terms of variance unexplained (1 - r2) they are approximately two-fold different. Reducing the unexplained variance by half seems like a meaningful difference. Alternatively, if one simulates scatter plots with correlation r=0.8 vs r=0.9, it is apparent that the latter is simply a much tighter relationship.

      4) How frequent are such examples shown in Fig 2d? Pentenal and pentenol are actually verysimilar in many ways, and it may be that Tanimoto distance is not a great descriptor of chemical similarity. cFP edit distance is quite small, just like metabolic distance. The thiol example on the right is much better. Also, even in Fig 2C POM vs metabolic distance, the lowest metabolic distances have large variations in the POM values - so there too, metabolic reactions that create very different molecules in 1 step can vary widely in POM distance as well.

      We agree that Tanimoto distance is not perfect. We were unable to find a measure of structural distance that agreed with human intuitions about “structural distance” in all cases; indeed that intuition is often generated by an understanding of odor/flavor characteristics of function in metabolic networks, which would beg the question! To answer the question about the frequency of examples like the ones shown in Figure 2d, we created a new density map (Figure 2-figure supplement 4) showing the number of one-step metabolite pairs for a given range of POM vs cFP edit/Tanimoto distance. We found >25 pairs of metabolites in the same “small POM distance” and “large structural distance” quadrant from which we found the original examples shown in Figure 2d..

      5) A major worry is that Mordred descriptors are doing fine, and POM offers only a smallimprovement (but statistically significant of course). Another way to ask this question is this: if you plot pairwise correlation/distance of pairs of odors from POM against that for Mordred, how correlated does this look? My suspicion is that it will be highly correlated.

      It will look highly correlated (as shown in the new Figure 2-figure supplement 3). The reason is that metabolic reactions cannot make arbitrary transformations to molecules (the reactants must have some structural relationship to the products) or similarly that olfactory receptors (in any species) cannot have arbitrary tuning – at the end of the day receptors mostly bind to similar-looking classes of molecules. As stated above, we believe that the improvement here is not just statistically significant but meaningful – a 2-fold drop in unexplained variance is large – and that it is important to identify principles by which the nervous system can be tuned, above and beyond the physical constraints imposed by basic rules of chemistry.

      Also, the metabolic distances that we constructed from available data are themselves noisy, since not all metabolic pathways and the compounds that compose them are known, which places an upper bound on the correlation that we could have obtained. Despite that, we still found a correlation of r>0.9.

      6) The co-occurrence in mixtures and close POM distance may arise from the way theembedding was done - with perceptual descriptors used as a key variable. Humans may just classify molecules that occur in a mixture as similar just from experiencing them together. Can the authors show that these same molecules in Fig 4d,e have very similar representations in neural data from insects or mice?

      We have added a new Figure 4-figure supplement 1 to show this. One constraint is that the neural datasets must contain molecules that are also in the natural substance datasets used in Figure 4. In all cases where the data is sufficient to be powered to test the hypothesis (i.e. more than five co-occuring pairs of molecules in essential oil), we observe an effect in the predicted direction.

    1. Author Response

      Reviewer #1 (Public Review):

      This work focuses on the characterization of neutralizing antibodies from humans survivors of SNV and ANDV hantavirus infections, including the mapping of epitopes located in the Gn and/or Gc glycoproteins, and their mechanism of viral interference blocking receptor binding or membrane fusion. It also confirms previous data on broadly neutralizing epitopes allowing inhibition of different hantavirus species. The work covers for the first time in vivo evidence of cross-protection against HNTV infection by a broadly neutralizing antibody prepared from SNV infection using a prophylaxis animal model and compares the data with protection from ANDV lethal challenge using ANDV-specific neutralizing antibodies. The work provides valuable information for the development of therapeutic measures that cross-protect against several hantavirus species which seems a promising strategy to rise pharmaceutical interest against a group of viruses causing orphan disease.

      The strength of the work is based on the impressive amount of work and versatility of methods to identify residues involved in the binding and/or escape from seven different neutralizing antibody clones that allow for important conclusions on species-specific antigenic regions and confirm data on a region that seems broadly conserved among different hantavirus species. At the same time, the weakness of the work is that data processing does not allow for readers data analysis (Figs. 1b, 2a, 2c, Ext. Data Fig. 4).

      The authors clearly achieve their aim of characterizing the antigenic sites of neutralizing antibodies. Yet, the presented data on binding to ANDV mutant constructs and negative-staining EM does not allow for the conclusion that the epitope of the broadly neutralizing antibodies ANDV-44 and SNV-53 involved the Gn capping loop. An alternative explanation of the escape mutations in the Gn capping loop could be produced by an allosteric effect on the Gc fusion loop region, and a role in structuring the Gc fusion loop has been previously demonstrated (References 7 and 9). In addition, it is not clear why SNV-24 has no broad neutralizing activity although escape mutations occurred at the highly conserved residues K833 and D822 in Gc domain I.

      . . . it would be important to show viral RNA levels in lungs and kidneys in the lethal ANDV animal model (Fig. 7) to allow for comparison with the prophylaxis from HTNV infection (Fig. 6).

      ANDV does not necessarily cause significant viremia but this challenge model does allow detection of substantial virus load in organs. To monitor virus in organs, a separate animal study would be required with serial euthanasia. All treated animals survived and were kept until day 28. The previous study (DOI: 10.1016/j.celrep.2021.109086) demonstrated that virus was not detected in animals that survived until day 28. Here, we would have to perform another ABSL3 animal experiment with euthanasia and harvest organs at the expected peak for viral replication to confirm this finding. We do not believe repeating such a study is justified at this point, since the key endpoint for the experiment here is survival, and the study provided clear results. Increasing the number of animals in study in order to euthanize a subset in order to collect organs on a specific day makes more sense in a drug discovery effort where a candidate drug is not expected to protect the animals but might have some impact on the virologic endpoint only (e.g., reduce viremia in blood or organs). Thus, we do not believe repeated studies are justified to obtain this additional confirmatory data point.

    1. Author Response

      Reviewer #1 (Public Review):

      Collins et al use mesoscopic two-photon imaging to simultaneously record activity from basal forebrain cholinergic or noradrenergic axons in several distant regions of the dorsal cortex during spontaneous behavior in head-fixed awake mice. They find that activity in axons from both neuromodulatory systems is closely correlated with measures of behavioral state, such as whisking, locomotion and face movements. While axons were globally correlated with these behavioral state-related metrics across the dorsal cortex, they also find evidence of behavioral state independent heterogenous signals.

      The use of simultaneous multiarea optical recordings across a large extent of dorsal cortex with single axon resolution for studying the coherence of neuromodulatory afferents across cortical areas is novel and addresses important questions regarding neuromodulation in the neocortex. The manuscript is clearly written, the data is well presented and, for the most part, carefully analyzed. Parts of the manuscript confirm previous results on the influence of behavioral state on norepinephrine and acetylcholine cortical afferents. However, the observation that these modulations are globally broadcasted to the dorsal cortex while behavioral state independent heterogenous signals are also present in these axons is novel and important for the field.

      While the evidence for a behavioral state driven global modulation of activity in both neuromodulatory systems is quite clear, I have concerns that the apparent heterogeneity in axonal responses might be driven by movement-induced artifacts. Moreover, even in the case that the heterogeneity in calcium activity across axons is confirmed, it might not be driven by differences in spiking activity across neuromodulatory axons as concluded, but by other mechanisms that are not explicitly discussed or considered.

      1) Motion artifacts are always a concern when imaging from small structures in behaving animals. This issue is addressed in the manuscript in Fig 2A-C by comparing axonal responses to "autofluorescent blebs that did not have calcium-dependent activity" (line 1011). Still, as calcium-dependent activity and motion artifacts can both be locked to behavioral variables the "bleb" selection criterion seems biased and flawed with a circular logic. "Blebs" presenting motion-induced changes in fluorescence that may pass as neural activity will be wrongly excluded when from the "bleb" control group using this criterion. This will result in an underestimation of the extent of the contamination of the GCaMP signals by movement-induced artifacts. This potential confound might generate apparent heterogeneity across axons and regions as some axons and some cortical areas might be more prone to movements artifacts than others.

      Thank you for the suggestion. We agree that motion artifacts are a reasonable concern. We rigorously addressed this concern by introducing non-calcium-dependent mCherry into cholinergic cortical axons and demonstrating that motion cannot explain our results (see Fig. 2F, Fig. 4H,L,P, Fig. 4 - figure supplement 1G, Video 3, and response above). These axons were chosen for analysis based solely on their ability to be imaged, in a manner identical to that of GCaMP6s containing axons.

      We agree that the observed evidence of heterogeneity is not as clear as the evidence of a common signal. We now carefully present our evidence. Heterogeneity may arise from variations in activity between single axons that is not explained by a common signal such as behavioral state. Heterogeneity could also be signaled by variations in correlated activity between axons. We now address these two possibilities in our manuscript. Our new analysis reveals that the correlated activity between axons is as expected for axons that are variably correlated to a common signal, such as behavioral state. Although we do find some evidence of correlation outside this common signal, we are not able to discern if this is related to imaging axon segments that are part of the same axon, or if it truly represents an independent signal. This is now stated in the text. On the other hand, strong variations in axonal activity from trial to trial that appear to be separate from the common signal is also prevalent. We now point out this variation as a possible source of heterogeneity. Since we do not know the source or meaning of this heterogeneous activity, we discuss only the possibility that it may hold behaviorally relevant information in these modulatory systems.

      2) In the case that the heterogeneity is indeed due to differences in calcium activity, it might be not due to modularity in spiking activity within the LC or the BF as interpreted and discussed in the manuscript. As calcium signaling in axons not only relates to spiking activity but can also reflect presynaptic modulations, the observed heterogeneity might be due to local action of presynaptic modulators in a context of global identical broadcasted activity. The current dataset does not allow distinguishing which of the two different mechanisms underlies the observed signal heterogeneity.

      It is true that our data set is unable to determine whether presynaptic modulations contribute to any observed heterogeneity. We have adjusted our interpretation of heterogeneity throughout the manuscript and have specifically addressed this comment in the discussion by presenting the possibility that a global signal could be locally modulated.

      Reviewer #3 (Public Review):

      Acetylcholine and Norepinephrine are two of the most powerful neuromodulators in the CNS. Recently developments of new methods allow monitoring of the dynamic changes in the activity of these agents in the brain in vivo. Here the authors explore the relationship between the dynamic changes in behavioral states and those of ACh and NE in the cortex. Since neuromodulatory systems cover most of the cortical tissue, it is essential to be able to monitor the activity of these systems in many cortical areas simultaneously. This is a daunting task because the axons releasing NE and ACh are very thin. To my knowledge, this study is the first to use mesoscopic imaging over a wide range of the cortex at the single axon resolution in awake animals. They find that almost any observable change in behavioral state is accompanied by a transient change in the activity of cortical ACh and NE axonal segments. Whisking is significantly correlated with ACh and NE. The authors also explore the spatial pattern of activity of ACh and NE axons over the dorsal cortex and find that most of the dynamics is synchronous over a wide spatial scale. They look for deviation from this pattern (which I will discuss later). Lastly, the authors monitor the activity of cortical interneurons capable of releasing ACh.

      Comments:

      1) On a broad overview, I find the discussion of behavioral states, brain states, and neuromodulation states quite confusing. To begin with, I am not convinced by the statement that "brain states or behavioral states change on a moment-to-moment basis." I find that the division of brain activity into microstates (e.g., microarousal) is counterproductive. After all, at the extreme, going along this path, we might eventually have an extremely high dimensional space of all neuronal activity, and any change in any neuron would define a new brain state. Similarly, mice can walk without whisking, can whisk without walking, can walk and whisk, are all these different behavioral states? And if so, are they all associated with different brain states? And if so, are they all associated with different brain states? Most importantly, in the context of this manuscript, one would expect that different states (brain, behavior) would be associated with at least four potential states of the ACh x NE system (high ACh and High NE, High ACh and Low NE, etc.). However, the reported findings indicate that the two systems are highly synchronized (or at least correlated), and both transiently go on with any change from a passive state to an active state. Therefore, the manuscript describes a rather confined relationship of the neuromodulation systems with the rather rich potential of brain and behavioral states. Of course, this is only my viewpoint, and the authors are not obliged to accept it, but they should recognize that the viewpoint they take for granted is not shared by all and consider acknowledging it in the manuscript.

      We thank this reviewer for this thoughtful comment. While it is clear that animals do in fact exhibit distinct and clear brain and behavioral states (e.g. sleep, waking, grooming, still, walking, etc.), it is beyond the scope of the present manuscript to attempt to tackle this complex field - rather, we refer the reader to a recent review that we have published on this important topic (McCormick, Nestvogel, and He 2020). We agree that properly delineating brain and behavioral states is of great importance, as it could significantly impact experimental design and interpretation of results. Since all of the relevant substates that a mouse may exhibit have not yet been determined, we decided to use changes in whisking and walking behaviors to differentiate between distinct behavioral states owing to: 1) historical use of these measures in behavioral and neural states in head-fixed mice, 2) relative ease of measurement of these variables, 3) a clearly observable relationship with cholinergic and noradrenergic activity with these measures of behavior, and, arguably most importantly, 4) assumed relevance to the animal (Musall et al. 2019; Reimer et al. 2016; Salkoff et al. 2020; Stringer et al. 2019).

      Our manuscript seeks to simply relate the activity of cholinergic and noradrenergic axons across the dorsal surface of the cortex in comparison to these commonly used measures of spontaneous behavior in head-fixed mice to discern to what relative degree there are common, global signals in these two modulatory systems and how they relate to changes in the measured behaviors. Somewhat surprisingly, previous studies have found that neural activity throughout the dorsal cortex of mice is strongly related to movements of the face and body as well as behavioral arousal (Stringer et al. 2019; Musall et al. 2019; Salkoff et al. 2020). Here we determine to what degree these commonly used measures of “state” are already reflected in the GCaMP6s activity of cholinergic and noradrenergic axons (and local cortical interneurons).

      We agree with the interpretation that our results suggest a confined relationship between spontaneous cholinergic and noradrenergic activity in the cortex within the spontaneous behaviors that we observe. We, by no means, mean to suggest that this confined relationship is the only relationship cholinergic and noradrenergic systems exhibit to each other or to behavior. It seems very likely that in the wide variety of behavior exhibited by freely moving mice in their lifetime, there are times in which the activity of cholinergic and noradrenergic systems exhibit a radically different relationship to each other and to behavior. We simply cannot know this without experimental examination. We now mention this possibility in the discussion and give a few appropriate references.

      2) Most of the manuscript (bar one case) reports nearly identical dynamics of ACh and NE. Is that a principle? What makes these systems behave so similarly? Why have two systems that act nearly the same? Still, if there is a difference, it is the time scale of the ACh compared to the NE. Can the authors explain this difference or speculate what drives it?

      Perhaps one of the most striking findings in recent years from examination of mouse brain activity is the prominence and prevalence of a general signal in nearly all neural systems that relates to movement and arousal of the animal (Stringer et al. 2019; Salkoff et al. 2020). Here we report that this signal is also strongly present within the cholinergic and noradrenergic systems. Perhaps this is unsurprising, since everywhere one looks, one finds this global signal. However, we feel that understanding the presence and nature of this large signal is critical to deciphering behavior-related signals in these systems in the future. We discuss this point in the discussion. The one difference we did find is in the more transient nature of NE axonal activity versus both behavior and cholinergic axon activity. We now speculate on this difference in the discussion.

      3) Whisker activity explains most strongly the neuromodulators dynamics, but pupil dilation almost does not (in contrast to many previous reports including reports of the same authors). If I am not mistaken, this was nearly ignored in the presentation of the results and the discussion section. Could the author elaborate more on what is the reason for this discrepancy?

      We apologize for the misleading presentation of our results. In Fig. 3C and D it is clear that pupil diameter is highly coherent with both cholinergic and noradrenergic axon activity, as published previously. In the present study, this coherence peaks at 0.4 to 0.5 for both. In our previous study (Reimer et al. 2016), the cholinergic activity also peaked in coherence at low frequencies at around 0.4 to 0.5 (Reimer et al., Fig. 1H) while the noradrenergic activity coherence peaked at 0.6 to 0.7. The present study was not optimized for pupil diameter examination, since we kept the light levels as low as possible (resulting in low dynamic range of pupil dilations since they were nearly always enlarged to near maximum) in order to increase the S/N of cortical axon activity. We now mention these similarities and differences and caveats in the manuscript. An additional important point is that the kinetics of pupil diameter changes are slow in comparison to whisker movements, reducing the ability of pupil dilation to accurately track changes in axonal activity at frequencies greater than approximately 0.2 Hz (Fig. 2 - figure supplement 2). This is now mentioned in the text.

      4) I find the question of homogenous vs. heterogenous signaling of both the ACh and NE systems quite important. It is one thing if the two systems just broadcast "one bit" information to the whole brain or if there are neuromodulation signals that are confined in space and are uncorrelated with the global signal. However, the way the analysis of this question is presented in the manuscript is very difficult to follow, and eventually, the take-home message is unclear. The discussion section indicates that the results support that beyond a global synchronized signal, there is a significant amount of heterogeneous activity. I think this question could benefit from further analysis. I suggest trying to demonstrate more specific examples of axonal ROIs where their activity is decorrelated with the global signal, test how consistent this property is (for those ROIs), and find a behavioral parameter that it predicts.

      Also, in the discussion part, I am missing a discussion of the potential mechanism that allows this heterogeneity. On the one hand, an area may receive NE/ACh innervation from different BF/LC neurons, which are not completely synchronized. But those neurons also innervate other areas, so what is the expected eventual pattern? Also, do the results support neuromodulation control by local interneuron circuits targeting the axons (as is the case with dopaminergic axons in the Basal Ganglia)?

      Our results clearly demonstrate a robust global signal that is common across cholinergic and noradrenergic axons which is related to behavioral state. We have less strong, but still present, evidence for a heterogeneous signal in addition to this global signal. This evidence is based largely upon the large variation in activities in different axon segments during behavioral events that appear similar. This result suggests that the axon segments we monitored do not all act as if they are members of the same axon. We now discuss the strong evidence for the global signal present in our data, and leave open the possibility of a heterogeneous signal whose mechanisms and importance remains to be determined.

      5) The axonal signal seems to be very similar across the cortex. I am not sure this is technically possible, but given that NE axons are thin and non-myelinated and taking advantage of the mesoscopic scale, could the author find any clue for the propagation of the signal on the rostral to caudal axis?

      We were unable to detect propagation across the cortical sheet and believe this is beyond the scope of the present study.

      6) While the section about local VCIN is consistent with the story, it is somehow a sidetrack and ends the manuscript on the wrong note. I leave it to the authors to decide but recommend them to reconsider if and where to include it. Unfortunately, the figure attached was on a very poor resolution, and I could not look into the details, so I am afraid that I could not review this section properly.

      We believe this adds to the manuscript and therefore have decided to include this data.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors aim to identify the cell state dynamics and molecular mechanisms underlying melanocyte regeneration in zebrafish. By analyzing thousands of single-cell transcriptomes over regeneration in both wild-type and Kit mutant animals, they provide thorough and convincing evidence of (1) two paths to melanocyte regeneration and (2) that Kit signaling, via the RAS/MAPK pathway, is a key regulator of this process. Finally, the authors suggest that another proliferative subpopulation cells, expressing markers of a separate pigment cell type, constitute an additional population of progenitors with the ability to contribute to melanocytes. The data supporting this claim are not as convincing, and the authors failed to show that these cells did indeed differentiate into melanocytes. Despite the challenges of describing this third cell state, this study offers compelling new findings on the mechanisms of melanocyte regeneration and provides paths forward to understanding why some animals lack this capacity.

      The majority of the main conclusions are well supported by the data, but one claim, in particular, should be revisited by the authors.

      (1) Provided evidence that the aox5(hi)mitfa(lo) population of cells contributes to melanocyte regeneration is inconclusive and somewhat circumstantial. First, the transcriptional profiles of these cells are much more consistent with the xanthophore lineage. Indeed, xanthophores have been shown to express mitfa (in embryos in Parichy, et al. 2003 (PMID: 10862741), and in post-embryonic cells in Saunders, et al. 2019). Second, while the authors address this possibility in Supplemental figure 7 by showing that interstripe xanthophores fail to divide following melanocyte ablation, they fail to account for the stripe-resident xanthophores/xanthoblasts. The presence and dynamics of aox5+ stripe-resident xanthophores/xanthoblasts are detailed in McMenamin, et al., 2014 (PMID: 25170046) and Eom, et al., 2015 (PMID: 26701906). Without direct evidence that the symmetrically-dividing, aox5+ cells measured in this study do indeed differentiate into melanocytes, it is more likely that these cells are a dividing population of xanthophores/xanthoblasts. The authors should revise their claims accordingly.

      We agree with the editor and reviewers that the identities of the mitfa+aox5hi cells and the interplay between these cells and the mitfa+aox5lo cells is a fascinating, and originally unexpected, aspect of this manuscript. The issue, as we see it, is whether mitfa+aox5hi cells that arise via cell division during regeneration are multipotent pigment cell progenitors or ‘cryptic’ xanthophores. The experiments we have performed to address this ambiguity have not worked for technical reasons, so we have tempered text in the relevant Results and Discussion sections to leave both options open. We have backed off from calling these cells progenitors but have included additional data showing that they (i.e. the mitfa+aox5hi subpopulation of cells that we believe are daughters of mitfa+aox5hi cycling cells) express multiple markers associated with multipotent pigment cell progenitors that have been characterized in developing zebrafish. Our expanded Discussion is as follows:

      “Heterogeneity may also be evident by the additional mitfa+aox5hi G2/M adj subpopulation that likely arises via cell divisions during regeneration. There are reasons to think that this could be a progenitor subpopulation. Firstly, these cells arose in response to specific ablation of melanocytes. Secondly, this subpopulation expresses markers that are associated with multipotent pigment progenitors cells found during development (Budi, et al., 2011; Saunders, et al., 2019). Thirdly, although this subpopulation expresses aox5 and some other markers associated with xanthophores, we showed that differentiated xanthophores are not ablated by the melanocyte-ablating drug neocuproine and this mitfa+aox5hi subpopulation does not make new pigmented xanthophores following neocuproine treatment. However, current observations cannot definitively determine the potency and fates adopted by these cells. One possibility is that these cells are indeed progenitors that arise through cell divisions, are in an as yet undefined way lineally related to MP-0 and MP-1 subpopulations, and ultimately give rise to new melanocytes during additional rounds of regeneration. Given their expression of markers associated with multipotent pigment cell progenitors, these cells could be multipotent but fated toward the melanocyte lineage following melanocyte-specific ablation. However, we cannot exclude the possibility that these cells are another cell type. For example, there is a type of partially differentiated xanthophores that populate adult melanocyte stripes (McMenamin, et al., 2014). At least some of these cells arise from embryonic xanthophores that transitioned through a cryptic and proliferative state (McMenamin, et al., 2014). That the descendants remain partially differentiated could indicate that they are in more of a xanthoblast state and maintain proliferative capacity (Eom, et al., 2015). It is possible that some or all of the cells in question are melanocyte stripe-resident, partially-differentiated xanthophores that arise: a) from cell divisions that are triggered by loss of interactions with melanocytes or, b) simply to fill space that is vacated due to melanocyte death. Such causes for partially-differentiated xanthophore divisions have not been documented, but nonetheless this possibility must be considered given the mitfa and aox5 expression and proliferative potential of these cells. Transcriptional profiles of ‘cryptic’ xanthophores are not available to help clarify the nature of these cells. Lastly, the relationship between adult progenitor populations – MP-0, MP-1 and, potentially, mitfa+aox5hi G2/M adj – and other progenitors present at earlier developmental stages is unclear and could be defined through additional long-term lineage tracing studies. In particular, previous examinations of pigment cell progenitors in developing zebrafish have identified dorsal root ganglion-associated pigment cell progenitors in larvae that contribute to adult pigmentation patterns (Singh, et al., 2016; Dooley, et al., 2013; Budi, et al., 2011). It is possible that these cells give rise to the adult progenitors we have identified. The further alignment of cell types that have been observed in vivo and cell subpopulations defined through expression profiling is a necessary route for understanding the complex relationship between stem and progenitor cells in development, homeostasis, and regeneration.”

      (1) At line 140, it is noted that Xanthophores are pteridine-producing, but they also get their yellow color from carotenoids (especially in adults). This should be noted as well, especially since the authors display the xanthophore marker, scarb1, which plays a key role in xanthophore carotenoid coloration.

      [Mapping expression levels onto UMAP space for scarb1 and perhaps other markers of xan, irid, or proliferation would be helpful as a supplement to the dot plot in Fig 1 and could help to clarify the transcriptomic signature of mitfa+ aox5-hi cells and plausibility of the model that they are an McSC population. -Parichy]

      We thank the reviewer for the suggestion, and we have changed the text to include the carotenoid coloration facts of xanthophores as follows:

      “aox5 is expressed in differentiated xanthophores, a pteridine- and carotenoid-producing pigment cell type of zebrafish, and in some undifferentiated pigment progenitor cells”

      Additionally, we have also added a new Figure Supplement to Figure 1 (Figure 1 – figure supplement 3) with feature plots demonstrating the expression of xanthophore markers scarb1 and bco2b, iridophore markers lypc and cdh11, and proliferation markers pcna and mki67. As noted above, there is some heterogeneity within the large grouping of mitfa+aox5hi cells. Whereas some markers associated with xanthophores are broadly expressed in this grouping (e.g. scarb1), others have more restricted expression (e.g. bco2b). The heterogeneity could reflect multiple differentiation states of xanthophores, multiple types of differentiated xanthophores, xanthophore progenitors and/or less fate-restricted pigment cell progenitors that cluster in this grouping.

      (2) The authors should provide the list of genes that comprise their cluster signatures (line 252) as part of the supplementary tables.

      We have now included a table of genes in the cluster signatures. The Supplementary Table is called “Supplementary File 2.”

      (3) The authors should more clearly describe how they performed lineage tracing (line 339). Additionally, for the corresponding figure 4E, the authors should list the number of cells traced. The source data only contains calculated percentages rather than counts for each type of differentiation. My understanding is that the number listed in the figure legend is the number of fish (i.e. n = 4), but this should be clarified as well.

      [A supplementary figure of labeled cells is important here with enough context to show that cells can be re-identified unambiguously. Additionally note that "lineage tracing" will typically be assumed to mean single-cell labeling and tracking, so if that is not the case for these experiments it would be preferable to use an alternative descriptor. -Parichy]

      We have included additional detail in our revised manuscript. In Figure 4E we now include the number of cells imaged and have included a breakdown of the raw numbers in the Source Data. We have also included Supplementary Animations as examples of the single-cell tracing that we perform through serial imaging.

      Additionally, the point about using ‘lineage tracing’ is well taken. We have replaced this with ‘serial imaging’ through the text.

      (4) Line 321, the authors list the mean regeneration percentages for the kita and kitlga(lf) mutants, but these differences are not significantly different according to Figure 4B. By listing the means (which should be noted), the authors seem to be highlighting the differences but then do not comment on them. The description and integration of this result into the main text should be clarified.

      We have changed the wording in the text to clarify that the mean percentage is being listed. We have also reworded the text to de-emphasize the mean percentage difference between kita(lf) and kitlga(lf) mutants, instead highlighting that their defects are similar. In the figure legend we have clarified that the mean percentage regeneration is being shown.

      (5) In Figure 6E, the RNA-velocity result is not particularly consistent with the authors' claims. Visually, the arrows seem fairly randomly directed. The data in 6B, showing gene expression associated with the S phase and G2/M phase much more clearly convey the directionality of the loop (S phase, followed by G2/M). I suggest that the authors weaken their claim about the RNA-velocity result or remove it altogether and focus on the cell cycle-related gene expression signatures.

      We thank the reviewer for their careful eye here. We have decided to remove the RNA-velocity result previously displayed in Figure 6E. As the reviewer points out the results are more clearly demonstrated by Figure 6B.

    1. Author Response

      Reviewer #1 (Public Review):

      This study addresses the role of the general transcription factor TBP (TATA-binding protein), a subunit of the TFIID complex, in RNA polymerase II transcription. While TBP has been described as a key component of protein complexes involved in transcription by all three RNA polymerases, several previous studies on TBP loss of function and on the function of its TRF2 and TRF3 paralogues have questioned its essential role in RNA polymerase II transcription. This new study uses auxin induced TBP degradation in mouse ES cells to provide strong evidence that its loss does not affect ongoing polymerase II transcription or heat-shock and retinoic acid-induced transcription activation, but severely inhibits polymerase III transcription. The authors coupled TBP degradation with TRF2 knock out to show that it does not account for the residual TBP-independent transcription. Rather the study provides evidence that TFIID can assemble and is recruited to promoters in the absence of TBP.

      All together the study provides compelling evidence for TBP-independent polymerase II transcription, but a better characterization of the residual TFIID complex and recruitment of other general transcription factors to promoters would strengthen the conclusions.

      We thank the reviewer for their accurate summary of our findings and the public assessment of our manuscript.

      Reviewer #2 (Public Review):

      The paper is intriguing, but to me, a main weakness is that the imaging experiments are done with overexpressed protein. Another is that the different results for the different subunits of TFIID would indicate that there are multiple forms of TFIID in the nucleus, which no one has observed/proposed before. Otherwise, the experimental data would have to be interpreted in a more nuance way. Additionally, there is no real model of how a TBP-depleted TFIID would recruit Pol II. Do the authors suggest that when TBP is present, it is not playing a role in Pol II transcription, despite being at all promoters? Or that in its absence an alternative mechanism takes over? In the latter case, are they proposing that it is just based on the rest of TFIID? How? The authors do not provide a mechanistic explanation of what is actually happening and how Pol II is being recruited to promoters.

      We thank the reviewer for their public review of our manuscript. Although the reviewer poses many interesting questions raised from our findings, they would be a great focus for future directions.

      We agree that our imaging experiments using over-expressed constructs have limitations. Though they provide insight that is unique and orthogonal to the genomics analyses, we agree that they are still preliminary, and therefore we have removed them from the manuscript, with the hope of further developing these experiments into a follow-up manuscript.

      While we cannot exclude different forms of TFIID in the cell, previous studies have identified different TAF-containing complexes. Indeed, we referenced several of these studies in our manuscript, including TFTC/SAGA. Furthermore, in our Discussion section, we speculated how a large multi-subunit complex like TFIID may not behave as a monolith but rather have distinct dynamics/behavior among the subunits. Some studies are now revealing that biochemically defined complexes behave more as a hub, with subunits having distinct dynamics coming in and out of the complex, but in a way such that a snapshot at any given time would show a stably formed complex.

      What TBP does for Pol II is an intriguing question, and one that we had thought we could answer with our rapid depletion system. One possibility is that Pol II initiation has evolved to have so many redundant mechanisms such that removal of one factor (TBP) would not disrupt the whole system. And yet, TBP remains a highly essential gene (perhaps mostly for its essential role in Pol III transcription), and therefore, its binding to Pol II gene promoters has been maintained, almost in a vestigial way. Of course, this is speculative, and our rapid depletion system only shows us that TBP is not required for Pol II transcription, not what it does when it binds to promoters.

      Lastly, we believe that our study tested 3 potential mechanisms that could explain TBP-independence for Pol II transcription. 1) We tested the possibility that TBP is only needed for induction and not for subsequent re-initiation. We provide evidence using two orthogonal induction systems that this is not the case. 2) We tested whether the TRF2 paralog could functionally replace TBP, and show that this is also not the case. 3) We show that TFIID can form in the absence of TBP. While we agree that there are more mechanisms to test, addressing all of them would require a re-examination of over 50 years of research that would not be feasible to report in a single manuscript, especially for a system as complex as Pol II initiation.

      Reviewer #3 (Public Review):

      In this study, the authors set out to study the requirement of the TATA binding protein (TBP) in transcription initiation in mESCs. To this end they used an auxin inducible degradation (AID) system. They report that by using the AID-TBP system after auxin degradation, 10-20% of TBP protein is remaining in mESCs. The authors claim that as, the observed 80-90% decrease of TBP levels are not accompanied by global changes in RNA polymerase II (Pol II) chromatin occupancy or nascent mRNA levels, TBP is not required for Pol II transcription. In contrast, they find that under similar TBP-depletion conditions tRNA transcription and Pol III chromatin occupancy were impaired. The authors also asked whether the mouse TBP paralogue, TBPL1 (also called TRF2) could functionally replace TBP, but they find that it does not. From these and additional experiments the authors conclude that redundant mechanisms may exist in which TBP-independent TFIID like complexes may function in Pol II transcription.

      The major strengths of this manuscript are the numerous genome-wide investigations, such as many different CUT&Tag experiments, and NET-seq experiments under control and +auxin conditions and their analyses. Weaknesses lie in some experimental setups (i.e. overexpression of Halo-tagged TAFs), mainly in the overinterpretation (or misinterpretation) of the data and in the lack of a fair discussion of the obtained data in comparison to observations described in the literature. As a result, very often the interpretation of data does not fully support the conclusions. Nevertheless, the findings that 80-90% decrease in cellular TBP levels do not have a major effect on Pol II transcription are interesting, but the manuscript needs some tuning down of many of the authors' very strong conclusions, correcting several weaker points and with a more careful and eventually more interesting Discussion.

      We thank the reviewer for their public review of our manuscript. We would like to add that, in addition to testing the TBP paralog for redundancy, we also tested a mechanism in which TBP would be required for the initial round of transcription but not for subsequent ones. We show that data from orthogonal experiments that this mechanism is not the case. As in our response to Reviewer 2, we agree that our over-expression imaging experiments are still somewhat preliminary, and therefore we have removed these experiments and potential over/misinterpretation of these results from the manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      This manuscript by Pendse et al aimed to identify the role of the complement component C1q in intestinal homeostasis, expecting to find a role in mucosal immunity. Instead, however, they discovered an unexpected role for C1qa in regulating gut motility. First, using RNA-Seq and qPCR of cell populations isolated either by mechanical separation or flow cytometry, the authors found that the genes encoding the subunits of C1q are expressed predominantly in a sub-epithelial population of cells in the gut that Cd11b+MHCII+F4/80high, presumably macrophages. They support this conclusion by analyzing mice in which intestinal macrophages are depleted with anti-CSF1R antibody treatment and show substantial loss of C1qa, b and c transcripts. Then, they generate Lyz2Cre-C1qaflx/flx mice to genetically deplete C1qa in macrophages and assess the consequences on the fecal microbiome, transcript levels of cytokines, macromolecular permeability of the epithelial barrier, and immune cell populations, finding no major effects. Furthermore, provoking intestinal injury with chemical colitis or infection (Citrobacter) did not reveal macrophage C1qa-dependent changes in body weight or pathogen burden.

      Then, they analyzed C1q expression by IHC of cross-sections of small and large intestine and find that C1q immunoreactivity is detectable adjacent to, but not colocalizing with, TUBB3+ nerve fibers and CD169+ cells in the submucosa. Interestingly, they find little C1q immunoreactivity in the muscularis externa. Nevertheless, they perform RNA-sequencing of LMMP preparations (longitudinal muscle with adherent myenteric plexus) and find a number of changes in gene ontology pathways associates with neuronal function. Finally, they perform GI motility testing on the conditional knockout mice and find that they have accelerated GI transit times manifesting with subtle changes in small intestinal transit and more profound changes in measures of colonic motility.

      Overall, the manuscript is very well-written and the observation that macrophages are the major source of C1q in the intestine is well supported by the data, derived from multiple approaches. The observations on C1q localization in tissue and the strength of the conclusions that can be drawn from their conditional genetic model of C1qa depletion, however, would benefit from more rigorous validation.

      1) Interpretation of the majority of the findings in the paper rest on the specificity of the Lyz2 Cre for macrophages. While the specificity of this Cre to macrophages and some dendritic cells has been characterized in the literature in circulating immune cells, it is not clear if this has been characterized at the tissue level in the gut. Evidence demonstrating the selectivity of Cre activity in the gut would strengthen the conclusions that can be drawn.

      As indicated by the reviewer, Cre expression driven by the Lyz2 promoter is restricted to macrophages and some myeloid cells in the circulation (Clausen et al., 1999). To better understand intestinal Lyz2 expression at a cellular level, we analyzed Lyz2 transcripts from a published single cell RNAseq analysis of intestinal cells (Xu et al., 2019; see Figure below). These data show that intestinal Lyz2 is also predominantly expressed in gut macrophages with limited expression in dendritic cells and neutrophils.

      Figure. Lyz2 expression from single cell RNAseq analysis of mouse intestinal cells. Data are from Xu et al., Immunity 51, 696-708 (2019). Analysis was done through the Single Cell Portal, a repository of scRNAseq data at the Broad Institute.

      Additionally, our study shows that intestinal C1q expression is restricted to macrophages (CD11b+MHCII+F4/80hi) and is absent from other gut myeloid cell lineages (Figure 1E-H). This conclusion is supported by our finding that macrophage depletion via anti-CSF1R treatment also depletes most intestinal C1q (Figure 2A-C). Importantly, we found that the C1qaDMf mice retain C1q expression in the central nervous system (Figure 2 – figure supplement 1). Thus, the C1qaDMf mice allow us to assess the function of macrophage C1q in the gut and uncouple the functions of macrophage C1q from those of C1q in the central nervous system.

      2) Infectious and inflammatory colitis models were used to suggest that C1qa depletion in Lyz2+ lineage cells does not alter gut mucosal inflammation or immune response. However, the phenotyping of the mice in these models was somewhat cursory. For example, in DSS only body weight was shown without other typical and informative read-outs including colon length, histological changes, and disease activity scoring. Similarly, in Citrobacter only fecal cfu were measured. Especially if GI motility is accelerated in the KO mice, pathogen burden may not reflect efficiency of immune-mediated clearance alone.

      We have added additional results which support our conclusion that C1qaDMf mice do not show a heightened sensitivity to acute chemically induced colitis. In Figure 3 – figure supplement 1 we now show a histological analysis of the small intestines of DSS-treated C1qafl/fl and C1qaΔMφ mice. This analysis shows that C1qaDMf mice have similar histopathology, colon lengths, and histopathology scores following DSS treatment. Likewise, our revised manuscript includes histological images of the colons of Citrobacter rodentium-infected C1qafl/fl and C1qaΔMφ mice showing similar pathology (Figure 3 – figure supplement 2).

      3) The evidence for C1q expression being restricted to nerve-associated macrophages in the submucosal plexus was insufficient. Localization was shown at low magnification on merged single-planar images taken from cross-sections. The data shown in Figure 4C is not of sufficient resolution to support the claims made - C1q immunoreactivity, for example, is very difficult to even see. Furthermore, nerve fibers closely approximate virtually type of macrophage in the gut, from those in the lamina propria to those in the muscularis….Finally, the resolution is too low to rule out C1q immunoreactivity in the muscularis externa.

      Similar points were raised by Reviewer 2. Our original manuscript claimed that C1q-expressing macrophages were mostly located near enteric neurons in the submucosal plexus but were largely absent from the myenteric plexus. However, as both Reviewers have pointed out, this conclusion was based solely on our immunofluorescence analysis of tissue cross-sections.

      To address this concern we further characterized C1q+ macrophage localization by performing a flow cytometry analysis on macrophages isolated from the mucosa (encompassing both the lamina propria and submucosa) and the muscularis, finding similar levels of C1q expression in macrophages from both tissues (Figure 4 – figure supplement 1 in the revised manuscript). Although the mucosal macrophage fraction encompasses both lamina propria and submucosal macrophages, our immunofluorescence analysis (Figure 4 B and C) suggests that the mucosal C1q-expressing macrophages are mostly from the submucosal plexus. This observation is consistent with the immunofluorescence studies of CD169+ macrophages shown in Asano et al., which suggest that most C169+ macrophages are located in or near the submucosal region, with fewer near the villus tips (Fig. 1e, Nat. Commun. 6, 7802).

      Most importantly, our flow cytometry analysis indicates that the muscularis/myenteric plexus harbors C1q-expressing macrophages. To further characterize C1q expression in the muscularis, we performed RNAscope analysis by confocal microscopy of the myenteric plexus from mouse small intestine and colon (Figure 4D). The results show numerous C1q-expressing macrophages positioned close to myenteric plexus neurons, thus supporting the flow cytometry analysis. We note that although the majority of C1q immunofluorescence in our tissue cross-sections was observed in the submucosal plexus, we did observe some C1q expression in the muscularis by immunofluorescence (Figure 4B and C). We have rewritten the Results section to take these new findings into account.

      Is the 5um average on the proximity analysis any different for other macrophage populations to support the idea of a special relationship between C1q-expressing macrophages and neurons?

      We agree that the proximity analysis lacks context and have therefore removed it from the figure. The other data in the figure better support the idea that C1q+ macrophages are found predominantly in the submucosal and myenteric plexuses and that they are closely associated with neurons at these tissue sites.

      There are many vessels in the submucosa and many associated perivascular nerve fibers - could the proximity simply reflect that both cell types are near vessels containing C1q in circulation?

      Our revised manuscript includes RNAscope analysis showing C1q transcript expression by macrophages that are closely associated with enteric neurons (Figure 4D). These findings support the idea that the C1q close to enteric neurons is derived from macrophages rather than from the circulation.

      4) A major disconnect was between the observation that C1q expression is in the submucosa and the performance of RNA-seq studies on LMMP preparations. This makes it challenging to draw conclusions from the RNA-Seq data, and makes it particularly important to clarify the specificity of Lyz2-Cre activity.

      Our revised manuscript provides flow cytometry data (Figure 4 – figure supplement 1) and RNAscope analysis (Figure 4D) showing that C1q is expressed in macrophages localized to the myenteric plexus. This accords with the results of our RNAseq analysis, which indicates altered LMMP neuronal function in C1qa∆Mφ mice (Figure 6A and B). Since neurons in the myenteric plexus are known to govern gut motility, it also helps to explain our finding that gut motility is accelerated in C1qa∆Mφ mice.

      Finally, the pathways identified could reflect a loss of neurons or nerve fibers. No assessment of ENS health in terms of neuronal number or nerve fiber density is provided in either plexus.

      Reviewers 1 and 2 also raised this point. Our revised manuscript includes a comparison of the numbers of enteric neurons in C1qafl/fl and C1qaΔMφ mice. There were no marked differences in neuron numbers in C1qaDMf mice when compared to C1qafl/fl controls (Figure 5A and B). There were also similar numbers of inhibitory (nitrergic) and excitatory (cholinergic) neuronal subsets and a similar enteric glial network (Figure 5C-E). Thus, our data suggest that the altered gut motility in the C1qaΔMφ mice arises from altered neuronal function rather than from an overt loss of neurons or nerve fibers. This conclusion is further supported by increased neurogenic activity of peristalsis (Figure 6H and I), and the expression of the C1q receptor BAI1 on enteric neurons (Figure 6 – figure supplement 4).

      5) To my knowledge, there is limited evidence that the submucosal plexus has an effect on GI motility. A recent publication suggests that even when mice lack 90% of their submucosal neurons, they are well-appearing without overt deficits (PMID: 29666241). Submucosal neurons, however, are well known to be involved in the secretomotor reflex and fluid flux across the epithelium. Assessment of these ENS functions in the knockout mice would be important and valuable.

      Our revised manuscript provides new data showing C1q expression by muscularis macrophages in the myenteric plexus. We analyzed muscularis macrophages by flow cytometry and found that they express C1q (Figure 4 – figure supplement 1). These findings are further supported by RNAscope analysis of C1q expression in wholemounts of LMMP from small intestine and colon (Figure 4D and E). These results are thus consistent with the increased CMMC activity and accelerated gut motility in the C1qaDMf mice. As suggested by the reviewer, our finding of C1q+ macrophages in the submucosal plexus indicates that C1q may also have a role controlling the function of submucosal plexus neurons. We are further exploring this idea through extensive additional experimentation. Given the expanded scope of these studies, we are planning to include them in a follow-up manuscript.

      6) Immune function and GI motility can be highly sex-dependent - in all experiments mice of both sexes were reportedly used but it is not clear if sex effects were assessed.

      This is a great point, and as suggested by the reviewer we indeed did encounter differences between male and female mice in our preliminary assays of gut motility. We therefore conducted our quantitative comparisons of gut motility between C1qafl/fl and C1qaDMf mice in male mice and now clearly indicate this point in the Materials and Methods.

    1. Author Response

      Reviewer #3 (Public Review):

      Dominant pathogenic variants of the Aac2/Ant1 ATP transporter cause disease by an unknown mechanism. In this manuscript the authors aim to reveal how these gain of function mutants impair cellular and mitochondrial health. To characterize the phenotype of Aac2 mutants in yeast, the authors use a series of single and double Aac2 mutations, within the 2nd and 3rd transmembrane domains that are associated with human diseases. Aac2A128P,A137D mutant, which caused high toxicity and damaged the mitochondrial DNA was selected for further analysis. This mutant was not imported efficiently into mitochondria and exhibited an increased association with TOM, suggesting that it clogs the TOM translocase. As a result, expression of Aac2A128P,A137D led to impaired import of other mitochondrial proteins. Several findings suggested that the single mutant Aac2A128P impaired mitochondrial import in a similar manner: 1. Mass spec analysis revealed its increased association with cytosolic chaperones, TOM and TIM22 subunits, 2. Aac2A128P overexpression led to global mitochondrial protein import deficiency, demonstrated by HSP60 precursor accumulation and activation of stress responses (transcription of chaperons, proteosome induction, and CIS1). Parallel mutants of human Ant1 (AntA114P and Ant1A114P,A123D) were ectopically expressed in HeLa cells. The mutants were demonstrated to clog TOM and cause a global defect in mitochondrial protein import. This was confirmed in tissues from Ant1A114P,A123D/+ knock-in mice. The Ant1A114P,A123D/+ mice exhibited decreased maximal mitochondrial respiration in muscles. Examination of the skeletal muscle myofiber diameter and COX and SDH activity revealed that Ant1A114P,A123D expression in heterozygous mice acts dominantly and causes a myopathic phenotype and in some case neurodegeneration.

      Major strengths -

      The ability of proteins to clog TOM and sequentially disrupt protein import into mitochondria was demonstrated in recent years. However, till now this was achieved using chemicals, artificial cloggers and overexpression of mitochondrial proteins. This study reveals, for the first time, that disease associated variants of native mitochondrial proteins can clog the entry into the organelle. Thus, this work demonstrates that TOM clogging is a physiological relevant phenomenon that is involved in human diseases.

      The manuscript is well-written and the experiments are well-designed, presenting convincing data that mostly support the conclusions. The methods used are well-establish and suitable techniques that are often used in the field. This work took advantage of 3 different biological systems/model organism, yeast, cell culture, and mice tissues, to validate the results, show conservation, and exploit the strengths of each system.

      Overall, this study is impactful, greatly contributes to the field and should be of interest to the general scientific community. The work sheds light of the mechanisms by which Ant1 pathogenic mutants impact cellular health and provides evidence for the involvement of translocases clogging and impaired protein import in human diseases. The gain of function Aac2/Ant1 mutants will provide a new and powerful tool for future studies of mitochondrial quality control and repair mechanisms.

      Major weaknesses -

      1) The evidence for clogging of mitochondrial translocases and for general defect in protein import are solid. However, there are not enough evidence to conclude that all phenotype seen in mice and yeast are directly connected to clogging.

      We completely agree with the reviewer that it is unreasonable to ascribe all phenotypes seen in mice and yeast directly to clogging. We are very open to the possibility that other unknown mechanisms contribute as well. The language in the manuscript has been modified to reflect this.

      2) This work implies that Aac2/Ant1 variants can clog TOM, TIM22, or both. Clogging of TIM22 is novel and interesting but is not fully discussed in the manuscript, as well as the possibility that clogging of different translocases can result in different defects.

      We thank the reviewer for this comment, and have directly addressed this in the revised manuscript. We added some speculation but overall, we prefer to keep this brief because the precise mechanism of carrier protein import and IMM insertion by the TIM22 complex remains unresolved, making an extensive discussion on its clogging premature.

    1. Author Responses

      Reviewer #1 (Public Review):

      This work aimed at investigating how a BMI decoding performance is impacted by changing the conditions under which a motor task is performed. They recorded motor cortical activity using multielectrode arrays in two monkeys executing a finger flexion and extension task in four conditions: normal (no load, neutral wrist position), loaded (manipulandum attached to springs or rubber bands to resist flexion), wrist (no load, flexed wrist position) or both (loaded and flexed wrist). They found, as expected, that BMI decoders trained and tested on data sets collected during the same conditions performed better at predicting kinematics and muscle activity than others trained and tested across conditions. They also report that the performance of monkeys a BMI task involving the online control of a virtual hand was almost unaffected by changing either the actual manipulandum conditions as above or switching between decoders trained from data collected under different conditions. As for the neuronal activity, they found a mix of changes across task contexts. Interestingly, a principal component analysis revealed that activity in each context falls within well-aligned manifolds, and that the context-dependent variance in neuronal activity strongly correlated to the amplitude of muscle activity.

      Strengths

      The current study expands on previous findings about BMI decoders generalizability and contributes scientifically in at least three important ways.

      First, their results are obtained from monkeys performing a fine finger control task with up to two degrees of freedom. This provides a powerful setting to investigate fine motor control of the hand in primates. The authors use the accuracy of BMI decoders between data sets as a measure of stationarity in the neuronsto-fingers mapping, which provides a reliable assessment. They show that changes in wrist angle or finger load affect the relationship between cortical neurons and otherwise identical movements. Interestingly, this result holds up for both kinematics and muscle activity predictions, albeit being stronger for the latter.

      Second, their results confirming that neuronal activity recorded during different task conditions lies effectively within a common manifold is interesting. It supports prior observations, but in the specific context of finger movements.

      Third, the dPCA results provide interesting and perhaps unexpected information about the fact that amplitude of muscle activity (or force) is clearly present in the motor cortical activity. This is possibly one of the most interesting findings because extracting a component from neural activity that can related robustly to muscle activity across context would provide great benefits to the development of BMIs for functional electrical stimulation.

      Overall, the analyses are well designed and the interpretation of the results is sound.

      Weaknesses

      I found the discussion about the possible reasons why offline decoders are more sensitive to context than online decoders very interesting. Nonetheless, as the authors recognize, the possibility that the BMI itself causes a change in context, "in the plant", limits their interpretation. It could mean for the monkeys to switch from one suboptimal decoder to another, causing a ceiling effect occluding generalization errors.

      Overall, several new and original results were obtained through these experiments and analyses. Nonetheless, I found it difficult to extract a clear unique and strong take-home message. The study comes short of proposing a new way to improve BMIs generalizability or precisely identifying factors that influence decoders generalizability.

      We thank the reviewer for the positive comments. Relating these results to BMI design and interpreting the adaptation to contexts during online trials comprised a bulk of the essential revisions from the eLife editorial staff. More details can be found in common response #2 and essential revisions #1-3. To summarize, we added an analysis of neural activity during online trials to provide insight into how the monkeys were adapting. We have expanded the discussion of online adaptation, as detailed in essential revision #2. We also expanded discussion of how both the online and offline results might affect BMI design, as detailed in essential revision #3.

      Reviewer #2 (Public Review):

      The authors motivate this study by the medical need to develop brain-machine interfaces (BMIs) to restore lost arm and hand function, for example through functional electrical stimulation. More specifically, they are interested in developing BMI decoding algorithms that work across a variety of "contexts" that a BMI user would encounter out in the real world, for example having their hand in different postures and manipulating a variety of objects. They note that in different contexts, the motor cortex neural activity patterns that produce the desired muscle outputs may change (including neurons' specific relationship to different muscles' activations), which could render a static decoder trained in a different context inaccurate.

      To test whether this potential challenge is indeed the case, this study tested BMI control of virtual (onscreen) fingers by two rhesus macaques trained to perform 1 or 2 degree-of-freedom non-grasping tasks either by moving their fingers, or just controlling the virtual finger kinematics with neural activity. The key experimental manipulations were context shifts in the form of springs on the fingers or flexion of the wrist (or both). BMI performance was then evaluated when these context changes were present, which builds on this group's previous demonstration of accurate finger BMI without any context shifts.

      The study convincingly shows the aforementioned context shifts do cause large changes in measured firing rates. When neural decoding accuracy (for both muscle and position/velocity) is evaluated across these context changes, reconstruction accuracy is substantially impaired. The headline finding, however, is that that despite this, BMI performance is, on aggregate, not substantially reduced. Although: it is noteworthy that in a second experiment paradigm where the decoder was trained on the spring or wrist-manipulated context and tested in a normal context, there were quite large performance reductions in several datasets as quantified by multiple performance measures; this asymmetry in the results is not really explored much further. The changes in neural activity due to context shifts appear to be relatively modest in magnitude and can be fit well as simple linear shifts (in the neural state space), and the authors posit that this would make it feasible (in future work) to find context-invariant neural readouts that would result in more robust muscle activity decoders.

      An additional novel contribution of this study is showing that these motor cortical signals support quite accurately decode muscle activations during non-prehensile finger movements (and also that the EMG decoding was more negatively affected by context shifts than kinematics decoding); previous work decoded finger kinematics but not these kinetics. Note that this was demonstrated with just one of the two monkeys (the second did not have muscle recordings).

      This is a rigorous study, its main results are well-supported, and it does not make major claims beyond what the data support.

      One of its limitations is that while the eventual motivating goal is to show that decoders are robust across a variety of tasks of daily living, only two specific types of context shifts are tested here, and they are relatively simple and potentially do not result in as strong a neural change as could be encountered in realworld context shifts. This is by no means a major flaw (simplifying experimental preparations are a standard and prudent way to make progress). But the study could point this out a bit more prominently that their results do not preclude that more challenging context shifts will be encountered by BMI users, and this study in its current form does not indicate how strong a perturbation the tested context shifts are relative to the full possible range of hand movement context shifts that would be encountered during human daily living activities.

      A second limitation is that while the discrepancy between large offline decoding performance reduction and small online performance reduction are attributed to rapid sensorimotor adaptation, this process is not directly examined in any detail.

      Third, the assessment of how neural dynamics change in a way that preserves the overall shape of the dynamics is rather qualitative rather than quantitative, and that this implementation of a more contextagnostic finger BMI is left for future work.

      We thank the reviewer for the positive comments. We agree that the paper could discuss how this work impacts a wider range of movements and we now include more discussion to that point as detailed in the responses to feedback below. We also acknowledge that the paper did not directly examine online adaptation and we have now included an analysis aimed at answering how the monkeys adapted to the context changes during online tasks.

      Reviewer #3 (Public Review):

      In this manuscript the authors ask whether finger movements in non-human primates can be predicted from neural activity recorded from the primary motor cortex. This question is driven by an ultimate goal of using neural decoding to create brain-computer interfaces that can restore upper limb function using prosthetics or functional electrical stimulation systems. More specifically, since functional use of the hand (real or prosthetic) will ultimately require generating very different grasp forces for different objects, these experiments use a constant set of finger kinematics, but introduce different force requirements for the finger muscles using several different techniques. Under these different conditions (contexts), the study examines how population neural activity changed and uses decoder analyses to look at how these different contexts affect offline predictions of muscle forces and finger kinematics, as well as the animals' ability to use different decoders to control 1 or 2-DOF online. In general, the study found that when linear models were trained on one context from offline data, they did not generalize well to the other context. However, when performance was tested online (monkeys controlling a virtual hand in real time using neural activity related to movement of their own hands) with a ReFIT Kalman filter, the animals were able to complete the task effectively, even with a decoder trained without the springs or wrist perturbation. The authors show data to support the idea that neural activity was constrained to the same manifold in the different contexts, which enabled the animals to rapidly change their behavior to achieve the task goals, compared to the more complex requirement of having to learn entirely new patterns of neural activity. This work takes studies that have been conducted for upper-limb movements and extends them to include hand grasp, which is important for creating decoders for brain-computer interfaces. Finally, the authors show using dPCA can extract features during changes in context that may be related to the activity of specific muscles that would allow for improved decoders.

      Strengths

      The issue of hand control, and how it compares to arm control, is an important question to tackle in sensorimotor control and in the development of brain-computer interfaces. Interestingly, the experiments use two very different ways of changing the muscle force requirements for achieving the same finger movements; springs attached to a manipulandum and changes in wrist posture. Using both paradigms the decoder analysis clearly shows that linear models trained without any manipulation do not predict muscle forces or finger kinematics well, clearly illustrating the limitations of common linear decoders to generalize to scenarios that might encompass real grasping activities that require forceful interactions. Using a welldescribed real-time decoder (ReFIT Kalman Filter), the authors show that this performance decrease observed offline is easily overcome in online testing. The metrics used to make these claims are welldescribed, and the likely explanations for these findings are described well. A particular strength of this manuscript is that, at least for these relatively simple movements and contexts, a component of neural activity (identified using dPCA) is identified that is significantly modulated by the task context in a way that sensibly represents the changes in muscle activity that would be required to complete the task in the new contexts. We thank the reviewer for the positive comments.

      Weaknesses

      The differences between exemplar data sets and comprehensively tested contexts was difficult to follow. There are many references to how many datasets or trials were used for a particular experiment, but overall, this is fragmented across the manuscript. As a result, it is difficult to assess how generalizable the results of the manuscript were across time or animal, or whether day-to-day variations, or the different data collection schedules had an effect.

      Thank you for the comment, we have added in the number of sessions in results in multiple places throughout the paper. For example, starting line 274 in the results:

      "During these 10 sessions the context changes were tested 15 times: four times for the wrist context, seven times for the spring context, and four times for the combined wrist and spring context."

      The introduction allocates a lot of space to discussing the concepts of generating (computing) movements as opposed to representing movements and relates this to ideas of neural dynamics. The distinction between these as described in the introduction is not very clear, nor is it clear what specific hypothesis this leads to for these experiments. Further, this line of thinking is not returned to in the discussion, so the contribution of these experiments to ideas raised in the introduction are unclear.

      Thank you for the comment, we have written a new paragraph relating these results to the concept of generating movement. Starting line 452 of the discussion:

      "During the offline tasks, many channels changed neural activity with context, with 20.9% to 61.7% of tuned SBP channels modulating activity with context (Table I). The magnitude of these shifts were relatively small, especially when compared to the large changes in required muscle activation (Figure 2D-E), with weak trends to require greater activation for resisted flexion and lesser for assisted extension (Figure 7B-C). Additionally, the neural manifolds underlying movements in each context were well-aligned (Figure 7D). Using dPCA we found that while a large proportion of neural variance was explained by dPCA components that did not change with context, a significant proportion of the neural variance is associated with components that are context-dependent (Figure 8B). Visually, the context components are shifting the trajectories without changing the overall shape and the shift in neural activity is strongly correlated with muscle activations in new contexts (Figure 8C). This agrees with other studies which found lower variance activity may be related to the actual motor commands (Gallego et al., 2018; Russo et al., 2018; Saxena et al., 2022)."

      The complexity of the control that was possible in this task (1 or 2 DOF finger flexion/extension) was low. Further, the manipulations that were used to control context were simple and static. Both these factors likely contribute to the finding that there was little change in the principal angles of the high-variance principal components. While this is not a criticism of the specific results presented here, the simplicity of the task and contexts, contrasted with the complexity of hand control more generally, especially for even moderately dexterous movements, makes it unclear how well the finding of stable manifolds will scale. On a related point, it is unclear whether the feature, identified using dPCA, that could account for changes in muscle activity, could be robustly captured in more realistic behaviors. It is stated that future work is needed, but at this point, the value of identifying this feature is highly speculative.

      Thank you for the comment, we have included more discussion to relate these results to decoder development in general as described in essential revision #3 from the editor.

      The maintained control in online BMI trials could also be explained by another factor, which I don't think was explicitly described by either of the two suggestions. Prism goggle experiments introduce a visual shift can be learned quickly, and some BCI experiments have introduced simple rotations in the decoder output (e.g. Chase et. al. 2012, J Neurophys). This latter case is likely similar in concept to in-manifold perturbations. Regardless, the performance can be rapidly rescued by simply re-aiming, which is a simple behavioral adaptation. In a 1DOF or 2DOF control case like used in these experiments, with constant visual feedback on performance, the change in context could likely be rapidly learned by the animals, maybe even within a single trial. In other words, the high performance in the online case may be a consequence of the relatively simple task demands, and the simple biomechanical solution to this problem (push harder). What is the expectation that the results seen in these experiments would be relevant to more realistic situations that require grasp and interaction?

      Thank you for the suggestion, we agree that the quick adaptation is likely related to re-aiming. To this end, we have included a re-aiming analysis, as described in essential revisions #1 and #2 from the editor and common response #2, to look into the quick adjustment.

      Some of the figures were difficult to read and the captions contained some minor incorrect information. The primary purpose of some of the figures was not immediately clear from the caption. For example, the bar plots in Figures 5 and 6 were very small and difficult to read. This also made distinguishing the data from the two different animals challenging.

      Thank you for the comments, multiple figures have been edited to increase legibility and a review of text has been done to fix errors and improve interpretability.

      There is no specific quantification of the data in Figures 4D and 5D. In Figure 4D it seems apparent that the vast majority of the points are below the unity line. But, it remains unclear, particularly in Figure 5D whether the correlations between the two contexts truly are different or not in a way that would allow conclusive statements.

      Thank you for the comments, Figure 4D has been moved to the supplement and 5D has now been replaced by figures analyzing the neural activity patterns during the online task.

    1. Author Response

      Reviewer #1 (Public Review):

      This is thorough, quantitative microbial ecology research on one of the most important problems of species coexistence in infection biology. The intermediate disturbance hypothesis is supported once again, and they show unsurprisingly that nutrition matters for their ratio of coexistence, but more specifically as a novel function of the ratio of metabolic fueling to reproductive rate, which the authors term absolute growth. I like this study for its care and completeness even though the results are fairly intuitive to those in the field of cystic fibrosis microbial ecology.

      We would like to thank the reviewer for acknowledging the importance, care, and completeness of our original manuscript. We have continued to employ our standards of rigor for this revision.

      Reviewer #2 (Public Review):

      The authors present a manuscript that addresses an important topic of bacterial co-existence. Specifically modeling infection-relevant scenarios to determine how two highly antibiotic-resistant pathogens will develop over time. Understanding how such organisms can persist and tolerate therapeutic interventions has important consequences for the design of future treatment strategies.

      We would like to thank the reviewer for acknowledging the importance of our work.

      A major strength of this paper is the methodical approach taken to assess the dynamics between the two bacterial species. Using carbon sources to regulate growth to test different community structures provides a level of control to be able to directly assess the impact of one dominant pathogen over another.

      The modeling aspect of this manuscript provides a basis for testing other disturbances and/or the impact of additional incoming pathogens. This could easily be applied to other infection settings where multiple microbes are observed ( for example viral/bacterial interactions in the lung).

      Thank you for acknowledging the rigor in our experimental and modeling approaches.

      The authors clearly show that by altering the growth rate and metabolism of various carbon sources, population structure can be modified, with one out-competing the other. Both modeling and experimental approaches support this.

      The exploration of the role of virulence factors is less clear, for example how strains unable to produce virulence factors are impacted in regard to their overall growth and whether S. aureus is able to sense virulence factors without transcriptional assays here. Although the hypothesis is strong, the experimental data does not fully support this conclusion.

      In addressing your comments below, we hope that we have increased your confidence in our hypotheses presented in our manuscript as it pertains to the involvement of virulence factors.

      Spatial disturbance has a significant impact on community structure. Although using one approach to assess this, it is not clear if the spatial structure is impacted without the comparable microscopy evaluation.

      We have indeed acknowledged this short coming in our revised manuscript. In the discussion, we write:

      “While we did not explicitly quantify spatial organization experimentally owing to technical limitations of our microplate reader and microscope setups, in theory, co-culture in an undisturbed condition should facilitate the creation of spatial organization.”

      In fact, we would really like to be able to track the position of each bacterium during shaking events. However, the plate reader cannot accommodate a microscope setup. While we could remove the plate from the plate reader and transport it to the microscope (two floors down), we cannot be certain that the position of the bacterium would not be altered during transport. We have thought about fixing the bacterium in place prior to transport. However, the injection of liquid for the purposes of fixation would likely alter the positioning of bacteria. Thus, we chose a modeling approach using an agent based model that is parametrized based on our experimental approach. Accordingly, we agree that this is a limitation of our current study. We hope that acknowledging this limitation in the discussion sits well with the reviewer.

      Overall this paper highlights the use of modeling approaches in combination with wet lab experiments to predict microbial interactions in changing environments.

      Reviewer #3 (Public Review):

      This is an intriguing manuscript with a rigorous experimental and computational methodology looking at the interaction of Pseudomonas aeruginosa (Pa) and Staphylococcus aureus (Sa). These two pathogens frequently co-habit infections but in standard liquid media often show a winner-take-all outcome. This study seeks to be mechanistically predictive as to the outcome of the co-culture based on the addition of specific carbon sources as filtered through the lens of metabolic efficiency or, as the authors term - absolute growth. Overall, the study is sound, but there are some specific caveats that I would like to present:

      We would like to thank the reviewer for acknowledging the rigor of our work.

      1) The study undersells the knowledge in the literature of what allows or prohibits the stability of the Pa and Sa co-cultures. While most of the correct papers are cited, the outcomes of those studies are downplayed in favor of the current predictive study. While the current study is indeed more "predictive", it strays exceedingly far from an infection-relevant media, whereas other studies show reasonable co-existence in host-relevant media.

      We have addressed this comment two different ways. First, we have included an entire paragraph in the discussion that acknowledges previous work and how our results fit into previous findings. We write:

      “Given the clinical importance of co-infection with both P. aeruginosa and S. aureus, multiple previous studies have identified mechanisms of co-existence. Indeed, long term co-existence of both species can result in physiological changes that reduce their competitive interactions. Strains of P. aeruginosa isolated from patients that enter into a mucoid state show reduced production of siderophores, pyocyanin, rhamnolipids and HQNO, which facilitates the survival of S. aureus [23, 24]. These strains can also overproduce the polysaccharide alginate, which in itself is sufficient to decrease the production of these virulence factors. Moreover, exogenously supplied alginate can reduce the production of pyoverdine and expression from the PQS quorum sensing system, which is responsible for the production of HQNO [25]. Changes in the physiology of S. aureus can also facilitate co-existence. Strains of S. aureus isolated from patients with cystic fibrosis show multiple changes in the abundance of proteins including super oxide dismutase, the GroEL chaperone protein, and multiple surface associated proteins [26]. Interestingly, the majority of proteins that show changes in abundance in S. aureus are related to central metabolism, which is consistent with our findings demonstrating that metabolism can influence the co-existence of both species. While it is unclear as to how long-term co-culture would affect the ratio of absolute growth, our findings provide an additional mechanism that can determine the co-existence of these bacterial species.”

      Second, as noted in our response in the ‘essential revisions’ section, we have tested the relationship between the final density ratio and the absolute growth ratio in SCFM medium, which we believe is host relevant. Our findings were fully consistent with the trends that we saw in our original submission. This data is presented in Fig. 3 and Figure 5 – figure supplement 3.

      2) The major weakness in the ability of this study to be extrapolatable to infection conditions is the basal media selected for this analysis. The authors choose TSB, which is an incredibly rich media from the start, and proceed to alter only 11% of the available carbon (per mass) with their carbon source manipulations. This suggests an underappreciation for the amino acid metabolism routes of these two pathogens that are taking advantage of the roughly 89% of carbon as amino acid content in the TSB components of tryptone and soytone (17g and 3g, respectively vs the 2.5g carbon source). There are a few major issues with this basal formulation:

      a) Comparison to all extant literature on Pa - The media historically used to assess Pa include (rich) LB, BHI, MH; (minimal) MOPS, M63, M9; (host-associated) ASM, SCFM, SCFM2, Serum, and DMEM. TSB is not a historically evaluated formulation for Pa (though it is often for non-mammalian pathogenic Pseudomonads and environmental species). Thus, this study is not inherently integrated into the Pa literature and presents an offshoot study for which a direct connection to extant literature is difficult. Explicitly testing these predictions in the most minimal media possible and then in a host-relevant model would be optimal.

      We would truly like to thank the reviewer for their rigor in reviewing our manuscript. We, admittedly, overlooked how amino acids might be influencing the growth of P. aeruginosa in TSB medium. We originally chose TSB medium as previous studies that have examined the co-culture of S. aureus and P. aeruginosa, or their mechanisms of interaction, have used this medium (e.g., [29-34]).

      To address this comment directly, we grew co-cultures in AMM minimal medium. This medium, to our knowledge, is the only minimal medium that allows growth of S. aureus. We, and others, have not reported growth of S. aureus in M9 or MOPS minimal medium despite the addition of components such as casamino acids and increases in the concentration of thiamine.

      While AMM as reported is quite complex relative to media such as MOPS and M9, we removed several vitamins (nicotinic acid, thiamine, calcium pantothenate, biotin), decreased the concentration of some salts, used a low concentration of casamino acids (0.01%), and used a higher concentration of carbon source (0.04%). In doing so, we hoped to reduce any ‘background effect’ of media components and thus absolute growth could be driven more by carbon source.

      Importantly, in using AMM medium, we continue to find a strong and significant relationship between the final density ratio and the absolute growth ratio. This data is presented in the Figure 3 and is described in a standalone paragraph in the results, along with our findings using SCFM.

      b) TSB is not remotely host-relevant. The Whiteley lab has done monumental work evaluating in vitro models that mimic human infection (scrupulously matching transcriptomes) and TSB is about as far as you can get. Thus, the ability to extrapolate from the current work to infection without testing in host-relevant media is limited.

      As noted above, we repeated our core experimental analysis in SCFM. The results are fully consistent with our original submission. This data is presented Figure 3 and in Figure 5- figure supplement 3.

      c) The experimental situation has a component that is both good and bad- O2 tension. By overlaying with mineral oil, the authors immediately bias Staph (a more versatile fermenter) to success, whereas Pa deals with most of these carbon sources better at body level or higher O2 levels. The benefit of this is that many of the infection sites in which these two species co-occur are low in O2.

      This was an interesting observation that we have partially addressed experimentally and acknowledged in the discussion.

      First, we acknowledged the limitations of our experimental approach as it pertains to O2 levels in the discussion as follows:

      “We note that our findings may be relevant to infections occurring in both high and low O2 environments. While P. aeruginosa is limited in its ability to perform fermentation [35], we have provided evidence that the absolute growth ratio can affect community composition in both aerobic (Figures 2-5) and more anaerobic environments (Figure 2 - figure supplement 1, panel H). The limited ability of P. aeruginosa to grow in anaerobic environments was apparent in SCFM as we could not obtain reliable or robustly quantifiable growth of this bacteria when succinate or -ketoglutarate was provided as a carbon source.”

      Second, we tested the effect of placing mineral oil over top of the co-culture experiments, thus increasing the anaerobic nature of the environment. We found that, in general, as the ratio of absolute growth increased, so did the dominance of P. aeruginosa in the growth medium. This new data is presented in Figure 2 - figure supplement 1, panel H.

      Taken together, we hope that these two modifications meet the Reviewer’s expectations.

      d) Some of the tested metabolites are osmotically active (sucrose), while others are not (acetate), confounding the interpretation of what absolute metabolism means in the context of this study since the concentrations of all tested metabolites vary from above to below physiologic-dependent on the metabolite. A much better approach would have been to vary a single metabolite or combination to alter 'absolute metabolism' and test whether the stability of the co-culture held.

      e) The manuscript never goes into the fact that for some of these "the carbon source" sources, they are catabolite repressed compared to the basal TSB amino acids (or not). Both organisms show exquisite catabolite repression control, yet this is not addressed at all within the text of the manuscript. Since this response in both organisms is sensitive to relative proportions of the various C-sources, failure to vary C-sources or compare utilization compared to the massive excess tryptone and soytone in the media makes the 'absolute metabolism' difficult to interpret.

      To address comments d and e, and to acknowledge the potential limitations of our findings, we have included the following in the discussion. In this paragraph, we acknowledge the osmotic activity of the different carbon sources and preferential consumption of amino acids in TSB medium.

      “One drawback of our approach in using different carbon sources to manipulate absolute growth is that some carbon sources are osmotically active, whereas others are not, which could have additional physiological effects on the bacteria outside of changing growth and metabolism. Moreover, both species of bacteria have different carbon source preferences; as above S. aureus tends to prefer carbon sources such as glucose [36] whereas P. aeruginosa prefers organic and amino acids [37]. Given the carbon source preferences of each species, in complex medium such as TSB, there is the potential that P. aeruginosa consumes amino acids prior to consuming the supplied carbon source. This is perhaps less of a concern in AMM medium or SCFM where the concentration of amino acids and additional nutrient components is reduced as compared to TSB medium. Along this line, it is certainly worth investigating how each nutrient component and its ordered utilization by both species contributes to changes in absolute growth. Minor or transient changes in absolute growth owing to preferential nutrient consumption may provide windows of opportunity for one species to increase its relative density to the other.”

      f) The authors left out the 'favorite' sources of Pa that are known to be relevant in vivo - the TCA intermediates: citrate, succinate, fumarate (and directly relevant to host-pathogen interactions, itaconate)

      We have included the analysis of succinate as a carbon source in both TSB medium (Figs. 1 and 2) and AMM medium (Fig. 3). However, we could not achieve reliable or a quantifiable growth rate of P. aeruginosa in SCFM medium supplemented with succinate in our experimental setup. Accordingly, this carbon source was not used in SCFM.

      3) Statistics: Most of the experiments presented are comparisons in which there are more than two experimental groups and the t-tests employed therefore need to be corrected for multiple comparisons. The standard way to do this is to employ an ANOVA with the appropriate multiple-comparison-corrected post-test. These appear to be appropriate for Dunnett's post-testing but the comparator group is not directly defined within the figure legends. Multiple comparison testing is critical for this analysis, as the H0 is that all are the same - the more samples potentially pulled from the same distribution will result in a higher likelihood that one or more will appear as from a distinct population (i.e. H0 rejected). Multiple comparisons correct for this and are absolutely critical for the evaluation of the data presented in this manuscript.

      We have addressed this comment two different ways.

      First, where there was a clear control group, we performed either a Dunnett’s (for normally distributed data) or a Dunn’s (for non-parametric data sets) following either an ANOVA or Kruskal-Wallis, respectively. These tests were applied to the data presented in Figure 2B, 5H (top and bottom panels) and in Figure 2 - figure supplement 1, panels K-L.

      Second, we did not broadly perform multiple comparisons across all data sets. The reason is that this approach would test the significance of relationships that are not relevant to the central premise of the manuscript. For example, a multiple comparison for figure 1B would test the growth rate of all carbon sources against all carbon sources. However, we are only interested if S. aureus or P. aeruginosa grows faster than one another. However, we do understand the need for a corrected P value to reduce the occurrence of Type 1 errors. To accomplish this, we applied a Benjamini-Hochberg Procedure [38] with a 8.5% discovery rate to all P values in the manuscript, including those that tested the distribution of data. This reduced the P value to indicate significance at < 0.0472. We have updated all claims and indications of significance in the figures based on this adjusted P value.

      4) The authors missed including Alves et Maddocks 2018 in relation to priority effects and other contributing factors to stable Pa/Sa co-culture.

      We have indeed included this manuscript and its findings in the introduction where we write:

      “While S. aureus can initially aid in the establishment of the P. aeruginosa population [8], production of N-acetylglucosamine from S. aureus augments…..”

    1. Author Response

      Reviewer #3 (Public Review):

      The authors examine the role of secreted BAFF in senescence phenotypes in THP1 AML cells and primary human fibroblasts. In the former, BAFF is found to potentiate the inflammatory phenotype (SASP) and in the latter to potentiate cell cycle arrest. This is an important study because the SASP is still largely considered in generic and monolithic terms, and it is necessary to deconvolute the SASP and examine its many components individually and in different contexts.

      Although the results show differences for BAFF in the two cell models, there are many places where key results are missing and the results over-interpreted and/or missing controls.

      1) Figure 1. Test whether the upregulation of BAFF is specific to senescence, or also in reversible quiescence arrest.

      We appreciate the Reviewer’s requests. We performed the experiments in fibroblasts and THP-1 cells to assess BAFF levels in quiescence. As shown below in the figure for Reviewers, we induced quiescence in fibroblasts by serum starvation (0.1%) for 96 h and confirmed the quiescent state by measuring two markers of quiescence (reduction of CCND1 mRNA and reduction of phopho-S6, when compared to cycling cells, following markers established previously (PMID 25483060) (panel A). In this case, the level of BAFF mRNA was increased upon quiescence (panel B).

      In THP-1 cells, we tried to induce quiescence by serum starvation and glutamine depletion for 96 h. Unfortunately, however, inducing quiescence in THP-1 cells was rather challenging, likely because they are cancer cells. Thus, we observed a reduction of cell proliferation in both conditions, but we observed a reduction in phospho-S6 only in the samples without glutamine (panel C). We failed to see increased BAFF mRNA levels in quiescent THP-1 cells after either serum starvation or glutamine depletion (panel D).

      In summary, further studies will be necessary to fully understand if the increased expression of BAFF seen in senescent cells is also observed in other conditions of growth suppression (such as quiescence or differentiation), as well as whether this effect is specific to different cell types.

      2) Figure 1, Supplement 1G. Show negative control IgG for immunofluorescence.

      We thank the Reviewer for this suggestion. Along with other changes during the revision, we decided to remove the immunofluorescence data in order to include more informative data.

      3) All results with siRNA should be validated with at least 2 individual siRNAs to eliminate the possibility of off-target effects.

      We agree with the Reviewer on the importance of testing individual siRNAs. For BAFF, we originally tested two independent siRNAs (BAFF#1 and BAFF#2) individually, but we also pooled them for additional analysis (and referred to simply as “BAFFsi” along the manuscript). In the revised version of our manuscript, we included the key experiments performed with these two individual BAFF siRNAs. Upon BAFF silencing in THP-1 cells, we observed a reduction of SASP factors and SA-β-Gal activity levels with each individual siRNA (Figure 4-Figure Supplement 1D-F) and with the pooled siRNAs (Figure 4C). For WI-38 cells, we observed a reduction of p53 levels with individual and pooled siRNAs (Figure 7-Figure Supplement 1A), as well as a reduction in IL6 levels and SA-β-Gal activity (Figure 6-Figure Supplement 1D,E). After IRF1 silencing, we observed a reduction in BAFF pre-mRNA with two different pairs of CTRLsi and IRF1si pools (Figure 2I and supplementary Figure 2E). For the data on BAFF receptors, we used SMARTpools from Dharmacon, which are combinations of 4 siRNAs designed by the company to minimize off-target effects. These additions and clarifications are indicated in the revised manuscript.

      4) To confirm a role for IRF1 in the activation of BAFF, the authors should confirm the binding of IRF1 to the BAFF promoter by ChIP or ChIP-seq.

      We thank the Reviewer for this suggestion. We performed ChIP-qPCR analysis in THP-1 cells that were either proliferating or rendered senescent after exposure to IR (Figure 2H, Materials and methods section), and we confirmed the binding of IRF1 to the proximal promoter region of BAFF. As anticipated, this interaction was stronger after inducing senescence.

      5) Key antibodies should be validated by siRNA knockdown of their targets, for example, TACI, BCMA, and BAFF-R in Figure 5. Note that there is an apparent discrepancy between BCMA data in Figure 5B vs 5C.

      We fully agree with the Reviewer on this point and we thank him/her for helping us to improve this part of our manuscript. To address the discrepancy regarding BCMA western blot analysis and flow cytometry data, we silenced BCMA in THP-1 cells and tested two different antibodies advertised to recognize BCMA. This experiment allowed us to identify the correct band for BCMA by western blot analysis. We then confirmed that BCMA is upregulated in senescence, as observed by both western blot and flow cytometry analyses. We have modified the manuscript to reflect these changes. Please find these data in Figure 5A,B and Figure 5-Figure Supplement 1A of the revised manuscript.

      6) Figure 5E. Negative/specificity controls for this assay should be shown.

      We thank the reviewer for this comment and regret that we were unable to provide a negative control. The kit only provides a competitive wild-type oligomer used to test the specificity of the binding. For each sample (CTRLsi, BAFFsi, CTRLsi IR, BAFFsi IR) and each antibody tested (p65, p50, p52, RelB and c-Rel), we evaluated the reductions in signal upon addition of excess competitive oligomer per well (20 pmol/well) compared to wells with an inactive oligomer. However, the negative control was performed only as single replicate, due to the limited quantity of nuclear extracts and the high number of samples and antibodies analyzed. We therefore considered this control as being ‘qualitative’ rather than fully ‘quantitative’.

      7) Hybridization arrays such as Figure 5H, Figure 6 - Supplement 1I, and Figure 6H should be shown as quantitated, normalized data with statistics from replicates.

      We appreciate this request. We have included the quantification and statistics to the phosphoarrays used for THP-1 and WI-38 cells, which had been performed in triplicate (Figure 7A, Figure 5-Figure Supplement 1D). The original arrays are shown in the respective Source Data Files. In the interest of space, we removed the cytokine array performed on IMR-90 cells and left instead the quantitative ELISA for IL6 (Figure 6-Figure Supplement 1F). The data obtained from the cytokine array analysis in Figure 4F and Figure 4-Supplemental Figure 1C are supported by quantitative multiplex ELISA measurements (Figure 4E and Figure 4C).

      8) Figure 6B - Supplement 1. Controls to confirm fractionation (i.e., non-contamination by cytosolic and nuclear proteins) should be shown.

      We thank the Reviewer for this suggestion. We tested the efficiency of fractionation and we did in fact observe some degree of contamination from cytosolic proteins using the earlier version of the kit (Pierce, cat. 89881). We therefore purchased an improved version of the kit (Pierce, cat. A44390) and repeated the surface fractionation assay, which this time showed improved fractionation (Figure 7-Figure Supplement 1B). Interestingly, with the improved fractionation strategy, we observed that BAFF receptors in fibroblasts were almost exclusively localized inside the cell and not on the surface, as we found in THP-1 cells. Further validation of BAFF receptor antibodies has been provided in Figure 5-Figure Supplement 1A. As described in the text, the intracellular localization of BAFF receptors was previously reported in other cell types and conditions (PMID 31137630, PMID 19258594, PMID 30333819, PMID 10903733), and thus it is possible that BAFF may act through non-canonical mechanisms in WI-38 cells. Nonetheless, we did detect a small amount of BAFFR on the cell surface, and furthermore, BAFFR silencing reduced the level of p53 in fibroblasts. Therefore, we propose that BAFFR may be the primary receptor involved in p53 regulation in fibroblasts (Figure 7-Figure Supplement 1B,C). Our data on BAFF receptors deserve deeper characterization in a future study of the functions of BAFF receptors in senescence.

      9) Figure 6A. Knockdown of BAFF should be shown by western blot.

      Yes, definitely. We appreciate this comment and have included BAFF knockdown data in fibroblasts by western blot analysis (Figure 7B).

      10) Figure 6G. Although BAFF knockdown decreases the expression of p53, p21 increases. How do the authors explain this?

      We thank the Reviewer for the interesting question. We too were surprised to observe that the p53-dependent transcripts regulated by BAFF did not include CDKN1A (p21) mRNA, as confirmed by western blot analysis. The accumulation of p21 in senescence can be also regulated by p53-independent pathways and in p53-/- cells, for example by p90RSK, SP1, and ZNF84 (PMID 24136223, PMID 25051367, PMID 33925586). Eventually, we removed the data relative to p21 and γ-H2AX in favor of other data and to streamline the content of this manuscript for the reader.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present data identifying the role of the bacterial enhancer binding protein (bEBP) SypG in the regulation of the Qrr1 small RNA, which is known to be a key regulator of Vibrio fischeri bioluminescence production and squid colonization. Previously, only the bEBP LuxO was known to activate Qrr1 expression. LuxO and Qrr1 are conserved in the Vibrionaceae, and the authors show that SypG is conserved in ~half of the Vibrio family, suggesting that this Qrr1 regulatory OR gate controlled by LuxO or SypG may play important roles in physiology processes in other species.

      Successful squid colonization by Vibrio fischeri is a complex process, known to be influenced by several factors, including the formation of and dispersal from cellular aggregates prior to entering squid pores, and inoculation of the light organ crypts, and biofilm formation within the crypts. Previously, it was shown that strains lacking qrr1 were at a deficit for crypt colonization in the presence of wild-type V. fischeri. Conversely, cells lacking binK, which encodes a hybrid histidine kinase, were at an advantage for crypt colonization in the presence of wild-type cells. However, the authors identified BinK as a negative regulator of Qrr1 expression in a transposon screen. The authors used genetic epistasis experiments and found that Qrr1 transcription can be activated by either phosphorylated LuxO at low cell densities (in the absence of quorum sensing signals) or by SypG, presumably by binding to the two upstream activation sequences in the promoter of qrr1 to activate transcription by the required alternative sigma factor sigma-54. The competition between these bEBPs has not been tested. The model proposed is an OR gate through which quorum sensing and aggregation signals control Qrr1. However, there are several untested aspects of this model. First, the role of phosphorylation in SypG activity, and the connection to BinK, are not addressed in this manuscript, which may confound the observed effects observed on qrr1 transcription. Further, the authors did not test whether SypG directly binds to the qrr1 promoter, nor did they assess the individual role of LuxO binding to the two LuxO binding sites in the absence of SypG. The study is lacking an in vivo assessment of SypG and LuxO binding/competition at the Qrr1 promoter based on the authors' model of the OR gate.

      Major comments:

      • What is known about the connection between BinK and SypG? BinK is a hybrid HK (intro states this). Does BinK phosphorylate/dephosphorylate SypG - directly or indirectly? I saw a published paper (Ludvik et al 2021) with a diagram suggesting BinK does inhibit SypG, but the connection is unclear. This diagram also suggested that SypG needs to be phosphorylated. Can the authors comment - does SypG need to be phosphorylated to be active? Because SypG has the same sequence as the LuxO linker (Fig. S2), then I presume that SypG would also need to be phosphorylated to be active (like LuxO)? The authors utilize a phosphomimic of LuxO to test function under constitutive activity (Fig. S3), but they do not use a phosphomimic of SypG (Fig 4). If the authors used a constitutive allele, would those assays reveal more about the competition between SypG and LuxO, in the presence of phosphorylated LuxO at low cell density? The authors should include a putative cartoon model for how BinK HK activity connects to SypG, based on what is already in the literature, to aid the reader.

      We have added information & corresponding cartoon model in the results section about the signaling pathway involving BinK and SypG, including that SypG must be phosphorylated to be active and that BinK acts as a phosphatase towards SypG. We have also generated a SypGD53E mutant and found increased Pqrr1 activity, which suggests that phosphorylation of SypG has a major impact on SypG-dependent activation of Pqrr1.

      • Line 246: Figure S3: nucleotide substitutions in both UAS regions showed loss of Pqrr1-gfp, but this could be due to binding/activation by SypG or LuxO. This should be tested in a sypG- strain to determine the sole effect of LuxO binding to these two UASs. In Figures 4G and 7, the luxO- sypG- Ptrc-sypG strain backgrounds allow the independent analysis of the two bEBPs. It is important to test which of these two sites is critical for LuxO-dependent activation of Pqrr1, given the conservation of the LuxO-Qrr1 region in other Vibrios (line 327, Fig. S5). Thus, the authors could also discuss whether these two proteins would compete at both sites. Further, the authors should comment that they have not shown biochemical evidence that SypG binds to the two UASs in the Qrr1 promoter. The regulation of this locus by SypG is only shown by genetic assays in this manuscript.

      We have added a paragraph in the discussion highlighting how useful protein-DNA assays would be to address competition along with the barriers encountered with approaches to purify SypG. Regarding the contribution of each UAS to LuxO-dependent activation, we refer to the phosphomimic data of LuxO (Fig. S4) in the supplement that highlight G-131 and G-97 do not affect LuxO-dependent activation (as pointed out by reviewer #2), which has contributed to our test of a G-131T mutant in the co-colonization experiment.

      • Examination of the binding of LuxO and SypG (e.g., ChIP-seq) in combination with their transcriptional reporter under varying conditions (low cell density vs high cell density, with or without rscS* overexpression) would be extremely beneficial in testing the model proposed.

      We agree but have not had success in our attempts to perform ChIP due to protein instability. For example, we have tried SypG with a C-terminal TAP tag, which my colleague Dr. Lu Bai at Penn State has used extensively for ChIP, ChIP-seq, and ChIP-exo, but we could not observe a signal even when RscS* allele was included in the strain.

      Reviewer #2 (Public Review):

      The study by Surrett et al. uncovers a novel regulatory axis in Vibrio fischeri that controls the expression of the qrr1 small RNA, which post-transcriptionally controls various quorum-dependent outputs. This study is timely and addresses a major question about the physiology of this important model symbiosis and potentially other Vibrio species. The results should be of broad interest within the field of microbiology.

      While it was previously believed that qrr1 expression is under the strict control of the LuxO-dependent quorum sensing cascade, the authors demonstrate that qrr1 expression can be induced by another bEBP, SypG, in a manner that is quorum-independent. It was previously shown that qrr1 is important for colonization, and the authors recapitulate and extend this finding here. However, bacteria are likely at high cell density prior to entry into the crypts, which would repress qrr1 expression. Thus, despite the importance of qrr1 expression for crypt colonization, it would counterintuitively be repressed. The discovery of the SypG quorum-independent induction of qrr1 in this study may help resolve this conundrum. The authors take a largely genetic approach to characterize this novel regulatory pathway in combination with a squid colonization model. The experiments performed are generally well controlled and the data are clearly presented. The authors, however, fail to provide experimental evidence to support the physiological relevance of SypG-dependent control of qrr1 expression during host colonization.

      Fig. 2 - It is unclear why there is a disconnect between qrr1 expression and qrr1-dependent effects. Data in 2B, indicate that qrr1 is induced in the ∆binK mutant according to the Pqrr1-gfp reporter but this expressed qrr1 does not have any effect on phenotypes like bioluminescence according to the data presented in 2C. While the authors reveal an effect of the binK deletion when rscS is overexpressed, it is unclear why this is necessary since simple deletion of bink without rscS is sufficient to induce qrr1 in 2B. Could this discrepancy be due to the fact that experiments in 2B are done on solid media while the experiments in 2C are performed in liquid media? Do cells in liquid not express qrr1? Or conversely, perhaps testing the bioluminescence of cells scraped off of plates could reveal a phenotype for the binK mutant similar to those seen in the rscS background in liquid. Or alternatively, if cells in a liquid culture still express qrr1, perhaps there is a posttranscriptional mechanism that prevents qrr1 from exerting an effect on bioluminescence? The latter possibility would alter the proposed model.

      To help explain why we chose to overexpress RscS, we have added the cartoon in Fig. 2C, which highlights how BinK dephosphorylates SypG. We believe that the conditions used in the bioluminescence assay do not phosphorylate SypG, which prevents an effect by BinK. However, overexpression of RscS permits phosphorylation of SypG, which enables a phenotype to emerge in a binK mutant. We have tested the bioluminescence of cells within spots but did not detect a difference.

      The authors propose a model in which sypG dependent activation of qrr1 is required for appropriate temporal regulation of this small RNA and contributes to optimal fitness of V. fischeri during colonization, however, this was not directly tested, and experimental evidence to support a physiological role for spyG-dependent regulation of qrr1 remains lacking. Data in Fig. S3 and Fig. 4G-H suggest that the Gs at -131 and -97 in Pqrr1 are largely dispensable for LuxO-dependent activation, but are important for SypG-dependent activation of Pqrr1. Also, the Pqrr1 mutations at C -130 and -96 completely prevent sypG-dependent activation while only partially reducing LuxO-dependent activation. If SypG-dependent activation of qrr1 is critical for the fitness of V. fischeri, a strain harboring these Pqrr1 promoter mutations should be attenuated in a manner that resembles the qrr1 deletion mutant as shown in Fig. 3C.

      We thank the reviewer for this suggestion, which led us to generate and test a G-131T mutant in vivo.

      Fig. S4 - these data suggest that LuxO cannot enhance transcription of PsypA and PsypP at native expression levels. But sypG-dependent induction of qrr1 was largely tested with Ptrc-dependent overexpression of SypG. Would overexpression of LuxO induce PsypA and PsypP? The authors should at least acknowledge this possibility in the text.

      As requested, we have added text that acknowledges this possibility.

      The authors adopt three distinct strategies to induce sypG-dependent activation of qrr1 in distinct figures throughout the manuscript: deletion of binK, overexpression of rscS (rscS*), and direct overexpression of sypG. It is not entirely clear why distinct approaches are used in different figures. This is particularly true for Fig. 5 since the authors already demonstrated that the direct overexpression of sypG can be used, which is a more direct way of addressing this question. Similarly, sypG overexpression should inhibit bioluminescence in Fig. 2 based on the proposed model, which would have tested the claims made more directly. Additional text to clarify this would be helpful.

      As requested, we have added Fig. 2C and text to describe how SypG is regulated, which provides ways to test SypG-dependent activation of qrr1.

      The Fig. 5D legend indicates that the strains harbor a Ptrc-GFP reporter. However, the text would suggest that these strains should harbor a Pqrr1-GFP reporter to test the question posed.

      This has been corrected.

      The conclusion that SypG and LuxO share UASs in the qrr1 promoter is based on fairly limited genetic evidence where point mutations were introduced into 3 bp of the predicted LuxO UASs within the qrr1 promoter. This conclusion needs to be qualified in the text or additional experimental evidence is needed to support this claim. For example, in vivo ChIP-exo could be used to map the SypG and LuxO binding sites. Or SypG and LuxO could be purified to assess binding to the qrr promoter in vitro (to map binding sites or test competitive interactions of these proteins to the qrr promoter).

      As described above and in the text, we have not been able to construct a functional tagged SypG that would enable these types of studies.

      On a related note, SypG binding to the qrr1 promoter is speculated based on indirect genetic evidence. But the authors do not directly demonstrate this. This should be acknowledged in the text or additional experimental evidence should be provided to support this claim.

      As requested, we have added text in the discussion that highlights this problem.

      Reviewer #3 (Public Review):

      In this manuscript, Surrett and coworkers aimed to identify the mechanism that regulates the transcription of Qrr1 sRNA in the squid symbiont Vibrio fischeri. In many Vibrio species, Qrr1 transcription is regulated by quorum sensing (QS) and activated only at low cell density. Qrr1 is important for V. fischeri to colonize the squid host. In the QS systems that have been studied so far, LuxO is the only known response regulator that activates Qrr sRNA transcription. However, the authors argued that since V. fischeri forms aggregates before entering into the light organ of the squid, Qrr1 would not be made as high cell density QS state is likely induced within the aggregates. Therefore, they hypothesized that additional regulatory systems must exist to allow Qrr1 expression in V. fischeri to initiate colonization of the light organ. In turn, the authors identified that disruption of the function of the sensor kinase BinK allowed Qrr1 expression even at high cell density. Through a series of cell-based reporter assays and an in vivo squid colonization assay, they concluded that BinK is also involved in Qrr1 regulation within the squid light organ. They went on to show that another sigma54-dependent response regulator SypG is also involved in controlling Qrr1 expression. The authors propose dual regulation of LuxO and SypG on Qrr could be a common regulatory mechanism on Qrr expression in a subset of Vibiro species.

      Overall, the experiments were carefully performed and the findings that BinK and SypG are involved in Qrr1 regulation are interesting. This paper is of potential interest to an audience in the field of QS and Vibrio-host interaction. However, experimental deficiencies and alternative explanations of the results have been identified in the manuscript that prevents a thorough mechanistic understanding of the interplay between QS and these new regulators.

      1) The premise that Qrr1 expression in the light organ has to be regulated by systems other than QS is unclear. In lines 108-109, it was stated that "...prior to entering the light organ, bacterial cells are collected from the environment and form aggregates that are densely packed", however, in lines 184-185, it was stated that "The majority of crypt spaces each contained only one strain type (Fig. 3B), which is consistent with most populations arising from only 1-2 cells that enter the corresponding crypt spaces". So, if the latter case is true (i.e., 1-2 cells/crypt), why Qrr1 could not be made at that time point as predicted by a QS regulation model?

      We have not changed this section because if Qrr1 is expressed only after the cells have already entered the crypt space, then the Δqrr1 mutant would colonize a number of crypt spaces comparable to that of wild type cells.

      2) The involvement of the rscS allele for the ∆binK mutant to show an altered bioluminescence phenotype is confusing. It is unclear why a WT genetic background was sufficient to show the high Qrr1 phenotype in the original genetic screen that identified BinK (Fig. 2A-B), while the rcsS allele is now required for the rest of the experiments to show the involvement of BinK in bioluminescence regulation (Fig 2C). Is the decreased bioluminescence phenotype observed in rcsS* ∆binK mutant (fig. 2C) dependent on LuxU/LuxO/Qrr1/LitR? Could it be through another indirect mechanism (e.g., SypK as discussed in line 403)? A better explanation of the connection between RcsS/Syp and BinK and perhaps additional mutant characterization are necessary to interpret the observed phenotypes.

      As described above, we have added a cartoon that illustrates the pathway involving BinK (Fig. 2C) and additional justification in the results section, which better explains why RscS overexpression was used.

      3) In squid colonization competition assays (Fig. 3), it was concluded that the ∆qrr1 allele is epistatic to the ∆binK allele (line 204), and the enhanced colonization of the ∆binK mutant is dependent on Qrr1 (section title, line 162). This conclusion is hard to interpret. The results can be interpreted as ∆qrr1 mutation lowers the colonization efficiency of the ∆binK mutant which could imply BinK regulates Qrr1 in vivo. Alternatively, it could be interpreted that the ∆binK mutation increases the colonization efficiency of the ∆qrr1 mutant. Direct competition between single and double mutants in the same animals may resolve the complexity. And direct comparison of Qrr1 expression of WT and ∆binK mutants inside the animals, if possible, will also help interpret these results.

      We thank the reviewer for the suggestion and were able to test the ΔbinK and ΔbinK Δqrr1 mutants directly (Fig. S2). We were unable to interpret the data using the Pqrr1 reporter due to unexpected heterogeneity in Pqrr1 activity throughout the crypt spaces.

      4) Similar concern to above (#2), in Fig. 4, the link between BinK and Qrr1 regulation is not fully explored. What connects BinK and Qrr1 expression? Does BinK function via LuxU (or other HPT) to control SypG like the other QS kinases? And what is the role of other known kinases (e.g., SypF) in the signaling pathway? And did the authors test other bEBPs found in V. fischeri for their role in Qrr1 regulation?

      We have added to the discussion content that highlights examining LuxU as a direction worthwhile to pursue to understand how BinK affects signaling that activates Qrr1.

      5) In addition to the genetic analysis, additional characterization of SypG is required to demonstrate the proposed regulatory mechanism: What is the expression level (and phosphorylation state) of SypG and LuxO at different cell densities? Does purified SypG directly bind to the qrr1 promoter region? c. How do these two bEBPs compete with each other if they are both made and active?

      We agree that these are interesting questions, but as described above, we were unable to purify SypG to address the biochemistry.

      6) The molecular OR logic gate is used to describe the relationship between LuxO and SypG, but this logic relationship is not always true in all conditions (if at all). In WT, deletion of luxO completely abolished Qrr1 expression (Fig. 4C). Even in the binK mutant, LuxO still seems to be the more prominent regulator (Fig. 4D) as deletion of luxO already caused a smaller but significant drop in Qrr1 expression. The authors may need to use this term more precisely.

      We note that in wild-type cells, SypG is not active under the conditions tested, so SypG would not contribute to activating Qrr1 expression. The level of Pqrr1 activity by the SypG(D53E) variant surpasses the basal level of LuxO, which suggests that LuxO does not always serve as the prominent regulator. We have added content to the discussion to highlight how LuxO may contribute more to the regulation.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Berryer et al describe a fully automated, scalable approach to quantify the number of synaptic inputs formed onto human iPSC-derived neurons (hNs) in 2D culture. They validate the sensitivity of their approach by synapsin1 knock-down and test almost 400 small molecules for their effect on synapses, and the role of astrocytes. They identify BET inhibitors as strong modifiers of synapse numbers in hNs and performed follow-up experiments to confirm the finding, characterize the effect further and demonstrate the critical role of astrocytes.

      Every step of the protocol is automated to achieve high reproducibility and homogeneity throughout the experiments. This automated approach has great potential for scaling up drug screening, genetic perturbations, and disease modeling experiments related to synapses.

      The authors successfully identified, in two independent hNs lines, three small-molecule inhibitors of transcription modifiers of the BET family as the strongest positive modifiers of synaptic inputs. The initial study performed with immunofluorescence was then validated by Western blot analysis and mRNA-seq analysis, which showed an increase in the expression of trans-synaptic signaling genes.

      While accessing the molecular mechanisms of BET inhibitors, the authors observed that the increased synaptic inputs occurred only in cocultures of astrocytes and neurons, and not in hNs monoculture. Finally, the authors report that the presence of astrocytes alone is a major driving force to promote synaptic inputs.

      Overall, the experiments are well conducted, and the conclusions are supported by the data. The new approach reaches beyond the current state of the field, especially in the first steps of automation and the identified modulators (BET inhibitors) are interesting and novel, and the subsequent validation is convincing.

      On the other hand, the manuscript does not yet define the exact resolution and power of the new methods, and does not convincingly show that the observed synapsin-puncta are synapses and that the data of the validation experiments can be improved.

      MAJOR POINTS:

      1) Although the manuscript contains a lot of quantitative data on variance, the current manuscript stops short of an exact definition of the resolution of the assay and its statistical power. With the real (measured) variance of the assay, the power to detect certain effects can be computed. To be relevant for other applications than the current (e.g. genetic perturbations and disease modelling), it is relevant to define this for smaller effects too: can this assay detect a 25% effect with reasonable numbers of observations? Such assessments can also provide important recommendations on when it makes sense to add more repeated measures of the same specimens (wells, ROIs) and when more independent inductions are required (and how much this adds to overall power). The manuscript would also benefit from a short discussion on how to optimize future study designs (repeated measures, independent inductions, number of subjects).

      As mentioned above, we have now calculated Cohen’s d for: (1) the primary screen overall as well as for compound included in the primary screen, (2) validation experiments performed in neuron monocultures and (3) validation experiments performed in neuron + astrocyte co-cultures, and these data have been added to Figure 5, Figure 5-figure supplement 1 and Supplementary File 2. For the validation experiments, we have also added a discussion of study design, given the observed effect sizes. These analyses are discussed in depth on pages 19-20 of the Results section and page 26 of the Discussion section in the PDF. In brief, we obtained a Cohen’s d of -0.18 for the primary screen where individual small molecules increased as well as decreased synaptic density. Also from the primary screen, we obtained a Cohen’s d of 2.914 for JQ1 and 3.710 for I-BET151, indicating large effects for the BET inhibitors. We also noted large effects for BET inhibitors in the co-culture validation experiments, where we could have scaled down on the number of fields and wells analyzed. While we were reasonably powered to detect changes in the monoculture validation experiments, here, effect sizes were much smaller and required the 50+ wells that we analyzed in order to achieve 95% power. Example from Figure 5 below shows well level data for the co-culture and monoculture validation experiments -

      2) It is widely recognized that synapses formed in networks of NGN2-induced excitatory neurons only, may not model synapses in the real human brain very well (yet), especially not at DIV21. First, the authors can be more open/precise about this, e.g., in line 156 the authors indicate they use hNs at DIV21 because they are "electrophysiologically active" based on three references. However, (a) these references indicate that hNs cultures start to mature from DIV21 onwards but are not really mature yet, and (b) being "electrophysiologically active" seems not the most relevant criterion. Synaptic parameters like initial release probability, rise/decay time, and synchronicity are more relevant (none of which indicate synapses are mature at DIV21). Second, especially in the light of the claims the authors make regarding the effects of compounds on "synaptic connectivity" it seems essential to test, at least in a set of validation experiments, the distribution of postsynaptic markers. Synapsin-positive puncta may not be accompanied by a postsynaptic specialization and rather represent (mobile) vesicle clusters and/or release sites without postsynaptic partners. In addition, the authors claim synapsin1 is a pan-neuronal synapse marker. This is not yet validated for human neurons. A few control stainings with synaptic vesicle and active zone markers will secure this claim.

      We thank the reviewer for this comment and have now updated the text to indicate and expand on the fact that we are looking at immature synapses at day 21 in vitro (e.g., please see pages 8 and 12 of the Results section in the PDF).

      As mentioned above, we also tested conditions for four additional postsynaptic antibodies, drawing from those used in published studies of human cellular models (and species that would not cross-react with antibodies used for Synapsin1 and MAP2). Specifically, we tested antibodies against PSD-95, NLGN4, Homer1 and BAIAP2 at a range of concentrations in co-cultures generated from two independent cell lines. Of these antibodies, we only obtained quantifiable signal for PSD-95, while NLGN4, Homer1 and BAIAP2 appeared to be of poor quality in our culture systems (e.g., nonspecific signal, high signal in astrocytes, etc.). As shown below and in Figure 1-figure supplement 1, analysis of PSD-95 revealed that 43.1% of PSD-95 puncta on MAP2 also colocalized with synapsin1, and 28.8% of synapsin1 puncta on MAP2 also colocalized with PSD-95. Discussions of these data and limitations have been significantly elaborated upon on pages 10-11 of the Results section and pages 24 and 29 of the Discussion section in the PDF. For example, we discuss how the partial colocalization could be due both to the relative immaturity of the synapses discussed above (presynaptic assembly preceding postsynaptic assembly at this early stage of neuronal development) as well as the overall poorer quality of the PSD-95 signal in human cellular material (PSD-95 signal was of insufficient quality and consistency for screening applications and was generally quite difficult to resolve as compared to Synapsin1).

      Additionally, we tested two additional presynaptic antibodies, including synaptophysin and SV2A. Of these antibodies, we obtained reasonable quality signal for synaptophysin, which we have quantified in Figure 1-figure supplement 1. While SV2A also gave some signal, it was of poorer quality and difficult to reliably quantify. We observed roughly half of the Synapsin1 signal on MAP2 colocalizing with synaptophysin, and vice versa. Lack of complete colocalization could be due to reports that synapsin1 expression precedes synaptophysin expression in the cortex (e.g., Pinto et al 2013), reports that synaptophysin is also expressed at extra synaptic sites (e.g., Micheva et al 2010), or the reduced quality of staining for synaptophysin that we obtained compared with synapsin1. These data are now elaborated upon on pages 10-11 of the Results section and page 24 of the Discussion section in the PDF.

      We have also expanded our discussion of Synapsin1 as a presynaptic marker including additional references on the use of Synapsin1 to label cortical glutamatergic synapses in rodent (e.g., Micheva 2010) and the use of Synapsin1 on MAP2 as a pan-synaptic marker in human neurons (e.g., Chanda et al 2019, Pak et al 2015, Yi et al 2016; page 10). We have also included the use of Synapsin1 on MAP2 as a specific Limitation on page 29 where we discuss that reliance on this system in developing neurons may be capturing sites which do not then develop into fully functional synapses with postsynaptic partners.

      3) The analysis of the transcriptional effects of BET inhibitors is rather basic, especially given the rather strong claim: "BET inhibitors enhance synaptic gene expression programs". Which programs? Differentially expressed transcripts can at least be analysed further in terms of subcellular localization (pre/post) or synaptic functions, e.g. using SYNGO, also to address point 2 above.

      We thank the reviewer for this comment and have now incorporated SynGO analysis into Figure 6 to examine the synaptic ontology terms. As shown below, Figure 6g now includes the top 5 significantly enriched terms and Figure 6h shows the gene counts by cellular component. Here, we focused on genes upregulated after both JQ1 and Birabresib treatment compared with a background list of expressed genes. The most enriched synaptic ontology terms related to the post-synaptic membrane, so we also validated protein level changes in two postsynaptic proteins (Homer1 and BAIAP2) by Western blot analysis in Figure 6. In addition to Figure 6, these data are now included in Supplementary File 5 and discussed on page 22 of the Results section.

    1. Author Response:

      Reviewer #1 (Public Review):<br /> <br /> Roberts et al have developed a tool called "XTABLE" for the analysis of publicly available transcriptomic datasets of premalignant lesions (PML) of lung squamous cell carcinoma (LUSC). Detection of PMLs has clinical implications and can aid in the prevention of deaths by LUSC. Hence efforts such as this will be of benefit to the scientific community in better understanding the biology of PMLs.

      The authors have curated four studies that have profiled the transcriptomes of PMLs at different stages. While three of them are microarray-based studies, one study has profiled the transcriptome with RNA-seq. XTABLE fetches these datasets and performs analysis in an R shiny app (a graphical user interface). The tool has multiple functionalities to cover a wide range of transcriptomic analyses, including differential expression, signature identification, and immune cell type deconvolution.

      The authors have also included three chromosomal instability (CIN) signatures from literature based on gene expression profiles. They showed one of the CIN signatures as a good predictor of progression. However, this signature performed well only in one study. The authors have further utilised the tool XTABLE to identify the signalling pathways in LUSC important for its developmental stages. They found the activation of squamous differentiation and PI3K/Akt pathways to play a role in the transition from low to high-grade PMLs

      The authors have developed user-friendly software to analyse publicly available gene expression data from premalignant lesions of lung cancer. This would help researchers to quickly analyse the data and improve our understanding of such lesions. This would pave the way to improve early detection of PMLs to prevent lung cancer.

      Strengths:

      1. XTABLE is a nicely packaged application that can be used by researchers with very little computational knowledge.<br /> 2. The tool is easy to download and execute. The documentation is extensive both in the article and on the GitLab page.<br /> 3. The tool is user-friendly, and the tabs are intuitively designed for successive steps of analysis of the transcriptome data.<br /> 4. The authors have properly elaborated on the biological interest in investigating PMLs and their clinical significance.

      Weaknesses:

      The article is focused on the development and the utility of the tool XTABLE. While the tool is nicely developed, the need for a tool focussing only on the investigation of PMLs is not justified. Several shiny apps and online tools exist to perform transcriptomic analysis of published datasets. To list a few examples - i) http://ge-lab.org/idep/ ; ii) http://www.uusmb.unam.mx/ideamex/ ; iii) RNfuzzyApp (Haering et al., 2021); iv) DEGenR (https://doi.org/10.5281/zenodo.4815134); v) TCC-GUI (Su et al., 2019). While some of these are specific to RNA-seq, there are plenty of such shiny apps to perform both RNA-seq and microarray data analysis. Any of these tools could also be used easily for the analysis of the four curated datasets presented in this article. The authors could have elaborated on the availability of other tools for such analysis and provided an explanation of the necessity of XTABLE. Since 3 of the 4 datasets they curated are from microarray technology, another good example of a user-friendly tool is NCBI GEO2R. This is integrated with the NCBI GEO database, and the user doesn't need to download the data or run any tools. iDEP-READS (http://bioinformatics.sdstate.edu/reads/) provide an online user-friendly tool to download and analyse data from publicly available datasets. Another such example is GEO2Enrichr (https://maayanlab.cloud/g2e/). These tools have been designed for non-bioinformatic researchers that don't involve downloading datasets or installing/running other tools.

      Two of these tools (IDEP and TCC-GUI) were reviewed in a literature review covering 20 Shiny apps performed two years ago prior to work on XTABLE starting. Three of the suggested tools (IDEP, RNFuzzyApp, TCC-GUI) are for processing only RNA-seq datasets. IDEAMEX appears to be for RNA-seq data only and is severely limited in its downstream analysis capabilities. DEGenR appears to handle microarray datasets and features an option to retrieve data directly from GEO. However, it appears to be based on GEO2R (with additional downstream analyses) where it automatically logtransforms already log-transformed data and unlike GEO2R, you do not have the option to not apply a log-transformation. A refreshed literature search focusing on microarray datasets highlighted three additional tools. iGEAK which hasn’t been updated in three years and seems to have compatibility issues running on new Windows and Mac machines. sMAP, an upcoming Shiny app for microarray data published in bioRxiv on 29 May 2022. MAAP which has the same issue of log-transforming already log-transformed data. iDEP-READS does not list the datasets used in XTABLE. GEO2Enrichr appears to require the counts table and experimental design in one file, performs a “characteristic direction” DEG test and outputs enriched pathways. These apps require not just downloading of datasets but reformatting and renaming of expression data files and creation of additional files for setting up the DEG analysis which is not practical for the number of samples we have (122, 63, 33, 448) even if these apps handled microarray data. XTABLE also incorporates AUC metrics, which is appropriate given the number of samples in each dataset and tool known for adequately controlling FDR, which is not seen in other apps as well as emphasis on individual gene results and interrogation.

      A new paragraph on the discussion section (lines 361-370) of the discussion addresses the potential use of existing applications instead of XTABLE

      Secondly, XTABLE doesn't provide a solution to integrate the four datasets incorporated in the tool. One can only analyse one dataset at a time with XTABLE. The differences in terms of methodology and study design within these four datasets have been elaborated on in the article. However, attempts to integrate them were lacking.

      We repeatedly considered different strategies of integrating the analysis of the four datasets and we always reached the conclusion that it was hardly going to offer any advantage, or that it might be counterproductive.

      Integration can occur at multiple levels. One possibility is to carry out the same analysis (e.g. expression of a given gene in two groups of samples) in all datasets. Since the design and methodologies of the four studies differ substantially (different stages, different definitions of progression status, etc), a unique stratification for all datasets is not possible. Moreover, interrogating the four datasets simultaneously would slow the analysis, with no significant advantage in terms of speed. Another possibility is the integration of results in the same output. For instance, obtain a single chart with the expression of a given gene in multiple subgroups of the four datasets. We think that the results from each cohort should be kept separately and then compared with a similar analysis from other datasets due to differences in design. Scientifically, this is the best way to proceed as it avoids confusions.

      Nevertheless, XTABLE allows the export of data for further analysis. The user can use this option to integrate data using other applications or statistical packages.

      We do understand the attractiveness of integration between the four datasets is and we seriously considered it. But there is a fine balance between user-friendliness, flexibility, and scientific rigour. We think that XTABLE achieves this balance. Increasing integration of datasets might lead to error and wrong conclusions due to biological and methodological differences between studies. We believe that comparing analyses obtained independently from the four cohorts is the most sensible way to proceed.

      We propose to discuss these aspects accordingly.

      The integrative analysis of two or more datasets has been discussed in a new paragraph (382-391)

      The tool also lacks the flexibility for users to add more datasets. This would be helpful when there are more datasets of PMLs available publicly.

      This was also a permanent topic for discussion while designing XTABLE. Creating a tool that could be used to analyse other cohorts of precancerous lesions, while maintaining the ease of use was certainly a challenge. We had to adapt XTABLE to the characteristics of each one of the four databases: specific stratification criteria, different nomenclatures for the different sample types, etc. Designing a shiny app that can be adapted to other present or future datasets without the need of changing the code is simply not practical.

      The flexibility that these other Shiny apps incorporate to analyse any RNA-seq dataset requires the contrasts used for the differentially expressed gene analysis be manually defined. IDEP requires an experimental design file where sample names in the counts file must match exactly the sample names in this experimental design file and pre-processing visualisation is limited to the first 100 samples. RNFuzzyApp is similar but we could not format the experimental design file in a way that did not result in the app crashing upon upload. TCC-GUI requires all the sample names to be renamed to the contrast group with the addition of the replicate number. Apps that allow datasets to be uploaded do not have a practical or easy way to set up the DEG analysis of more than a couple dozen samples.

      Future versions of XTABLE can be updated to include additional curated PML datasets that would enhance hypothesis generation upon request. Importantly, the code is freely available and can be modified by other scientists to add their cohorts of interest, although we agree that a high level of expertise in coding will be needed. We propose to add these considerations to the text.

      The possibilities of expansion of XTABLE to new databases are discussed in lines 392-398

      Understanding the biology of PML progression would require a multi-omics approach. XTABLE analyses transcriptome data and lacks integration of other omics data. The authors mention the availability of data from whole exome, methylation, etc from the four studies they have selected. However, apart from the CIN scores, they haven't integrated any of the other layers of omics data available.

      Only one dataset (GSE108104) contains whole-exome sequencing and methylation data. We considered that a multi-omics approach in XTABLE would result in an overcomplicated application. As far as early detection and biomarker discovery is concerned, transcriptomic data is the most interesting parameter.

      Also discussed in lines 382-391

      Lastly, the authors could have elaborated on the limitations of the tool and their analysis in the discussion.

      We propose to raise these limitations accordingly in the discussion.

      See above.

      Reviewer #2 (Public Review):

      In this manuscript, Roberts et al. present XTABLE, a tool to integrate, visualise and extract new insights from published datasets in the field of preinvasive lung cancer lesions. This approach is critical and to be highly commended; whilst the Cancer Genome Atlas provided many insights into cancer biology it was the development of accessible visualisation tools such as cbioportal that democratised this knowledge and allowed researchers around the world to interrogate their genes and pathways of interest. XTABLE is trying to do this in the preinvasive space and should certainly be commended as such. We are also very impressed by the transparency of the approach; it is quite simple to download and run XTABLE from their Gitlab account, in which all data acquisition and analysis code can be easily interrogated.

      We would however strongly advocate deploying XTABLE to a web-accessible server so that researchers without experience in R and git can utilise it. We found it a little buggy running locally and cannot be sure whether this is due to my setup or the code itself. Some issues clearly need development; Progeny analysis brings up a warning "Not working for GSE109743 on the server and not sure why". GSEA analysis does not seem to work at all, raising an error "Length information for genome hg38 and gene ID ensGene is not available". In such relatively complex software, some such errors can be overlooked, as long as the authors have a clear process for responding to them, for example using Gitlab issue reporting. Some acknowledgement that this is an ongoing development would be helpful.

      We thank the reviewer for these comments. We will inspect the code to address those warnings, implement a system for issue reporting, and add the acknowledgements suggested by the reviewer. Regarding the deployment of XTABLE to a web-accessible server, this could present a challenge in the long term as computing resources need to be allocated for years and the economic cost involved.

      The code has been inspected to remove the warning and errors pointed out by the reviewer.

      The authors discuss some very important differences between the datasets in the text. Most notably they differ in endpoints and in the presence of laser capture. We would advocate including some warning text within the XTABLE application to explain these. For example, the "persistent/progressive" endpoint used in Beane et al (next biopsy is the same or higher grade) is not the same as the "progressive" endpoint in Teixeira et al (next biopsy is cancer); samples defined as "persistent/progressive" may never progress to cancer. This may not be immediately obvious to a user of XTABLE who wishes to compare progressive and regressive lesions. Similarly, the use of laser capture is important; the authors state that not using laser capture has the advantage of capturing microenvironment signals, but differentiating between intra-lesional and stromal signals is important, as shown in the Mascaux and Pennycuick papers. The authors cannot do much about the different study designs, but as the goal is to make these data more accessible We think some brief description of these issues within the app would help to prevent non-expert users from drawing incorrect conclusions.

      The authors themselves illustrate this clearly in their analysis of CIN signatures in progression potential. They observe that there is a much clearer progressive/regressive signal in GSE108124 compared to GSE114489 and GSE109743. This does not seem at all surprising, since the first study used a much stricter definition of progression - these samples are all about to become cancer whereas "progressive" samples in GSE109743 may never become cancer - and are much enriched for CIN signals due to laser capture. Their discussion states "CIN scores as a predictor of progression might be limited to microdissected samples and CIS lesions"; you cannot really claim this when "progression" in the two cohorts has such a different meaning. To their credit, the authors do explain these issues but they really should be clearly spelled out within the app.

      This is a very good point. We will add the warning text about the differences between studies regarding the definition of progression potential and the differences and sample processing (LCM or o not) so that the user is permanently aware of the differences between cohorts.

      A new tab (Dataset) has been added table with the methodologies used in each of each study, and the differences in progression status definitions. Additionally, we emphasized these differences in the main text of the manuscript (lines 296-300 and 403-409).

      We are not sure we agree with their analysis of CDK4/Cyclin-D1 and E2F expression in early lesions. The authors claim these are inhibited by CDKN2A and therefore are markers of CDKN2A loss of function. But these genes are markers of proliferation and can be driven by a range of proliferative processes. Histologically, low-grade metaplasias and dysplasias all represent proliferative epithelium when compared to normal control, but most never become cancer. It is too much of a leap to say that these are influenced by CDKN2A because that gene is inactivated in LUSC; do the authors have any evidence that this gene is altered at the genomic level in low-grade lesions?

      We are grateful for this comment. There is currently not evidence that CDKN2A mutations occur in low-grade lesions and therefore, we cannot argue that the of CDK4/Cyclin-D1 and E2F expression signature are the result of CDKN2A inactivation in low-grade lesions. We propose to modify the text to introduce these caveats to our conclusion an make our interpretations more accurate.

      We have modified the discussion (lines 443-454) to address the interpretation of our results regarding the connection between CDKN2A inactivation and the CDK4/cyclin-D1 and E2F signatures. We now focus our conclusions on the pathway itself and we mention Cyclin-D1 and CDKN2A alterations as a potential modulator of the changes in the pathway, but leaving the discussion open to other drivers.

      Overall this tool is an important step forwards in the field. Whilst we are a little unconvinced by some of their biological interpretations, and the tool itself has a few bugs, this effort to make complex data more accessible will be greatly enabling for researchers and so should be commended. In the future, we would like to see additional molecular data integrated into this app, for example, the whole genome and methylation data mentioned in line 153. However, we think this is an excellent start to combining these datasets.

    1. Author Response

      Reviewer #1 (Public Review):

      Determination of the biomechanical forces and downstream pathways that direct heart valve morphogenesis is an important area of research. In the current study, potential functions of localized Yap signaling in cardiac valve morphogenesis were examined. Extensive immunostainings were performed for Yap expression, but Yap activation status as indicated by nuclear versus cytoplasmic localization, Yap dephosphorylation, or expression of downstream target genes was not examined.

      We thank the reviewer for appreciating the significance of this work, and we also thank the reviewer for the constructive suggestions. Following these suggestions, we have improved analysis of YAP activation status and used nuclear versus cytoplasmic localization to quantify YAP activation. To address the reviewer’s concerns, we have conducted extra qPCR analysis of YAP downstream target genes and YAP upstream genes in Hippo pathway. Please find the detailed revisions in our responses to the Recommendations for authors.

      The goal of the work was to determine Yap activation status relative to different mechanical environments, but no biomechanical data on developing heart valves were provided in the study.

      We appreciate the reviewer for raising this concern. We have previously published the biomechanical data of developing chick embryonic heart valves in the following study:

      Buskohl PR, Gould RA, Butcher JT. Quantification of embryonic atrioventricular valve biomechanics during morphogenesis. Journal of Biomechanics. 2012;45(5):895-902.

      In that study, we used micropipette aspiration to measure the nonlinear biomechanics (strain energy) of chick embryonic heart valves at different developmental stages. Here in this study, we used the same method to measure the strain energy of YAP activated/inhibited cushion explants and compared it to the data from our previous study. Our findings were summarized in the Results: “YAP inhibition elevated valve stiffness”, and the detailed measurements, including images and data, are presented in Figure S4.

      There are several major weaknesses that diminish enthusiasm for the study.

      1) The Hippo/Yap pathway activation leads to dephosphorylation of Yap, nuclear localization, and induced expression of downstream target genes. However, there are no data included in the study on Yap nuclear/cytoplasmic ratios, phosphorylation status, or activation of other Hippo pathway mediators. Analysis of Yap expression alone is insufficient to determine activation status since it is widely expressed in multiple cells throughout the valves. The specificity for activated Yap signaling is not apparent from the immunostainings.

      We thank the reviewer for pointing out this weakness. We have now implemented nuclear versus cytoplasmic localization as recommended to quantify YAP activation. We have also conducted additional experiments to analyze via qPCR YAP downstream target genes and YAP upstream genes in Hippo pathway. Please see the detailed revisions in our responses to the Recommendations for authors.

      2) The specific regionalized biomechanical forces acting on different regions of the valves were not measured directly or clearly compared with Yap activation status. In some cases, it seems that Yap is not present in the nuclei of endothelial cells surrounding the valve leaflets that are subject to different flow forces (Fig 1B) and the main expression is in valve interstitial subpopulations. Thus the data presented do not support differential Yap activation in endothelial cells subject to different fluid forces. There is extensive discussion of different forces acting on the valve leaflets, but the relationship to Yap signaling is not entirely clear.

      We thank the reviewer for these important questions. The region-specific biomechanics have been well mapped and studied, thanks to the help from Computational Fluid Dynamics supported by ultrasound velocity and pressure measurements. For example:

      Yalcin, H.C., Shekhar, A., McQuinn, T.C. and Butcher, J.T. (2011), Hemodynamic patterning of the avian atrioventricular valve. Dev. Dyn., 240: 23-35.

      Bharadwaj KN, Spitz C, Shekhar A, Yalcin HC, Butcher JT. Computational fluid dynamics of developing avian outflow tract heart valves. Ann Biomed Eng. 2012 Oct;40(10):2212-27. doi: 10.1007/s10439-012-0574-8.

      Ayoub S, Ferrari G, Gorman RC, Gorman JH, Schoen FJ, Sacks MS. Heart Valve Biomechanics and Underlying Mechanobiology. Compr Physiol. 2016 Sep 15;6(4):1743-1780.

      Salman HE, Alser M, Shekhar A, Gould RA, Benslimane FM, Butcher JT, et al. Effect of left atrial ligation-driven altered inflow hemodynamics on embryonic heart development: clues for prenatal progression of hypoplastic left heart syndrome. Biomechanics and Modeling in Mechanobiology. 2021;20(2):733-50.

      Ho S, Chan WX, Yap CH. Fluid mechanics of the left atrial ligation chick embryonic model of hypoplastic left heart syndrome. Biomechanics and Modeling in Mechanobiology. 2021;20(4):1337-51.

      Those studies have shown that USS develops on the inflow surface of valves while OSS develops on the outflow surface of valves, CS develops in the tip region of valves while TS develops in the regions of elongation and compaction. Here in this study, we mimic those forces in our in-vitro and ex-vivo models. This allows us to study the direct effect of specific force on the YAP activity in different cell lineages. The results showed that OSS promoted YAP activation in VECs while USS inhibited it, CS promoted YAP activation in VICs while TS inhibited it. This result well explained the spatiotemporal distribution of YAP activation in Figure 1. For example, nuclear YAP was mostly found in VECs on the fibrosa side, where OSS develops, and YAP was not expressed in the nuclei in VECs of the atrialis/ventricularis side, where USS develops. It is also worth noting that formation of OSS on the outflow side is slower, and thus the side specific YAP activation in VECs was not in effect at the early stage, from E11.5 to E14.5.

      3) The requirement for Yap signaling in heart valve remodeling as described in the title was not demonstrated through manipulation of Yap activity.

      With respect, it is unclear what the reviewer is asking for given no experiments are suggested nor an elaboration of alternative interpretations of our results that emphasize against YAP requirement. It has been previously shown that YAP signaling is required for early EMT stages of valvulogenesis using conditional YAP deletion in mice:

      Zhang H, von Gise A, Liu Q, Hu T, Tian X, He L, et al. Yap1 Is Required for Endothelial to Mesenchymal Transition of the Atrioventricular Cushion. Journal of Biological Chemistry. 2014;289(27):18681-92.

      Signaling roles for early regulators at these later fetal stages are different, sometimes opposite early EndMT stages, thus contraindicating reliance on these early data to explain later events:

      Bassen D, Wang M, Pham D, Sun S, Rao R, Singh R, et al. Hydrostatic mechanical stress regulates growth and maturation of the atrioventricular valve. Development. 2021;148(13).

      However, embryos with YAP deletion failed to form endocardial cushions and could not survive long enough for the study of its roles in later cushion growth and remodeling into valve leaflets. In this work,

      We first showed the localization of YAP activity and its direct link with local shear or pressure domains. Then we explicitly applied controlled gain and loss of function of YAP via specific molecules. We also applied critical mechanical gain or loss of function studies to demonstrate YAP mechanoactivation necessity and sufficiency to achieve growth and remodeling.

      Reviewer #2 (Public Review)

      This study by Wang et al. examines changes in YAP expression in embryonic avian cultured explants in response to high and low shear stress, as well as tensile and compressive stress. The authors show that YAP expression is increased in response to low, oscillatory shear stress, as well as high compressive stress conditions. Inhibition of YAP signaling prevents compressive stress-induced increases in circularity, decreased pHH3 expression, and increases VE-cadherin expression. On the other hand, YAP gain of function prevents tensile stress-induced decreases in pHH3 expression and VE-cadherin expansion. It also decreases the strain energy density of embryonic avian cushion explants. Finally, using an avian model of left atrial ligation, the authors demonstrate that unloaded regions within the primitive valve structures are associated with increased YAP expression, compared to regions of restricted flow where YAP expression is low. Overall, this study sheds light on the biomechanical regulation of YAP expression in developing valves.

      We thank the reviewer for the accurate summary and their enthusiasm for this work.

      Strengths of the manuscript include:

      • Novel insights into the dynamic expression pattern of YAP in valve cell populations during post-EMT stages of embryonic valvulogenesis.

      • Identify the positive regulation of YAP expression in response to low, oscillatory shear stress, as well as high compressive stress conditions.

      • Identify a link between YAP signaling in regulating stress-induced cell proliferation and valve morphogenesis.

      • The inclusion of the atrial left atrial ligation model is innovative, and the data showing distinguishable YAP expression levels between restricted, and non-restricted flow regions is insightful.

      We thank the reviewer for appreciating the strengths of this work.

      This is a descriptive study that focuses on changes in YAP expression following exposure to diverse stress conditions in embryonic avian cushion explants. Overall, the study currently lacks mechanistic insights, and conclusions based on data are highly over-interpreted, particularly given that the majority of experimental protocols rely on one method of readout.

      We thank the reviewer for constructive suggestions.

      Reviewer #3 (Public Review)

      In this manuscript, Wang et al. assess the role of wall shear stress and hydrostatic pressure during valve morphogenesis at stages where the valve elongates and takes shape. The authors elegantly demonstrate that shear and pressure have different effects on cell proliferation by modulating YAP signaling. The authors use a combination of in vitro and in vivo approaches to show that YAP signaling is activated by hydrostatic pressure changes and inhibited by wall shear stress.

      We thank the reviewer for their enthusiasm for the impact of our work.

      There are a few elements that would require clarification:

      1) The impact of YAP on valve stiffness was unclear to me. How is YAP signaling affecting stiffness? is it through cell proliferation changes? I was unclear about the model put forward:

      • Is it cell proliferation (cell proliferation fluidity tissue while non-proliferating tissue is stiffer?)

      • Is it through differential gene expression?

      This needs clarification.

      We thank the reviewer for raising this important question. Cell proliferation can affect valve stiffness but is a minor factor compared with ECM deposition and cell contractility Our micropipette aspiration data showed that the higher cell proliferation rate induced by YAP activation did lead to stiffer valves when compared to the controls. This may be because at the early stages, cells are more elastic than the viscous ECM. However, the stiffness of YAP activated valves were only about half of that of YAP inhibited valves, showing that the transcriptional level factor plays a more important role. This also suggests that YAP inhibited valves exhibited a more mature phenotype. An analogous role of YAP has also been found in cardiomyocytes. Many theories propose that in cardiomyocytes when YAP is activated the proliferation programs are turned on, while when YAP is inhibited the proliferation programs are turned off and maturation programs are released. Similarly, here we hypothesize that YAP works like a mechanobiological switch, converting mechanical signaling into the decision between growth and maturation. We have revised the Discussion to include this hypothesis.

      2) The model proposes an early asymmetric growth of the cushion leading to different shear forces (oscillatory vs unidirectional shear stress). What triggers the initial asymmetry of the cushion shape? is YAP involved?

      Although the initial geometry of the cushion model is symmetric, the force acting on it is asymmetric. The detailed numerical simulation of how the initial forces trigger the asymmetric morphogenesis can be found in our previous publication:

      Buskohl PR, Jenkins JT, Butcher JT. Computational simulation of hemodynamic-driven growth and remodeling of embryonic atrioventricular valves. Biomechanics and Modeling in Mechanobiology. 2012;11(8):1205-17.

      The color maps represent the dilatation rates when a) only pressure is applied, b) only shear stress is applied, and c) both pressure and shear stress are applied. It is such load that initiates an asymmetric morphological change, as shown in d). In addition, we believe YAP is involved during the initiation because it is directly nuclear activated by CS and OSS or cytoplasmically activated by TS and LSS.

      3) The differential expression of YAP and its correlation to cell proliferation is a little hard to see in the data presented. Drawings highlighting the main areas would help the reader to visualise the results better.

      We thank the reviewer for this helpful suggestion, we have improved the visualization of Figure 3C and Figure 4C with insets of higher magnification.

      4) The origin of osmotic/hydrostatic pressure in vivo. While shear is clearly dependent upon blood flow, it is less clear that hydrostatic pressure is solely dependent upon blood flow. For example, it has been proposed that ECM accumulation such as hyaluronic acid could modify osmotic pressure (see for example Vignes et al.PMID: 35245444). Could the authors clarify the following questions:

      • How blood flow affects osmotic pressure in vivo?

      • Is ECM a factor that could affect osmotic pressure in this system?

      We thank the reviewer for sharing this interesting study. The osmotic pressure plays a critical role in mechanotransduction and the development of many tissues including cardiovascular tissues and cartilage. As proposed in the reference, osmotic pressure is an interstitial force generated by cardiac contractility. Here in our study, the hydrostatic pressure is different, which is an external force applied by flowing blood. According to Bernoulli's law, when an incompressible fluid flows around a solid, the static pressure it applies on the solid is equal to its total pressure minus its dynamic pressure.

      Despite the difference, the osmotic pressure can mimic the effect of hydrostatic pressure in-vitro. The in-vitro osmotic pressure model has been widely used in cartilage research, for example:

      P. J. Basser, R. Schneiderman, R. A. Bank, E. Wachtel, and A. Maroudas, “Mechanical properties of the collagen network in human articular cartilage as measured by osmotic stress technique.,” Arch. Biochem. Biophys., vol. 351, no. 2, pp. 207–19, 1998.

      D. a. Narmoneva, J. Y. Wang, and L. a. Setton, “Nonuniform swelling-induced residual strains in articular cartilage,” J. Biomech., vol. 32, no. 4, pp. 401–408, 1999.

      C. L. Jablonski, S. Ferguson, A. Pozzi, and A. L. Clark, “Integrin α1β1 participates in chondrocyte transduction of osmotic stress,” Biochem. Biophys. Res. Commun., vol. 445, no. 1, pp. 184–190, 2014.

      Z. I. Johnson, I. M. Shapiro, and M. V. Risbud, “Extracellular osmolarity regulates matrix homeostasis in the intervertebral disc and articular cartilage: Evolving role of TonEBP,” Matrix Biol., vol. 40, pp. 10–16, 2014.

      When maturing cushions shift from GAGs dominated ECM to collagen dominated ECM, the water and ion retention capacity of the tissue would be greatly changed, and thus reducing the osmotic pressure. This could in turn accelerate the maturation of cushions. By contrast, the ECM of growing cushions remain GAGs dominated, which would delay maturation and prolong the growth.

      The revised second section of Results is as follows:

      Shear and hydrostatic stress regulate YAP activity

      In addition to the co-effector of the Hippo pathway, YAP is also a key mediator in mechanotransduction. Indeed, the spatiotemporal activation of YAP correlated with the changes in the mechanical environment. During valve remodeling, unidirectional shear stress (USS) develops on the inflow surface of valves, where YAP is rarely expressed in the nuclei of VECs (Figure 2A). On the other side, OSS develops on the outflow surface, where VECs with nuclear YAP localized. The YAP activation in VICs also correlated with hydrostatic pressure. The pressure generated compressive stress (CS) in the tips of valves, where VICs with nuclear YAP localized (Figure 2B). Whereas tensile stress (TS) was created in the elongated regions, where YAP was absent in VIC nuclei.

      To study the effect of shear stress on the YAP activity in VECs, we applied USS and OSS directly onto a monolayer of freshly isolated VECs. The VEC was obtained from AV cushions of chick embryonic hearts at HH25. The cushions were placed on collagen gels with endocardium adherent to the collagen and incubated to enable the VECs to migrate onto the gel. We then removed the cushions and immediately applied the shear flow to the monolayer for 24 hours. The low stress OSS (2 dyn/cm2) promoted YAP nuclear translocation in VEC (Figure 2C, E), while high stress USS (20 dyn/cm2) restrained YAP in cytoplasm.

      To study the effect of hydrostatic stress on the YAP activation in VICs, we used media with different osmolarities to mimic the CS and TS. CS was induced by hypertonic condition while TS was created by hypotonic condition, and the Unloaded (U) condition refers to the osmotically balanced media. Notably, in-vivo hydrostatic pressure is generated by flowing blood, while in-vivo osmotic pressure is generated by cardiac contractility and plays a critical role in the mechanotransduction during valve development (30). Despite the different in-vivo origination, the osmotic pressure provides a reliable model to mimic the hydrostatic pressure in-vitro (31). We cultured HH34 AV cushion explants under different loading conditions for 24 hours and found that the trapezoidal cushions adopted a spherical shape (Figure 2D). TS loaded cushions significantly compacted, and the YAP activation in VICs of TS loaded cushions was significantly lower than that in CS loaded VICs (Figure 2F).

    1. Author Response

      Reviewer #2 (Public Review):

      The idea of using fluorescently labeled tandem SH2 domains to target tagged RTKs is brilliant and could potentially provide a powerful new way to assess the activation of RTKs in situ and in multiple physiological contexts. Thus, it was disappointing that there was insufficient characterization of the system to be able to interpret the data it generates. Although the paper shows that tagging the EGFR appears to have minimal impact on its biological activity, the readout for receptor kinase activity is % clearance of the fluorescent reporter tag from the cytosol. Such clearance is likely to depend on a variety of different factors, including the ratio of tagged receptors to probe, the number of functional pools in which the probe exists, the exchange rate between these pools, and the affinity of the probes for the tagged receptor. Without determining how each of these factors impacts % clearance, it is difficult to interpret either the dose-response curves or response kinetics.

      We appreciate the reviewer’s point that the paper would be improved by a thorough analysis of how membrane translocation depends on our biosensor’s expression levels. We have attempted to address this thoroughly in our response to the Editor’s summary comments above. Briefly, we have now added 3 new supplementary figures (Figures S2-S4) in which we quantify ZtSH2 translocation as a function of expression levels. We find that the ratio of EGFR/ZtSH2 expression predicts the extent of ZtSH2 translocation in both NIH3T3 and HEK293T cells, matching results from our computational model. We have also added a new section to the main text to clearly explain these results (Lines 190-235). We hope that these data clarify the design constraints for two-component biosensors of this type.

      For example, the difference in activation kinetics between EGFR and ErbB2 is very interesting, but the almost instantaneous rise (Fig S4B) is very surprising. The kinetics of activation of the EGFR have been extensively studied by mass-spectrometry and are generally limited by ligand binding, which has a characteristic time of several minutes, not seconds (pmid: 26929352; pmid: 1975591). Thus, such a response is suggestive of a freely exchanging ZtSH2 reporter pool that is mostly depleted in seconds with the slow secondary kinetics reflecting a slowly exchanging ZtSH2 reporter pool. Alternately, the cells could be accumulating an intracellular pool of activated receptors over time. That the authors are using concentrations of EGF >100-fold physiological levels (pmid: 29268862) further complicates the interpretation of these experiments.

      We thank the reviewer for bringing these papers to our attention. However, we strongly disagree with their interpretation of the results. In a paper cited by the reviewer (PMID:26929352), phosphotyrosine responses are extremely fast, with phosphorylation occurring within tens of seconds even in response to 20 nM EGF (see Figure 2 from Reddy et al PNAS 2016). Reddy et al further claim in their abstract “Significant changes were observed on proteins far downstream in the network as early as 10 s after stimulation.” While the timescale of EGFR phosphorylation may be of some debate, the response timescale we observe is consistent with previously published observations.

      It is also important to point out that the secondary gradual rise of ZtSH2 recruitment is only observed upon treatment with EGF, not EREG or EPGN (Figure 3A). The gradual rise can also be observed upon treatment with EREG in the presence of a GBM-associated EGFR mutation that alters receptor dimerization (Figure 3E). These data indicate that the secondary rise is not an intrinsic feature of the ZtSH2 reporter, and instead represents a feature of ligand-receptor activation itself.

      The reviewer suggests that perhaps there is some internal pool of ZtSH2 or EGF, but we find no evidence for such a pool in our microscopy imaging. To clarify this point to the reader, we have now added a new supplementary figure (Figure S6) showing representative cells for all stimulation conditions used in Figure 3A, showing consistent, high levels of EGFR and ZtSH2 enrichment at the plasma membrane and uniform cytosolic intensity for at least 30 min after stimulation across all ligands.

      Finally, while the reviewer mentions the use of high EGF doses in our paper, we would like to point out that we performed extensive experiments at other doses in the manuscript, testing 14 total doses of three EGFR ligands in Figure 3, and present additional data at 20 ng/mL EGF throughout Figures 2, S2, and S7. It is also very important to test high input doses for our negative controls to ensure that the ZtSH2 biosensor retains specificity for ITAM sequences and fails to show recruitment to untagged EGFR even under saturating conditions. It is also quite customary in the field: for example, the Erk KTR paper that the reviewer mentions in a later comment (Regot et al, Cell 2014) exclusively tests their biosensors using saturating doses of 50 ng/mL anisomycin, 100 ng/mL FGF, and 10 μM forskolin to characterize p38, Erk and PKA biosensor responses.

      There is also insufficient attention paid to either controlling or measuring important parameters, such as expression levels of tagged receptors or levels of endogenous receptors. 3T3 cells, contrary to the statement of the authors, do not have "negligible" numbers of EGFR: they have ~40K, which is typical for mouse fibroblasts. This is much higher than MCF7 cells, which are frequently used as a model system to study EGFR responses. Yet they do not see transactivation of their ErbB2 construct in 3T3 cells without expressing additional EGFR (Fig. 4C), suggesting low sensitivity of the assay. Conversely, they show a significant response mediated by endogenously tagged EGFR in HEK 293 cells, which are frequently used as an EGFR-negative cell line (PMID: 26368334). This indicates that their assay is extremely sensitive. Which is it? As mentioned above, it likely depends on the expression level and affinity of the different components of their system.

      After extensive searching we have not found any publications with an estimate as high as 40K EGFR receptors/cell in NIH3T3 cells. Livneh et al 1986 report that NIH3T3 cells express as little as 500 EGFR receptors per cell and do not respond mitogenically to EGF, and subsequent Schlessinger lab papers use NIH3T3 cells as an EGFR-null background for introduction of receptor variants. Eierhoff et al PLOS Pathogens 2010 use NIH3T3s as an EGFR-null control, showing immunoblot data of undetectable pEGFR responses. The paper we found with the highest stated EGFR expression per cell in NIH3T3 cells is Verbeek et al, FEBS Lett 1998, which reports a value of 3,000 receptors per cell, but does so without any literature citation or measurement. These references are consistent with our experience: over nearly a decade of MAPK signaling experiments in the lab, we have only seen weak or undetectable EGF-stimulated responses in unmodified NIH3T3s, depending on the assay. We are quite confident that more potent responses are elicited in HEK293T cells, where we observe EGFR expression by fluorescence imaging of CRISPR-tagged cells, immunofluorescence staining, and immunoblotting, and where we observe robust signaling responses using biosensors. We also now cite some of these references to support our claim (Line 144).

      The reviewer makes an excellent point in the last sentence of their comment: indeed, it is essential to match the expression level of our SH2-based biosensor to the expression level of EGFR in any system in order to observe potent membrane translocation! This was imperative for visualizing any translocation in our CRISPR-tagged HEK293Ts: we had to switch to an exceptionally bright fluorophore and select cells with very low ZtSH2 expression to observe translocation. The ZtSH2/EGFR ratio is a crucial design parameter, which we now present extensive data and modeling to support (Figure S2-S4; Lines 190-235). Our data suggests that quite sensitive biosensor responses are possible with appropriate balance between ZtSH2 and EGFR expression levels (Figure 6) and, in general, biosensor responses can be matched to a dynamic range of interest by scaling ZtSH2 expression with EGFR levels.

      A great advantage of using the EGFR system as a test case for the new system is that thousands of investigations have been performed over the last four decades. This provides a strong foundation for determining whether the new technology is working correctly. For example, the dynamics of EGFR activation and trafficking at the single cell level have been documented in many studies, which show a remarkable consistency (e.g. see pmid: 24259669; pmid: 11408594; pmid: 25650738). Unfortunately, instead of using differences between the new results and previously reported data as a basis for refining their technique, the authors attempt to apply their raw data to address complex questions of EGFR dynamics, with less than satisfactory results.

      For example, they attempt to use their technique to understand the basis of different signaling dynamics between EGFR ligands. Rather than being a relatively recent observation, differences in EGFR ligand signaling have been explored for over 30 years (pmcid: PMC361851), and are generally ascribed to differences in trafficking (pmid: 7876195). Based on these observations and resulting mathematical models, novel EGFR ligands have been designed with enhanced potency (pmid: 8195228 , pmid: 9634854 ). All this work was done over 20 years ago. Since then, new natural ligands for the EGFR have been discovered from sequence analysis and differences in their potency have similarly been ascribed to differences in their intracellular trafficking patterns (pmid: 19531065 - cited by the authors). An alternate hypothesis was proposed more recently by Freed et al (2017) as described by the authors, but that is what it is: an alternative hypothesis.

      We thank the reviewer for pointing out many excellent, classic studies on EGFR endocytosis and trafficking. We agree that this is a well-established field and that EGFR is certainly internalized, recycled, and degraded in a manner that depends on ligand affinity on the cell surface and in endosomes. These seminal studies lead the reviewer to propose an alternative hypothesis to explain our kinetic data in Figure 3: that differences in trafficking and maintenance of EGFR levels at the plasma membrane are the source of the altered kinetics between high- and low-affinity ligands. To address this question, we have now included new supplementary data examining endocytosis and trafficking in multiple contexts.

      First, we examine membrane EGFR levels in 3T3 cells overexpressing our EGFR-pYtag system (or ITAM-less EGFR as a control) after EGF stimulation (Figure S5A-C). We find that EGFR membrane intensity is virtually unchanged after 60 min of saturating EGF stimulation, a response that does not depend on whether ITAMs are appended to the receptor. We also now include still images of cells at every concentration examined in our dose-response experiments for all 3 ligands (Figure S6), which do not show clear differences in the subcellular distribution of EGFR before and after stimulation as a function of ligand identity. We also remind the reviewer that our interpretation is not simply an untested hypothesis – we experimentally tested a GBM-associated EGFR variant whose effect on receptor dimerization has been quantified, and observe EGF-like response kinetics even after EREG stimulation, a result predicted by our model (Figure 3D-E).

      We believe that the sustained membrane-localized signaling we observe might be ascribed to two factors: our choice of cell line and its expression level of EGFR. This conjecture is supported by some data: in contrast to our EGFR-overexpressing NIH3T3 cells, HEK293Ts harboring endogenous or low EGFR levels exhibit a dramatic redistribution of EGFR after EGF stimulation (Figure S3, Figure 6). This is clearly a context where transient versus sustained signaling might depend on the choice of ligand and its consequences on internalization.

      We also note that our data identify ligand-specific signaling differences that are distinct from prior studies, which focused on transient vs sustained signaling downstream of different EGFR ligands. In contrast, we identify a biphasic increase in EGFR activity after stimulation with EGF versus a rapid approach to steady state after stimulation with EREG or EPGN, despite the continued presence of high levels of membrane-localized EGFR in each case.

      Unfortunately, the model that the authors use to test this hypothesis does not even include endocytosis or receptor trafficking but instead uses variable "scaling" factors to see if the data can fit the dimerization hypothesis. In the supplement, they state that "Since our simulations were run on relatively short time scales (~30 min post-stimulation), we did not consider trafficking and degradation of receptors." However, the half-life of EGFR internalization is generally ~3-4min (pmid: 1975591) and degradation ~1hr, so most of the signal shown in Figure 3 is likely to come from internalized rather than surface-associated ligand-EGFR complexes. A further complication is that internalization rates are strongly influenced by receptor expression levels (pmid: 3262110), which are not controlled for here. Thus, the omission of trafficking in their model is not appropriate. This does not mean that the authors are wrong, it simply means that without validation or calibration, their new technology is not ready to resolve current problems in the field.

      We thank the reviewer for pointing out ways to improve our modeling (endocytosis) and discussion of its parameterization (scaling factors). We address both points below:

      Scaling factors: We thank the reviewer for their comments & agree that our discussion of model parameterization was lacking. To clarify: our base-case model for EGF includes 9 parameters, 6 of which are obtained from literature and 3 which reflect lumped kinetic processes of EGFR dimerization and activation and which we set to match our data. We then used experimentally-determined values to change the base-case model to simulate low-affinity ligand stimulation: a fold-change in ligand affinity and a fold-change in receptor dimerization. This is why we simulate EREG with β=50 and γ=100, reflecting the 10-to-100-fold differences in binding affinity and receptor dimerization that have been experimentally measured for this low-affinity ligand. Similar experimentally defined values constrain β and γ in the case of GBM-associated mutations. A more thorough explanation of our model and these scaling parameters is now included in Lines 334-362.

      Endocytosis: We wholeheartedly agree that our model is quite simplified, and a thorough treatment of endocytosis and trafficking would be essential for capturing nuances associated with these steps of the cascade. However, while we appreciate the 3-4 min rule of thumb for EGFR internalization that the reviewer mentions, it is simply not reflective of the membrane-associated EGFR levels we observe in our cells. Examples can be observed in Figure 1C, Figure 2A, Figure 5F, Figure S1B, Figure S2A-B, Figure S5A, and Figure S6, as well as in the quantification of membrane associated EGFR at 0 and 60 min in Figure S5B. It is quite likely that endocytosis and trafficking are operating throughout this time course, but are balanced to maintain similarly high level of EGFR at the cell surface. We wholeheartedly agree with the reviewer’s helpful note that EGFR expression levels heavily influence internalization, which our data also support, and may explain our results. For example, we also see rapid EGFR membrane clearance in HEK293T CRISPR cells (Figure 6) and in HEK293Ts that express low levels of EGFR but not high levels of EGFR (Figure S3A).

      In sum, we argue that our inclusion of additional data showing sustained EGFR protein levels and ZtSH2 recruitment at the plasma membrane should help justify our assumption of membrane-associated signaling in our model. However, we happily concede that this is a highly simplified model, and that endocytosis is a very important process that should be accounted for in future studies (e.g., Line 344-346: “However, we expect that internalization and trafficking can play a crucial role in EGFR dynamics in many contexts, which would need to be included in future models to adequately assess those scenarios”).

    1. Author Response

      Reviewer #3 (Public Review):

      Over the past decade, Cryo-EM analysis of assembling ribosomes has mapped the major intermediates of the pathway. Our understanding of the mechanisms by which ATPases drive the transitions between states has been slower to develop because of the transient nature of these events. Here, the authors use cryo-EM and biochemical and molecular genetic approaches to examine the function of the DEAD-box ATPase Spb4 and the AAA-ATPase Rea1 in RNP remodeling. Spb4 works on the pre-60S in an early nucleolar state. The authors find that Spb4 acts to remodel the three-way junction of H62/H63/H63a at the base of expansion segment ES27. Interestingly, Spb4 appears to interact stably with a folding intermediate in the ADP rather than ATP-bound form. This work represents one of the few cases in which an RNA helicase of ribosome biogenesis has been captured and engaged with its substrate. The authors then show that the addition of the AAA-ATPase Rea1 to Spb4-purified particles results in the release of Ytm1, a known target of Rea1. However, they did not observe an efficient release of Ytm1 when particles were affinity purified via Ytm1, suggesting that the recruitment of Spb4 is important for this step. Cryo-EM analysis of Spb4-particles treated with Rea1 revealed the previously characterized state NE particles but no additional intermediates. Consequently, this analysis of Rea1 is less informative about its function than is their work on Spb4 helicase activity. In general, the data support the authors' conclusions and the data are well presented.

      Major points

      1) The Erzberger group has recently published work regarding the function of Spb4. They similarly found that Spb4 is necessary for remodeling the 3-way junction at the base of ES27. Although it was posted to Biorxiv in Feb 2022, it was not formally published until Dec 2022. The authors should cite this work and include a brief discussion comparing conclusions.

      We are now citing this study in the introduction and discussion and are briefly comparing the conclusions.

      2) L311. The heading "Coupled pre-60S dissociation of the Ytm1-Erb1 complex and RNA helicase Has1" should be changed. Coupling implies a mechanistic interplay. Although the release of Ytm1 and Has1 both depend on Rea1, the data do not support the conclusion of mechanistic coupling. In fact, the authors write in lines 328-329 "Thus, the Rea1-dependent pre-60S release of the Ytm1-Erb1 complex occurs before and independently of Has1..." Independently cannot also imply coupling.

      We have changed the heading into “Ytm1–Erb1 release promotes the dissociation of the RNA helicase Has1”.

      3) L339-342 Combining data sets for uniform processing was a great idea! This approach should be used more often in cryo-EM analyses of in vitro maturation reactions.

      We agree with the reviewer that this approach is appropriate to analyse such reactions.

      4) L428 The authors need to amend their comment that this is the first structure of Spb4-bound to the substrate as this has recently been published by the Erzberger group and was first posted as a preprint in early 2022.

      We have removed the statement regarding the first structure of Spb4 and added a citation of the study published by Cruz et al.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript builds on data from the same group showing that Lphn2 functions cell-autonomously as a receptor in CA1 pyramidal axons and cell-non-autonomously as a ligand in the neurons of the subiculum. In either case, binding of teneurin-3 to Lphn2 mediates repulsive events, and since different populations of neurons within each region express differing levels of both proteins, this mechanism allows proximal CA1 pyramidal axons to preferentially project to distal subiculum and distal CA1 pyramidal axons to project to proximal subiculum. The authors now ask mechanistic questions about the role of Lphn2 signaling in these wiring processes.

      The authors demonstrate that G-protein signaling downstream of Lphn2, which is mediated by the tethered agonist, is necessary for the ability of ectopically expressed Lphn2 to redirect proximal CA1 axons from distal to proximal subiculum. Moreover, the authors show that while autoproteolytic activity of Lphn2 facilitates G-protein signaling, possibly by making the tethered agonist more available for signaling, it is not necessary for axonal mistargeting. Thus, the authors conclude that tethered agonistdependent G-protein signaling is required for Lphn2-mediated hippocampal neural circuit assembly. Most of the data shown in support of these conclusions are convincing, though I have some concerns about the expression levels and/or effects of the tethered agonist mutants in CA1, which is important since the analyses assume that any defects are in the repulsive interactions described above.

      We thank Reviewer 1 for their suggestion to incorporate data on the expression levels of the tethered agonist mutants in CA1. We have now performed additional experiments and included a new Figure 1—figure supplement 2A-B to address this concern.

      The authors also use heterologous cells to determine that Lphn2 couples to Ga12/13, but not other heteromeric G-proteina-subunits. Within the context of heterologous cells, these experiments are well controlled and exhaustive, as every mutant used in vivo is carefully analyzed. One potential criticism of this work, however, is that perhaps the authors assume too much in simply translating their results in heterologous cells to neurons, especially when one of the most interesting conclusions of this paper (see below) is that Lphn2 signaling is context-dependent. Without further data to confirm the results of these experiments in the neuronal populations studied, these data primarily illustrate possibilities, but don't exclude other possibilities.

      We are grateful to Reviewer 1 for bringing this potential criticism to our attention. We have now included clarification of this point in the text and discussion of the manuscript, as noted in our response to Essential Revision #3 above.

      Finally, the authors test the role of Lphn2 functioning as a ligand in the subiculum by driving its expression in the normally Lphn2-low dorsal subiculum. As they reported before, this alteration decreases the ability of proximal CA1 axons to project to this area. Interestingly, and in contrast to the role of Lphn2 as a receptor above, neither Lphn2 autoproteolysis nor tethered agonist function are required for this effect. This finding is very interesting and will merit follow-up, though I agree with the authors that this manuscript does not require this for publication.

      In summary, this is an interesting paper that addresses timely and pressing issues in the adhesion-GPCR field.

      Reviewer #2 (Public Review):

      This is an intriguing study investigating the molecular mechanisms of the adhesion G-protein coupled receptor latrophilin-2 control of neural circuit developmental organization. Using the model CA1 to subiculum hippocampal circuit with its spatially segregated axon targeting, the authors experiments find that ectopic Lphn2 expression in CA1 neurons that normally do not express it, leads to axon mistargeting. The authors detail these circuitry alterations with Lphn2 genetic manipulations, finding that axon targeting is dependent on its GPCR signaling, likely through Galpha12/13 coupling.

      Strengths: Building off the author's previous studies, the experiments are well designed and analyzed. The advance in this study is finding that Lphn2 expression in CA1 cells that normally do not express impacts its axon targeting. They go on to show compelling data that implicates this mistargeting is dependent on Lphn2 GPCR signaling properties, identified as likely Galpha12/13 dependent.

      Weaknesses: The system used is a "misexpression system". By forcing cells with ordinally low levels to overexpress Lphn2, circuitry alterations are observed. While this gain of function assay demonstrates the importance as to why Lphn2 is not expressed in certain cell types, it isn't a physiologically relevant system to investigate Lphn2 dependent circuit development.

      We thank Reviewer 2 for the appreciation of our study. We wish to clarify, in response to the critiques of the artificial nature of misexpression system, that experiments involving loss-of-function of endogenous Lphn2 have been described in our previous study (Pederick et al., 2021). When we conditionally deleted Lphn2 in CA1, Lphn2+ mid-CA1 axons spread to distal, Ten3+ subiculum. Thus, both the gain-of-function experiment described in this study and the loss-of-function experiment described in Pederick et al., 2021 support the notion that Lphn2 acts in axons as a repulsive receptor for the Ten3 ligand.

      To strengthen this study, the following specific points could use addressing:

      1) While the data is strong, some of the terminology used is unclear, including use of terms "repulsive receptor" and "repulsive ligand". The authors use "repulsive receptor" to describe Lphn2 action for axon targeting, but repulsion and attraction processes are simultaneous. Is Lphn2 really by acting as a repulsive receptor, or alternatively, by acting to shift axon attraction to Lphn2 expressing subiculum neurons?

      We apologize for the lack of clarity. The terms “receptor” and “ligand” are used to refer to a molecule’s role in axons or target neurons, respectively, a common usage in the axon guidance field (Kolodkin and Tessier-Lavigne, 2011; PMID 21123392). Using a series of loss and gain of function manipulations, our previous data support a role for Lphn2 both as a repulsive receptor in axons and repulsive ligand in target neurons. When Lphn2 is deleted in CA1 axons they invade Ten3 subiculum target neurons. Similarly, deletion of Ten3 in the subiculum results in Lphn2-positive axons invading the Ten3 KO area. Unlike its partner Ten3, which can serve as an attractive receptor when the ligand is Ten3 and repulsive receptor when the ligand is Lphn2, Lphn2 only serves as a repulsive receptor to the Ten3 ligand. We (and others) have shown that Lphn2 does not bind homotypically (Boucard et al., 2014 and Pederick et al., 2021). We have clarified these points in the revised manuscript (2nd paragraph of Introduction).

      2) For their proposed axon guidance model to work, Lphn2 has to be signaling through Ga12/13 proteins near the axon growth cone to induce its collapse and retraction. By using Flag-tagged Lphn2 constructs in their assays, this should be visible. Clear Flag-Lphn2 signal is observed in the dendrites of infected cells (Figure1-figure supplement 1; Figure5- figure supplement 1). But does Flag-Lphn2 also localize to the pCA1 axons that are projecting to the subiculum?

      Thank you for this important question. We have added new data to show that FLAG-tagged Lphn2 is indeed found in CA1 axons. Please see our response in “Essential Revision #2” above.

      3) With their previous work, pCA1 to dSub circuit patterning is dependent on Ten3+ to Ten3+ homophilic attraction that exists between the two regions. Its unclear how ectopic Lphn2 is able to override this Ten3+ to Ten3+ connection patterning. Does ectopic Lphn2 outcompete Ten3 function in these neurons? Or alternatively, is Ten3 expression/localization impacted by the presence of ectopic Lphn2?

      We believe it is the former. Regarding the latter, please see our response in “Essential Revision #1” above.

    1. Author Response

      Reviewer #1 (Public Review):

      Idiosyncratic drug-induced liver injury is a disease that appears to be linked to mitochondrial DNA (mtDNA), but there is a lack of model cell lines for the study of this link. To help address this problem, the authors developed ten cybrid HepG2 cell lines that have had their mitochondrial DNA replaced with the mitochondrial DNA of ten human donors. Analysis of single nucleotide polymorphisms in all of the patients' mtDNA allowed the authors to assign the donors to two haplogroups (H and J) with five patients each. The authors also present the results of several assays (e.g. oxygen consumption, ATP production) performed on all ten cell lines in the absence and presence of five clinically-relevant drugs (or drug metabolites). Significant attention was paid to differences observed between the cell lines in the H and J haplogroups. The work is methodologically and scientifically rigorous, ethically conducted, and objectively presented according to the appropriate community standards.

      While I feel that the manuscript will be useful to the research field and is an important step towards improving patient outcomes, I feel that the work lacks a broad interest. Much of the paper is spent discussing small and/or statistically insignificant differences between haplogroups H and J. While some interesting interpretations and suggestions are presented in the discussion, the authors didn't perform follow-up experiments to try to nail down any particular mechanistic insights that would be useful to the broader community. I also didn't feel a strong sense that the paper produced any specific suggestions for how clinical outcomes could be improved. Accordingly, any clear insights that would be interesting to a broad scientific community would probably require follow-up studies.

      Again, we strongly believe that the subject is of broad interest to researchers in both academia and the pharmaceutical industry. Evidence of the level of interest in this subject can be quantified by the access metrics of the 3 publications we have recently published on this topic (Biochem Soc Trans, 2020, PMID: 32453388; Arch Toxicol, 2021, PMID: 33585966; Front Genetics, 2021, PMID: 34484295), which have been accessed >6000 times.

      The structure of the paper is also not friendly to a broad audience; the results are presented without interspersed commentary that could help the reader understand the meaning or utility of the results as they are being presented. Accordingly, I often felt unsure about how the results being presented were relevant to solving the broader problem established nicely in the introduction.

      We thank the reviewer for this comment and have revised the manuscript to now contain a combined results and discussion section.

      Finally, it wasn't clear that the generated cell lines were made available for anyone to purchase through a cell bank (perhaps the authors did do this, but I don't recall seeing a mention of it). As these cell lines appear to be the primary output of this work, it seems important to better highlight the extent to which they are being made accessible to the scientific community.

      The cells are currently in the process of being deposited under licence with XimBio. This will allow other researchers to easily access them. They are also available upon request from me. This has been conveyed in the revised manuscript (pg 18, lines 1-2).

      Reviewer #2 (Public Review):

      In this work, Ball et al. investigated the possibility to generate a novel set of HepG2 liver cell lines to generate "mitochondrial DNA-personalized" models as novel tools to study idiosyncratic drug-induced liver injury related to mitochondrial variation. This work represents the generation of a comprehensive collection of n=10 HepG2 lines, half reflecting haplogroup H and half reflecting haplogroup J. The authors then assessed their impact on basic mitochondrial function in liver cells. Interestingly, they find a greater respiratory complex activity driven by complex I and II of the haplogroup J lines relative to haplogroup H. Finally, the authors make an attempt at using this novel set of lines to probe the consequential effects of mitochondrial genotype on drug-induced liver toxicity. This work provides an interesting proof-of-concept study and is a starting point towards studying and predicting idiosyncratic drug-induced liver injury in a personalized manner. This technique may be broadly extrapolated to other commonly used liver cell models within the toxicology field.

      Strengths:

      1) This work presents an exciting initiative to study interindividual variability in idiosyncratic drug-induced liver injury focusing on mitochondrial haplotypes. In further follow-ups, this work could be extended to also represent other different haplogroups to establish a thorough "biobank". The established lines allow for future in-depth characterization and testing of many putative hepatotoxic compounds through a variety of toxicity measures that could shed further light on the impact of mitochondrial DNA variation on (idiosyncratic) drug-induced liver injury.

      2) This technique may be broadly extrapolated to other commonly used liver cell lines within the toxicology field (e.g. HepaRG cells or iPSC-derived cells) that are potentially also more metabolically competent. A short discussion on this could be added to the current manuscript.

      We thank the reviewers for this comment, which we agree with. We have now incorporated this into the conclusion (pg 18, lines 23 - 27).

      Weaknesses:

      1) The major weakness of the current manuscript is the rather large variation across sample measurements regarding the proof-of-concept experiments to study drug effects (fig. 3-6). This makes much of the data rather hard to interpret and to infer conclusions. As an example, proton leak (fig. 3f/4f) seems to 2-fold increase in the J group even under basal conditions (0 uM flutamide/metabolite), while this is not observed in fig. 2a and this effect seems to be also absent under 0 uM tolcapone (fig. 5f). Unfortunately, the current data do not allow us to draw confident conclusions about whether the tested drugs have effects on the mitochondrial respiration of the different haplogroups. This may well be linked to the methods used for measuring mitochondrial activity, but since this is the predominant method needed in the current paper, either increasing the number of experiments (across more lines) or identifying a more rigorous methodological manner to obtain consistencies of experiments would help the authors to make more confident claims about their data.

      The reviewers have noted the inherent variability in the respiratory measurements from plate to plate. To counter this, experiments were designed so that for each cybrid cell line the control and treated cells were always positioned on the same plate. However, we believe that the reporting of such data, and their limitations, is a fundamental aspect of unbiased science reporting feeding into the principles of data reproducibility. In this resubmission, we have updated the methodology of our data analysis, which better accounts for this variability. The new figures plot each cybrid as a distinct point to easily visualise the variation across haplogroups dependent upon each cybrid within the group. We have included this limitation in the conclusion (pg 18, lines 15 – 19).

      2) The data on the effects of inhibition of complex I/II activity are not sufficiently convincing to support the claim that haplogroup J is more susceptible to flutamide/metabolite (fig. 6). Both seem to respond rather identical to flutamide or its metabolite, i.e. at higher concentrations complex I/II activity decreases, but with the sole difference that the haplogroups represent different basal activity (not influenced by the drug). Estimating fold changes, for example, for both haplogroups, complex I and II activity decreases ca. 2-fold at the highest concentration of the metabolite (fig. 6c-d), therefore concluding that there is no difference between haplogroup susceptibility unlike the authors claim. It is furthermore unclear what the statistical significance currently represents: it should represent whether at different/increasing concentrations the activity of the complexes significantly differs vs. the previous/basal conditions from the same haplogroup. If it represents (which it seems to be) the significance of the haplogroup J vs. the haplogroup H, it is non-informative as it is obvious that haplogroup J presents with a higher baseline.

      Thank you for this comment, we agree with the shortcomings of statistical analysis in fig 6 and have reanalysed the dataset using a more appropriate statistical methodology, see response 2.2.

      3) It would help to mention how many lines per haplogroup H/J were used in the analyses across all figures. This should be clarified, as the error bars for most experiments are rather high and therefore statistical significance is lacking, making data interpretation complex. It could be helpful if the authors present at least for some analyses single plots of data obtained across different lines from the same haplogroup to evaluate the consistency of the effects of the genotypes as supplementary figures. If only 1-2 lines were used per group, it would help to perform additional experiments to assess consistencies across groups.

      We apologise that the number of lines per haplogroup that were employed in the analyses is unclear. In every case, we included 5 cybrid lines per haplogroup. We have further clarified this point in the methods and results. Furthermore, in the new figures, each cybrid is now represented as a single data point.

    1. Author Response

      Reviewer #2 (Public Review):

      1) A major point of the manuscript is the description of Hrc+ fibroblasts (Fibroblast 3) as profibrogenic in diabetes. However, fibroblast 3 expresses several cardiomyocyte markers Nppa, Ryr2, Ttn alongside Hrc which is described to play a role in Ca2+ handling at the sarcoplasmic reticulum in cardiomyocytes (Fig. 4C) and shows a low correlation with other fibroblast clusters (Fig. 4B). A possible explanation is technical, e.g. if two nuclei (one fibroblast, one cardiomyocyte) were captured together in one droplet (barcode collisions or doublets). Unfortunately, this uncertainty makes interpretation of all following snRNA-seq analyses based on this fibroblast subpopulation impossible.

      Thank you very much for the precious comments of the reviewer. We went over scRNA-seq results carefully. Firstly, for quality of cells, we used a relatively high threshold to ensure that we have filtered out the most of barcodes associated with empty partitions or doublet cells. We quantified the number of genes and UMIs, and kept high quality cells with the detection threshold of 500-2,500 genes and 600-8,000 UMIs. Then cells with unusually high detection rate of mitochondrial gene expression (≥10%) were excluded in this study. Taking into account the multicellular effects as you mentioned, we tried to identify doublets cells by applying the DoubletFinder (v2.0.3) by the generation of artificial doublets, using the PC distance to find each cell’s proportion of artificial k nearest neighbors (pANN) and ranking them according to the expected number of doublets. We finded that 3.20% cells (19 cells) were labeled as doublets in fibroblast-3 (594 cells). Then 19 doublet cells were removed, the trends of cell proportion and the Hrc gene expression trend in fibroblast-3 was the same as before. Therefore, our data analysis results do not affect the conclusions in this study, and it was also validated by Hrc and vimentin double immunostaining experiments (Figure 4E). Thanks again to the reviewer for these professional comments.

      2) To follow the study and be able to appreciate the data quality, individual sample metadata and UMAPs colored based on a sample and/or condition (diabetes or control) would be helpful. The paper would benefit from an analysis to show if the differences in the number of detected genes are due to the number of nuclei per cluster or if the bigger clusters are really also the ones with the most dramatic changes. Instead of showing expression levels of differentially regulated genes in distinct clusters (Fig1 S2), the differential expression could be displayed with violin plots or heatmaps that illustrate values for both conditions. Clusters that did not reveal any differential expressed genes, e.g. Adipo can be removed. Fig 1F these KEGG enrichments are hard to interpret since they can be confounded by highly expressed cardiomyocyte genes that are detected in all clusters (1B) and thus drive the GO enrichment of e.g. "cardiac muscle contraction" in T cells.

      Thanks to the reviewer for these comments. Fig1 S2 shows top 10 upregulated genes in different cell populations and the expression characteristics of these genes in a concise way. More detailed expressions levels of differentially regulated genes in distinct clusters can be seen in supplemental file 2-5. At the same time, if we use violin plot or heat maps to show the differential expression information of top 10 upregulated genes, we need too many supplement figures in the main text and therefore take up too much space. On the other hand, cell populations without differentially expressed genes in Figure 1E have been removed as you suggested.

      3) The study looks into the pathogenesis of cardiac fibrosis in diabetic mice. The authors show that downregulation of Itgb1 with siRNA (Fig 6I) leads to less fibrosis in diabetic mice. This effect might be expected since Itgb1 is an extracellular matrix-linked gene and might indicate that downregulation could be beneficial. Given this, it is confusing to see the following analysis which links several genetic variants associated with Type 2 Diabetes to Itgb1 (one leading to premature stop) and its ligand. This analysis seems out of place in relation to the remainder of the study which focuses to identify the downstream effects of diabetes on cardiac fibrosis.

      Thank you very much for the precious comments of the reviewer. We have deleted the results of the association of Itgb1 variants with diabetic cardiac fibrosis in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Han et al use sophisticated genetic approaches to investigate leptin-responsive neural circuits. Overall, this is an impressive series of studies that provide fairly convincing evidence for a key inhibitory pathway downstream of AGRP neurons. A few data sets require additional validation or explanation.

      We appreciate the reviewer’s strong interests and support of this manuscript and these valuable comments below. We have revised the manuscript accordingly to incorporate reviewer’s suggestions and critiques.

      Reviewer #2 (Public Review):

      Using a novel genetic system to conditionally ablate Lepr from Agrp neurons in adults, the authors discovered that leptin-AgRP neuron signaling strongly modulates the DMH and sought to understand the DMH targets and mechanisms of action in the response to AgRP neuron signaling. GABA signaling likely underlies the effects of AgRP neuron-mediated hyperphagia (etc). DMH Mc4R neurons appear to lie downstream of Agrp neurons. GABA in the DMH appears to mediate many of the effects of AgRP neurons on feeding and body weight. Furthermore, Deletion of Lepr from AgRP neurons increases DMH GABA-ARa3, and modulation of this receptor in the DMH alters food intake and the response to leptin.

      Unfortunately, there is little quantification or other validation data from many of the systems deployed, and the analysis jumps around a fair amount, without really uniting the results in a way that paints a convincing picture of the final model that they build.

      Thanks for these positive comments on our studies. In the revised manuscript, we have added substantial amount of new experimental data, more controls, and data validation that significantly strengthen our proposed model.

      Reviewer #3 (Public Review):

      The manuscript by Han et al characterizes a pathway from AgRP(LepR) neurons to DMH(MC4R) neurons that is involved in energy balance control. They use a conditional knockout strategy to show that AgRP(LepR) knockout increases body weight and this effect was reversible by blocking GABA signaling. They also showed that activation of AgRP-DMH projection increases food intake, and highlighted a role for alpha3-GABAA receptor signaling in the DMH for regulating feeding behavior. While these data highlight a potential circuit that modulates feeding, there are concerns about the paper in its current form that diminish enthusiasm. The lack of proper controls in many of the experiments raises doubts about the findings.

      Strengths: The authors use new tools to characterize a new circuit for leptin-mediated energy balance control. The conditional knockout has several advantages over previous techniques that are described within the manuscript. Further, the authors use combinations of different techniques (gene knockout, optogenetic manipulation, in vivo activity monitoring) to make observations at multiple levels of analysis.

      Weaknesses: Several experiments within the paper have worrisome caveats or lack proper controls, raising concerns about the overall conclusions made.

      We appreciate the reviewer’s positive comments. We added more control and validation data in our updated manuscript to support our conclusion.

    1. Author Response

      Reviewer #1 (Public Review):

      Demographic inference is a notoriously difficult problem in population genetics, especially for non-model systems in which key population genetic parameters are often unknown and where the reality is always a lot more complex than the model. In this study, Rose et al. provided an elegant solution to these challenges in their analysis of the evolutionary history of human specialization in Ae. aegypti mosquitoes. They first applied state-of-the-art statistical phasing methods to obtain haplotype information in previously published mosquito sequences. Using this phased data, they conducted cross-coalescent and isolation-with-migration analyses, and they innovatively took advantage of a known historical event, i.e., the spread of Ae. aegypti to South America, to infer the key model parameters of generation time and mutation rate. With these parameters, they were able to confirm a previous hypothesis, which suggests that human specialists evolved at the end of the African Humid Period around 5,000 years ago when Ae. aegypti mosquitoes in the Sahel region had to adapt to human-derived water storage as their breeding sites during intense dry seasons. The authors further carried out an ancestry tract length analysis, showing that human specialists have recently introgressed into Ae. aegypti population in West African cities in the past 20-40 years, likely driven by rapid urbanization in these cities.

      Given all the complexities and uncertainties in the system, the authors have done outstanding jobs coming up with well-informed research questions and hypotheses, carrying out analyses that are most appropriate to their questions, and presenting their findings in a clear and compelling fashion. Their results reveal the deep connections between mosquito evolution and past climate change as well as human history and demonstrate that future mosquito control strategies should take these important interactions into account, especially in the face of ongoing climate change and urbanization. Methodologically, the analytical approach presented in this paper will be of broad interest to population geneticists working on demographic inference in a diversity of non-model organisms.

      In my opinion, the only major aspect that this paper can still benefit from is more explicit and in-depth communication and discussion about the assumptions made in the analyses and the uncertainties of the results. There is currently one short paragraph on this in the discussion section, but I think several other assumptions and sources of uncertainties could be included, and a few of them may benefit from some quantitative sensitivity analyses. To be clear, I don't think that most of these will have a huge impact on the main results, but some explicit clarification from the authors would be useful.

      Below are some examples:

      Thank you very much for your kind words and your feedback! We have expanded our discussion of assumptions and uncertainties – we have responded to each point below:

      1) Phasing accuracy: statistical phasing is a relatively new tool for non-model species, and it is unclear from the manuscript how accurate it is given the sample size, sequencing depth, population structure, genetic diversity, and levels of linkage disequilibrium in the study system. If authors would like to inspire broader adoption of this workflow, it would be very helpful if they could also briefly discuss the key characteristics of a study system that could make phasing successful/difficult, and how sensitive cross-coalescent analyses are to phasing accuracy.

      We agree that this is an important topic to expand on. We have clarified as follows:

      Results, Page 4, last paragraph: “Over 95% of prephase calls had maximal HAPCUT2 phred-scaled quality scores of 100 and prephase blocks (i.e. local haplotypes) were 728bp long on average (interquartile range 199-1009bp). We then used SHAPEIT4.2 to assemble the prephase blocks into chromosome-level haplotypes, using statistical linkage patterns present across our panel of 389 individuals (25).”

      Discussion, Page 8, last paragraph: “Overall linkage disequilibrium is relatively low in Ae. aegypti, dropping off quickly over a few kilobases and reaching half its maximum value within about 50kb (37); this is likely sufficient for assembling shorter, high-confidence prephase blocks into longer haplotypes in many cases. However, phase-switch errors may be common across longer distances – potentially affecting inferences in the most recent time windows. Nevertheless, the similar results we obtain using different proxy populations (and thus different input haplotype structures) for human-specialist and generalist lineages (see Figure S1) suggest that our results are robust to potential mistakes in long-range haplotype phasing.”

      Discussion, Page 9, paragraph 2: “Here, we take advantage of a continent-wide set of genomes, combined with read-based prephasing and population-wide statistical phasing to develop a phasing panel that should enable future studies in Ae. aegypti with a lower barrier to entry. The same approach may work for other study organisms with similar population genomic properties; high levels of diversity are helpful for prephasing and at least moderate levels of linkage disequilibrium are important for the assembly of prephase blocks.”

      2) Estimation of mutation rate and generation time: the estimation of these importantparameters is made based on the assumption that they should maximize the overlap between the distribution of estimated migration rate and the number of enslaved people crossing the Atlantic, but how reasonable is this assumption, and how much would the violation of this assumption affect the main result? Particularly, in the MSMC-IM paper (Wang et al. 2020, Fig 2A), even with a simulated clean split scenario, the estimated migration rate would have a wide distribution with a lot of uncertainty on both sides, so I believe that the exact meaning and limitations of such estimated migration rate over time should be clarified. This discussion would also be very helpful to readers who are thinking about using similar methods in their studies. Furthermore, the authors have taken 15 generations per year as their chosen generation time and based their mutation rate estimates on this assumption, but how much will the violation of this assumption affect the result?

      This is a great point. We have expanded our discussion of how this assumption affects our conclusions (see Discussion page 9, first paragraph): “Furthermore, we chose a scaling factor that maximized overlap between the peak of estimated Ae. aegypti migration and the peak of the Atlantic Slave Trade (Fig. 2B). If we instead consider alternative scenarios where peak migration occurred at the very beginning of the slave trade era, around 1500, then our inferred mutation rate would be lower (about 2.4e-9, assuming 15 generations per year), pushing back the split of human-specialist lineages to about 10,000 years before present. This scenario seems less plausible, in part because our isolation-with-migration analyses suggest a gradual onset of migration between continents rather than a single, early-pulse model. It would also make it harder to explain the timing of the bottleneck we see in invasive populations; the first signs of this bottleneck occur at the beginning of the slave trade (~500 years ago) with our current calibration (Fig. S1A), but would be pushed to a pre-trade date in this alternative scenario. We can also consider a scenario in which peak Ae. aegypti migration occurred more recently, perhaps around 1850, corresponding to increased global shipping traffic outside the slave trade alone. In this case, our inferred mutation rate would be higher (or generation time lower), and the split of human-specialist lineages would be placed at about 3,000 years ago. Overall, the best match between the existing literature and our data corresponds to our main estimates, but alternative scenarios could gain support if future research finds evidence for a different time course of invasion than is suggested by the epidemiological literature.”

      We have slightly expanded our description of calibration in Results, page 5, last paragraph: “The fact that we see good overlap between the two distributions (yellow–white color) across a wide range of reasonable mutation rates and generation times for Ae. aegypti is consistent with our understanding of the species’ recent history and supports our approach. For example, if we take the common literature value of 15 generations per year (0.067 years per generation) (17, 20), the de novo mutation rate that maximizes correspondence between the two datasets is 4.85x10-9 (black dot in Figure 2A, used in Figure 2B), which is on the order of values documented in other insects. We chose to carry forward this calibrated scaling factor (corresponding to any combination of mutation rate and generation time found along the line in Figure 2A) into subsequent analyses.”

      We have also expanded on the uncertainty of our analyses (see Discussion page 8, last paragraph): “First, the temporal resolution of our inferences is relatively low, and both previously published simulations (39) and our own bootstrap replicates (Figure 2B–D, grey lines) suggest relatively wide bounds for the precise timing of events.”

      3) The effect of selection: all analyses in this paper assume that no selection is at play,and the authors have excluded loci previously found to be under selection from these analyses, but how effective is this? In the ancestry tract length analysis, in particular, the authors have found that the human-specialist ancestry tends to concentrate in key genomic regions and suggested that selection could explain this, but doesn't this mean that excluding known loci under selection was insufficient? If the selection has indeed played an important role at a genome-wide level, how would it affect the main results (qualitatively)?

      We have clarified that we excluded those loci from our timing estimates for both MSMC and ancestry tract analyses, but then re-ran the ancestry tract analysis with all regions included to visualize and assess how tracts were distributed along chromosomes. See Methods, page 12, paragraph 2: “Since selection associated with adaptation to urban habitats could shape lengths of admixture tracts, we masked regions previously identified as under selection between human-specialists and generalists when estimating admixture timing—namely, the outlier regions in (2). However, we used an unmasked analysis to determine and visualize the genome-wide distribution of ancestries (Fig. 3).”

      We have also added additional discussion of the expected effects of selection on our analyses (see Discussion, page 9, last paragraph): “Positive selection during adaptive introgression can increase tract lengths and make admixture appear to be more recent than it actually is. For this reason, we masked regions of the genome thought to underlie adaptation to human habitats before running our analysis. Nevertheless, if selection has acted outside these regions, admixture may be somewhat older than we estimate.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have tried to correlate changes in the cellular environment by means of altering temperature, the expression of key cellular factors involved in the viral replication cycle, and small molecules known to affect key viral protein-protein interactions with some physical properties of the liquid condensates of viral origin. The ideas and experiments are extremely interesting as they provide a framework to study viral replication and assembly from a thermodynamic point of view in live cells.

      The major strengths of this article are the extremely thoughtful and detailed experimental approach; although this data collection and analysis are most likely extremely time-consuming, the techniques used here are so simple that the main goal and idea of the article become elegant. A second major strength is that in other to understand some of the physicochemical properties of the viral liquid inclusion, they used stimuli that have been very well studied, and thus one can really focus on a relatively easy interpretation of most of the data presented here.

      There are three major weaknesses in this article. The way it is written, especially at the beginning, is extremely confusing. First, I would suggest authors should check and review extensively for improvements to the use of English. In particular, the abstract and introduction are extremely hard to understand. Second, in the abstract and introduction, the authors use terms such as "hardening", "perturbing the type/strength of interactions", "stabilization", and "material properties", for just citing some terms. It is clear that the authors do know exactly what they are referring to, but the definitions come so late in the text that it all becomes confusing. The second major weakness is that there is a lack of deep discussion of the physical meaning of some of the measured parameters like "C dense vs inclusion", and "nuclear density and supersaturation". There is a need to explain further the physical consequences of all the graphs. Most of them are discussed in a very superficial manner. The third major weakness is a lack of analysis of phase separations. Some of their data suggest phase transition and/or phase separation, thus, a more in-deep analysis is required. For example, could they calculate the change of entropy and enthalpy of some of these processes? Could they find some boundaries for these transitions between the "hard" (whatever that means) and the liquid?

      The authors have achieved almost all their goals, with the caveat of the third weakness I mentioned before. Their work presented in this article is of significant interest and can become extremely important if a more detailed analysis of the thermodynamics parameters is assessed and a better description of the physical phenomenon is provided.

      We thank you for the comments and, in particular, for being so positive regarding the strengths of our manuscript and for raising concerns that will surely improve it. We have taken the following actions to address your concerns:

      1) Extensive revisions have been made to the use of English, particularly in the abstract and introduction. Key terms are defined as they are introduced in the text to enhance the clarity of the argument. This is a significant revision that is highlighted within the text, but it is too extensive to detail here.

      2) In the results section, we improved and extended the discussion of our graphs to the extent possible. However, we found that attempting to explain the graphs' meanings more thoroughly would detract from our manuscript's main focus: identifying thermodynamic changes that could potentially lead to alterations in material properties, specifically aspect ratio, size, and Gibbs free energy. As a result, we introduced the type of information we could obtain from our analyses in the introduction (Lines 112-125) and briefly commented on it in the ‘results’ section (Lines 304-306, sentences below).

      From introduction – lines 112-125:

      “In addition, other parameters like nucleation density determine how many viral condensates are formed per area of cytosol. Overall, the data will inform us if changing one parameter, e.g. the concentration, drives the system towards larger condensates with the same or more stable properties, or more abundant condensates that are forced to maintain the initial or a different size on account of available nucleation centres (Riback et al., 2020:Snead, 2022 #1152). It will also inform us if liquid viral inclusions behave like a binary or a multi-component system. In a binary mixture, Cdilute is constant (Klosin et al., 2020). However, in multi-component systems, Cdilute increases with bulk concentration (Riback et al., 2020). This type of information could have direct implications about the condensates formed during influenza infection. As the 8 different genomic vRNPs have a similar overall structure, they could, in theory, behave as a binary system between units of vRNPs and Rab11a. However, a change in Cdilute with concentration would mean that the system behaves as a multi-component system. This could raise the hypothesis that the differences in length, RNA sequence and valency that each vRNP has may be relevant for the integrity and behaviour of condensates.”.

      From results lines 304-306:

      This indicates that the liquid inclusions behave as a multi-component system and allow us to speculate that the differences in length, RNA sequence and valency that each vRNP may be key for the integrity and behaviour of condensates.

      3) The reviewer has drawn our attention to the absence of phase separation analysis in our study. We believe that the formation of influenza A virus condensates is governed by phase separation (or percolation coupled to phase separation). However, we must exercise caution at this point because the condensates we are studying are highly complex, and the physics of our cellular system may not be adequate to claim phase separation without being validated by an in vitro reconstitution system. IAV inclusions contain a variety of cellular membranes, different vRNPs, and Rab11a. While we have robust data to propose a model in which the liquid-like properties of IAV inclusions arise from a network of interacting vRNPs that bridge multiple cognate vRNP-Rab11 units on flexible membranes, similar to what occurs in phase-separated vesicles in neurological synapses, our model for this system still lacks formal experimental validation. As a note, the data supporting our model includes: the demonstration of the liquid properties of our liquid inclusions (Alenquer et al. 2019, Nature Communications, 10, 1629); and impairment of recycling endocytic activity during IAV infection Bhagwat et al. 2020, Nat Commun, 11, 23; Kawaguchi et al. 2012, J Virol, 86, 11086-95; Vale-costa et al. 2016, J Cell Sci, 129, 1697-710. This leads to aggregated vesicles seen by correlative light and electron microscopy (Vale-Costa et al., 2016 JCS, 129, 1697-710) and by immunofluorescence and FISH (Amorim et al. 2011,. J Virol 85, 4143-4156; Avilov et al. 2012, Vaccine 30, 7411-7417; Chou et al. 2013, PLoS Pathog 9, e1003358; Eisfeld et al. 2011, J Virol 85, 6117-6126 and Lakdawala et al. 2014, PLoS Pathog 10, e1003971.

      To be able to explore the significance of the liquid material properties of IAV inclusions, we used the strategy described in this current work. By developing an effective method to manipulate the material properties of IAV inclusions, we provide evidence that controlled phase transitions can be induced, resulting in decreased vRNP dynamics in cells and a negative impact on progeny virion production. This suggests that the liquid character of liquid inclusions is important for their function in IAV infection. We have improved our explanation addressing this concern in the limitations of our study (as outlined below in the box and in manuscript in lines 857-872).

      We are currently establishing an in vitro reconstitution system to formally demonstrate, in an independent publication, that IAV inclusions are formed by phase separation (or percolation coupled to phase separation). For this future work, we teamed up with Pablo Sartori, a theorical physicist to derive in-depth analysis of the thermodynamics of the viral liquid condensates in the in vitro reconstituted system and compare it to results obtained in the cell. This will provide means to establish comparisons. We think that cells have too many variables to derive meaningful physics parameters (such as entropy and enthalpy) and models that need to be complemented by in vitro systems. For example, increasing the concentration inside a cell is not a simple endeavour as it relies on cellular pathways to deliver material to a specific place. At the same time, the 8 vRNPs, as mentioned above, have different size, valency and RNA sequence and can behave very differently in the formation of condensates and maintenance of their material properties. Ideally, they should be analysed individually or in selected combinations. For the future, we will combine data from in vitro reconstitution systems and cells to address this very important point raised by the reviewer.

      From the paper on the section ‘Limitations of the study’:

      “Understanding condensate biology in living cells is physiological relevant but complex because the systems are heterotypic and away from equilibria. This is especially challenging for influenza A liquid inclusions that are formed by 8 different vRNP complexes, which although sharing the same structure, vary in length, valency, and RNA sequence. In addition, liquid inclusions result from an incompletely understood interactome where vRNPs engage in multiple and distinct intersegment interactions bridging cognate vRNP-Rab11 units on flexible membranes (Chou et al., 2013, Gavazzi et al., 2013, Sugita et al., 2013, Shafiuddin and Boon, 2019, Haralampiev et al., 2020, Le Sage et al., 2020). At present, we lack an in vitro reconstitution system to understand the underlying mechanism governing demixing of vRNP-Rab11a-host membranes from the cytosol. This in vitro system would be useful to explore how the different segments independently modulate the material properties of inclusions, explore if condensates are sites of IAV genome assembly, determine thermodynamic values, thresholds accurately, perform rheological measurements for viscosity and elasticity and validate our findings. The results could be compared to those obtained in cell systems to derive thermodynamic principles happening in a complex system away from equilibrium. Using cells to map how liquid inclusions respond to different perturbations provide the answer of how the system adapts in vivo, but has limitations.

      Reviewer #2 (Public Review):

      During Influenza virus infection, newly synthesized viral ribonucleoproteins (vRNPs) form cytosolic condensates, postulated as viral genome assembly sites and having liquid properties. vRNP accumulation in liquid viral inclusions requires its association with the cellular protein Rab11a directly via the viral polymerase subunit PB2. Etibor et al. investigate and compare the contributions of entropy, concentration, and valency/strength/type of interactions, on the properties of the vRNP condensates. For this, they subjected infected cells to the following perturbations: temperature variation (4, 37, and 42{degree sign}C), the concentration of viral inclusion drivers (vRNPs and Rab11a), and the number or strength of interactions between vRNPs using nucleozin a well-characterized vRNP sticker. Lowering the temperature (i.e. decreasing the entropic contribution) leads to a mild growth of condensates that does not significantly impact their stability. Altering the concentration of drivers of IAV inclusions impact their size but not their material properties. The most spectacular effect on condensates was observed using nucleozin. The drug dramatically stabilizes vRNP inclusions acting as a condensate hardener. Using a mouse model of influenza infection, the authors provide evidence that the activity of nucleozin is retained in vivo. Finally, using a mass spectrometry approach, they show that the drug affects vRNP solubility in a Rab11a-dependent manner without altering the host proteome profile

      The data are compelling and support the idea that drugs that affect the material properties of viral condensates could constitute a new family of antiviral molecules as already described for the respiratory syncytial virus (Risso Ballester et al. Nature. 2021)

      Nevertheless, there are some limitations in the study. Several of them are mentioned in a dedicated paragraph at the end of a discussion. This includes the heterogeneity of the system (vRNP of different sizes, interactions between viral and cellular partners far from being understood), which is far from equilibrium, and the absence of minimal in vitro systems that would be useful to further characterize the thermodynamic and the material properties of the condensates.

      There are other ones.

      We thank reviewer 2 for highlighting specific details that need improving and raising such interesting questions to validate our findings. We have addressed the comments of Reviewer 2, we performed the experiments as described (in blue) below each point raised.

      1) The concentrations are mostly evaluated using antibodies. This may be correct for Cdilute. However, measurement of Cdense should be viewed with caution as the antibodies may have some difficulty accessing the inner of the condensates (as already shown in other systems), and this access may depend on some condensate properties (which may evolve along the infection). This might induce artifactual trends in some graphs (as seen in panel 2c), which could, in turn, affect the calculation of some thermodynamic parameters.

      The concern of using antibodies to calculate Cdense is valid, and we thought it was very important. We addressed this concern by performing the same analyses using a fluorescent tagged virus that has mNeon Green fused to the viral polymerase PA (PA-mNeonGreen PR8 virus). Like NP, PA is a component of vRNPs and labels viral inclusions, colocalising with Rab11 when vRNPs are in the cytosol. However, per vRNP there is only one molecule of PA, whilst of NP there are 37-96 depending on the size of vRNPs. As predicted, we did observe changes in the Cdilute, Cdense and nucleation density. However, the measurements and values obtained for Gibbs free energy, size, aspect ratio detecting viral inclusions with fluorescently tagged vRNPs or antibody staining followed the same trend and allow us to validate our conclusion that major changes in Gibbs free energy occur solely when there is a change in the valency/strength of interactions but not in temperature or concentration (Figure 1 below). Given the extent of these data, we show here the results but, in the manuscript, we will describe the limitations of using antibodies in our study within the section ‘Limitations of the study’ from lines 881-894. Given the importance of the question regarding the pros and cons of the different systems for analysing thermodynamic parameters, we have decided to systematically assess and explore these differences in detail in a future manuscript.

      For more information. This reviewer may be asking why we did not use the PA-fluorescent virus in the first place to evaluate inclusion thermodynamics and avoid problems in accessibility that antibodies may have to get deep into large inclusions. Our answer is that no system is perfect. In the case of the PA-fluorescent virus, the caveats revolve around the fact that the virus is attenuated (Figure 1a below), exhibiting a delayed infection as demonstrated by reduced levels of viral proteins (Figure 1b below). Consistently, it shows differences in the accumulation of vRNPs in the cytosol and viral inclusions form later in infection and the amount of vRNPs in the cytosol does not reach the levels observed in PR8-WT virus. After their emergence, inclusions behave as in the wild-type virus (PR8-WT), fusing and dividing (Figure 1c below) and displaying liquid properties.

      As the overarching goal of this manuscript is to evaluate the best strategies to harden liquid IAV inclusions and given that one of the parameters we were testing is concentration, we reasoned that using PR8-WT virus for our analyses would be reasonable.

      In conclusions, both systems have caveats that are important to systematically assess, and these differences may shift or alter thermodynamic parameters such as nucleation density, inclusion maturation rate, Cdense, Cdilute in particular by varying the total concentration. As a note, to validate all our results using the PA-mNeonGreen PR8 virus, we considered the delayed kinetics and applied our thermodynamic analyses up to 20 hpi rather than 16 hpi.

      However, because of the question raised by this reviewer, on which is the best solution for mitigating errors induced by using antibodies, we re-checked all our data. Not only have we compared the data originated from attenuated fluorescently tagged virus with our data, but also made comparisons with images acquired from Z stacks (as used for concentration and for type/strength of interactions) with those acquired from 2D images. Our analysis revealed that there is a very good match using images acquired with Z-stacks and analysed as Z projections with between antibody staining and vRNP fluorescent virus. Therefore, we re-analysed all our thermodynamic data done with temperature using images acquired from Z stacks and altered entirely Figure 2. We believe that all these comparisons and analyses have greatly improved the manuscript and hence we thank all reviewers for their input.

      Figure 1 – The PA-mNeonGreen virus is attenuated in comparison to the WT virus and data obtained is consistent for Gibbs free energy with analyses done with images processed with antibody fluorescent vRNPs. A. Representation of the PA-mNeonGreen virus (PA-mNG; Abbreviations: NCR: non coding region). B. Cells (A549) were transfected with a plasmid encoding mCherry-NP and co-infected with PA-mNeonGreen virus for 16h, at an MOI of 10. Cells were imaged under time-lapse conditions starting at 16 hpi. White boxes highlight vRNPs/viral inclusions in the cytoplasm in the individual frames. The dashed white and yellow lines mark the cell nucleus and the cell periphery, respectively. The yellow arrows indicate the fission/fusion events and movement of vRNPs/ viral inclusions. Bar = 10 µm. Bar in insets = 2 µm. C-D. Cells (A549) were infected or mock-infected with PR8 WT or PA-mNG viruses, at a multiplicity of infection (MOI) of 3, for the indicated times. C. Viral production was determined by plaque assay and plotted as plaque forming units (PFU) per milliliter (mL) ± standard error of the mean (SEM). Data are a pool from 2 independent experiments. D. The levels of viral PA, NP and M2 proteins and actin in cell lysates at the indicated time points were determined by western blotting. (E-G) Biophysical calculations in cells infected with the PA-mNeonGreen virus upon altering temperature (at 10 hpi, evaluating the concentration of vRNPs (over a time course) in conditions expressing native amounts of Rab11a or overexpressing low levels of Rab11a and upon altering the type/strength of vRNP interactions by adding nucleozin at 10 hpi during the indicated time periods. All data: Ccytoplasm/Cnucleus; Cdense, Cdilute, area aspect ratio and Gibbs free energy are represented as boxplots. Above each boxplot, same letters indicate no significant difference between them, while different letters indicate a statistical significance at α = 0.05 using one-way ANOVA, followed by Tukey multiple comparisons of means for parametric analysis, or Kruskal-Wallis Bonferroni treatment for non-parametric analysis.

      2) Although the authors have demonstrated that vRNP condensates exhibit several key characteristics of liquid condensates (they fuse and divide, they dissolve upon hypotonic shock or upon incubation with 1,6-hexanediol, FRAP experiments are consistent with a liquid nature), their aspect ratio (with a median above 1.4) is much higher than the aspect ratio observed for other cellular or viral liquid compartments. This is intriguing and might be discussed.

      IAV inclusions have been shown to interact with microtubules and the endoplasmic reticulum, that confers movement, and undergo fusion and fission events. We propose that these interactions and movement impose strength and deform inclusions making them less spherical. To validate this assumption, we compared the aspect ratio of viral inclusions in the absence and presence of nocodazole (that abrogates microtubule-based movement). The data in figure 2 shows that in the presence of nocodazole, the aspect ratio decreases from 1.42±0.36 to 1.26 ±0.17, supporting our assumption.

      Figure 2 – Treatment with nocodazole reduces the aspect ratio of influenza A virus inclusions. Cells (A549) were infected with PR8 WT for 8 h and treated with nocodazole (10 µg/mL) for 2h, after which the movement of influenza A virus inclusions was captured by live cell imaging. Viral inclusions were segmented, and the aspect ratio measured by imageJ, analysed and plotted in R.

      3) Similarly, the fusion event presented at the bottom of figure 3I is dubious. It might as well be an aggregation of condensates without fusion.

      We have changed this (check Fig 5A and B in the manuscript), thank you for the suggestion.

      4) The authors could have more systematically performed FRAP/FLAPh experiments on cells expressing fluorescent versions of both NP and Rab11a to investigate the influence of condensate size, time after infection, or global concentrations of Rab11a in the cell (using the total fluorescence of overexpressed GFP-Rab11a as a proxy) on condensate properties.

      We have included a new figure, figure 5 with the suggested data.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors present evidence from studies of biopsies from human subject and muscles from young and older mice that the enzyme glutathione peroxidase 4 (GPx4) is expressed at reduced levels in older organisms associated with elevated levels of lipid peroxides. A series of studies in mice established that genetic reduction of GPx4 and hindlimb unloading each elevated lipid peroxide levels and reduced muscle contractility in young animals. Overexpression of GPx4 or N- acetylcarnosine blocked atrophy and loss of force generating capacity resulting from hindlimb unloading in young mice. Cell culture experiments in C2C12 myotubes were used to develop evidence linking elevated lipid peroxide levels to atrophy using genetic and pharmacologic approaches. Links between autophagy and atrophy were suggested.

      Experiments on GPx4 expression levels, lipid peroxide levels, muscle mass and muscle force generating capacity were internally consistent and convincing. I thought the experiments supporting the view that autophagy contributed to atrophy were convincing. The hypothesis that altered lipidation of autophagy factors contributed was tested or supported in my view. Evidence for muscle atrophy in response to genetic or pharmacologic manipulations is a bit inconsistent throughout the paper, possibly because the small N of some experiments does not provide sufficient power to detect observed numeric differences in the means. The pattern of muscle fiber atrophy by fiber type is consistent throughout the paper but there is variability in which comparisons reached the threshold for significance, again, possibly because of the small N of the experiments. I agree with the authors that altered activity of enzymes in the contractile apparatus provides one explanation for the observed weakness but respectfully wish to point out there are others such as impaired excitation-contraction coupling which is well known to occur in aging.

      We thank Dr. Cardozo for taking time to carefully review our manuscript, and for providing an enthusiastic feedback for the significance of our work. We are grateful for additional suggestions and modified our manuscript accordingly.

      Reviewer #2 (Public Review):

      This is a well-written paper that reports that the accumulation of LOOH with age and disuse contributes to the loss of skeletal muscle mass and strength. Moreover, the authors report that LOOH neutralization attenuates muscle atrophy and weakness. The mechanism via which LOOH contributes to these phenotypes remains unclear but seems to be mediated by the autophagy- lysosomal axis. In addition, the paper also reports the efficacy of N-acetylcarnosine treatment in ameliorating muscle atrophy in mice.

      We thank the reviewer 2 for their positive response to our manuscript. Very much appreciated! Below please find our response to your specific comments.

      The authors should consider the following points to improve the manuscript:

      • The authors showed that inhibition of the autophagy-lysosome axis by ATG3 deletion or BafA1 was sufficient to reduce LOOH levels induced by GPx4 deletion, erastin, or RSL3. Moreover, they found that 4-HNE co-localizes with LAMP2. However, it remains unclear the precise mechanism via which LOOH contributes to muscle atrophy and how it is amplified by the autophagy-lysosomal axis. The authors could further test the functional interaction of 4-HNE with LAMP2 with additional experiments such as immunoprecipitation.

      Thank you for these comments. We agree with the reviewer that our observations on autophagy-lysosomal axis is yet backed by a tangible mechanism. To clarify, we only show 4HNE and LAMP2 colocalization to show that they are proximate to each other. We do not necessarily claim that LAMP2 is the protein that becomes 4-HNE-ylated. We are currently developing a proteomic platform to detect 4-HNE conjugations on peptides, and this should hopefully shed light to the nature of interaction between LOOH and the autophagy-lysosomal axis. We now include additional discussion on autophagy-lysosomal axis with LOOH in lines 280-291.

      • A weak point of the paper is not having performed the experiments on 24-month-old-mice. At 20 months of age, the mice do not display any muscle wasting and myofiber atrophy compared to young mice that have completed postnatal muscle growth (=6-month-old-mice). It would be interesting to see the levels of 4-HNE in 24- or 30-month-old mice, and if N-acetylcarnosine treatment in older mice is able to rescue muscle atrophy induced by aging.

      This is a nuanced but a very important point. We initially set out to study mice in the 24 months old mice, but these mice did not tolerate the hindlimb unloading procedure well and ended up using the 20 months old mice instead. While mice at this age tolerated our HU procedure well, they did not manifest significant reduction in muscle mass compared to young. We included additional discussions in lines 298-300 and 310-314. To address this point, we are currently performing a 6-month N-acetylcarnosine intervention in 24 months old mice, and examine the effect that this compound has on the effect of aging (without HU) in multiple organ systems. We have thus completed 2 cohorts for this preclinical trial. Results on the effects of long-term N- acetylcarnosine treatment on muscle will be included in the separate manuscript.

      Previous studies have shown that inhibition of autophagy accelerates (rather than protect) from sarcopenia, and that autophagy is required to maintain muscle mass (Masiero 2009, PMID: 19945408; Castets 2013, PMID: 23602450; Carnio 2014, PMID: 25176656). On this basis, the authors should test whether their findings are valid only in the context of disuse atrophy or also in the context of sarcopenia (=24-30-month-old mice).

      We agree with the reviewer that the role of autophagy and muscle mass is likely complex. In the current study, we only showed that a SHORT-TERM inhibition of autophagy by ATG3 deletion prevents muscle atrophy induced by a SHORT-TERM disuse intervention. Inhibition of autophagic machinery long-term will likely be detrimental, and as shown in references provided by the reviewer, accelerates sarcopenia. We now include these discussions in lines 280-287. We respectfully request that the experiments in 24-30 month old ATG3-MKO mice be beyond the scope of the study. As discussed above, there is much more to study regarding the nature of interaction between the autophagy-lysosomal axis and LOOH.

      • In Fig.2 the authors report that GPx4 KD, erastin, and RSL3 reduce the diameter of myotubes. For how long and when was the treatment done? Looking at the images, it seems that there are some myoblasts in the cultures treated with GPx4 KD, erastin, and RSL3. Is it possible that these compounds reduce myotube size by inhibiting myoblast fusion rather than by inducing myotube atrophy?

      Thank you for point this out. We now provide further details in the method section (lines 439- 443). For KD experiments, we treat myoblasts with virus simultaneous to differentiation, due to lower infection efficiency in myotubes. This is certainly a caveat. However, erastin and RSL3 experiments were done on fully differentiated myotubes. It is common to have non- differentiated myoblasts under differentiated myotubes.

      • MDA quantification was done in the gastrocnemius although all the experiments in this paper were performed in the soleus and EDL. It would be good if the authors could explain the reason for this.

      MDA and 4-HNE WB were done on gastroc for all mouse models because some soleus and EDL muscles are below 7 mg and provided insufficient materials to perform MDA or 4-HNE. Soleus and EDL were used for contractile experiments (gastr0c cannot be used for this experiment) and for histological analyses.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Jigo et al. measured the entire contrast sensitivity function and manipulated eccentricity and stimulus size to assess changes in contrast sensitivity and acuity for different eccentricities and polar angles. They found that CSFs decreased with eccentricity, but to a lesser extent after M scaling while compensating for striate-cortical magnification around the polar angle of the visual field did not equate to contrast sensitivity.

      In this article, the authors used classic psychophysical tests and a simple experimental design to answer the question of whether cortical magnification underlies polar angle asymmetries of contrast sensitivity. Contrast sensitivity is considered to be the most fundamental spatial vision and is important for both normal individuals and clinical patients in ophthalmology. The parametric contrast sensitivity model and the extraction of key CSF attributes help to compare the comparison of the effect of M scaling at different angles. This work can provide a new reference for the study of normal and abnormal space vision.

      The conclusions of this paper are mostly well supported by data, but some aspects of data collection and analysis need to be clarified and extended.

      1) In addition to the key CSF attributes used in this paper, the area under the CSF curve is a common, global parameter to figure out how contrast sensitivity changes under different conditions. An analysis of the area under the CSF curve is recommended.

      – We have added the area under the CSF (AULCSF) [lines 305-319, Fig 5 E-F; lines 339-343, Fig 6 E-F]. Differences for non-magnified and magnified stimuli are not eliminated.

      2) In Figure 2, CRFs are given for several SFs, but were the CRFs at the cutof-sf well-fitted? The authors should have provided the CRF results and corresponding fits to make their results more solid.

      – As reported in Fig 4A,C,E, the group data fits were very high (≥.98).

      3) The authors suggested that the apparent decrease in HVA extent at high SF may be due to the lower cutoff-SF of the perifoveal VM. Analysis of the correlation between the change in HVA and cutoff SF after M scaling may help to draw more comprehensive conclusions.

      – We have rephrased our explanation [lines 453-460]. As per your suggestion, we correlated the change in HVA and the cutoff SF after M scaling and found these correlations to be non significant.

      4) In Figure 6, it would be desirable to add panels of exact values of HVA and VMA effects for key CSF attributes at different eccentricities, as shown in Figures 4B, D, and F, to make the results more intuitive.

      – We have added these panels [FIG 6] and the corresponding analysis in the text [lines 321-343]

      5) More discussions are needed to interpret the results. 1) Due to the different testing distances in VM and HM, their retinae will be in a different adaptation state, making any comparison between VM and HM tricky. The author should have added a discussion on this issue.

      – Note that the mean luminance of the display (from retina to monitor) was 23 cd/m2 at 57cm and 19 cd/m2 at 115 cm. The pupil size difference for these two conditions is relatively small (< 0.5 mm) and should not significantly affect contrast sensitivity (Rahimi-Nasrabadi et al., 2021) [lines 483-491]. Moreover, the differences we get here are consistent with the asymmetries we (e.g., Carrasco, Talgar & Cameron, 2001; Cameron, Tai & Carrasco, 2002; Fuller, Park & Carrasco, 2009; Abrams, Nizam & Carrasco, 2012; Corbett & Carrasco, 2012; Himmelberg, Winawer & Carrasco, 2020) and many others (e.g., Baldwin et al., 2012; Pointer & Hess, 1989; Regan and Beverley, 1983; Rijsdijk et al., 1980; Robson and Graham, 1981; Rosén et al., 2014; Silva et al., 2008) have observed for contrast sensitivity when the vertical and horizontal meridian are tested simultaneously at the same distance.

      6) In Figure 4, the HVA extent appears to change after M-scaling, although the analysis shows that M-scaling only affects the HVA extent at high SF. In contrast, the range of VMA was almost unchanged. The authors could have discussed more how the HVA and VMA effects behave differently after M-scaling.

      – We had commented on this pattern and have further clarified it [lines 436-451]

      7) The results in Figure 4 also show that at 11.3 cpd, the measurement may be inaccurate. This might lead to an inaccurate estimate of the M scaling effect at 11.3 cpd. The authors should discuss this issue more.

      – We have explained why this data point is at chance [FIG 4 caption]

      8) The different neural image-processing capabilities among locations, which is referred to as the "Qualitative hypothesis", is the main hypothesis explaining the differences around the polar angle of the visual field. To help the reader better understand this concept, the author should provide further discussions.

      – We have expanded the discussion of the qualitative hypothesis of differences in polar angle (lines 86-92; lines 476-481).

      9) The authors should also provide more details about their measures. For example, high grayscale is crucial in contrast sensitivity measurements, and the authors should clarify whether the monitor was calibrated with high grayscale or only with 8-bit. Since the main experiment was measuring CS at different locations, it should also be clarified whether the global uniformity of the display was calibrated.

      – The monitor was calibrated with 8-bit at the center of the display [lines 607].

      – Regarding global uniformity, although we only calibrated at the center of the display, please note that the asymmetries are not due to the particular monitor we used. We have obtained these asymmetries in contrast sensitivity in numerous studies using multiple monitors over 20 years (e.g., Carrasco, Talgar & Cameron, 2001; Cameron, Tai & Carrasco, 2002; Fuller, Park & Carrasco, 2009; Abrams, Nizam & Carrasco, 2012; Corbett & Carrasco, 2012; Hanning et al., 2022a; Himmelberg et al., 2020) and other groups have reported these visual asymmetries as well (Baldwin et al., 2012; Pointer and Hess, 1989; Rosén et al., 2014). Also important, as we had mentioned in the Introduction [lines 55-59], the HVA and VMA asymmetries shift in-line with egocentric referents, corresponding to the retinal location of the stimulus, not with the allocentric location (Corbett & Carrasco, 2011).

      10) In addition, their method of data analysis relies on parametric contrast sensitivity model fitting. One of the concerns is whether there are enough trials for each SF to measure the threshold. The authors should have included in their method the number of trials corresponding to each SF in each CSF curve.

      – We have specified number of trials [lines 637-644]

      Reviewer #2 (Public Review):

      This is an interesting manuscript that explores the hypothesis that inhomogeneities in visual sensitivity across the visual field are not solely driven by cortical magnification factors. Specifically, they examine the possibility that polar angle asymmetries are subserved by differences not necessarily related to the neural density of representation. Indeed, when stimuli were cortically magnified, pure eccentricity-related differences were minimized, whereas applying that same cortical magnification factor had less of an effect on mitigating polar angle visual field anisotropies. The authors interpret this as evidence for qualitatively distinct neural underpinnings. The question is interesting, the manuscript is well written, and the methods are well executed.

      1) The crux of the manuscript appears to lean heavily on M-scaling constants, to determine how much to magnify the stimuli. While this does appear to do a modest job compensating for eccentricity effects across some spatial frequencies within their subject pool, it of course isn't perfect. But what I am concerned about is the degree to which the M-scaling that is then done to adjust for presumed cortical magnification across meridians is precise enough to rely on entirely to test their hypothesis. That is, do the authors know whether the measures of cortical magnification across a polar angle that are used to magnify these stimuli are as reliable across subjects as they tend to be for eccentricity alone? If not, then to what degree can we trust the M-scaled manipulation here? In an ideal world, the authors could have empirically measured cortical surface area for their participants, using a combination of retinotopy and surface-based measures, and precisely compensated for cortical magnification, per subject. It would be helpful if the authors better unpacked the stability across subjects for their cortical magnification regime across polar angles.

      –– We note that the equations by Rovamo and Virsu are commonly used to cortically magnify stimulus size. This paper has many citations, and the conclusions of many studies are based on those calculations [lines 115-128].

      –– In response to Rev’s 3 comment, “In lieu of carrying out new measurements, it could also suffice to compare individual cortical magnification factors to the performance to quantify the contribution to the psychophysical performance”, we found a significant correlation between the surface area and contrast sensitivity measures at the horizontal, upper-vertical and lower-vertical meridians. However, we found no significant correlation between the cortical surface with the difference in contrast sensitivity for fixed-size and magnified stimuli at 6 deg at each meridian. These findings suggest that surface area plays a role but that individual magnification is unlikely to equalize contrast sensitivity [lines 366-380; Fig 7; lines 511-529].

      2) Related to this previous point, the description of the cortical magnification component of the methods, which is quite important, could be expanded on a bit more, or even placed in the body of the main text, given its importance. Incidentally, it was difficult to figure out what the references were in the Methods because they were indexed using a numbering system (formatted for perhaps a different journal), so I could only make best guesses as to what was being referred to in the Methods. This was particularly relevant for model assumptions and motivation.

      –– We now detail M-scaling in the Introduction [lines 115-135], and we have fixed the references in the Methods section.

      3) Another methodological aspect of the study that was unclear was how the fitting worked. The authors do a commendably thorough job incorporating numerous candidate CSF models. However, my read on the methods description of the fitting procedure was that each participant was fitted with all the models, and the best model was then used to test the various anisotropy models afterwards. What was the motivation for letting each individual have their own qualitatively distinct CSF model? That seems rather unusual.

      Related to this, while the peak of the CSF is nicely sampled, there was a lack of much data in the cutoff at higher spatial frequencies, which at least in the single subject data that was shown made the cutoff frequency measure seem like it would be unreliable. Did the authors find that to be an issue in fitting the data?

      –– We have further clarified that we fit all 9 models to the grouped data [lines 177-178] and in Methods [lines 693, 716, 725], and that the fit in Figure 3 corresponds to the grouped data [Fig 3 caption]. As reported in Fig 4A,C,E, the group data fits were very high (≥.98). Please note that the cutoff spatial frequency is reliable. The data point (11.3 cpd) in the differences which does not follow the same function (Fig 4D,F) reflects the fact that for both magnified and not-magnified stimuli, performance was at chance, consistent with the fact that high SF are harder to discriminate at peripheral locations [Fig 4 caption].

      4) The manuscript concludes that cortical magnification is insufficient to explain the polar angle inhomogeneities in perceptual sensitivity. However, there is little discussion of what the authors believe may actually underlie these effects then. It would be productive if they could offer some possible explanation.

      –– We have expanded the discussion of qualitative hypothesis of differences in polar angle [lines 86-92; lines 476-481].

      –– We have expanded the discussion of possible mechanisms [lines 496-529].

      –– We have explained why having assessed the VM and HM and different distances does not significantly influence our measures [lines 483-491].

      –– We have expanded the discussion of how the HVA and VMA effects behave differently after M-scaling [lines 435-450].

      –– We have clarified that the fits are reliable and made explicit that the highest SF data point is at chance in both conditions [FIG 4 caption].

      Reviewer #3 (Public Review):

      Jigo, Tavdy & Carrasco used visual psychophysics to measure contrast sensitivity functions across the visual field, varying not only the distance from fixation (eccentricity) but also the angular position (meridian). Both parameters have been shown to affect visual sensitivity: spatial visual acuities generally fall off with eccentricity, it is now widely accepted that it is superior along the horizontal than the vertical meridian, and there may also be differences between the upper and lower visual field, although this anisotropy is typically less pronounced. The eccentricity-dependent decrease in performance is thought to be due to reduced cortical magnification in peripheral compared to central vision; that is, the amount of brain tissue devoted to mapping a fixed amount of visual space. The authors, therefore, include a crucial experimental condition in which they scale the size of their stimuli to account for reduced cortical magnification. They find that while this corrects for reduced performance related to stimulus eccentricity, it does not fully explain the variation in performance at different visual field meridians. They argue that this suggests other neural mechanisms than cortical magnification alone underlie this intra-individual variability in visual perception.

      The experiments are done to an extremely high technical standard, the analysis is sound, and the writing is very clear. The main weakness is that as it stands the argument against cortical magnification as the factor driving this meridional variability in visual performance is not entirely convincing. The scaling of stimulus size is based on estimates in previous studies. There are two issues with this: First, these studies are all quite old and therefore used methods that cannot be considered state-of-the-art anymore. In turn, the estimates of cortical magnification may be a poor approximation of actual differences in cortical magnification between meridians.

      –– We note that the equations by Rovamo and Virsu are commonly used to cortically magnify stimulus size. This paper has many citations, and the conclusions of many studies are based on those calculations [lines 115-128].

      –– In response to Rev’s 3 comment, “In lieu of carrying out new measurements, it could also suffice to compare individual cortical magnification factors to the performance to quantify the contribution to the psychophysical performance”, we found a significant correlation between the surface area and contrast sensitivity measures at the horizontal, upper-vertical and lower-vertical meridians. However, we found no significant correlation between the cortical surface with the difference in contrast sensitivity for fixed-size and magnified stimuli at 6 deg at each meridian. These findings suggest that surface area plays a role but that individual magnification is unlikely to equalize contrast sensitivity [lines 366-380; Fig 7; lines 511-529].

      Second, we now know that this intra-individual variability is rather idiosyncratic (and there could be a wider discussion of previous literature on this topic). Since these meridional differences, especially between upper and lower hemifields, are relatively weak compared to the variance, a scaling factor based on previous data may simply not adequately correct these differences. In fact, the difference in scaling used for the upper and lower vertical meridian is minute, 7.7 vs 7.68 degrees of visual angle, respectively. This raises the question of whether such a small difference could really have affected performance.

      That said, there have been reports of meridional differences in the spatial selectivity of the human visual cortex (Moutsiana et al., 2016; Silva et al., 2017) that may not correspond one-to-one with cortical magnification. This could be a neural substrate for the differences reported here. This possibility could also be tested with their already existing neurophysiological data. Or perhaps, there could be as-yet undiscovered differences in the visual system, e.g., in terms of the distribution of cells between the ventral and dorsal retina. As such, the data shown here are undoubtedly significant and these possibilities are worth considering. If the authors can address this critique either by additional experiments, analyses, or by an explanation of why this cannot account for their results, this would strengthen their current claims; alternatively, the findings would underline the importance of these idiosyncrasies in the visual cortex.

      We now include discussion of the different points that the reviewer raised here in our new section 'What mechanism might underlie perceptual polar angle asymmetries' [lines 497-530].

    1. Author Response

      Reviewer #1 (Public Review):

      • The statistical procedures used are not completely described and may not be appropriate.

      We revised the text in Methods and Results sections to give more details about the methods used.

      -As only two levels of delay were tested, it is not possible to directly test whether the subjective discounting function is hyperbolic or exponential and hence whether the delay is encoded subjectively or objectively.

      We agree with the reviewer. A higher number of task parameters may offer a better resolution to evaluate the discounting functions. Fortunately, this does not affect our main results.

      • The task has several variable interval lengths (hold in: 1.2-2.8 s, short delay: 1.8-2.3 s, long delay: 3.5-4s) that frustrate interpretation. The distribution of these delays is not described, for example as it reads it seems possible that some long delay rewards are delivered with shorter latency between cue and reward than some short delay rewards (1.2 + 3.5 = 4.7s vs. 2.8+2.3 = 5.1 s).

      We revised the text to address that ambiguity. In the new version of the manuscript, we describe short versus long delays considering the total delay intervals between instruction cue onset and reward delivery [short delay (3.5-5.6s) and long delay (5.2-7.3s)]. Within each delay category, individual delays were distributed in a gaussian fashion such that the two delay ranges overlapped for 9% of trials. These details are now described in the revised Methods section (pg. 22).

      -The authors have not considered that if the delay value is encoding, then the value, both objectively and subjectively, may be changing as the delay elapses. The variation of these task intervals may have an effect on the value of delay.

      In the present study, we report a dynamic integration between the desirability of the expected reward and the imposed delay to reward delivery across the waiting period. Our results (e.g. see Fig. 6) do not fit with simple linear (or logarithmic) effects corresponding to continuous regular changes as the delay elapses. We found different types of interactions (Discounting± and Compounding±) at different periods of the hold period and in different single units. We did not find a way to model all these types of interactions with this type of approach.

      Reviewer #2 (Public Review):

      • Plots of "rejection rate" (trials where the monkeys failed to wait until the rewards) as a function of delay and reward size seem to indicate that the monkeys understood the visual cue. The rejection rates were very low (less than 4% for almost all conditions) which indicates that the monkeys did not have a hard time inhibiting their behavior. It also meant that the authors could not compare trials where the monkeys successfully waited with trials where they failed to wait. This missing comparison weakens the link between the neurophysiological observations and the conclusions the authors made about the signals they observed.

      Here, our main goal was to describe the dynamic STN signals engaged during the waiting period without studying action-related activities. In the discussion (pg. 20), we clearly wrote ‘Further research is needed to determine whether the neural signals identified here causally drive animals’ behavior or rather just participate to reflect or evaluate the current situation.’ Consequently, our conclusions were already tempered by that point.

      In addition, we address the same limitation by writing (pg. 20): “An important avenue for future research will be to determine how STN signals, such as those described here, change when animals run out of patience and finally decide to stop waiting. To do this, however, smaller reward sizes and longer delays might be used to promote more escape behaviors during the delay interval.”

      • The authors examined the STN activity aligned to the start of the delay and also aligned to the reward. Most of the "delay encoding" in the STN activity was observed near the end of the waiting period. The trouble with the analysis is that a neuron that responded with exactly the same response on short and long trials could appear to be modulated by delay. This is easiest to see with a diagram, but it should be easy to imagine a neural response that quickly rose at the time of instruction and then decayed slowly over the course of 2 seconds. For long trials, the neuron's activity would have returned to baseline, but for short trials, the activity would still be above baseline. As such, it is not clear how much the STN neurons were truly modulated by delay.

      We agree with the reviewers. Our original analyses using two-time windows had the potential to introduce biases in the detection of neuronal activities modulated by the delay. To overcome this issue, we modified the time frame of all of our analyses (neuronal activity, eye position, EMG). Now, the revised version of the manuscript only reports activities across one-time window aligned to the time of instruction cue delivery (i.e., -1 to 3.5s relative to instruction cue onset). This time frame corresponds to the minimum possible interval between instruction cues and reward delivery. We have revised all of the figures and we re-calculated all of the statistics using that one analysis window. Despite these major modifications, our key findings were not changed substantially. We found the same pattern in STN activities, with a strong encoding of reward (48% of neurons) preceding a late encoding of delay (39% of neurons). We also updated the text in Methods and Results sections to reflect the revised analyses.

      • Another concern is the presence of eye movement variables in the regressions that determine whether a neuron is reward or delay encoding. If the task variables modulated eye movements (which would not be surprising) and if the STN activity also modulated eye movements, then, even if task variables did not directly modulate STN activity, the regression would indicate that it did. This is commonly known as "collider bias". This is, unfortunately, a common flaw in neuroscience papers.

      Because the presence of eye variables did not influence how neurons were selected by the GLM, we do not think it likely that our analysis was susceptible to “collider bias”. Nonetheless, to control for that possibility directly, we have now repeated the GLM analyses with eye movement variables excluded. Results are shown in a new figure (Fig.4 – supplementary 1). Exclusion of eye parameters produced results that are very similar to those from the GLM that included eye parameters (differences <3 degrees). We have added text to the manuscript describing this added control analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      The work integrated genomic and transcriptomic data to reconstruct the origin of the svPDE gene from the ancestral ENPP3 gene. The authors also analyzed the expression of svPDE along different snake lineages and different tissues in three species of venomous snakes. Finally, they purified an svPDE from the venom of Naja atra and analyzed its crystallographic structure and enzymatic function. The experiments are adequately designed and carefully planned and the conclusions made by the authors are well supported by evidence.

      I have the following suggestions:

      1) I could not find a section where the authors provided information regarding the origin of the analyzed venom and tissues. i.e. muscle tissue from Naja atra and venom for purification of svPDE. It is important to include this information.

      We thank the reviewer for mentioning this.

      The information for the venom purification has been described in Results (LINE 116) as “This svPDE was directly purified from the crude venom of Naja atra captured in Taiwan”. The information for the tissues of sequencing data has been included in Results (LINE 117) as “… with publicly available RNA-Seq data and compared them with the corresponding genomes available in the NCBI Assembly database (SI Appendix, Table S1)”, and Material and Methods (Line 403) as “DNA was extracted from the muscle tissue of a male Naja atra …”.

      Also, the SI Appendix Table S1 summarized all samples used for sequence analysis with their tissue origins.

      We are still grateful for this comment and have updated the text to make it clearer as follows:

      “The target genomes included the draft one of Naja atra sequenced from a muscle tissue (ongoing internal project, see Material and Methods for detail) and the complete one of its sister species, Naja naja, from the public data (Suryamohan et al., 2020).”

      We have also updated the text when the first time mentioning the comparative genomics and transcriptomes analysis to indicate where the information is described.

      “To test our hypothesis, we comprehensively de novo assembled transcriptomes from the species across 13 clades of Toxicofera (Fig. 1B) with publicly available RNA-Seq data and compared them with the corresponding genomes available in the NCBI Assembly database (see SI Appendix, Table S1 for sample details).”

      2) The authors mention (Line 156) that "the genomic sequences of svPDE-E1a were present in all species of Serpentes but not in the species of Dactyloidae, Varanidae, and Typhlopidae.". As I understand it, the family Typhlopidae is included in the Suborder Serpentes. The conclusions stand of course, but I believe it is worth revising, for accuracy.

      We thank the reviewer for noticing this issue.

      We have updated the text as follows to prevent misleading:

      From “the genomic sequences of svPDE-E1a were present in all species of Serpentes but not in the species of Dactyloidae, Varanidae, and Typhlopidae. This suggests an early emergence of svPDE-E1a in the common ancestor of Serpentes and became …”

      To

      “the genomic sequences of svPDE-E1a were present in all species of Serpentes except for the earliest diverged Typhlopidae. This suggest an early emergence of svPDE-E1a in the Serpentes evolution and became …”

      3) During the discussion (Line 315), it is stated that the expression of svPDE in Lamprophiidae is probably associated with the adaptation of prey selection as a dietary generalist compared to Viperidae and Elapidae. Provided that both of these clades have several species considered dietary generalists, I believe this statement is not strongly supported.

      We agreed with the reviewer’s comment that we overstated it without solid support. However, here we believe it is worth mentioning and providing a hint for future studies that Lamprophiidae, a less-known clade, has svPDE expression and is not lower than several species of Elapidae. Therefore, we have revised this paragraph to include the finding without further speculations.

      “Comparative transcriptomics is a powerful tool to reveal species-specific or tissue-specific novel transcripts, providing new insights for further studies. For example, the svPDE expression of Lamprophiidae, even higher than several species of Elapidae, indicates the worth of further study for this less-known clade to fill the knowledge gap.”

      4) Also in the discussion (Line 320), the authors mention that Colubridae is traditionally regarded as a non-venomous clade. This statement is far from accurate given that Colubridae is a very diverse clade and several species within it have been shown to be at least moderately venomous. Various species have been shown to produce secretions comparable to those of front-fanged snakes. Furthermore, despite their difference in morphology, I believe there is little to no evidence that suggests Duvernoy's glands in colubrids have any functions differing from the venom glands of front-fanged snakes.

      We thank reviewer’s comment for revising the interpretation. This paragraph has been rewritten to as follows:

      “Interestingly, the svPDE expression in Duvernoy’s glands of Colubridae, although low, several species within the diverse Colubridae clade have been shown to be moderately venomous. The expression of svPDE in the Duvernoy’s glands also highlights its potential function despite that Duvernoy’s glands exhibit morphological difference from the venom glands of front-fanged snakes”

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript "Interplay between PML NBs and HIRA for H3.3 dynamics following type I interferon stimulus" by Kleijwegt and colleagues describes a study that's set out to explore the details of the PML-HIRA axis in H3.3 deposition at ISGs upon IFN-I stimulation. First, the authors establish that HIRA colocalized at PML NBs upon TNFa and TNFb treatment. This process is SUMO-dependent and facilitated by at least one of the identified SIM domains of HIRA. Next, the authors set out to determine whether interferon responsive genes (ISGs) are dependent on HIRA or PML. By knocking-down either HIRA or PML, only an effect on ISGs was observed when PML was knocked down. In fact, immune-FISH showed that PML NBs are in close proximity of ISGs upon TNFb treatment. To address the histone chaperone function of HIRA, the deposition of the replication-independent H3.3 on ISGs is tested. In specific, the enrichment of H3.3 across the ISG gene body. ChIP-seq data (Fig 5B) showed an enrichment around the TES, whereas qPCR (Fig 5A) showed less convincing enrichment (for details see below). When either HIRA or PML are knocked down, a mild loss of H3.3 enrichment was observed (Fig 5E). Interestingly, when HIRA is sequestered away from PML NBs by Sp100, an increased enrichment of H3.3 was observed. To understand the interplay between H3.3 deposition and HIRA's role in this process in the presence of PML NBs, H3.3 was overexpressed. Two population of cells were observed: low or high levels of H3.3. In the former, HIRA formed foci and the latter, HIRA did not form foci. Surprisingly, when HIRA is overexpressed, PML NBs form in the absence of TNFb. Finally, a two-sided model is proposed, where PML NBs is required for ISG transcription promoting H3.3 loading. The second side is that PML NBs function as a "storage center" for HIRA to regulate its availability.

      Overall, it the model is intriguing, but the data presented seems insufficient to support the current claims.

      We thank the reviewer for his/her constructive comments. We want to point out that there is a confusion in the reviewer's statement (highlighted in red here above) between TNFb and IFNb, because it is IFNb that was mostly used in our study. We suppose it is a typo error. Concerning the sentence: "when HIRA is overexpressed, PML NBs form in the absence of TNFb", it is inaccurate. Indeed, PML NBs are present in our cells with or without IFNb treatment. Overexpression of HIRA triggers accumulation of the ectopic HIRA in the PML NBs in absence of IFNb, probably as part of a buffering mechanism.

      Major concerns:

      • The suggested function of HIRA at the PML NBs as storage is interesting. Ideally, this would be tested by real-time single molecule tracking.

      While surely interesting, we believe that the real-time single molecule tracking is beyond the scope of our article. In addition, with our hypothesis that PML NBs act as buffering places for HIRA, HIRA might come in and out of PML NBs depending on its concentration and/or the availability of free binding sites and single molecule tracking might not be informative for long- term possible storage functions of PML NBs.

      • The link between PML NBs containing HIRA and H3.3 deposition is very intriguing and indeed the ChIP-seq data shown in Figure 5B shows a clear increase in the H3.3 signal around the TES. This distribution is very intriguing as recent work (Fang et al 2018 Nat Comm) showed that H3.3 deposition across the gene body was diverse and dynamic. Ideally, the qPCR of some select ISGs would confirm the ChIP-seq data. Here a more complex picture emerges. Just as with the ChIP-seq, a modest decrease of H3.3 at the TSS was observed, but only in 2 of the 3 genes shown is H3.3 enriched at the TES and only in 1 gene (ISG54) is H3.3 enriched at the gene body. As qPCR is later used in the manuscript (Fig 5E and 5G), it is essential that the results of two different techniques give similar results. With regards to Fig 5E and 5G, it is unclear why certain gene regions are shown, but not others.

      We agree with the reviewer that distribution of H3.3 on active genes follows a diverse and dynamic pattern. H3.3 is enriched on gene bodies but several papers have shown an important increase of H3.3 loading on the TES region of actively transcribed genes (Tamura et al. 2009; Sarai et al. 2013). Our ChIP-qPCR data (Figure 6A) and our ChIP-Seq data (Figure 6B) are consistent and show a moderate increase of H3.3 on gene bodies, eg on MX1 mid or ISG54 mid regions shown by qPCR on Figure 6A (this enrichment is reproducible but not necessarily statistically significant) and on gene bodies of the 48 core ISGs as shown in our ChIP-Seq data (see the light blue line between TSS and TES on figure 6B). In addition, our ChIP-qPCR and ChIP-Seq data also consistently show a higher enrichment of H3.3 on the TES regions of ISGs (see the significant enrichment found in ChIP-qPCR in the TES regions of MX1, OAS1 and ISG54, as well as the strong increase in H3.3 deposition with IFN seen by the light blue line for ChIP- Seq data on figure 6B).

      Since the strongest enrichment for H3.3 was found on the TES region, we focused on this region to evaluate the impact of HIRA or PML knock-down. Our ChIP-Seq data (now added in main Figure 6F for the whole ISG region, or with a zoom on the TES region in Figure 6G) shows that the strongest effect of HIRA or PML knock-down is indeed visible in the TES region of ISGs. Our ChIP-qPCR presented on Figure 6E data totally supports this effect.

      Overall, the link between HIRA and PML in H3.3 loading is only mildly affected (Fig 5E and 5F). The conclusion that HIRA and PML are essential (Page 12, line 8) is not represented by the presented data. The authors propose that DAXX could play a role. Indeed, work on another H3 variant, CENP-A, showed that non-centromeric localization is dependent on both HIRA and DAXX (Nye et al 2018 PLoS ONE). It would be interesting to learn if a double knock-down of HIRA and DAXX can prevent the enrichment of H3.3 at TES of ISGs upon TNFb treatment.

      To address the first part of the comment, we have now added 3 things :

      (1) we have tuned-down our conclusion by saying that HIRA and PML are 'important' for the long-lasting deposition of H3.3 on ISGs,

      (2) we provide new data of time-ChIP qPCR experiments suggesting that HIRA is important for H3.3 recycling during transcription of ISGs. We believe that these results strengthen the importance of HIRA for the global H3.3 enrichment on ISGs (by acting both in the de novo deposition and/or recycling of H3.3).

      We agree with the reviewer that it could be interesting to study the impact of the double knock-down of DAXX and HIRA on H3.3 enrichment at ISGs. However, we decided to focus our attention on SP100 since it could help us to better tease apart the role of HIRA localization in PML NBs, versus its role in H3.3 deposition at ISGs. In addition, since SP100 knock-down unleashes ISGs transcription, it also provided us with the opportunity to study the impact of an elevated ISGs transcription on H3.3 deposition and whether this is also mediated by HIRA.

      (3) we thus now also provide data of the double knock-down of SP100 and HIRA showing that the increase in H3.3 loading on ISGs seen upon SP100 knock-down is mediated by HIRA. This new result also strengthens the importance of HIRA for H3.3 enrichment on ISGs upon transcription.

      • In Figure 6B, two versions of HIRA are overexpressed and the authors conclude that the number of PML NBs goes up. Earlier in the manuscript, the authors showed that PML NB formation upon IFNb exposure brings HIRA into the PML NBs via a SUMO-dependent mechanism. Is overexpression of HIRA and its accumulation in PML NBs also SUMO-dependent or SUMO-independent? Overexpressing the SIM mutants from Figure 3F would address this question. In addition, the link between the proposed HIRA being stored at PML NBs could be strengthened by overexpressing HIRA and see at both short and late time points whether H3.3 is enriched on ISG genes.

      We want to clarify the first point: we do not conclude that the number of PML NBs goes up upon overexpression of HIRA. The number of PML NBs seems stable, although we have not quantified it. The aim of Figure 4A (previously Figure 6B) is to show that upon overexpression, ectopic forms of HIRA localize in PML NBs without IFN-I treatment, as part of a buffering mechanism.

      The SIM mutant of HIRA from Figure 3F is indeed overexpressed and does not localize in PML NBs upon IFN-I treatment. We have now added an IF (Figure 3- figure supplement 1C) showing that it does not localize either in PML NBs in non-treated cells. Thus, this underscores that accumulation of ectopic HIRA in PML NBs is SUMO-SIM-dependent regardless of the IFN-I treatment.

      • BJ cells are known to senesce rather easily. Did the authors double-check what fraction of their cells were in senescence and whether this correlated with the high or low expression of ectopic H3.3?

      BJ cells can indeed enter into senescence, but there are less prone to senesce than other human primary cells such as IMR90 for example. Nevertheless, we checked EdU incorporation both in BJ cells (Figure 1 - Figure supplement 1F) and BJ eH3.3i cells with expression of ectopic H3.3, with or without IFN-I treatment (Figure R2 for reviewer). We could clearly see that in our conditions (Dox addition for 24h maximum, IFNb at 1000U/mL for 24h), there is no significant difference in the number of EdU+ cells (ie proliferating cells), thus excluding effects due to senescence entry. As positive control, we have treated BJ cells with etoposide, a known senescence-inducing drug (Kosar et al., 2013; Tasdemir et al., 2016) which indeed reduces the number of EdU positive cells. We have now added a sentence in the main text as well to underscore that cells are not senescent.

      • In Figure 6 - figure supplement D, it appears that the levels of HIRA go up upon TSA and IFNb treatment. Rather than relying on visual inspection, ideally, all Western blots should be quantified to confirm the assessment that protein levels are not affected by different experimental procedures.

      We now provide quantification of all WBs below each WB. In addition, we have removed data on TSA since it could appear too preliminary.

      Reviewer #2 (Public Review):

      HIRA chaperone complex has been previously shown to localize at PML Nuclear Bodies upon various stress or stimuli (senescence, viral infections, interferon treatment). The authors have previously unraveled an anti-viral role of PML NBs through the chromatinization of HSV-1 viral genome by H3.3 chaperones. Here, the authors identify SUMOylation, as well as a SIM-like sequence in HIRA, as drivers for HIRA recruitment at PML Nuclear Bodies upon interferon-I treatment. These HIRA-containing PML NBs localize close to interferon-stimulated gene (ISG) loci. Although the functional role of HIRA/PML interaction is yet not solved, HIRA and PML regulate H3.3 loading at transcriptional end sites of IGS upon Interferon-I treatment. The authors propose that PML NBs play a buffering role for HIRA, regulating its availability depending on H3.3 level or chromatin dynamics.

      Strength:

      The authors used primary human diploid BJ fibroblasts, a relevant cell line for studying physiological regulation upon inflammatory cytokines. The role of SUMO/SIM on HIRA localization upon interferon beta treatment was assessed using interesting - but already described - tools, such as SUMO-specific affimers. The authors provide convincing results on the requirement of PML SUMOylation and a putative SIM sequence on HIRA for its localization at PML Nuclear Bodies. Other interesting observations are described, such as some PML or HIRA-dependent long-lasting H3.3 loading at transcription end site of ISGs upon interferon beta treatment, as shown by ChIP analyses of ISG loci, but also by endogenous H3.3 ChIPseq analysis.

      Weakness:

      The authors claim HIRA partitioning at PML NBs correlates with increase in "PML valency" upon interferon-I. The "valency" refers to the number of interaction domains, but the number of SUMOs conjugated on PML is not explored here (nor the number of SIMs on HIRA). Although the authors have proposed interested hypothesis and discussion, the inhibitory role of H3.3 overexpression or acetylation inhibition on HIRA localization at PML Nuclear Bodies clearly deserves further investigations.

      More generally, the manuscript explores many directions, but the links between the various observations remain unclear and limit firm conclusions.

      We thank the reviewer for his/her constructive comments.

      We have now addressed these 3 weaknesses pointed out by the reviewer.

      • Our claims on PML valency have been removed. We have now underscored the link between HIRA accumulation in PML NBs and the increase in PML and SP100 protein levels, without lingering on the valency aspects which was not the focus of our paper.

      • The role of H3.3 overexpression in inhibition of HIRA localization in PML NBs has been moved in the first part of the paper describing the mechanistic for accumulation of HIRA in PML NBs. We feel that these data are still of importance and support the role of PML NBs as a buffering place for HIRA depending on DAXX levels (new data) as well as H3.3 levels.

      We agree that the acetylation inhibition would deserve further investigations and we have thus removed the part on TSA treatment.

      • Thanks to the reviewer's comments, we have now remodeled the article to better convey two main conclusions : (1) PML NBs serve as a buffering site for HIRA. Accumulation of HIRA in PML NBs depends both on PML and SP100 concentration (and on PML SUMOylation) as well as DAXX and H3.3 levels and (2) upon IFN-I treatment, PML regulates ISGs transcription and thus indirectly regulates HIRA loading on ISGs, which controls H3.3 deposition and recycling during transcription. HIRA-mediated H3.3 deposition/recycling is highly dependent on ISGs transcription levels and is thus increased upon SP100 knock-down which unleashes ISGs transcription.
    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides the first cellular analysis of how neuronal activity in axons (in this case the optic nerve) regulates the diameter of nearby blood vessels and hence the energy supply to neuronal axons and their associated cells. This is an important subject because, in a variety of neurological disorders, there is damage to the white matter that may result from a lack of sufficient energy supply, and this paper will stimulate work on this important subject.

      Axonal spiking is suggested to release glutamate which activates NMDA receptors on myelin-making oligodendrocytes wrapped around the axons: the oligodendrocytes - either directly or indirectly via astrocytes - then generate prostaglandin E2 which relaxes pericytes on capillaries, thus decreasing the resistance of the vascular bed and (presumably) increasing blood flow in the nerve.

      Strengths of the paper

      The paper identifies some important characteristics of axon-vascular coupling, notably its slow temporal development and long-lasting nature, the involvement of PgE2 in an oxygen-dependent manner, and a role for NMDARs. Rigorous criteria (constriction and dilation of capillaries by pharmacological agents) are used to select functioning pericytes for analysis.

      Weaknesses of the paper

      The study focuses exclusively on pericytes. It would have been interesting to assess whether arteriolar SMCs also contribute to regulating blood flow

      We thank reviewer #1 for his/her positive comment on our manuscript. We also share the future interest in the optic nerve’s arteriole (there is only one main arteriole covered by SMC). However, it is not always visible in the preparation due to the orientation of the nerve - if not on the surface and directly under the microscope it is not possible to image it.

      Reviewer #2 (Public Review):

      This paper describes a new concept of "axo-vascular coupling" whereby action potential traffic along white matter axons induces vasodilation in the mouse optic nerve. This is an initial report dissecting some of the mechanisms that are undoubtedly complex as in gray matter NVC. I like the novel AVC concept.

      We really appreciate the reviewer’s positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a systematic study of the cortical propagation patterns of human beta bursts (~13-35Hz) generated around simple finger movements (index and middle finger button presses).

      The authors deployed a sophisticated and original methodology to measure the anatomical and dynamical characteristics of the cortical propagation of these transient events. MEG data from another study (visual discrimination task) was repurposed for the present investigation. The data sample is small (8 participants). However, beta bursts were extracted over a +/- 2s time window about each button press, from single trials, yielding the detection and analysis of hundreds of such events of interest. The main finding consists of the demonstration that the cortical activity at the source of movement related beta bursts follows two main propagation patterns: one along an anteroposterior directions (predominantly originating from pre central motor regions), and the other along a medio- lateral (i.e., dorso lateral) direction (predominantly originating from post central sensory regions). Some differences are reported, post-hoc, in terms of amplitude/cortical spread/propagation velocity between pre and post-movement beta bursts. Several control tests are conducted to ascertain the veracity of those findings, accounting for expected variations of signal-to-noise ration across participants and sessions, cortical mesh characteristics and signal leakage expected from MEG source imaging.

      One major perceived weakness is the purely descriptive nature of the reported findings: no meaningful difference was found between bursts traveling along the two different principal modes of propagation, and importantly, no relation with behavior (response time) was found. The same stands for pre vs. post motor bursts, except for the expected finding that post-motor bursts are more frequent and tend to be of greater amplitude (yielding the observation of a so-called beta rebound, on average across trials).

      Overall, and despite substantial methodological explorations and the description of two modes of propagation, the study falls short of advancing our understanding of the functional role of movement related beta bursts.

      For these reasons, the expected impact of the study on the field may be limited. The data is also relatively limited (simple button presses), in terms of behavioral features that could be related to the neurophysiological observations. One missed opportunity to explain the functional role of the distinct propagation patterns reports would have been, for instance, to measure the cortical "destination" of their respective trajectories.

      In response to this comment, we would like to highlight two important points.

      First, our work constitutes the first non-invasive human confirmation of invasive work in animals (Balasubramanian et al., 2020; Roberts et al., 2019; Rule et al., 2018; (Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Takahashi et al., 2011, 2015) and patients (Takahashi et al., 2011). Thus, these results bridges between recordings limited to the size of multielectrode arrays (roughly 0.16 cm2; Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Takahashi et al., 2011, 2015) and human EEG recordings spanning across large areas of the cortex and several functionally distinct regions (Alexander et al., 2016; Stolk et al., 2019). The ability to access these neural signatures non- invasively is important for cross-species comparison. This further enables us, to provide an in-depth analysis of the spatiotemporal diversity of human MEG signals and a detailed characterisation of the two propagation directions, which significantly extends previous reports. We note that their functional role remains undetermined also in these animal studies, but being able to identify these signals now in humans can provide a steppingstone for identifying their role.

      Second, and related, the reviewers are correct that we did not observe distinct propagation directions between pre- and post-movement bursts, nor a relationship with reaction time. However, such a null result would be relevant, in our view, towards understanding what the functional relevance of these signals, if any, might be. Recent work in macaques indicates that the spatiotemporal patterns of high-gamma activity carry kinematic information about the upcoming movement (Liang et al 2023). The functional role of beta may therefore be more complex and not relate to reaction times or kinematics in a straightforward manner. We believe this is a relevant observation, and in keeping with the continued efforts to identify how sensorimotor beta relates to behaviour. It is increasingly clear that spatiotemporal diversity in animal recordings and human E/MEG and intracranial recordings can constitute a substantial proportion of the measured dynamics. As such, our report is relevant in narrowing down what these signals may reflect.

      Together, we think that our work provides new insights into the multidimensional and propagating features of burst activity. This is important for the entire electrophysiology community, as it transforms how we commonly analyse and interpret these important brain signals. We anticipate that our work will guide and inspire future work on the mechanistic underpinnings of these dominant neural signals. We are confident that our article has the scope to reach out to the diverse readership of eLife.

      Reviewer #2 (Public Review):

      The authors devised novel and interesting experiments using high precision human MEG to demonstrate the propagation of beta oscillation events along two axes in the brain. Using careful analysis, they show different properties of beta events pre- and post movement, including changes in amplitude. Due to beta's prominent role in motor system dynamics, these changes are therefore linked to behavior and offer insights into the mechanisms leading to movement. The linking of wave-like phenomena and transient dynamics in the brain offers new insight into two paradigms about neural dynamics, offering new ways to think about each phenomena on its own.

      Although there is a substantial, and recent, body of literature supporting the conclusions that beta and other neural oscillations are transient, care must be taken when analyzing the data and the resulting conclusions about beta properties in both time and space. For example, modifying the threshold at which beta events are detected could alter their reported properties and expression in space and time. The authors should therefore performing parameter sweeps on e.g. the thresholds for detection of oscillation bursts to determine whether their conclusions on beta properties and propagation hold. If this additional analysis does not change their story, it would lend confidence in the results/conclusions.

      We thank the reviewing team for this comment. As suggested, we evaluated the effect of different burst thresholds on the burst parameters.

      The threshold in the main analysis was determined empirically from the data, as in previous work (Little et al., 2019). Specifically, trial-wise power was correlated with the burst probability across a range of different threshold values (from median to median plus seven standard deviations (std), in steps of 0.25, see Figure 6-figure supplement 1). The threshold value that retained the highest correlation between trial-wise power and burst probability was used to binarize the data.

      We repeated our original analysis using four additional thresholds, i.e., original threshold - 0.5 std, -0.25 std, +0.25 std, +0.5 std. As one would expect, burst threshold is negatively related to the number of bursts (i.e., higher thresholds yield fewer bursts, Figure R4a [top]), and positively related to burst amplitude (i.e., higher thresholds yield higher burst amplitudes, Figure R4a [bottom]).

      Similarly, the temporal duration of bursts and apparent spatial width are modulated by the burst threshold: lowering the threshold leads to longer temporal duration and larger apparent spatial width while increasing the threshold leads to shorter temporal duration and smaller apparent spatial width Figure R4b. Note that for the temporal and spectral burst characteristics, the difference to the original threshold can be numerically zero, i.e., changing the burst threshold did not lead to changes exceeding the temporal and spectral resolution of the applied time-frequency transformation (i.e., 200ms and 1Hz respectively).

      Importantly, across these threshold values, the propagation direction and propagation speed remain comparable.

      We now include this result as Figure 6-figure supplement 2and refer to this analysis in the manuscript (page 28 line 717).

      “To explore the robustness of the results analyses were repeated using a range of thresholds (Figure 6-figure supplement 2).”

      Determining the generators of beta events at different locations is a tricky issue. The authors mentioned a single generator that is responsible for propagating beta along the two axes described. However, it is not clear through what mechanism the beta events could travel along the neural substrate without additional local generators along the way. Previous work on beta events examined how a sequence of synaptic inputs to supra and infragranular layers would contribute to a typical beta event waveform. Although it is possible other mechanisms exist, how might this work as the beta events propagate through space? Some further explanation/investigation on these issues is therefore warranted.

      Based on this and other comments (i.e., comments 7 and 8) we re-evaluated the use of the term ‘generator’ in this manuscript.

      While the term generator can be used across scales, from micro- to macroscale, ifor the purpose of the present paper, we believe one should differentiate at least two concepts: a) generator of beta bursts, and b) generator of travelling waves.

      We realised that in the previous version of the manuscript the term ‘generator’ was at times used without context. We removed the term where no longer necessary.

      Further, the previous version of the manuscript discussed putative generators of travelling waves (page 19f.) but not generators of beta bursts. We now address this as follows:

      “Studies using biophysical modelling have proposed that beta bursts are generated by a broad infragranular excitatory synaptic drive temporally aligned with a strong supragranular synaptic drive (Law et al., 2022; Neymotin et al., 2020; Sherman et al., 2016; Shin et al., 2017) whereby layer specific inhibition acts to stabilise beta bursts in the temporal domain (West et al., 2023). The supragranular drive is thought to originate in the thalamus (E. G. Jones, 1998, 2001; Mo & Sherman, 2019; Seedat et al., 2020), indicating thalamocortical mechanisms (page 22f).”

      Once the mechanisms have been better understood, a question of how much the results generalize to other oscillation frequencies and other brain areas. On the first question of other oscillation frequencies, the authors could easily test whether nearby frequency bands (alpha and low gamma) have similar properties. This would help to determine whether the observations/conclusions are unique to beta, or more generally applicable to transient bursts/waves in the brain. On the second issue of applicability to other brain areas, the authors could relate their work to transient bursts and waves recorded using ECoG and/or iEEG. Some recent work on traveling waves at the brain-wide level would be relevant for such comparisons.

      We appreciate the enthusiasm and the suggestions. To comment on the frequency specificity of the observed effects we conducted the same analysis focusing on the gamma frequency range (60-90 Hz). For computational reasons, we limited this analysis to one subject. Figure R1 shows the polar probability histogram for the beta frequency range (left) and the gamma frequency range (right). In contrast to the beta frequency range, no dominant directions were observed for the gamma range and von Mises functions did not converge. These preliminary results suggest some frequency specificity of the spatiotemporal pattern in sensorimotor beta activity. We believe this paves the way for future analysis mapping propagation direction across frequency and space.

      Here we did not investigate the spatial specificity of the effects, as the beta frequency range is dominant in sensorimotor areas. Investigating beta bursts in other cortical areas would have likely resulted in very few bursts. We discuss our results across spatial scales in the section: Distinct anatomical propagation axes of sensorimotor beta activity. However, please note that most of the previous literature operates on a different spatial scale (roughly 4mm; Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Rule et al., 2018; Takahashi et al., 2011, 2015) and different species (e.g., non-human primates). Non-invasive recordings in humans capture temporospatial patterns of a very different scale, i.e., often across the whole cortex (Alexander et al., 2016; Roberts et al., 2019). Comparing spatiotemporal patterns, across different spatial scales is inherently difficult. Work

      investigating different spatial scales simultaneously, such as Sreekumar et al. 2020, is required to fully unpack the relationship between mesoscopic and macroscopic spatiotemporal patterns.

      Figure R1: Spatiotemporal organisation for the beta (β, 13-30Hz) and gamma (γ, 60-90) frequency range for one exemplar subject. Same as Figure 4a, but for one exemplar subject.

      If the source code could be provided on github along with documentation and a standard "notebook" on use other researchers would benefit greatly.

      All analyses are performed using freely available tools in MATLAB. The code carrying out the analysis in this paper can be found here: [link provided upon acceptance]. The 3D burst analyses can be very computationally intensive even on a modern computer system. The analyses in this paper were computed on a MacBook Pro with a 2.6 GHz 6-Core Intel Core i7 and 32 Gb of RAM. Details on the installation and setup of the dependencies can be found in the README.md file in the main study repository.

      This information has been added to the paper in the methods section on page 35.

    1. Author Response

      Reviewer #2 (Public Review):

      Understanding the molecular mechanism of obesity-associated OA is highly in clinical demand. Overall, the current study is well-designed and illustrated that down-regulated GAS6 impairs synovial macrophage efferocytosis and promotes obesity-associated osteoarthritis. Based on the patient's sample, the data indicated synovial tissues are highly hyperplastic in obese OA patients and infiltrated with more polarized M1 macrophages than in non-obese OA patients. Further authors proved that obesity promotes synovial M1 macrophage accumulation and GAS6 was inhibited in synovitis during OA development in mice models. The sample size, data collection, and quality of the IHC and immunofluorescent histological sections are outstanding. The results were well presented with appropriate interpretation. But the following major questions should be addressed.

      Major:

      1) Animal model: Ten-week-old animals received DMM surgery and were fed a standard/HFD diet for 4 or 8 weeks prior to specimen harvest. Since Wang J and other studies have shown that male ApoE(-/-) and C57BL/6J wild-type (WT) mice fed with a high-fat diet for 12 or 24 weeks, and the ApoE(-/-) mice gained less body weight and had less fat mass and lower triglyceride levels with better insulin sensitivity and lower levels of inflammatory markers in skeletal muscle than WT (Wang J, et al. Atherosclerosis. 2012 Aug;223(2):342-9. PMID: 22770993; Hofmann SM, et al. Diabetes. 2008 Jan;57(1):5-12. PMID: 17914034; Kypreos KE et al. J Biomed Res. 2017 Nov 1;32(3):183-90. PMID: 29770778). Thus, it is very important to provide the data on the final body weight gained in your groups and provide a relative background of the animal model chosen in the introduction or discussion. Please explain why ApoE-/- mouse model, and how this animal model is clinically relevant. Does a high-fat diet induced obsess OA available in C57BL/6 WT?

      Thank you for your valuable comment. We have added the body weight change data for each group of mice in Revised Figure 2-figure supplement 3. We also provided a relative background of the animal model in paragraph 2 of the Discussion section, which reads, “ApoE plays an important role in maintaining the normal levels of cholesterol and triglycerides in serum by transporting lipids in the blood. Mice lacking ApoE function develop hypercholesterolemia, increased very low-density lipoprotein (VLDL) and decreased high-density lipoprotein (HDL), exhibiting chronic inflammation in vascular disease and nonalcoholic steatohepatitis.”.

      Epidemiological study results suggest obesity is an independent risk factor for OA pathological progression. Gierman et al. found that increased plasma cholesterol levels play a vital role in the development of OA1,2. Mice deficient in ApoE-/- showed naturally high levels of LDL-cholesterol independent of gender and age, which could additionally be increased by a cholesterol-rich diet3,4. Moreover, recent studies found that ApoE-/- mice feeding with HFD gained more body weight than those feeding standard chow-diet groups5–7. We have re-analyzed the body weight statistics and found that ApoE-/- fed with HFD (19.81±1.33g) gained more body weight than the control (16.89±0.75g). These manuscripts indicated that feeding HFD to ApoE-/- mice for a short period could accelerate the increase in LDL cholesterol levels and cause more body weight gain. ApoE-/- mice may be partially clinically relevant to pathological progression in obese osteoarthritis patients with elevated plasma LDL cholesterol levels. As Reviewer #2 mentioned, an HFD induced obesity is available in C57BL/6 WT according to our weight gain data. However, the effect of obesity on OA progression in these two kinds of animals deserves further study.

      References:

      1. Gierman LM, Kühnast S, Koudijs A, et al. Osteoarthritis development is induced by increased dietary cholesterol and can be inhibited by atorvastatin in APOE*3Leiden.CETP mice—a translational model for atherosclerosis. Ann Rheum Dis. 2014;73(5):921-927.

      2. Gierman LM, van der Ham F, Koudijs A, et al. Metabolic stress-induced inflammation plays a major role in the development of osteoarthritis in mice. Arthritis Rheum. 2012;64(4):1172-1181.

      3. Wu D, Sharan C, Yang H, et al. Apolipoprotein E-deficient lipoproteins induce foam cell formation by downregulation of lysosomal hydrolases in macrophages. J Lipid Res. 2007;48(12):2571-2578.

      4. Naura AS, Hans CP, Zerfaoui M, et al. induces lung remodeling in ApoE-deficient mice: an association with an increase in circulatory and lung inflammatory factors. Lab Invest. 2009;89(11):1243-1251.

      5. Tung MC, Lan YW, Li HH, et al. Kefir peptides alleviate high-fat diet-induced atherosclerosis by attenuating macrophage accumulation and oxidative stress in ApoE knockout mice. Sci Rep. 2020;10(1):8802.

      6. Bao M hua, Luo H qing, Chen L hua, et al. Impact of high fat diet on long non-coding RNAs and messenger RNAs expression in the aortas of ApoE(−/−) mice. Sci Rep. 2016;6(1):34161.

      7. Cao X, Guo Y, Wang Y, et al. Effects of high-fat diet and Apoe deficiency on retinal structure and function in mice. Sci Rep. 2020;10(1):18601.

      2) Control group: The DMM surgery was performed on the right leg, and the contralateral knee joint should be used as a baseline to show the level of M1 macrophage infiltration under the obsess microenvironment.

      Thank you for this insightful comment. The reason why we used the right lower limb as the control group in our experiment was mainly because we considered the impact of right knee surgery on the left lower limb. A book published in 2014 described a series of method for inducing mouse osteoarthritis model, authors noted that sham-operated left knee joints would develop OA-like symptoms after right knee joints received DMM. Thus, Lorenz et al. strongly recommend using a separate control group for sham surgeries.

      References:

      1. Lorenz, J., Grässel, S. (2014). Experimental Osteoarthritis Models in Mice. In: Singh, S., Coppola, V. (eds) Mouse Genetics. Methods in Molecular Biology, vol 1194. Humana Press, New York, NY.
    1. Author Response

      Reviewer #1 (Public Review):

      The goal of this study was to investigate the mechanisms that lead to the release of photosynthetically fixed carbon from symbiotic dinoflagellate alga to their coral host. The experimental approach involved culturing free-living Brevolium sp dinoflagellates under "Normal" and "Low pH" conditions (respective target pH of 7.8 and 5.50) and measuring the following parameters: (Fig.1) cell growth rate over ~28 days, photosynthetic activity, glucose and galactose secretion at day 1; (Fig. 2) Cell clustering, external morphology (using SEM), and internal morphology (using TEM) after 3 weeks; (Fig. 3) Transcriptomic analyses at days 0 and 1; and (Fig. 4) glucose and galactose concentration in Normal culturing medium after 24h incubation with a putative cellulase inhibitor (PSG).

      The paper reports decreased growth at Low pH coupled with decreased photosynthetic rates and increased glucose and galactose release in 1-day Breviolum sp. cultures. At this same time point, genes related to cellulase were upregulated, and after 3 weeks morphological changes on the cell wall were reported. The addition of the cellulase inhibitor PSG to cells in pH 7.8 media decreased the release of glucose and galactose.

      The paper concludes that acidic conditions mimicking those reported for the coral symbiosome -the intracellular organelle that hosts the symbiotic algae- upregulate algal cellulases, which in turn degrade the algal cell wall releasing glucose and galactose that can be used as a source of food by the coral host. However, there are some methodological issues that hamper the interpretation of results and conclusions.

      We appreciate your helpful comments and apologize the confusion caused by insufficient descriptions in the previous manuscript. In the revised manuscript we clarify what we originally intended to demonstrate including the followings:

      (1) Most analyses including SEM and TEM were done at day 0 and 1, except for a few, i.e. growth rate over 28 days and cell clumping assay done 3 weeks after the inoculation, which is summarized as a schematic panel and clarified in the revised manuscript.

      (2) Inhibitor assay for secreted celluloses was done in pH 5.5.

      (3) We do not intend to suggest that low pH medium mimics symbiosomes, as these organelles are far more complex than simple culture media and how symbiosomes are maintained and what the interior environment is like are not fully understood in general. Based on previous studies, presumably they are featured by low pH, high CO2, host-derived nutrients. Among these, we focus on low pH, which is a stressor for dinoflagellates to go through in not only symbiosomes but also natural environments, e.g. animal gut.

      In this study, we clarified how algae respond to low pH as an environmental stressor, which can also provide insights into how they interact with the host inside the guts as well as symbiosomes.

      Reviewer #2 (Public Review):

      Ishii and colleagues investigated the process of monosaccharide release from algae in low-pH environmental conditions, mimicking the acidic lysosomal-like intracellular compartment where the algae reside symbiotically and transfer nutrients to their hosts, namely corals and other animals. Upon exposure of cultured algae to low pH, subsequent physiological changes as well as the increased presence of glucose and galactose were measured in the surrounding media. Concurrently, photosynthetic activity was decreased, and further experiments employing the photosynthetic inhibitor DCMU to cultures also replicated the increased monosaccharide release. Transcriptomic comparison of algae in low pH to controls showed differential expression in glycolytic pathways and, interestingly, a strong upregulation of signal-peptide-containing isoforms of cellulases. Finally, the elegant use of a cellulase inhibitor on the cultured algae revealed a decrease in monosaccharides in the media. This led the authors to propose a pathway of sugar release in which acidic conditions trigger a cellulase-driven cascade of cell wall degradation in the algae and their consequent release of monosaccharides. These results have interesting implications on the molecular mechanisms of coral-algae symbiosis, contributing to the understanding of how these important symbioses function on the cellular level.

      Overall the conclusions of this manuscript are supported by the data presented, but clarification and elaboration are needed to fully justify the proposed mechanisms and better situate the results in a broader context of the field.

      We thank the reviewer for the positive comments. In the revised the manuscript we show that the results could be better explained with the proposed mechanisms in a broader context.

    1. Author Response

      Reviewer #2 (Public Review):

      1) Mechanistic details of how FCA regulates FLC have been extensively studied, and both transcriptional and co-transcriptional regulations occur. I understand that FCA affects the 3'end processing of antisense COOLAIR RNAs, which regulate FLC. FCA also physically interacts with COOLAIR RNAs and other proteins, including chromatin-modifying complexes, which establish epigenetic repression of FLC regardless of vernalisation. In addition, FCA appears to function to resolve R-loop at the 3' end FLC, and FLC preferentially interacts with m6A-modified COOLAIR by forming liquid condensates. FCA is also alternatively spliced in an autoregulatory manner, and fca-1 mutant was reported to be a null allele as fca-1 cannot produce the functional form of FCA transcripts (r-form).

      However, I could not find any information on the fca-3 allele, which was reported to exhibit a weaker phenotype in terms of flowering time (Koornneef et al., 1991). In this manuscript, the authors showed that the level of FLC expression is lower than fca-1 and higher than Ler WT, but I could not find any other relevant information on the nature of the fca-3 allele. Given the known details on the function of FCA, the authors should explain how fca-3 shows an "intermediate" phenotype, which is highly relevant to the argument for an "analog" mode of regulation in fca-3. Therefore, the nature of the fca-3 mutant should be described in detail.

      We thank the reviewers for pointing out this omission. We have added much more information on the genotypes in the methods of the manuscript. We emphasise, however, that the rationale for selecting fca-3 as an intermediate mutant was empirical: namely, it generates an intermediate level of FLC expression (Fig. 1C and Fig. 1S1).

      2) The authors used a transgene (FLC-venus) in which an FLC fragment from ColFRI was used. Both fca-1 and fca-3 is Ler background where FLC sequence variations are known. I understand that the authors introgressed the transgenic in Ler background to avoid the transgene effect, but it is not known whether fca-1 or fca-3 mutations have the same impact on Col- FLC.

      We tested the expression of both endogenous (Ler) and FLC-Venus (Col-FLC) copies in these mutants by qPCR and found similar results (Fig. 1S1C,D), indicating that the fca-1 and fca-3 mutations have similar effects in both cases.

      3) Fig. 3A: I understand that Fig 3A is the qRT-PCR data using whole seedlings, and the gradual reduction of FLC from 7 DAG to 21 DAG was used to test the "analog" vs. "digital" mode of gene regulation in fca-1 and fca-3. I am not sure whether this is biologically relevant.

      Indeed, Ler is the only line that has transitioned to flowering during the experiment, with both fca lines being late flowering mutants. We totally agree that for Ler, later timepoints may be biologically irrelevant. It is used in this case as a negative control for the imaging, since FLC in Ler was already mostly OFF from the first timepoint and no biological conclusions are drawn from the later times. We have added a comment to this effect in the results section, also clarifying in the discussion that our focus is on the early regulation of FLC. Therefore, by looking at the young seedling in wildtype Ler, as we and others have previously, we are already looking too late to capture the switching of FLC to OFF. However, we expect that this combination of analog and digital regulation will be highly

      relevant to FLC regulation in wild-type plants in different accessions, partly leading to the differences in autumn FLC levels that were shown to be so important in the wild (Hepworth et al. 2020).

      3-a) The authors wrote that "This experiment revealed a decreasing trend in fca-3 and Ler (Fig. 3A)". But, I do also see a "decreasing trend" in fca-1 as well (although I understand that they may not be statistically significant). I also noticed that the level of FLC in fca-1 at 7 day has a greater variation. Is there any explanation?

      The level of FLC in fca-1 at 7 days is indeed more variable in these experiments. However, in a new second experiment, this is not the case (Fig. 3S2). In addition, a similar effect has not been observed in the ColFRI genotype (Fig. S9F of Antoniou-Kourounioti et al. 2018). Therefore, we believe this greater variation in one data set may simply be due to random fluctuations.

      For the decreasing trend in fca-1 in Fig. 3A, as the reviewer says, this is not significant. However, in the second experiment, we again see a decrease, which is now slow but significant. The decrease could be due to a subset of fca-1 ON cells switching off (in tissue that we have not imaged) and we comment on this slow decrease in the text.

      3-b) The decreasing trend observed in Ler (although the expression of FLC is already relatively low in Ler) may be the basis for the biological relevance. But Fig. 3D shows that the FLC-venus intensity in Ler root is not "decreasing". The authors interpreted that "root tip cells in Ler could switch off early, while ON cells still remain at the whole plant level that continue to switch off, thereby explaining the decrease in the qPCR experiment." Does this mean that the root tip system with FLC-venus cannot recapitulate other parts of plants (especially at the shoot tip where FLC function is more relevant)?

      The authors utilize the root system with transgenes in mutant backgrounds to observe and model the gene repression (transgene repression, to be exact). If the root tip cells behave differently from other parts of plants, how could the authors use data obtained from the root tip system?

      We now show that FLC-Venus in Ler, fca-3, fca-1 in young leaves have similar expression patterns to roots, thus validating the root system as an appropriate one to study the switching dynamics, see response to Essential comment 3. Nevertheless, in Fig. 3A, we show that FLC expression declines even in Ler. However, the levels here are low, so if it is indeed a subfraction of late-switching cells that are responsible, these cells cannot form a large proportion of the plant. We now make this clear in the text.

      4) I do see both fca-1 and fca-3 can express FCA at a comparable level (Fig. 3B); thus, I guess that the authors are measuring total FCA transcripts and that fca-3 may result in different levels of "functional form" of FCA. But this is not clearly discussed.

      We have now added yellow boxes in Fig. 2S3 to show additional examples of short files of ON cells in fca-3 and fca-4. To further improve the interpretation of this image (and all others in the manuscript) we have changed the presentation of the imaging using a different colourmap to enhance clarity.

      5) Quantification based on image intensity needs to be carefully controlled. Ideally, a threshold to call "ON" or "OFF" state should be based on the comparison to internal control and it is not clear to me how the authors determined which cells are ON or OFF based on image intensity (especially in fca-3).

      For the wild-type and fca-1 situations there is no switching in the model, and hence no dynamical changes in the FLC protein levels. As the FLC levels in the ON or OFF states are simply fit to the data using log-normal distributions, this would simply be a fitting exercise for fca-1 and Ler, and little would be learnt. Hence, we have not pursued this line of analysis.

      6) In many parts, I had to guess how the experiments were performed with what kind of tissues/samples. The methods section can benefit from a more thorough description.

      We have now gone through and added the missing information.

      Related to Public review #2. What is the phenotype (flowering time) of FLC-venus in fca-1 and fca-3? In addition, how many independent lines were used? Do they behave similarly?

      It was observed that with the additional FLC gene (in the form of the FLC-Venus), flowering is delayed as expected. However, this was not quantified in this work. Instead, we validated that the expression of the transgene was equivalent to endogeneous between genotypes, as shown in Fig. 1S1, supporting that this is an appropriate readout for FLC expression. One line for each genotype was selected and used in this work. In addition, we also now use fca-4, which has similar expression to fca-3, and where FLC-Venus also behaves similarly to the fca-3 case (Fig. 1S1, 2S3).

      Reviewer #3 (Public Review):

      1) The way the authors define ON and OFF cells sounds a bit arbitrary to me and, in my understanding, can affect a lot the outcomes and derived conclusions. The authors define ON cells to those cells having more than one transcript, or when they are above the value of 0.5 of the Venus intensity measure - what would it happen if the thresholds are slightly above these levels? And why such thresholds should be the same for the studied lines Ler, fca-3 and fca-1? By looking at the distributions of mRNAs and Venus intensities in Ler and fca-3 plants, one could argue that all cells are in an OFF, 'silent' state, and that what is measured is some 'leakage', noise or simply cell heterogeneity in the expression levels. If there is a digital regulation, I would expect to see this bimodality more clearly at some point, as it was captured in Berry et al (2015) - perhaps cells in fca-1 show at a certain level of bimodality? When seeing bimodality, one could separate ON and OFF states by unmixing gaussians, or something in these lines that makes the definition less arbitrary and more robust.

      As explained in Essential comment 5, we have removed arbitrary thresholding from the manuscript and only used absolute thresholds from smFISH (now changed to >3, and shown that our results are robust to varying these thresholds, Fig. 2S2). If all cells are in the OFF state and fca-3 just has higher noise/heterogeneity, then this does not explain the reduction in expression over time. Nor can such heterogeneity explain the short files of ON cells and longer files of OFF cells in Fig. 2S3: the cells should just be a random mix of varying FLC levels. Our results are much more compatible with switching into a heritable silenced state. Finally, with bimodality, this is difficult to see as clearly as before due to the wide levels of expression in fca-3, but we believe it is present: a well-defined OFF state together with a broad ON state. This broadness makes extracting the ON cells quite difficult as a completely rigorous unmixing of the two states is just not possible.

      2) The authors use means in all their plots for histograms and data, and perform tests that rely on these means. However, many of these plots are skewed right distributions, meaning that mean is not a good measure of center. I think using median would be more appropriate, and statistical tests should be rather done on medians instead of means. If tests using medians were performed, I believe that some of the pointed results will be less significant, and this will affect the conclusions of this work.

      Highly expressing FLC lines and mutants, such as ColFRI and fca-9, often used for vernalization studies, are late flowering, but do eventually flower even with no decrease in FLC levels (and so no switching). This is not an artifact of using roots versus shoots, and presumably arises from there being multiple inputs into the flowering decision which can allow the FLC-mediated flowering inhibition to eventually be overcome.

      3) Some data might require more repeats, together with its quantification. For instance, the expression levels for fca-1 in Fig 2E and Fig 3D at 7 days after sowing look qualitatively different to me - not just the mean looks different, but also the distribution; fca-1 in Fig 3D looks more monomodal, while in Fig 2E it looks it shows more a bimodal distribution. Having these two different behaviours in these two repeats indicates that, more ideally, three repeats might be needed, together with their quantification. Fig. 2C would also need some repeats. In Fig 1S1 C and D, it would be good to clarify in which cases there are 2 or more repeats -3 repeats might be needed for those cases in Fig 1S1 C-D that have large error bars.

      The data in Figs. 2C and 2E are both based on two independent experiments, with the results combined. The data in Fig. 3D is almost entirely based on three independent experiments. We have now stated this in the legend. The Venus imaging was performed on separate microscopes for Fig. 2 and Fig. 3 and this possibly accounts some of the observed differences. However, we do not think that the data in Fig 2E for fca-1 supports a bimodal distribution: the slight peak at higher levels is, we believe, much more likely to be a statistical fluctuation. For Fig. 1S1 C and D, we now clarify in the legend that n=2 biological replicates for fca-3 and n=3 for others.

      Also, when doing the time courses, I find it would be very beneficial to capture an earlier time point for all the lines, to see whether it is easier to capture the digital nature of the regulation. Note that the authors have already pointed that 7 days after sowing might be too late for Ler line to capture the switch.

      We agree that capturing earlier time points for Ler in particular is interesting and important. However, we have found that this requires specialist imaging in the embryo and we feel that this is really beyond the scope of this manuscript and will instead form the basis of a future publication.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use what is potentially a novel method for bootstrapping sequence data to evaluate the extent to which SARS-CoV-2 transmissions occurred between regions of the world, between France and other European countries, and between some distinct regions within France. Data from the first two waves of SARS-CoV-2 in Europe were considered, from 2020 into January 2021. The paper provides more detail about the specific spread of the virus around Europe, specifically within France, than other work in this area of which I am aware.

      First of all, we would like to thank reviewer #1 for their evaluation and their various comments which, in our opinion, have allowed us to considerably improve the manuscript.

      An interesting facet of the methodology used is the downsampling of sequence data, generating multiple bootstraps each of around 500-1000 sequences and conducting analysis on each one. This has the strength of sampling, in total, a large number of sequences, while reducing the overall computational cost of analysis on a database that contains in total several hundred thousand sequences. A question I had about the results concerns the extent of downsampling versus the rate of viral migration: If between-country movements are rapid, a reduced sample could be misleading, for example characterising a transmission path from A to B to C as being from A to C by virtue of missing data. I acknowledge that this would be a problem with any phylogeographic analysis relying on limited data. However, in this case, how does the rate of migration between locations compare to the length of time between samples in the reduced trees? Along these lines, I was unclear to what extent the reported proportions of intra- versus inter-regional transmissions (e.g. line 223) would be vulnerable to sampling effects.

      This question is indeed a very important one. Between-country movement rate can be high but the contagious period for a SARS-CoV-2-infected individual is short (a bit less than two weeks in average). In our subsamples, the dated trees have a median branch length around 20 days. To ensure that our subsamples did not introduce errors in estimating the exchange events between locations, we conducted a simulation. Briefly, we generated a tree of 1,000,000 tips with a five-states discrete trait. We then took 100 subsampled 1000-leaves trees, reconstructed the ancestry for the discrete trait and assess transitions between states. The error rate is less than 3% on average: it comprises the missing data, as you pointed out, and the errors in reconstructing the ancestry for the trait deeper in the tree.

      We think that overall, less than 3% is a satisfying error rate.

      The results of this specific simulation were added to the paper (lines 150-157) and as Figure 2—figure supplement 1.

      A further question around the methodology was the use of an artificially high fixed clock rate in the phylogenetic analysis so as to date the tree in an unbiased way. Although I understood that the stated action led to the required results, given the time available for review I was unable to figure out why this should be so. Is this an artefact of under-sampling, or of approximations made in the phylogenetic inference? Is this a well-known phenomenon in phylogenetic inference?

      We thank reviewer #1, who was, as reviewer #2 and the editor, disturbed by the use of an artificially fast and fixed molecular clock. It was an artifact to correct a mistake in our code that has been fixed. See the answer to point (3) of the editor.

      The value of this kind of research is highlighted in the paper, in that genomic data can be used to assess and guide public health measures (line 64). This work elucidates several facts about the geographical spread of SARS-CoV-2 within France and between European countries. The more clearly these facts can be translated into improved or more considered public health action, through the evaluation of previous policy actions, or through the explication of how future actions could lead to improved outcomes, the more this work will have a profound and ongoing impact.

      This is a very interesting point to emphasize indeed. We are currently discussing with public health specialists in our institution on how to assess past public health actions using phylodynamics data in a statistically valid manner.

      Reviewer #2 (Public Review):

      This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020 and the authors should be congratulated for tackling this important question. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors provided compelling, although as of yet correlative and incomplete, evidence towards how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, however this work suffers from a number of serious analytical shortcomings, all of which can be overcome in a major revision and re-analysis.

      We would like to thank the reviewer #2 for their evaluation and their various comments. We want to point that reviewer #2 was contacted for advice on strategy for the molecular clock since she performed a study on a similar topic describing SARS-CoV-2 epidemics in Canada during 2020. We strongly believe that all reviewer #2 comments drastically contributed to improve the quality of this work.

      With this genomic epidemiology analysis, the authors disentangled the relative contributions of different geographic levels to transmission events in France and in Europe in the first two COVID-19 waves of 2020. By partitioning the analysis into three complementary, but distinct, geographic levels, the migration flows in and out of continents, countries in Europe, and regions in France were inferred using maximum likelihood ancestral state reconstruction. The major strengths of this paper were the inclusion of multiple geographic levels, the comparison of different rate symmetries in the ancestral character estimation, and the comprehensive qualitative descriptions of comparisons over time and geographies. However, there were also major weaknesses that need to be addressed and are described in more detail below. They include summing across replicates that were drawn with replacement and were not independent; inadequate justification for excluding underrepresented geographies; the assertion that positive correlation between intra-regional transmission and deaths validates the accuracy of the analysis; considering the framework the authors have chosen for this analysis the analysis would accommodate and benefit strongly from increasing the size of the sequence sets selected for analysis in each replicate; and the sparsity of quantitative (over qualitative or exploratory) comparisons and statistics in the reporting of results. In particular, it would greatly strengthen the paper if the authors could better evaluate the effect of travel restrictions on importations and exportations by testing hypotheses, quantifying changes in the presence of restrictions, or estimating inflection points in importation rates.

      We are grateful for this comprehensive listing of the strengths and weaknesses of our study. Regarding the limitations of this study, these will be detailed specifically for each dedicated remark of the reviewer. We would like to emphasize that all the remarks and limitations reported here by reviewer #2 are in our opinion fully justified. We hence have tried to bring additional analyses (study of the Pango lineages, averaging of the subsamples, simulation study to justify the size of the sampling), a modification of the methodology (in particular concerning the molecular clock) and a thorough rewriting of the “Results” section.

      General comments on the Background: Need to elaborate on how this study fits into the big picture in the first paragraph. Should discuss how phylodynamics contributes to understanding of viral outbreaks, SARS-CoV-2 epidemiology and viral evolution.

      We have added in the “Introduction” section some elements to better understand why phylodynamics is an important field in the epidemiology of SARS-CoV-2 and its evolution.

      The authors should consider a hypothesis driven framework for their analyses, for example considering the geographically central position of France what hypotheses stem from this considering sources of viral importations and destinations of exportations from/to Europe vs other international? Or other a priori expectations.

      We agree with reviewer #2 about this remark. Indeed, given the central position of France, we can hypothesize that it has strongly participated in the dissemination of the virus within Europe. This hypothesis has been included in the "Introduction" section of the revised version (lines 102-105).

      To address the computational limits of phylogenetic reconstruction, 100 replicates of fewer than 1000 sequences each were sampled for each epidemic wave at each level. The inter- and intra-regional transmissions were averaged and then summed across replicates in order to compare the relative roles played by each geography towards transmission. While we see the logic in using the sum across replicates, this is highly likely to bias results, especially since in the methods, this is described as sampling with replacement between replicates (LX). The validity of summing replicates needs to be discussed and are likely most appropriately presented as mean or median. Also, these samples are quite small considering the computational capacity of the maximum likelihood tools being used. We recommend repeating the analysis with a substantially larger number of sequences per sample.

      We thank reviewer #2 for this relevant remark. We initially summed the subsamples, a strategy that may possibly bias the results. In the new version of the manuscript, we averaged the subsamples by region and by week as recommended (and stated in the methods, line 536-537).

      About the size of our subsamples, it made no difference to use 1,000, 2,000 or 5,000 genomes in each subsample. To get a more definitive and scientifically sound answer, we performed a simulation assay that has been included in the manuscript and is shown is what is now figure 2 (and figure 2—figure supplement 1). These simulations show that our subsampling strategy allows for an accurate estimate of transition rates for a discrete parameter (lines 107-160).

    1. Author Response

      Reviewer #1 (Public Review):

      The paper addresses an interesting question - how genetic changes in Y. pestis have led to phenotypic divergence from Y. pseudotuberculosis - and provides strong evidence that the frameshift mutation in rcsD is involved. Overall, I found the data to be clearly presented, and most of the conclusions well supported by the data. The authors convincingly show that (i) the frameshift mutation in rcsD alters the regulation of biofilm formation, (ii) this effect depends upon expression of a small protein that corresponds to the C-terminal portion of RcsD, and (iii) the frameshift mutation in rcsD prevents loss of the pgm locus. I felt that the discussion/conclusions about what phosphorylates/dephosphorylates RcsB and how this impacts biofilm formation are overstated, as there are no experiments that directly address this question. I also felt that the authors' model for what phosphorylates/dephosphorylates RcsB in Y. pestis should be more clearly articulated, even if it is only presented as speculation. Lastly, the authors propose that full-length RcsD is made in Y. pestis and contributes to phosphorylation of RcsB, but the evidence for this is weak (faint band in Figure 2d). It may be that the N-terminal domain of RcsD is functional. I recommend either softening this conclusion or testing this hypothesis further, e.g., by introducing an in-frame stop codon early in rcsD after the frame-shift.

      Thanks for your comments. We have provided a model and revised the discussion about phosphorylation/dephosphorylation of RcsB and how this impacts biofilm formation (Figure 8 and Supplementary Figure 4). In addition, we have introduced an in-frame stop codon in rcsD before the frameshift and showed that full-length RcsD is only made in wildtype Y. pestis but not in the rcsDpe-stop mutant (Supplementary Figure 1g).

      Reviewer #2 (Public Review):

      Guo et al. have investigated the consequences of a frameshift mutation in the rcsD gene in the Yersinia pseudotuberculosis progenitor that is conserved in modern Y. pestis strains. Interestingly, they identify a start codon with a ribosome binding site that enables production of an Hpt-domain protein from the C-terminus in Y. pestis. Targeted deletion of this Hpt-domain increased biofilm production in Y. pestis. They find that the ancestral RcsDpstb (full length) is a positive regulator of biofilm in Y. pestis while the Hpt-domain version (RcsDYP) represses biofilm in vitro. When fleas were infected with Y. pestis expressing the ancestral RcsDPSTB protein, there was no difference in bacterial survival or rate of proventricular blockage. This strain also killed mice the same rate (in a different Y. pestis strain background). However, replacing RcsDYP with RcsYPTB dramatically increases the frequency of pgm locus deletion (containing Hms ECM and yersiniabactin genes) during flea infection. The authors predict that this would reduce the invasiveness of the bacteria in mammals and/or flea blockage in subsequent flea-rodent-flea transmission cycles. They also measured global gene expression differences between RcsDPSTB compared to the wild-type strain. They argue that the frameshift of RcsD maintaining the Hpt-domain (RcsDYP) was needed to regulate biofilm while limiting loss of the pgm locus.

      Loss of the pgm locus was not tested in the Y. pestis rcsD mutant strain (lacking the entire gene or just the C-terminal Hpt domain). Therefore, the claim that maintaining the Hpt-domain protein was important lacks convincing evidence. Additionally, it is possible that the population of rcsDpe::rcsDpstb after in vitro growth for 6 days would still be proficient at infecting and blocking fleas, even though many of the bacteria would have lost the pgm locus. Production of Hms polysaccharide by pgm+ could trans-complement those that are pgm-. The nature of the pgm locus loss is assumed to be due to recombination between IS elements. This is certainly the likeliest explanation but not the only one. The authors checked for pgm loss by phenotype (CR binding) and by two sets of primers, one targeting the hmsS gene and another set that is unspecified. Loss of the entire pgm (especially yersiniabactin genes) should be clarified.

      Thanks for your comments. We have now provided the data to show that deletion of RcsD-Hpt resulted in increased loss of the pgm locus (Figure 5d) to strengthen the claim that maintenance of the Hpt-domain is significant for retention of the pgm locus. We also agree that 6-day old cultures of a mixture of pgm+ and pgm- rcsDpe::rcsDpstb will still be capable of infecting and blocking fleas. However, these strains will be less efficient at causing disease in the vertebrate host in the absence of the pgm locus. We agree that recombination between IS elements might not be the only cause of loss of the pgm locus. To verify the loss of the pgm locus, we have used two sets of primers. One set targets the hmsS gene and another set targets the upstream and downstream sequences of the pgm locus (Supplementary Table 3). We have clarified this in the revised manuscript (Line 610-613).

      Reviewer #3 (Public Review):

      The Rcs phosphorelay plays an important role in regulating gene expression in bacteria; most of the current knowledge about the Rcs proteins is from E. coli. Yersinia pestis, carrying mutations in two central components of the Rcs machinery, provides an interesting example of how evolution has shaped this system to fit the life cycle of this bacteria. In bacteria other than Y. pestis, most Rcs activating signals are sensed via the outer membrane lipoprotein RcsF; from there, signalling depends on inner membrane protein IgaA, a negative regulator of RcsD. Histidine kinase RcsC is the source of the phosphorylation cascade that goes from the histidine kinase domain of RcsC to the response regulator domain of RcsC, from there to the histidine phosphotransfer (Hpt) domain of RcsD, and finally to the response regulator RcsB. RcsB, alone or with other proteins, regulates transcription of many genes, both positively and negatively. These authors have previously shown that RcsA, a co-regulator that acts with RcsB at some promoters, is functional in Y. pseudotuberculosis but mutant in Y. pestis, and that this leads to increased biofilm in the flea. The authors also noted that rcsD in Y. pestis contains a frameshift after codon 642 in this 897 aa protein; in theory that should eliminate the Hpt domain from the expressed protein. However, they found evidence that the frame-shifted gene had a role in regulation. This paper investigates this in more depth, providing clear evidence for expression of the Hpt domain (without the N-terminal domain), and demonstrating a critical role for this domain in repressing biofilm formation. The Y. pseudotuberculosis RcsD does not express a detectable amount of the Hpt domain nor does it repress biofilm formation. The ability of the Hpt domain protein to keep biofilm formation low explains most of what is observed for the full-length frame-shifted protein.

      1) The authors provide a substantial amount of data supporting the expression of the C-terminus of RcsD is sufficient and necessary for low biofilm levels, and that this is dependent upon the active site His in the RcsD Hpt domain (H844A) as well as other components of the basic phosphorelay (RcsC and RcsB). However, it is only possible to see this protein by Western blot in 100-fold "Enriched" lysates (Figure 2). No small protein was detected in the RcsDpstb strain, although the enriched lysate was not shown for this. Without that experiment, it is not possible to evaluate whether the small protein is also made from the rcsDpstb gene. Either answer would be interesting, and would allow other conclusions to be drawn. Is the RBS and start codon the same for the HPT region of this rcsD gene (it could be added to Supplementary Table 6). If the small protein is made, is its ability to function blocked by the excess full length protein in terms of interactions with RcsC? Or is the expression of the small protein dependent upon loss of overlapping translation from the upstream start?

      The small Hpt protein may be produced from expression of the epitope tagged rcsDpstb gene as it can be detected in an enriched isolation of this sample (Supplementary Figure 1f). Because only a small amount of the RcsD-Hpt is produced from the rcsDpstb substitution, it might only function at low levels in the presence of large amounts of RcsDpstb. The RBS and start codon are the same for the RcsD-Hpt in Y. pestis and Y. pseudotuberculosis, we have added them in the Supplementary Table 6. In addition, we have provided a model to show the function and regulation of RcsD and Hpt (Supplementary Figure 4).

      2) In many phosphorelays, the protein kinase also acts as a phosphatase, and which direction P flows is critical for regulation. It is often difficult to follow what the model for this is in this paper, and that is important to understand for evaluating the results. Most of this paper uses two assays, biofilm formation and crystal violet staining (also related to biofilm formation) to assess the functioning of the Rcs phosphorelay. Based on the behavior of the rcsB mutant, it would seem that functional Yersinia pestis Rcs (RcsDpe) represses this behavior, and this correlates with RcsB phosphorylation (Figure4). What is the basis (Line 443-44) for saying that RcsD phosphorylates RcsB while RcsDHpt dephosphorylates? Yersinia pseudotuberculosis RcsD(pstb) shows no difference with the rcsB mutant. Doesn't that suggest that RcsDpstb is no longer repressing (phosphorylating)? In the presence of the RcsDpstb as well as multicopy RcsF, an activating signal in other organisms, RcsDpstb seems able to phosphorylate. This all suggests that the full-length protein, like the Hpt domain, is capable of phosphorylating, but that it may be doing nothing in the absence of signal (or dephosphorylating). Given these results, saying that RcsDpstb is positively regulating biofilm formation (Fig.1 title, and elsewhere) is somewhat misleading. What it presumably does is prevent the Hpt domain, expressed from the chromosomal locus in Figure1b, from signalling to RcsB. By itself, it is not clear it is doing anything. Understanding this clearly is important for interpreting this system and the tested mutants. A clear model and how phosphate is flowing in the various situations would help a lot. Currently Supplementary Figure3 seems to reflect the appropriate directional arrows, but the text does not. Moving the rcsB data earlier in the paper (after Figure1, 2, or maybe earlier, before Figure3) would certainly help.

      RcsD dephosphorylates RcsB while RcsD-Hpt phosphorylates RcsB. Expression of RcsDpstb in the wild type strain and the N-term deletion mutant resulted in increased biofilm, indicating RcsB is less phosphorylated (Figure 1b and 1c). While over-expression of RcsD-Hpt resulted in decreased biofilm formation, indicating RcsB is more phosphorylated. In addition, the Phos-tag experiments showed that the RcsDpstb strain has a lower level of phosphorylated RcsB (Figure 4b). Expression of RcsDpstb in the wild type strain showed similar results as a rcsB mutant indicating a lower level of phosphorylated RcsB in the presence of RcsDpstb.

      It is possible that the RcsDpstb interferes with the ability for RcsD-Hpt to phosphorylate RcsB. However, plasmid expression of the rcsDpstb-H844A mutant in the Y. pestis rcsDN-term deletion mutant formed significantly less biofilm than wild type rcsDpstb indicating H844 might be important for RcsD to dephosphorylate RcsB (Supplementary Figure 2b and Line 180-183). In addition, it is known that RcsD plays a dual role in phosphorylation and dephosphorylation of RcsB in other organisms (Majdalani N, et al., 2005, J. Bacteriol. https://doi.org/10.1128/JB.187.19.6770-6778.2005; Wall EA, et al., 2020, Plos Genetics, https://doi.org/10.1371/journal.pgen.1008610; Takeda S., et al., 2001, Mol. Microbiol., https://doi: 10.1046/j.1365-2958.2001.02393.x). We therefore think it is safe to say that the full length RcsD might function to dephosphorylate RcsB. We have modified the model in the revised manuscript (Supplementary Figure 4 and Figure 8). Regulation of RcsB has been investigated previously. The main finding of our manuscript is regulation of RcsB by the mutated RcsD (RcsD-Hpt). Thus, we have moved the known rcsB deletion mutant data to Figure 1 in the revised manuscript as suggested. We kept the rest of data in Figure 4 the same. We think it might be better to first show the mutation of rcsD alters Rcs signaling and then show how this occurs (by affecting RcsB phosphorylation).

      3) The authors show (in their pull-down) that there is a bit of full-length RcsD even in the frame-shifted protein. Is there any clear evidence this does anything here? Does the N-terminus (truncated after the frame-shift) have a function?

      We have introduced a stop codon in rcsDpe and showed that full-length RcsD is made by rcsDpe but not by rcsDpe with the stop codon (Supplementary Figure 1g). RcsDN-term seems do not have a function in our tested condition (Figure 1e).

      4) While the RNA seq data is useful addition here, it is difficult to interpret without a bit more data on the strain used for the RNA seq, including the biofilm phenotypes of the WT and mutant derivatives, as well as the relevant rcsD sequences, and maybe expression of a few genes or proteins (Hms or hmsT). Are these similar in the parallel strains used earlier in the paper and the one for RNA seq, in WT, rcsB- and the RcsDpstb derivative? It would appear that rcsB- and rcsDpstb have opposite effects, at least at 25{degree sign}C, while in Figure4, these two derivatives have similar effects on biofilm. Is this due to temperature, strains, or biofilm genes that are not shown here? It is certainly possible that the ability of the full-length RcsD changes its kinase/phosphatase balance as a function of temperature, or dependent on other differences in these Y. pestis strains.

      The strain used for RNA seq is a derivative of the biovar Microtus strain 201 which has a similar in vitro phenotype as the strain KIM6+ (Line 297-298). We used this strain for RNA seq because it has the virulence plasmid pCD1 and we wanted to analyze the gene expression of this plasmid, which is required for virulence, as well. RNAseq data showed that rcsB- and rcsDpstb have opposite effects on mRNA level of some genes. However, no significant change in expression of biofilm genes was noted in the RNAseq data set. In fact, our previous data has shown that the biofilm related (hmsT and hmsD) genes are only moderately (Less than 2-fold change between wild type and rcsB mutant) regulated by RcsB based on RT-PCR and β-gal analysis (Sun YC, et al., 2012, J. Bacteriol. https:// doi: 10.1128/JB.06243-11and Guo XP, et al., 2015, Sci. Rep. https://doi: 10.1038/srep08412 and Figure 4c).

    1. Author Response

      Reviewer #1 (Public Review):

      Sex determination and dosage compensation are two fundamental mechanisms in organisms with distinct sexes. These mechanisms vary greatly across the various model organisms in which they have been studied. Comparisons across more closely related members of the same genus have already proven productive in the past, to understand how these essential mechanisms evolve. In this study, the authors compare some aspects of the dosage compensation and sex determination mechanisms across two Caenorhabditis species that diverged ~15-30 MYA.

      Previously, the authors have studied dosage compensation and sex determination extensively in C. elegans. Here, they first identify the homologs of some key factors in C. briggsae, a species that independently evolved hermaphroditism. The authors show that some of the key players in these processes play the same roles in C. briggsae as they do in C. elegans. Namely, they show that the nematode-specific SDC-2 protein plays a role in both dosage compensation and sex determination also in C. briggsae, they find the homologs of some of the SMC protein complex that performs dosage compensation also in C. elegans and they study the binding specificity on the X chromosome.

      Overall, the work is thorough and compelling and is very clearly presented. The authors generate a number of genetic tools in C. briggsae and the careful genetic analyses together with a number of binding assays in vivo and in vitro, support the authors' main conclusions: that the main players and genetic regulatory hierarchy are conserved between these two nematodes, but the binding sites for the DCC on the X chromosome have diverged and the mode of binding has changed as well. Whereas in C. elegans the DCC binds sites in the X chromosome that contain multiple sequence motifs in a synergistic manner, in briggsae they seem to do so additively. This latter point is supported by the data, but it could be explored a bit more deeply using the available ChIP-seq data that the authors have generated. In addition, it would be interesting to discuss the possible implications of this difference.

      One minor weakness of this work is that it could be better put in the context of other related comparisons of these mechanisms. For example, the comparison of sex determination pathway by Haag et al. in Genetics 2008, and the comparison of dosage compensation across Drosophila species (Ellison and Bachtrog, Plos Genetics, 2019), and possibly others. The other point that the authors could provide deeper insight into, is the rate of divergence of proteins like SDC-2 (which is thought to be the protein that contacts DNA), versus some other proteins in the DCC and in general other proteins not involved in sex determination or dosage compensation (this doesn't need to be limited to comparing elegans and briggsae as there are numerous Caenorhabditis genomes available). This would provide a more complete view of the evolution of these processes.

      Regarding the comparison of our studies to those of the C. briggsae sex determination pathway described by Haag and others, we have included the following in our revised manuscript:

      Pages 8-9. "Within the Caenorhabditis genus, similarities and differences occur in the genetic pathways governing the later stages of sex determination and differentiation (Haag, 2005). For example, three sex-determination genes required for C. elegans hermaphrodite sexual differentiation but not dosage compensation, the transformer genes tra-1, tra-2, and tra-3, are conserved between C. elegans and C. briggsae and play very similar roles. Mutation of any one gene causes virtually identical masculinizing somatic and germline phenotypes in both species (Kelleher et al., 2008). Moreover, the DNA binding motif for both Cel and Cbr TRA-1 (Berkseth et al., 2013), a Ci/GL1 zincfinger transcription factor that acts as the terminal regulator of somatic sexual differentiation (Zarkower and Hodgkin, 1992), is conserved between the two species.

      At the opposite extreme, the mode of sexual reproduction, hermaphroditic versus male/female, dictated the genome size and reproductive fertility of Caenorhabditis species diverged by only 3.5 million years (Yin et al., 2018; Cutter et al., 2019). Species that evolved self-fertilization (e.g. C. briggsae or C. elegans) lost 30% of their DNA content compared to male/female species (e.g. C. nigoni or C. remanei), with a disproportionate loss of male-biased genes, particularly the male secreted short (mss) gene family of sperm surface glycoproteins (Yin et al., 2018). The mss genes are necessary for sperm competitiveness in male/female species and are sufficient to enhance it in hermaphroditic species. Thus, sex has a pervasive influence on genome content. In contrast to these later stages of sex determination and differentiation, the earlier stages of sex determination and differentiation had not been analyzed in C. briggsae."

      Regarding the comparison to Drosophila dosage compensation, including the work of Ellison and Bachtrog (2019), we included the following in the Discussion of our revised manuscript (page 22) and included related remarks in the abstract.

      "In contrast to the divergence of X-chromosome target specificity between Caenorhabditis species, X-chromosome target specificity has been conserved among Drosophila species. A 21-bp GA-rich sequence motif on X is utilized across Drosophila species to recruit the dosage compensation machinery, although it may not be the sole source of X target specificity (Alekseyendo, 2008; Kuzu, 2016, Ellison, 2008; Alekseyendo, 2013)."

      Regarding a comparison of our work to that of other rapidly evolving processes, we have made the following revision to our Discussion (page 22):

      "Conservation of DNA target specificity among species is also a common theme among developmental regulatory proteins that participate in multiple, unrelated developmental processes, such as Drosophila Dorsal in body-plan specification (Schloop et al., 2020) or Caenorhabditis TRA-1 in hermaphrodite sexual differentiation and male neuronal differentiation (Berkseth et al., 2013; Bayer et al., 2020). Typically, for such multi-purpose proteins, target-site specificity is evolutionarily constrained: protein function is changed far more by changes in the number and location of conserved cis-acting target sequences than by changes in the target sequences themselves (Carroll, 2008; Nitta et al., 2015). Hence, the divergence in X-chromosome target specificity across the Caenorhabditis genus is atypical among developmental regulatory complexes with highly diverse target genes and could have been an important factor for establishing reproductive isolation between species. Our finding is reminiscent of the discovery that centromeric sequences and their corresponding centromere-binding proteins have co-evolved rapidly as a consequence of hybrid incompatibilities (Malik and Henikoff, 2001; Henikoff et al., 2001; Talbert and Henikoff, 2022). Occurrence of rapidly changing DNA targets and their corresponding DNA-binding proteins (see also Lienard et al., 2016; Ting et al., 1998; Ting et al., 2004; Sun et al., 2004) is an increasingly dominant theme contributing to reproductive isolation."

      A brief comment about all three comparisons is also made in the beginning of the Discussion on page 18.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to extend modeling of bispecific engager pharmacology through explicit modelling of the search of T cells for tumour cells, the formation of an immunological synapse and the dissociation of the immunological synapse to enable serial killing. These features have not been included in prior models and their incorporation may improve the predictive value of the model.

      Thank you for the positive feedback.

      The model provides a number of predictions that are of potential interest- that loss of CD19, the target antigen, to 1/20th of its initial expression will lead to escape and that the bone marrow is a site where the tumour cells may have the best opportunity to develop loss variants due to the limited pressure from T cells.

      Thank you for the positive feedback.

      A limitation of the model is that adhesion is only treated as a 2D implementation of the blinatumomab mediated bridge between T cell and B cells- there is no distinct parameter related to the distinct adhesion systems that are critical for immunological synapse formation. For example, CD58 loss from tumours is correlated with escape, but it is not related to the target, CD19. While they begin to consider the immunological synapse, they don't incorporate adhesion as distinct from the engager, which is almost certainly important.

      We agree that adhesion molecules play critical roles in cell-cell interaction. In our model, we assumed these adhesion molecules are constant (or not showing difference across cell populations). This assumption made us to focus on the BiTE-mediated interactions.

      Revision: To clarify this point, we added a couple of sentences in the manuscript.

      “Adhesion molecules such as CD2-CD58, integrins and selectins, are critical for cell-cell interaction. The model did not consider specific roles played by these adhesion molecules, which were assumed constant across cell populations. The model performed well under this simplifying assumption”.

      In addition, we acknowledged the fact that “synapse formation is a set of precisely orchestrated molecular and cellular interactions. Our model merely investigated the components relevant to BiTE pharmacologic action and can only serve as a simplified representation of this process”.

      While the random search is a good first approximation, T cell behaviour is actually guided by stroma and extracellular matrix, which are non-isotropic. In a lymphoid tissue the stroma is optimised for a search that can be approximated as brownian, or more accurately, a correlated random walk, but in other tissues, particularly tumours, the Brownian search is not a good approximation and other models have been applied. It would be interesting to look at observations from bone marrow or other sites to determine the best approximating for the search related to BiTE targets.

      We agree that the tissue stromal factors greatly influence the patterns of T cell searching strategy. Our current model considered Brownian motion as a good first approximation for two reasons: 1) we define tissues as homogeneous compartments to attain unbiased evaluations of factors that influence BiTE-mediated cell-cell interaction, such as T cell infiltration, T: B ratio, and target expression. The stromal factors were not considered in the model, as they require spatially resolved tissue compartments to represent the gradients of stromal factors; 2) our model was primarily calibrated against in vitro data obtained from a “well-mixed” system that does not recapitulate specific considerations of tissue stromal factors. We did not obtain tissue-specific data to support the prediction of T cell movement. This is under current investigation in our lab. Therefore, we are cautious about assuming different patterns of T cell movement in the model when translating into in vivo settings. We acknowledged the limitation of our model for not considering the more physiologically relevant T-cell searching strategies.

      Revision: In the Discussion, we added a limitation of our model: “We assumed Brownian motion in the model as a good first approximation of T cell movement. However, T cells often take other more physiologically relevant searching strategies closely associated with many stromal factors. Because of these stromal factors, the cell-cell encounter probabilities would differ across anatomical sites.”

      Reviewer #3 (Public Review):

      Liu et al. combined mechanistic modeling with in vitro experiments and data from a clinical trial to develop an in silico model to describe response of T cells against tumor cells when bi-specific T cell engager (BiTE) antigens, a standard immunotherapeutic drug, are introduced into the system. The model predicted responses of T cell and target cell populations in vitro and in vivo in the presence of BiTEs where the model linked molecular level interactions between BiTE molecules, CD3 receptors, and CD19 receptors to the population kinetics of the tumor and the T- cells. Furthermore, the model predicted tumor killing kinetics in patients and offered suggestions for optimal dosing strategies in patients undergoing BiTE immunotherapy. The conclusions drawn from this combined approach are interesting and are supported by experiments and modeling reasonably well. However, the conclusions can be tightened further by making some moderate to minor changes in their approach. In addition, there are several limitations in the model which deserves some discussion.

      Strengths

      A major strength of this work is the ability of the model to integrate processes from the molecular scales to the populations of T cells, target cells, and the BiTE antibodies across different organs. A model of this scope has to contain many approximations and thus the model should be validated with experiments. The authors did an excellent job in comparing the basic and the in vitro aspects of their approach with in vitro data, where they compared the numbers of engaged target cells with T cells as the numbers of the BiTE molecules, the ratio of effector and target cells, and the expressions of the CD3 and CD19 receptors were varied. The agreement with the model with the data were excellent in most cases which led to several mechanistic conclusions. In particular, the study found that target cells with lower CD19 expressions escape the T cell killing.

      The in vivo extension of the model showed reasonable agreements with the kinetics of B cell populations in patients where the data were obtained from a published clinical trial. The model explained differences in B cell population kinetics between responders and non-responders and found that the differences were driven by the differences in the T cell numbers between the groups. The ability of the model to describe the in vivo kinetics is promising. In addition, the model leads to some interesting conclusions, e.g., the model shows that the bone marrow harbors tumor growth during the BiTE treatment. The authors then used the model to propose an alternate dosage scheme for BiTEs that needed a smaller dose of the drug.

      Thank you for the positive comments.

      Weaknesses

      There are several weaknesses in the development of the model. Multiscale models of this nature contain parameters that need to be estimated by fitting the model with data. Some these parameters are associated with model approximations or not measured in experiments. Thus, a common practice is to estimate parameters with some 'training data' and then test model predictions using 'test data'. Though Supplementary file 1 provides values for some of the parameters that appeared to be estimated, it was not clear which dataset were used for training and which for test. The confidence intervals of the estimated parameters and the sensitivity of the proposed in vivo dosage schemes to parameter variations were unclear.

      We agree with the reviewer on the model validation.

      Revision: To ensure reproducibility, we summarized model assumptions and parameter values/sources in the supplementary file 1. To mimic tumor heterogeneity and evolution process, we applied stochastic agent-based models, which are challenging to be globally optimized against the data. The majority of key parameters was obtained or derived from the literature. Details have been provided in the response to Reviewer 3 - Question 1. In our modeling process, we manually optimized sensitive coefficient (β) for base model using pilot in-vitro data and sensitive coefficient (β) for in-vivo model by re-calibrating against the in-vitro data at a low BiTE concentration. BiTE concentrations in patients (mostly < 2 ng/ml) is only relevant to the low bound of the concentration range we investigated in vitro (0.65-2000 ng/ml). We have added some clarification/limitation of this approach in the text (details are provided in the following question). We understand the concerns, but the agent-based modeling nature prevent us to do global optimization.

      The model appears to show few unreasonable behaviors and does not agree with experiments in several cases which could point to missing mechanisms in the model. Here are some examples. The model shows a surprising decrease in the T cell-target cell synapse formation when the affinity of the BiTEs to CD3 was increased; the opposite should have been more intuitive. The authors suggest degradation of CD3 could be a reason for this behavior. However, this probably could be easily tested by removing CD3 degradation in the model. Another example is the increase in the % of engaged effector cells in the model with increasing CD3 expressions does not agree well with experiments (Fig. 3d), however, a similar fold increase in the % of engaged effector cells in the model agrees better with experiments for increasing CD19 expressions (Fig. 3e). It is unclear how this can be explained given CD3 and CD19 appears to be present in similar copy numbers per cell (~104 molecules/cell), and both receptors bind the BiTE with high affinities (e.g., koff < 10-4 s-1).

      Thank you for pointing this out. The bidirectional effect of CD3 affinity on IS formation is counterintuitive. In a hypothetical situation when there is no CD3 downregulation, the bidirectional effect disappears (as shown below), consistent with our view that CD3 downregulation accounts for the counterintuitive behavior. We have included the simulation to support our point. From a conceptual standpoint, the inclusion of CD3 degradation means the way to maximize synapse formation is for the BiTE to first bind tumor antigen, after which the tumor-BiTE complex “recruits” a T cell through the CD3 arm.

      We agree that the model did not adequately capture the effect of CD3 expression at the highest BiTE concentration 100 ng/ml, while the effects at other BiTE concentrations were well captured (as shown below, left). The model predicted a much moderate effect of CD3 expression on IS formation at the highest concentration. This is partly because the model assumed rapid CD3 downregulation upon antibody engagement. We did a similar simulation as above, with moderate CD3 downregulation (as shown below, right). This increases the effect of CD3 expression at the highest BiTE concentration, consistent with experiments. Interestingly, a rapid CD3 downregulation rate, as we concluded, is required to capture data profiles at all other conditions. Considering BiTE concentration at 100 ng/ml is much higher than therapeutically relevant level in circulation (< 2 ng/ml), we did not investigate the mechanism underlying this inconsistent model prediction but we acknowledged the fact that the model under-predicted IS formation in Figure 3d. Notably, this discrepancy may rarely appear in our clinical predictions as the CD3 expression is low level and blood BiTE concentration is very low (< 2 ng/ml).

      Revision: we have made text adjustment to increase clarity on these points. In addition, we added: “The base model underpredicted the effect of CD3 expression on IS formation at 100 ng/ml BiTE concentration, which is partially because of the rapid CD3 downregulation upon BiTE engagement and assay variation across experimental conditions.”

      The model does not include signaling and activation of T cells as they form the immunological synapse (IS) with target cells. The formation IS leads to aggregation of different receptors, adhesion molecules, and kinases which modulate signaling and activation. Thus, it is likely the variations of the copy numbers of CD3, and the CD19-BiTE-CD3 will lead to variations in the cytotoxic responses and presumably to CD3 degradation as well. Perhaps some of these missing processes are responsible for the disagreements between the model and the data shown in Fig. 3. In addition, the in vivo model does not contain any development of the T cells as they are stimulated by the BiTEs. The differences in development of T cells, such as generation of dysfunctional/exhausted T cells could lead to the differences in responses to BiTEs in patients. In particular, the in vivo model does not agree with the kinetics of B cells after day 29 in non-responders (Fig. 6d); could the kinetics of T cell development play a role in this?

      We agree that intracellular signaling is critical to T cell activation and cytotoxic effects. IS formation, T cell activation, and cytotoxicity are a cascade of events with highly coordinated molecular and cellular interactions. Compared to the events of T cell activation and cytotoxicity, IS formation occurs at a relatively earlier time. As shown in our study, IS formation can occur at 2-5 min, while the other events often need hours to be observed. We found that IS formation is primarily driven by two intercellular processes: cell-cell encounter and cell-cell adhesion. The intracellular signaling would be initiated in the process of cell-cell adhesion or at the late stage of IS formation. We think these intracellular events are relevant but may not be the reason why our model did not adequately capture the profiles in Figure 3d at the highest BiTE concentrations. Therefore, we did not include intracellular signaling in the models. Another reason was that we simulated our models at an agent level to mimic the process of tumor evolution, which is computationally demanding. Intracellular events for each cell may make it more challenging computationally.

      T cell activation and exhaustion throughout the BiTE treatment is very complicated, time-variant and impacted by multiple factors like T cell status, tumor burden, BiTE concentration, immune checkpoints, and tumor environment. T cell proliferation and death rates are challenging to estimate, as the quantitative relationship with those factors is unknown. Therefore, T cell abundance (expansion) was considered as an independent variable in our model. T cell counts are measured in BiTE clinical trials. We included these data in our model to reveal expanded T cell population. Patients with high T cell expansion are often those with better clinical response. Notably, the T cell decline due to rapid redistribution after administration was excluded in the model. T cell abundance was included in the simulations in Figure 6 but not proof of concept simulations in Figure 7.

      In Figure 6d, kinetics of T cell abundance had been included in the simulations for responders and non-responders in MT103-211 study. Thus, the kinetics of T cell development can’t be used to explain the disagreement between model prediction and observation after day 29 in non-responders. The observed data is actually median values of B-cell kinetics in non-responders (N = 27) with very large inter-subject variation (baseline from 10-10000/μL), which makes it very challenging to be perfectly captured by the model. A lot of non-responders with severe progression dropped out of the treatment at the end of cycle 1, which resulted in a “more potent” efficacy in the 2nd cycle. This might be main reason for the disagreement.

      Variation in cytotoxic response was not included in our models. Tumor cells were assumed to be eradicated after the engagement with effecter cells, no killing rate or killing probability was implemented. This assumption reduced the model complexity and aligned well with our in-vitro and clinical data. Cytotoxic response in vivo is impacted by multiple factors like copy number of CD3, cytokine/chemokine release, tumor microenvironment and T cell activation/exhaustion. For example, the cytotoxic response and killing rate mediated by 1:1 synapse (ET) and other variants (ETE, TET, ETEE, etc.) are supposed to be different as well. Our model did not differentiate the killing rate of these synapse variants, but the model has quantified these synapse variants, providing a framework for us to address these questions in the future. We agree that differentiate the cytotoxic responses under different scenarios cell may improve model prediction and more explorations need to be done in the future.

      Revision: We added a discussion of the limitations which we believe is informative to future studies.

      “Our models did not include intracellular signaling processes, which are critical for T activation and cytotoxicity. However, our data suggests that encounter and adhesion are more relevant to initial IS formation. To make more clinically relevant predictions, the models should consider these intracellular signaling events that drive T cell activation and cytotoxic effects. Of note, we did consider the T cell expansion dynamics in organs as independent variable during treatment for the simulations in Figure 6. T cell expansion in our model is case-specific and time-varying.”

      References:

      Chen W, Yang F, Wang C, Narula J, Pascua E, Ni I, Ding S, Deng X, Chu ML, Pham A, Jiang X, Lindquist KC, Doonan PJ, Blarcom TV, Yeung YA, Chaparro-Riggers J. 2021. One size does not fit all: navigating the multi-dimensional space to optimize T-cell engaging protein therapeutics. MAbs 13:1871171. DOI: 10.1080/19420862.2020.1871171, PMID: 33557687

      Dang K, Castello G, Clarke SC, Li Y, AartiBalasubramani A, Boudreau A, Davison L, Harris KE, Pham D, Sankaran P, Ugamraj HS, Deng R, Kwek S, Starzinski A, Iyer S, Schooten WV, Schellenberger U, Sun W, Trinklein ND, Buelow R, Buelow B, Fong L, Dalvi P. 2021. Attenuating CD3 affinity in a PSMAxCD3 bispecific antibody enables killing of prostate tumor cells with reduced cytokine release. Journal for ImmunoTherapy of Cancer 9:e002488. DOI: 10.1136/jitc-2021-002488, PMID: 34088740

      Gong C, Anders RA, Zhu Q, Taube JM, Green B, Cheng W, Bartelink IH, Vicini P, Wang BPopel AS. 2019. Quantitative Characterization of CD8+ T Cell Clustering and Spatial Heterogeneity in Solid Tumors. Frontiers in Oncology 8:649. DOI: 10.3389/fonc.2018.00649, PMID: 30666298

      Mejstríková E, Hrusak O, Borowitz MJ, Whitlock JA, Brethon B, Trippett TM, Zugmaier G, Gore L, Stackelberg AV, Locatelli F. 2017. CD19-negative relapse of pediatric B-cell precursor acute lymphoblastic leukemia following blinatumomab treatment. Blood Cancer Journal 7: 659. DOI: 10.1038/s41408-017-0023-x, PMID: 29259173

      Samur MK, Fulciniti M, Samur AA, Bazarbachi AH, Tai YT, Prabhala R, Alonso A, Sperling AS, Campbell T, Petrocca F, Hege K, Kaiser S, Loiseau HA, Anderson KC, Munshi NC. 2021. Biallelic loss of BCMA as a resistance mechanism to CAR T cell therapy in a patient with multiple myeloma. Nature Communications 12:868. DOI: 10.1038/s41467-021-21177-5, PMID: 33558511

      Xu X, Sun Q, Liang X, Chen Z, Zhang X, Zhou X, Li M, Tu H, Liu Y, Tu S, Li Y. 2019. Mechanisms of relapse after CD19 CAR T-cell therapy for acute lymphoblastic leukemia and its prevention and treatment strategies. Frontiers in Immunology 10:2664. DOI: 10.3389/fimmu.2019.02664, PMID: 31798590

      Yoneyama T, Kim MS, Piatkov K, Wang H, Zhu AZX. 2022. Leveraging a physiologically-based quantitative translational modeling platform for designing B cell maturation antigen-targeting bispecific T cell engagers for treatment of multiple myeloma. PLOS Computational Biology 18: e1009715. DOI: 10.1371/journal.pcbi.1009715, PMID: 35839267

    1. Author Response

      Reviewer #1 (Public Review):

      Following previous publications showing that NR2F2 controls atrial identity in the mouse and human iPS cells, the authors address in the fish the role of the transcription factor Nr2f1a, which is specific to the atrial chamber. This had been initiated in a previous publication (Duong et al, 2018) and is extended in this manuscript. In mutant fish, the atrial chamber is smaller and mispatterned. Markers of the atrioventricular canal and of the pacemaker are expanded. Transcriptomic analyses and electrophysiological measures further support this observation. A putative enhancer of nkx2.5 is identified by ATAC-seq and shown to be repressed in nr2f1a mutants, suggesting that Nkx2.5, a known repressor of pacemaker identity, may be a mediator of Nr2f1a. Overexpression of nkx2.5 delays the appearance of pacemaker cells, and is proposed to partially rescue the absence of nr2f1a.

      Overall, this work provides novel insight into the mechanism of atrial chamber patterning in the fish and discusses the conservation of the role of nr2f1a. However, the claim that atrial cells switch their identity into ventricular and pacemaker cells is currently not demonstrated. Alternative hypotheses of mispatterning, cell number changes by proliferation, survival, or ingression are not ruled out by the data presented. The claim that "Nr2f1a maintains atrial nkx2.5 expression" or of a "progressive loss of Nkx2.5 within the ACs" needs to be further supported. The definition of "atrial cells (AC)" varies between figures.

      Major comments:

      1) The definition of "AC" varies from figure to figure: amhc+ in Fig 1A, amhc+vmhc- in Fig.1S1A, amhc+fgf13a- in Fig. 2 and 5, morphological area in Fig. 3. Please clarify how the atrial chamber is delineated in mutants in Fig. 3 since the avc constriction is not obvious.

      a. As stated in the response to Essential Revisions comment 1.B, we have tried to clarify the definitions of the cardiomyocytes populations in the revised text by indicating the specific markers used in the text and the figures. We then provide our interpretation for what this means regarding the different cardiomyocyte populations.

      b. Since the analysis of the electrophysiology cannot be performed with markers or the transgenic zebrafish embryos using GFP, we chose areas for analysis closer to the middle of the morphological atrium in the nr2f1a mutant and WT sibling control embryo hearts that would be consistent with having Amhc+ expression and fgf13a:EGFP+ transgenic and Isl1 markers that were found from the analysis with immunohistochemistry. This strategy was schematized in Figure 3A and is now explicitly stated on lines 266 and 267 of the revised manuscript.

      2) The claim of a switch in cell identity or transdifferentiation is not demonstrated. This would require cell tracking or single-cell transcriptomics. I don't see how "AVC (..) [is] resolving to ventricular identity", since amhc seems to be maintained throughout the atrial chamber at all stages. The claim that "the number of vmhc+ only cardiomyocytes progressively increased" is not supported by Fig1S1. The expansion of pacemaker cells may result from cell ingression at the arterial pole. This hypothesis is in keeping with the expression of nr2f1a outside the heart tube in putative atrial progenitors (Duong, 2018). The phenotype upon nkx2.5 overexpression may also be interpreted along this line: ingression of pacemaker cells is delayed. The claim that "PC identity progressively expands throughout nr2f1a mutant atria" is not supported by the quantifications of a mean of 12 fgf13a+amhc+ cells at 96hpf (Fig. 2H), which is as many as fgf13a-amhc+ cells (Fig. 2G) and a quarter of the total amhc+ cells in Fig. 1J. The schema in Fig 6 does not reflect quantifications at 96hpf, which indicate the persistence of amhc+vmhc+ cells, amhc+ only, or amhc+fgf13a- in Fig 1S1 and 2G.

      "We did not observe effects on cell death or proliferation in the hearts of nr2f1a mutants": please provide the data, since proliferation was shown to be affected in mouse mutants (Wu, 2013).

      a. As indicated above in our response to the Essential Revisions comment 1.D, our quantification of cardiomyocytes indicates there are progressively fewer Amhc+/Vmhc+ cardiomyocytes in the nr2f1a mutant hearts (Figure 1J-L). The total number of Vmhc+ cardiomyocytes (Amhc+/Vmhc+ and Amhc-/Vmhc+) cardiomyocytes is increased in the nr2f1a mutant hearts relative to the WT sibling hearts. However, the number of Vmhc+-only (Amhc-/Vmhc+) cardiomyocytes, which reflect the ventricles, does not increase significantly in the n2f1a mutants and are not statistically different than their WT siblings at each of the stages, despite their trending that way (Figure 1 – figure supplement 2C). The total number of cardiomyocytes in the nr2f1a mutant hearts also is not increasing during these stages (Figure 1L). Along with the lack of cardiomyocyte death or proliferation (Figure 1 – figure supplements 3 and 4), this suggests that these hearts have more total Vmhc+ cardiomyocytes and the addition of Vmhc+-only cardiomyocytes is primarily coming from the cardiomyocytes in the Vmhc+/Amhc+ atrioventricular canal progressively losing Amhc expression. As indicated in the response to Essential Revisions comment 1.D, we have provided the individual image channels in a revised Figure 1 – figure supplement 1 and proportions of Vmhc+ cardiomyocytes in Figure 1 – figure supplement 2D to help clarify this issue.

      b. Regarding the transdifferentiation vs ingression of newly-differentiating cardiomyocyte hypotheses for the expansion of pacemaker markers, was addressed in the response to Essential Revision comment 2. Please see that comment for how we addressed this concern.

      3) The claim that "Nr2f1a maintains atrial nkx2.5 expression" or of a "progressive loss of Nkx2.5 within the ACs" needs to be further supported by quantification of the number of nkx2.5 positive cells in nr2f1a mutants. It seems that some cells in Fig. 4 co-express nkx2.5 and pacemaker markers in the mutant, which questions the repressive role of Nkx2.5. Following the observation of an nkx2.5 enhancer active next to pacemaker cells in control heart but absent in nr2f1a mutants, shouldn't we expect a gap of nkx2.5 expression next to pacemaker cells in mutants? It is unclear why pacemaker cells express nr2f1a (Fig. 6S1) but not nkx2.5. This needs clarification.

      a. The repressive role of Nkx2.5 with respect to pacemaker identity has been well documented in zebrafish and mice (Colombo et al., 2018). Nkx2.5 and Isl1 expression at the venous pole of zebrafish hearts are predominantly mutually exclusive, although there are a few cardiomyocytes at their borders that the express both Nkx2.5 and pacemaker markers. We recgonize that there are still some Nkx2.5-expressing cardiomyocytes that overlap with the pacemaker maker cardiomyocytes in the nr2f1a mutant hearts, as shown in Figure 4F. However, the majority of these cardiomyocytes have lower expression than the adjacent cardiomyocytes that form a border and do not have overlapping expression. Furthermore, as shown in Figure 4D-F and Figure 4 – figure supplement 2, the overall effect appears to be a regression of Nkx2.5+ expression in cardiomyocytes and corresponding expansion of pacemaker markers from the venous pole from 48 though 96 hpf in the nr2f1a mutant hearts, consistent with the established role of Nkx2.5 in repressing pacemaker identity. In the revised manuscript, we have provided each of the individual channels for the images in Figure 4 to better allow visualization of the different cardiomyocyte markers and a new supplemental figure showing the predominantly mutually exclusive expression of Nkx2.5 and Isl1 at the venous pole of zebrafish embryo hearts (Figure 4 – figure supplement 1).

      b. The expression of Nkx2.5 within the heart, like any gene, is likely controlled by multiple different regulatory elements. It is not clear to us why Reviewer #1 feels one would expect to see a gap in expression between Nkx2.5+ and pacemaker cardiomyocytes in the nr2f1a mutant hearts, unless Nkx2.5 was not required to repress pacemaker identity or there was a significant delay between loss of Nkx2.5 and gain of pacemaker markers. As indicated in the response to Essential Revisions comment 3.C, in the revised manuscript, we show experiments in which we have deleted the putative nkx2.5 enhancer element and found there is a loss of Nkx2.5+ and gain of fgf13a:EGFP+ cardiomyocytes in the atrium, as one might expect if the enhancer promotes or maintains Nkx2.5 expression in atrial cardiomyocytes that border the pacemaker cardiomyocytes. In the revised manuscript, this experiment is described in the Results (lines 348-364 and included in a revised Figure 6 and new Figure 6 – figure supplement 2.

      c. Please see our response to Essential Revision comment 3.A regarding the issue of Nr2f1a expression in pacemaker cardiomyocytes.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Warren et al., presents evidence suggesting that aberrant Yap signaling plays a role in epithelial progenitor cell dysregulation in lung fibrosis. This work builds on a body of work in the literature that Hippo signaling is aberrantly regulated in idiopathic pulmonary fibrosis. They use a combination of single nuclear and spatial transcriptomics, together with in vivo conditional genetic perturbations of Hippo signaling in mice, to investigate roles for Yap/Taz signaling in alveolar epithelial homeostasis and remodeling associated with exposure to a fibrosing agent, bleomycin. They show that Taz and Tead1/4 are most abundantly expressed by alveolar type 1 (AT1) cells, but Nf2 immunoreactivity (upstream activator of Hippo) is observed predominantly within airway and AT2 cells. Bleomycin exposure was associated with reduced p-Mst in regenerating alveolar epithelium, that inactivation of Yap/Taz arrested AT2>AT1 differentiation, and inactivation of either Nf2 or Mst1/2 promoted AT1 differentiation after bleomycin exposure and reduced matrix deposition/fibrosis. They go on to show that compromised alveolar regeneration resulting from inactivation of Yap/Taz results in enhanced bronchiolization of injured alveoli. Experiments are well designed and include quantitative endpoints where appropriate, data of high quality, and results are generally supportive of conclusions. These studies provide valuable new data relating to roles for the Hippo pathway in regulation of alveolar homeostasis and epithelial regeneration/remodeling in injury/repair and fibrosis.

      We thank the reviewer for their enthusiastic and constructive comments.

      Reviewer #2 (Public Review):

      The authors explored non-redundant, and potentially contrasting, roles of the Hippo effector transcription factors, YAP and TAZ, in the epithelial regenerative response to non-infectious lung injury. The strength of the work is the use of genetic mouse models that explored inducible loss of function of YAP and/or TAZ in an alveolar epithelial type 2 (AT2) specific manner. The main weakness of the work is that gene(s) inactivation was performed prior to lung injury and, therefore, does not take into account the contextual and dynamic nature of YAP/TAZ signaling; for example, work by other groups have shown that YAP/TAZ is activated early following injury followed by a decrease in activity, thus balancing proliferation and differentiation of AT2 cells (for review, see PMID: 34671628).

      We thank the reviewer for their enthusiastic and constructive comments.

      We agree that knocking out genes prior to injury might not take into account the contextual and dynamic nature of YAP/TAZ signaling. However, the Hippo pathway allows cells to sense changes in their environment. We have published that in the airway epithelium the Hippo pathway becomes inactivated upon naphthalene injury in surviving airway epithelial cells sensing the loss of their neighbors, to induce Wnt7b expression which then induces Fgf10 expression in airway smooth muscle cells to drive airway epithelial regeneration. Normally when regeneration is complete and cell density is restored the Hippo pathway reactivates and the repair cascade is inactivated. Knocking out Mst1/2 in airway epithelium chronically activates this cascade and leads to overproliferation of the airway epithelium. Interestingly, upon inactivation of Mst1/2 in the airway epithelium some airway epithelial cells also turn into AT1 cells.

      However, AT1 cells do not proliferate. As such we believe that inactivation of Mst1/2 or Nf2 in AT2 cells will not result in overproliferation but mainly promote AT1 cell differentiation. That being said there are other pathways and molecules that affect Yap/Taz nuclear localization. So inactivation of Mst1/2 or Nf2 in AT2 cells most likely primes/activates AT2 cells to regenerate AT1 cells but this decision is likely not binary.

      Reviewer #3 (Public Review):

      The manuscript entitled "Hippo signaling impairs alveolar epithelial regeneration in pulmonary fibrosis" is a rigorous and timely report detailing the significance of Hippo signaling, Taz and Yap in AT2/AT1 differentiation and the subsequent impact on the progression of lung fibrosis versus repair/ regeneration. The authors experimental design and results support their conclusions. The identification of the distinct effects of Taz and Yap in these processes highlight the pathway and specific molecules as potential therapeutic targets.

      The major strengths of these studies lie in the rigor of the elegant transgenic developmental/adult injuryrepair mouse models combined with spatial transcriptomics and analyses. The weaknesses include a lack of detail presented in the methods, some legends and discussion.

      We thank the reviewer for their enthusiastic and constructive comments. And have addressed the issues raised.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper showing that during amino acid starvation of Neurospora, the general amino acid control factors CPC-1 and CPC-3 are crucial to maintaining circadian rhythm at the levels of rhythmic growth and transcription of the FRQ gene. They show that deleting both genes leads to reduced and arrhythmic cell growth and FRQ transcription that can be accounted for by severely reduced occupancy of the FRQ promoter by the key transcription factor WCC. This defect in turn appears to result from diminished H3 acetylation of the FRQ promoter that was observed at least in the cpc-1 mutant, which is mediated by Gcn5. Thus, they show that Gcn5 occupancy at FRQ is rhythmic and impaired by cpc-1 knock-out, that CPC-1 occupies the FRQ promoter, and provide coIP evidence that Cpc-1 interacts with Gcn5 and Ada2 and, hence, could act directly to recruit these cofactors to the FRQ promoter. Importantly, they show that knock out of GCN5 eliminates rhythmic cell growth and FRQ expression (although surprisingly not FRQ mRNA abundance), as well as reducing H3ac levels and WCC binding at FRQ. They further show that TSA treatment can reverse the effects of histidine starvation on the circadian period in WT cells, and can partially restore rhythmic growth to histidine-starved cpc-3 cells, and that elimination of HDAC Hda1 increases H3ac at FRQ in WT cells. They provide some evidence that transcriptional activation of certain aa biosynthetic genes by CPC-1 is also rhythmic, although the evidence for this is not strong and it's unclear whether CPC-1 occupancy or its activation function would be periodic. They also did not address whether CPC-1 occupancy at FRQ is rhythmic.

      This work is important in providing convincing evidence that CPC-1-mediated induction of transcription factor CPC-3 in starved Neurospora cells mediates CPC-1-mediated recruitment of Gcn5 and acetylation of the FRQ promoter, which counteracts the function of histone deacetylase HDA1 to maintain high occupancy of the transcription factor WCC and attendant circadian rhythm of FRQ transcription. Although the work does not identify new regulatory circuits, such as rhythmic transcription of FRQ, the role of Gcn5, Hda1, and promoter histone acetylation in supporting transcriptional activation, and the general amino acid control response to amino acid starvation are all well-established mechanisms, the work is significant in showing how these pathways and mechanisms are integrated to maintain circadian rhythm in the face of amino acid limitation.

      There is an abundance of convincing experimental evidence provided to support the key claims just summarized above. However, there are a few instances in which additional experiments might be required to resolve a discrepancy in the data or provide stronger evidence to support a claim.

      Thanks for the comments. We have revised the manuscript as suggested.

      Reviewer #2 (Public Review):

      This study by Liu et al. investigates the mechanism that enables the Neurospora circadian clock to maintain robust molecular and physiological rhythms under conditions of nutrient stress. The authors showed that the nutrient-sensing GCN2 signaling pathway is required to maintain robust circadian clock function and output rhythms under amino acid starvation in the filamentous fungus Neurospora. Specifically, they observed that under amino acid starvation conditions, knocking out GCN2 pathway components GCN4 (CPC-1) and GCN2 (CPC-3) severely disrupts rhythmic transcription of core clock gene frequency (frq) and clock-regulated conidiation rhythm. They provided data to indicate that the observed disruptions are due to reduced binding of the White Collar (WC) complex to the frq promoter stemming from lower histone H3 acetylation levels. This prompted the authors to propose a model in which GCN2 (CPC-3) and GCN4 (CPC-1) are activated upon sensing amino acid starvation, recruit GCN-5 containing SAGA acetyltransferase complex to maintain robust histone acetylation rhythm at the frq promoter. They then performed a battery of assays to show that both GCN-5 and ADA-2 are necessary for maintaining robust H3ac, frq mRNA, and conidiation rhythms under normal conditions. To support that low H3ac level at the frq promoter is the cause for impaired WC binding and frq transcription, they demonstrated they can partially rescue the observed rhythm defects of the knockout mutants under amino acid starvation using an HDAC inhibitor. Finally, the authors used RNA-seq to identify genes and pathways that are differentially activated by GCN4 (CPC-1) under amino acid starvation conditions. Many of these genes are involved in amino acid metabolism and they showed that 3 of them exhibit rhythmic expression in WT but low and non-rhythmic expression in the CPC-1 KO strain.

      Strength: The 24-hour period length of the circadian clock is known to be stable over a range of environmental and metabolic conditions because of circadian compensation mechanisms. Whereas temperature compensation (maintenance of circadian period length over a physiological range of temperature) has been studied extensively in multiple model organisms, the phenomenon of nutritional compensation and its underlying mechanisms are poorly understood. This study provides new insights into this important yet understudied area of research in chronobiology. In addition to advancing our understanding of fundamental mechanisms governing clock compensation mechanisms, this study also adds to our understanding of metabolic regulation of rhythmic biology and the relationship between nutrition and healthy biological rhythms. Given that the GCN2 nutrient-sensing pathway is broadly conserved beyond Neurospora, findings from this study will likely be relevant to other eukaryotic systems.

      The authors provided strong evidence supporting their claims that the GCN2 signaling pathway is important for maintaining the robustness of the Neurospora clock under conditions of amino acid starvation. The authors performed parallel experiments in normal (no 3-AT) vs amino acid-starved conditions (+3-AT). Their observations of relatively minor disruptions of molecular and conidiation rhythms in cpc-3 and cpc-1 KO strains in normal nutrient conditions compared to starvation conditions support their model that sensing of amino acid starvation by GCN2 pathway-induced changes at the chromatin and transcriptional level that are necessary to maintain a robust frq oscillator. Without the comparison between normal vs amino acid starved conditions, this part of their model will not be as strong.

      Previously Karki et al. (2020) showed that rhythmic activation of GCN2 kinase is regulated by the clock, resulting in clock-control rhythmic translation initiation. This study uncovers an additional mechanism through which GCN2 pathway modulates circadian rhythms by regulating histone acetylation of rhythmic genes. RNA-seq as described in Figure 7 provides some potential targets.

      Thanks for the comments and suggestions. We have revised the manuscript as suggested.

      Weakness:

      (1) The authors propose a model (Figure 8) in which the GCN2 pathway is ,activated by amino acid starvation and recruits the SAGA complex to promote histone acetylation level at the frq promoter. There is however no data in this study showing that the GCN2 pathway is activated in amino acid-starved conditions, only that it is required to maintain robust frq and conidiation rhythms. The authors should clarify how they are defining "activation of the GCN2 pathway" in this study. For example, is it recruitment of GCN-5 and SAGA complex to frq promoter?

      Thanks for the question. CPC-3, the GCN2 homolog in Neurospora, is the only eIF2α kinase responsible for eIF2α phosphorylation at serine 51(Karki S et al. 2020, PMID: 32355000). As shown in the revised Figure 1-figure supplement 1A, the eIF2α phosphorylation and CPC-1 were induced by 3-AT treatment in the WT but not in the cpc-3KO strain. These results demonstrate that the GCN2 pathway is activated by amino acid starvation, and as a result, the CPC-1 expression is activated to recruit the SAGA complex to the frq promoter.

      (2) The experiments to examine the involvement of GCN-5 and ADA-2 were performed in normal conditions (no amino acid starvation). Unlike cpc-1 and cpc-3 KO strains, gcn-5 and ada-2 KO strains showed severely disrupted frq rhythms in normal nutrient conditions, suggesting they are normally required for robust circadian rhythms. If GCN-5 and the SAGA complex are normally involved in regulating H3ac rhythms in the frq loci, how does GCN2 pathway modulates the activity of GCN-5 and SAGA complex in conditions of amino acid starvation? Are the interactions between GCN2/4 with GCN-5 and SAGA complex different in normal vs amino acid starved conditions? The authors should clarify their model.

      As mentioned above, our data suggested that GCN-5 and ADA-2 are required for robust circadian rhythms under normal conditions. As suggested, we did detect dampened rhythmic expression of frq in the gcn-5KO and ada-2KO strains under amino acid starvation (Figure 5D and 5E and Figure 5–figure supplement 1E and 1F). We also performed Co-IP to compare the difference of interactions between CPC-1 with ADA-2 and GCN5 with ADA-2 under normal and amino acid starved conditions. The results showed that although the Myc.GCN-5, MYC.CPC-1 or Flag.ADA-2 protein level was repressed by 3 mM 3-AT treatment (likely due to global translational inhibition by induced eIF2α phosphorylation) (Karki S et al. 2020, PMID: 32355000), the interactions between CPC-1 with ADA-2 and GCN-5 with ADA-2 were almost the same under normal and amino acid starved conditions (IP was normalized with Input) (Figure 4B and 4C). These results indicated that amino acid starved conditions had little impact on the protein interactions between CPC-1 with GCN-5 and SAGA complex.

      In our model, we proposed that amino acid starvation resulted in compact chromatin structure (due to decreased H3ac) in the frq promoter in the WT strain (Figure 3B), likely due to activation of histone deacetylases or inhibition of histone acetyltransferases. Amino acid starvation activates GCN2 pathway and induces CPC-1 expression. The induced CPC-1 can recruit GCN5-containing SAGA complex to the frq promoter to loosen the chromatin structure, promoting frq rhythmic transcription under starvation conditions. However, in the cpc-3KO mutants, CPC-1 could not effectively recruit GCN5 containing SAGA complex to frq promoter, resulting in arrhythmic frq transcription. We have now clarified our model in the revised discussion.

      (3) Given that the GCN2 pathway is important for nutrient sensing, the authors should not disregard the alternative hypothesis that the GCN2 pathway may be important for nutrient compensation and plays a role in maintaining the robustness of rhythms in a range of nutrient conditions.

      Thanks for the suggestion. We now discussed the alternative hypothesis in the revised manuscript. “Because GCN2 signaling pathway is important for nutrient sensing, it may be important for nutrient compensation and plays a role in maintaining the robustness of rhythms in a range of nutrient conditions”.

      (4) The authors should use circadian statistics to compute the phase and amplitude of the mRNA, DNA binding of the WC complex, and H3Ac rhythms. This will allow them to compare between rhythms and provide statistical significance values, rather than just providing qualitative descriptions. This will be valuable when comparing rhythms between strains and between nutrient conditions.

      As suggested, we used CircaCompare to analyze our data.

      Reviewer #3 (Public Review):

      This is an important paper anchored by the observation that cultures of Neurospora undergoing amino acid starvation lose circadian rhythmicity if orthologs in the classic GCN2/CPC-3 cross-pathway control system are absent. Data convincingly show that Neurospora orthologs of Saccharomyces GCN2 and GCN4 (CPC-3 and CPC-1 respectively) are needed to promote histone acetylation at the core clock gene frequency to facilitate rhythmicity. While the binding of CPC-1 and thereby GCN-5 are plainly rhythmic, the explanation of exactly where rhythmicity enters the pathway is incomplete.

      Figure 1 shows that inhibition of the HIS-3 activity affected by 3-AT, which should trigger cross-pathway control, is correlated with a graded reduction in the amplitude of the rhythm, and eventually to arrhythmicity at 3 mM 3-AT. While normalized data are shown in Figure 1B, raw data should also be provided in the Supplement as sometimes normalization hides aspects of the data. Ideally, this would be on the same scale in wt and in mutant strains.

      We revised as suggested and added the raw data. The results are now shown in Figure 1–figure supplement 2A and 2B and Figure 5–figure supplement 1B and 1C.

      Figure 2. The logical conclusion from Fig 1 is that circadian frq expression driven by the WCC has been impacted by amino acid starvation in the mutants. If so, either WC-1/WC-2 levels might be low, or else they might not be able to bind to DNA. When this was assessed, ChIP assays showed a loss of DNA binding. Although documented, an interesting result is that WCC protein amounts are sharply increased, especially for WC-1. The authors could comment on possible causes for this.

      Line 176, "hypophosphorylation of WC-1 and WC-2 (which is normally associated with WC activation . . . )". While the authors are correct that this is often the case it is not always the case and this introduces a potentially interesting caveat. That is, the overall phosphorylation status of WCC does not always reflect its activity in driving frq transcription. This was first noticed by Zhou et al., (2018 PLOS Genetics) who reported that even though WCC is always hyperphosphorylated in ∆csp-6, the core clock maintains a normal circadian period with only minor amplitude reduction. This should be noted, cited, and discussed.

      Thanks for the suggestion. We revised the manuscript as suggested, “It should be noted that the overall phosphorylation status of WCC does not always reflect its activity in driving frq transcription, possibly due to the unknown function of multiple key phosphosites on WCC (Wang et al., 2019; X. Zhou et al., 2018)”.

      Figure 2 and Figure 2 Suppl. report different gel conditions and show that the sharply increased WC1/WC-2 levels seen in Fig 2 resulting from 3-AT treatment of the cpc pathway mutants are due to the accumulation of hypophosphorylated WC-1/2. The conclusion would be stronger if the gels in the Supplement showed the same degree of difference between wt and mutants as seen in Fig 2. In any case, these hypophosphorylated WC should be active and able to bind DNA but plainly are not based on Fig 2.

      Thanks for the comments. It’s correct that WC-1/WC-2 were hypo-phosphorylated and their protein levels were increased (Figure 2 and Figure 2-figure supplement 1). However, the reduced binding of WC-1/WC-2 at the frq promoter explains for the reduced frq transcription in the cpc-1KO or cpc-3KO mutants under amino acid starvation.

      Figure 3 correlates the unexpected loss of DNA binding by hypophosphorylated WCC with reduced histone H3 acetylation at frq. The 3 mM 3-AT reported to result in arrhythmicity in cpc mutants in Figures 1 and 2 results in a small (~20%?) and not statistically significant reduction in H3 acetylation in wt, compatible with the sustained rhythms seen in wt in Figure 1, but in a substantial (~5 fold) loss of binding in the ∆cpc-1 background; so CPC-1 is needed for H3 acetylation at frq to sustain the rhythm during amino acid starvation. The simplest explanation here then is that the hypophosphorylated WCC cannot bind to DNA because the chromatin is closed due to decreased AcH3.

      Thanks for the comments.

      Figure 4. Title:" Figure 4. CPC-1 recruits GCN-5 to activate frq transcription in response to amino acid starvation"; the conditions of amino acid starvation should be mentioned here for the reader's benefit. (In the unlikely case that there was no amino acid starvation here then many things about the manuscript need to be reconsidered.)

      Based on the model from yeast where amino acid starvation activates GCN2 (aka CPC-3 in Neurospora) kinase which activates the transcriptional activator GCN4 (aka CPC-1) which recruits the SAGA complex containing the histone acetylase GCN5 to regulated promoters, CPC-1 was tagged and shown by ChIP to bind rhythmically at frq. Co-IP experiments establish the interaction of components of the SAGA complex in Neurospora and Neurospora GCN-5 indeed is bound to frq, likely recruited by CPC-1. This part all follows the Saccharomyces model with the interesting twist that the binding CPC-1 is weakly rhythmic and GCN-5 strongly rhythmic in a CPC-1-dependent manner. Based on the figure legend title, these cultures should always be starved for amino acids (although as noted this should be made explicit in the figure legend). In any case, given this, from where does the rhythmicity in GCN-5-binding arise? This question is developed more below.

      Line 224, "low in the cpc-1KO strain, suggesting that CPC-1 rhythmically recruit GCN-5". Because ChIP was done only for a half circadian cycle (DD10-22), it is hard to conclude "rhythmically". The statement should be modified.

      To address the concern, we performed the ChIP assay using the CPC-1 antibody instead of Myc antibody (revised Figure 4A). Analysis of the ChIP results with CircaCompare showed that CPC-1 binding at the frq promoter was rhythmic without 3-AT (Figure 4A) or with 3 mM 3-AT treatment (Figure 4-figure supplement 1A). Due to the ADA-2-GCN5 and CPC-1-ADA-2 interactions with/without 3-AT treatment (Revised Figure 4B-C), CPC-1 should be able to recruit GCN-5-containing SAGA complex to activate frq transcription in response to amino acid starvation. We have now clarified this model in the revised manuscript. Please also see response to Reviewer 2/point 5.

      It was previously reported that the CPC-3/CPC-1 signaling pathway was rhythmically controlled by circadian clock, as indicated by CPC-3-mediated rhythmic eIF2α phosphorylation at serine 51 (Karki S et al. 2020, PMID: 32355000). Our data showed rhythmic CPC-1 and GCN-5 binding at the frq promoter in the WT strain and decreased GCN-5 binding in the cpc-1KO mutant (Figure 4A and 4D). These results suggested that the circadian clock controlled the CPC-3/CPC-1 signaling pathway rhythmically, which in turn promoted the rhythmic frq transcription through recruiting GCN5 containing SAGA complex under amino acid starvation. We clarified the model and description in the discussion.

      As suggested by the reviewer, we modified the statement "suggesting that CPC-1 recruits GCN-5-containing SAGA complex to the frq promoter".

      Figure 5 shows that rhythmicity in general and of frq/FRQ specifically requires GCN-5 even under conditions of normal amino acid sufficiency, and that normal levels of H3 acetylation and its rhythm at frq require GCN-5. Not surprisingly, high H3 acetylation at frq correlated with high WC-2 DNA binding, and ADA-2 is required for SAGA functions.

      As earlier, raw bioluminescence data corresponding to panel B should be provided in the figure or Supplement.

      Also, if CPC-3 and CPC-1 regulate frq transcription through GCN-5, why is the frq level extremely low in the cpc-3KO or cpc-1KO(Fig.1D) but remains normal in gcn-5KO (Fig. 5D)?

      Raw bioluminescence data are listed in Figure 5–figure supplement 1B and 1C. For frq transcription in the WT and gcn-5KO mutant, please see response to Essential Revisions point 4.

      Figure 6 documents the counter effects of TSA which inhibits histone deacetylation and shortens the period versus 3-AT which decreases (via CPC-3 to CPC-1 to GCN-5) histone acetylation and causes period lengthening or arrhythmicity. HDA-1 is necessary for histone deacetylation at frq.

      Thanks for the comments.

      Figure 7 documents extensive changes in gene expression associated with 3-AT-induced amino acid starvation and the CPC-3 to CPC-1 pathway. How do these results compare with other previously studied systems, particularly Saccharomyces, where similar experiments have been done? Are the same genes regulated to the same extent or are there some interesting differences?

      Thanks for the suggestion. We revised our manuscript by comparing the difference of these genes in Saccharomyces. GCN4/CPC-1 targets are similar. “Similar to Saccharomyces cerevisiae (Natarajan et al., 2001), genes in amino acid biosynthetic pathways, vitamin biosynthetic enzymes, peroxisomal components, and mitochondrial carrier proteins were also identified as CPC-1 targets”.

      Figure 8 provides a model consistent with the role of the CPC-3/GCN2 pathway in regulating genes in response to amino acid starvation. It seems this could be any gene responding to amino acid starvation.

      Not accounted for in the model is the data from Fig 4 which show the rhythmic binding of CPC-1 and stronger rhythmic binding of GCN-5 to frq, both under amino acid starvation. In the presence of 3-AT, amino acid starvation is constant, which should mean that CPC-3 and CPC-1 would always be "on". Why doesn't CPC-1 recruit GCN5 at the same level at all times leading to constant high H3 acetylation rather than rhythmic H3 acetylation as seen in Figure 3? Perhaps, unlike the statement in lines 345-34, it is WCC that regulates rhythmic GCN-5 binding and facilitates rhythmic histone acetylation at frq. Or perhaps the clock introduces rhythmicity upstream from GCN5. Without an answer to the question of where rhythmicity comes into the pathway, the story is only about how the CPC-3/GCN2 pathway in regulating genes in response to amino acid starvation; without explaining the rhythmicity the story seems incomplete.

      It was previously reported that the CPC-3/CPC-1 signaling pathway was rhythmically controlled by circadian clock, as indicated by CPC-3-mediated rhythmic eIF2α phosphorylation at serine 51 (Karki S et al. 2020, PMID: 32355000). Our data showed rhythmic CPC-1 and GCN-5 binding at the frq promoter in the WT strain and decreased GCN-5 binding in the cpc-1KO mutant (Figure 4A and 4D). These results suggested that the circadian clock controlled the CPC-3/CPC-1 signaling pathway rhythmically, which in turn promoted the rhythmic frq transcription through recruiting GCN5 containing SAGA complex under amino acid starvation. We clarified the model and description in the discussion.

    1. Author Response

      Reviewer 2 (Public review):

      A quasi-experimental before and after design as the methodological intention should be stated in the article. Although there are equally powerful alternatives with arguably less-stringent requirements that are appropriate and well-tested for natural experiments such as that intervened by the COVID-19 pandemic given the simulation methods, as of now obtaining the actual stage distribution of cancer and the cancer-specific mortality rates before and after the pandemic is possible for making scientifically valid conclusions based on observed data to support the simulation study.

      We agree with the reviewer that a modelled before-and-after analysis would have been informative. However, the required mortality and cancer stage distribution data to inform this analysis is not yet available for Australia. In future, our modelled predictions can be compared to emergent observed national stage and mortality data. The current paper presents estimates that were modelled during rapid-response modelling commissioned by the Australian Government early in the pandemic. Findings therefore demonstrate what could be done with the information available at that time. We have amended, as shown in bold below, the end of the introduction as follows:

      “We demonstrate what could be estimated by a rapid response evaluation based on information available early in the pandemic, and discuss how these estimates relate to subsequent observed disruptions to screening. In future, our modelled predictions can be compared to emergent observed national stage and mortality data.”

      The screening disruption is the only concerned parameter in modelling the change of cancer progression in this study. But delayed diagnosis after screening as another concern could be possibly affected by the pandemic. This should be taken into consideration in the simulation. The authors also claimed the cancer treatment could also be affected by the pandemic, the evaluation on mortality is therefore not feasible. However, the impacts of COVID-19 pandemic on the delayed treatment and cancer treatment are important issues which should be covered by simulation study.

      We clearly state that this is a limitation of the current study. We have added the following sentence to the discussion, lines 377-379.

      ‘These effects will be incorporated in future modelled evaluations, after careful calibration and validation to observed data, with a view to extending the modelled outcomes to mortality estimates.’

      By simulations, the confident intervals for the outcomes should be provided as the requirement to determine the required reliability for the estimates.

      The manuscript aims to present indicative estimates for a range of scenarios, with numerous simplifying assumptions as indicated. In this context, generating meaningful uncertainty intervals is not feasible or appropriate.

    1. Author Response

      Reviewer #1 (Public Review):

      There has been a lot of work showing that multi-peaked tuning curves contain more information than single peaked ones. If that's the case, why are single-peaked tuning curves ubiquitous in early sensory areas? The answer, as shown clearly in this paper, is that multi-peaked tuning curves are more likely to produce catastrophic errors.

      This is an extremely important point, and one that should definitely be communicated to the broader community. And this paper does an OK job doing that. However, it suffers from two (relatively easily fixable) problems:

      I) Unless one is an expert, it's very hard to extract why multi-peaked tuning curves lead to catastrophicerrors.

      II) It's difficult to figure out under what circumstances multi-peaked tuning curves are bad. This isimportant, because there are a lot of neurons in the sensory cortex, and one would like to know whether multi-peaked tuning curves are really a bad idea there.

      And here are the fixes:

      I) Fig. 1c is a missed opportunity to explain what's really going on, which is that on any particular trialthe positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). However Fig. 1c shows the average log likelihood, which makes it hard to understand what goes wrong. It would really help if Fig. 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multi-peaked tuning curves but not for single peaked ones. You could also indicate why, when multi-peaked tuning curves do give the right answer, the error tends to be small.

      We thank the reviewer for this suggestion. We have now split the first figure into two.

      In the new Figure 1, we provide an intuitive explanation of local vs catastrophic errors and single-peaked / periodic tuning curves. We have also added smaller panels to illustrate how the distribution of errors changes with decoding time (using a simulated single-peaked population).

      The new Figure 2 shows sampled likelihoods for 3 different populations. We hope this provides some intuitive understanding of the phase shifts. Unfortunately, it proved difficult not to normalize the “height” of each module’s likelihood as they can differ by several orders of magnitude across the modules. However, due to the multiplication, the peak likelihood values can (approximately) be disregarded in the ML-decoding. Lastly, we have also added more simulation points (scale factors) compared to what we had in the earlier version of the figure (see panels d-e).

      II) What the reader really wants to know is: would sensory processing in real brains be more efficient ifmulti-peaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with single peaked tuning curves and a good code with multi-peaked tuning curves. My guess is that a good code would have lambda_1=1 and c around 0.5 (you could use the module ratio the grid cell people came up with -- I think 1/sqrt(2) -- although I doubt if it matters much). My guess is that it's the total number of spikes, rather than the number of neurons, that matters. Some metric of performance (see point 1 below) versus the contrast of the stimulus and the number of spikes would be invaluable.

      We thank the reviewer for this comment and the suggestions. We agree, ideally such an expression would be useful. However, as you note it is a very challenging task due to the large parameter space (number of neurons, peak amplitude, spontaneous firing rate, width of tuning, stimulus dimensionality etc) and is beyond the scope of the present study. We have instead included a new figure (see Figure 7 in the manuscript) detailing the minimal decoding times for various choices of parameter values. We believe this gives an indication to how minimal decoding time scales with various parameters.

    1. Author Response:

      Reviewer #1 (Public Review):

      […] This novel system could serve as a powerful tool for loss-of-function experiments that are often used to validate a drug target. Not only this tool can be applied in exogenous systems (like EGFRdel19 and KRASG12R in this paper), the authors successfully demonstrated that ARTi can also be used in endogenous systems by CRISPR knocking in the ARTi target sites to the 3'UTR of the gene of interest (like STAG2 in this paper).

      We thank the referee for highlighting the novelty and potential of the ARTi system.

      ARTi enables specific, efficient, and inducible suppression of these genes of interest, and can potentially improve therapeutic target validations. However, the system cannot be easily generalized as there are some limitations in this system:

      • The authors claimed in the introduction sections that CRISPR/Cas9-based methods are associated with off-target effects, however, the author's system requires the use CRISPR/Cas9 to knock out a given endogenous genes or to knock-in ARTi target sites to the 3' UTR of the gene of interest. Though the authors used a transient CRISPR/Cas9 system to minimize the potential off-target effects, the advantages of ARTi over CRISPR are likely less than claimed.

      We thank the reviewer for raising these very valid concerns about potential off-target effects related to the CRISPR/Cas9-based gene knockout or engineering of endogenous ARTi target sites. In contrast to conventional RNAi- and CRISPR-based approaches, such off-target effects can be investigated prior to loss-of-function experiments through comparison between parental and engineered cells, which in the absence of CRISPR-induced off-target events are expected to be identical. Subsequent ARTi experiments provide full control over RNAi-induced off-target activities through comparison of target-site engineered and parental cells. However, we agree that undetected CRISPR/Cas9-induced off-target events cannot be ruled out in a definitive way, which we will point out in our revised manuscript.

      • Instead of generating gene-specific loss-of-function triggers for every new candidate gene, the authors identified a universal and potent ARTi to ensure standardized and controllable knockdown efficiency. It seems this would save time and effort in validating each lost-of-function siRNAs/sgRNAs for each gene. However, users will still have to design and validate the best sgRNA to knock out endogenous genes or to knock in ARTi target sites by CRISPR/Cas9. The latter is by no-means trivial. Users will need to design and clone an expression construct for their cDNA replacement construct of interest, which will still be challenging for big proteins.

      We fully agree that the required design of gene-specific sgRNAs and subsequent CRISPR-engineering steps are by no means trivial. However, we believe that decisive advantages of the method, in particular the robustness of LOF perturbations and additional means for controlling off-target activities, can make ARTi an investment that pays off. In our experience, much time can be lost in the search for effective LOF reagents, and even when these are found, questions about off-target activity remain. While ARTi overcomes many of these challenges by providing a standardized experimental workflow, we do not propose to replace all other LOF approaches by this method. Instead, we would position ARTi as a unique orthogonal approach for the stringent validation and in-depth characterization of candidate target genes, as we will highlight in our revised discussion.

      • The approach of knocking-out an endogenous gene followed by replacement of a regulatable gene can also be achieved using regulated degrons, and by tet-regulated promoters included in the gene replacement cassette. The authors should include a discussion of the merits of these approaches compared with ARTi.

      We thank the reviewer for pointing out these alternative LOF methods. We had already included a brief discussion of chemical-genetic LOF methods based on degron tags. While we certainly share the current excitement about degron technologies, they inevitably require changes to the coding sequence of target proteins, which can alter their regulation and function in ways that are hard to control for. In our revised discussion, we will add a brief comparison to conventional tet-regulatable expression systems, which unlike ARTi require the use of ectopic tet-responsive promoters. Overall, we would position ARTi as an orthogonal tool that enables inducible and reversible LOF perturbations without changing the coding sequence and the endogenous transcriptional control of candidate target genes.

      Reviewer #2 (Public Review):

      […] The system is very innovative, likely easy to be established and used by the scientific community and thus very meaningful.

      We thank the reviewer for their enthusiasm about ARTi.

  3. Feb 2023
    1. Author Response

      Reviewer #1 (Public Review):

      Starrett, Gabriel et al. investigated 43 bladder cancers (primary tumors), 5 metastases and 14 normal tissues from 43 solid organ transplant recipients of 5 Transplant Cancer Match Study participating registries (US) for the presence of viral genetic signatures, their host genome integration and possible contribution in carcinogenesis. They isolated DNA and RNA from FFPE tissues to perform state of the art whole genome and transcriptome sequencing. They find that 20 of the primary tumors, 3 of the metastases and 7 of the normal tissues harbor viral signatures with BKPyV and JCPyV being the most prevalent viruses detected. The bulk of the experiments focuses on the 9 BKPyV-positive primary tumors. They report that several of the BKPyV-positive tumors show host genome integration of BKPyV with associated focal amplifications of adjacent host chromosome regions, with chromosome 1 being the most prevalent. Furthermore, BKPyV-positive tumors show a distinct transcriptomic signature with gene expression changes related to DNA damage responses, cell cycle progression, angiogenesis, chromatin organization, mitotic spindle assembly, chromosome condensation/separation and neuronal differentiation. The authors only touch the features of other virus-positive tumors, e.g. those with JCPyV and HPV signals, without offering further detail or thought. The overall mutation signature analysis reveals no clear correlation between presence of viral sequences and tumor mutation burden suggesting that many different, virus-unrelated, factors possibly contribute to bladder cancer genesis and progression. Most striking are cases potentially linked to aristolochic acid, ABOBUCK3 and SBS5. Thus, while the approach is state-of-the-art, the causality of viral signatures and oncogenesis and vice versa remains unsolved.

      Strengths:

      1) The study assesses 43 primary tumors, 5 metastases and 14 normal tissues from 43 solid organ transplants of different kinds (24x kidney, 4x liver, 14x heart and/or lung, 1x pancreas) rather than just focusing on a particular organ.

      2) The study makes use of whole genome sequencing and transcriptomics and the assayed material is extracted from FFPE tissue, which shows a high level of practical, technical and computational skills and expertise.

      Weaknesses:

      1) There have been multiple inconsistencies in sample number and figure references throughout the publication. Is it 19 or 20 cases that have viral sequences detected? A comprehensive checker board table showing all cases, the available tissue samples and respective analyses would be in order.

      We would like to thank the reviewer for their detailed assessment of the manuscript. A checkerboard table of all samples tissues and analysis has been added as supplemental table 1 (Supplementary file 1a).

      2) The overall low coverage of the whole genome sequencing, which the authors mention, and the relatively big variation in coverage in both datasets (WGS, transcriptomics) are major limitations of the study. Possibly, this was done to increase specificity, but sorting out and discarding reads may also be problematic. Please comment.

      Besides performing quality and adapter trimming as described in the methods, we did not discard any reads. Experimental design and analysis were conducted to be as inclusive as possible considering the rarity of these specimens.

      Reviewer #2 (Public Review):

      Starrett et al performed whole genome and transcriptome sequencing of bladder cancers from 43 organ transplant recipients. They found that most of these tumors contained DNA from one of four viruses (BKPyV, JCPyV, HPV, and TTV). Viral genomes are most often integrated into the genomes of these tumor cells and the authors provide evidence that the integration utilized the POL theta-mediated end joining pathway. In most cases, viral RNA was detected in tumors with viral DNA. This suggests that the viruses are actively altering the cellular environment. Frequently, this resulted in similarities for overall gene expression patterns in the tumors that were grouped by the type of virus present in the tumor. Moreover, the changes in expression linked with viral gene expression were found in genes relevant to tumorigenesis. Immunohistochemical detection of viral proteins in these tumors also demonstrated active viral gene expression. However, the presence of viral proteins was heterogenous within the tumor, with between 1 and 100% of the tumor staining positive for BKPyV large T antigen. An analysis of mutational signatures in these tumors indicate that the viruses are also shaping the tumor genome by inducing mutations. Evidence that specific viruses are contributing to tumorigenesis in organ transplant patients has fundamental implications for preventing tumorigenesis in these patients.

      The conclusions of this paper are generally well supported by the data provided. Indeed, there is little doubt that viral infections are more likely in these tumors. However, there are aspects of the paper that could be improved and or clarified. Most importantly, despite the strong evidence that the viruses are altering the tumor cell environment, it is unclear if these changes are necessary for tumorigenesis or less excitingly the result of an even more immune suppressive environment within the tumor. The heterogeneity of the LT expression suggests that the presence of the viral DNA and RNA may not be enough to assess whether it is actively contributing to the tumor. Is an increased frequency of viral protein staining linked with any evidence of an active contribution to tumorigenesis (fewer tumor-suppressor/oncogene mutations). that they reduced mutations in tumor suppressors. This might be easiest to assess with the tumors that have oncogenic HPV DNA. If those tumors lacked p53 and RB mutations, it would support a causative role for the virus.

      We thank the reviewer for their thoughtful review. Indeed, in Figure 6 we show that no BKPyV-positive or HPV-positive tumor harbored mutations in RB1. Additionally, only one BKPyV-positive tumor and none of the HPV-positive tumors had a mutation in TP53. We have added further emphasis to this point on page 14, “None of the HPV-positive tumors with WGS harbored mutations in TP53 or RB1. Similarly, none of the polyomavirus-positive tumors harbored mutations in RB1 and only TBC08 had a frameshift mutation in TP53.”

    1. Author Response

      Reviewer #1 (Public Review):

      Buglak et al. describe a role for the nuclear envelope protein Sun1 in endothelial mechanotransduction and vascular development. The study provides a full mechanistic investigation of how Sun1 is achieving its function, which supports the concept that nuclear anchoring is important for proper mechanosensing and junctional organization. The experiments have been well designed and were quantified based on independent experiments. The experiments are convincing and of high quality and include Sun1 depletion in endothelial cell cultures, zebrafish, and in endothelial-specific inducible knockouts in mice.

      We thank the reviewer for their enthusiastic comments and for noting our use of multiple model systems.

      Reviewer #2 (Public Review):

      Endothelial cells mediate the growth of the vascular system but they also need to prevent vascular leakage, which involves interactions with neighboring endothelial cells (ECs) through junctional protein complexes. Buglak et al. report that the EC nucleus controls the function of cell-cell junctions through the nuclear envelope-associated proteins SUN1 and Nesprin-1. They argue that SUN1 controls microtubule dynamics and junctional stability through the RhoA activator GEF-H1.

      In my view, this study is interesting and addresses an important but very little-studied question, namely the link between the EC nucleus and cell junctions in the periphery. The study has also made use of different model systems, i.e. genetically modified mice, zebrafish, and cultured endothelial cells, which confirms certain findings and utilizes the specific advantages of each model system. A weakness is that some important controls are missing. In addition, the evidence for the proposed molecular mechanism should be strengthened.

      We thank the reviewer for their interest in our work and for highlighting the relative lack of information regarding connections between the EC nucleus and cell periphery, and for noting our use of multiple model systems. We thank the reviewer for suggesting additional controls and mechanistic support, and we have made the revisions described below.

      Specific comments:

      1) Data showing the efficiency of Sun1 inactivation in the murine endothelial cells is lacking. It would be best to see what is happening on the protein level, but it would already help a great deal if the authors could show a reduction of the transcript in sorted ECs. The excision of a DNA fragment shown in the lung (Fig. 1-suppl. 1C) is not quantitative at all. In addition, the gel has been run way too short so it is impossible to even estimate the size of the DNA fragment.

      We agree that the DNA excision is not sufficient to demonstrate excision efficiency. We attempted examination of SUN1 protein levels in mutant retinas via immunofluorescence, but to date we have not found a SUN1 antibody that works in mouse retinal explants. We argue that mouse EC isolation protocols enrich but don’t give 100% purity, so that RNA analysis of lung tissue also has caveats. Finally, we contend that our demonstration of a consistent vascular phenotype in Sun1iECKO mutant retinas argues that excision has occurred. To test the efficiency of our excision protocol, we bred Cdh5CreERT2 mice with the ROSAmT/mG excision reporter (cells express tdTomato absent Cre activity and express GFP upon Cre-mediated excision (Muzumdar et al., 2007). Utilizing the same excision protocol as used for the Sun1iECKO mice, we see a significantly high level of excision in retinal vessels only in the presence of Cdh5CreERT2 (Reviewer Figure 1).

      Reviewer Figure 1: Cdh5CreERT2 efficiently excises in endothelial cells of the mouse postnatal retina. (A) Representative images of P7 mouse retinas with the indicated genotypes, stained for ERG (white, nucleus). tdTomato (magenta) is expressed in cells that have not undergone Cre-mediated excision, while GFP (green) is expressed in excised cells. Scale bar, 100μm. (B) Quantification of tdTomato fluorescence relative to GFP fluorescence as shown in A. tdTomato and GFP fluorescence of endothelial cells was measured by creating a mask of the ERG channel. n=3 mice per genotype. ***, p<0.001 by student’s two-tailed unpaired t-test.

      2) The authors show an increase in vessel density in the periphery of the growing Sun1 mutant retinal vasculature. It would be important to add staining with a marker labelling EC nuclei (e.g. Erg) because higher vessel density might reflect changes in cell size/shape or number, which has also implications for the appearance of cell-cell junctions. More ECs crowded within a small area are likely to have more complicated junctions. Furthermore, it would be useful and straightforward to assess EC proliferation, which is mentioned later in the experiments with cultured ECs but has not been addressed in the in vivo part.

      We concur that ERG staining is important to show any changes in nuclear shape or cell density in the post-natal retina. We now include this data in Figure1-figure supplement 1F-G. We do not see obvious changes in nuclear shape or number, though we do observe some crowding in Sun1iECKO retinas, consistent with increased density. However, when normalized to total vessel area, we do not observe a significant difference in the nuclear signal density in Sun1iECKO mutant retinas relative to controls.

      3) It appears that the loss of Sun1/sun1b in mice and zebrafish is compatible with major aspects of vascular growth and leads to changes in filopodia dynamics and vascular permeability (during development) without severe and lasting disruption of the EC network. It would be helpful to know whether the loss-of-function mutants can ultimately form a normal vascular network in the retina and trunk, respectively. It might be sufficient to mention this in the text.

      We thank the reviewer for pointing this out. It is true that developmental defects in the vasculature resulting from various genetic mutations are often resolved over time. We’ve made text changes to discuss viability of Sun1 global KO mice and lack of perduring effects in sun1 morphant fish, perhaps resulting from compensation by SUN2, which is partially functionally redundant with SUN1 in vivo (Lei et al., 2009; Zhang, et al., 2009) (p. 20).

      4) The only readout after the rescue of the SUN1 knockdown by GEF-H1 depletion is the appearance of VE-cadherin+ junctions (Fig. 6G and H). This is insufficient evidence for a relatively strong conclusion. The authors should at least look at microtubules. They might also want to consider the activation status of RhoA as a good biochemical readout. It is argued that RhoA activity goes up (see Fig. 7C) but there is no data supporting this conclusion. It is also not clear whether "diffuse" GEF-H1 localization translates into increased Rho A activity, as is suggested by the Rho kinase inhibition experiment. GEF-H1 levels in the Western blot in (Fig. 6- supplement 2C) have not been quantitated.

      We agree that analysis of RhoA activity and additional analysis of rescued junctions strengthens our conclusions, so we performed these experiments. New data (Figure 6IJ) shows that co-depletion of SUN1 and GEF-H1 rescues junction integrity as measured by biotin-matrix labeling. Interestingly, co-depletion of SUN1 and GEF-H1 does not rescue reduced microtubule density at the periphery (Figure 6-figure supplement 3BC), placing GEF-H1 downstream of aberrant microtubule dynamics in SUN1 depleted cells. This is consistent with our model (Figure 8) describing how loss of SUN1 leads to increased microtubule depolymerization, resulting in release and activation of GEF-H1 that goes on to affect actomyosin contractility and junction integrity. In addition, we include images of the junctions in GEF-H1 single KD (Figure 6-figure supplement 3BC) and quantify the western blot in Figure 6-figure supplement 3A.

      We performed RhoA activity assays and new data shows that SUN1 depletion results in increased RhoA activation, while co-depletion of SUN1 and GEF-H1 ameliorates this increase (Figure 6-figure supplement 2D). This is consistent with our model in which loss of SUN1 leads to increased RhoA activity via release of GEF-H1 from microtubules. In addition, we now cite a recent study describing that GEF-H1 is activated when unbound to microtubules, with this activation resulting in increased RhoA activity (Azoitei et al., 2019).

      5) The criticism raised for the GEF-H1 rescue also applies to the co-depletion of SUN1 and Nesprin-1. This mechanistic aspect is currently somewhat weak and should be strengthened. Again, Rho A activity might be a useful and quantitative biochemical readout.

      We respectfully point out that we showed that co-depletion of nesprin-1 and SUN1 rescues SUN1 knockdown effects via several readouts, including rescue of junction morphology, biotin labeling, microtubule localization at the periphery, and GEFH1/microtubule localization. We’ve moved this data to the main figure (Figure 7B-C, E-F) to better highlight these mechanistic findings. These results are consistent with our model that nesprin-1 effects are upstream of GEF-H1 localization. We also added results showing that nesprin-1 knockdown alone does not affect junction integrity, microtubule density, or GEF-H1/microtubule localization (Figure 7-figure supplement 1B-G).

      Reviewer #3 (Public Review):

      Here, Buglak and coauthors describe the effect of Sun1 deficiency on endothelial junctions. Sun1 is a component of the LINC complex, connecting the inner nuclear membrane with the cytoskeleton. The authors show that in the absence of Sun1, the morphology of the endothelial adherens junction protein VE-cadherin is altered, indicative of increased internalization of VE-cadherin. The change in VE-cadherin dynamics correlates with decreased angiogenic sprouting as shown using in vivo and in vitro models. The study would benefit from a stricter presentation of the data and needs additional controls in certain analyses.

      We thank the reviewer for their insightful comments, and in response we have performed the revisions described below.

      1) The authors implicate the changes in VE-cadherin morphology to be of consequence for "barrier function" and mention barrier function frequently throughout the text, for example in the heading on page 12: "SUN1 stabilizes endothelial cell-cell junctions and regulates barrier function". The concept of "barrier" implies the ability of endothelial cells to restrict the passage of molecules and cells across the vessel wall. This is tested only marginally (Suppl Fig 1F) and these data are not quantified. Increased leakage of 10kDa dextran in a P6-7 Sun1-deficient retina as shown here probably reflects the increased immaturity of the Sun1-deficient retinal vasculature. From these data, the authors cannot state that Sun1 regulates the barrier or barrier function (unclear what exactly the authors refer to when they make a distinction between the barrier as such on the one hand and barrier function on the other). The authors can, if they do more experiments, state that loss of Sun1 leads to increased leakage in the early postnatal stages in the retina. However, if they wish to characterize the vascular barrier, there is a wide range of other tissue that should be tested, in the presence and absence of disease. Moreover, a regulatory role for Sun1 would imply that Sun1 normally, possibly through changes in its expression levels, would modulate the barrier properties to allow more or less leakage in different circumstances. However, no such data are shown. The authors would need to go through their paper and remove statements regarding the regulation of the barrier and barrier function since these are conclusions that lack foundation.

      We thank the reviewer for pointing out that the language used regarding the function and integrity of the junctions is confusing, although we suggest that the endothelial cell properties measured by our assays are typically equated with “barrier function” in the literature. However, we have edited our language to precisely describe our results as suggested by the reviewer.

      2) In Fig 6g, the authors show that "depletion of GEF-H1 in endothelial cells that were also depleted for SUN1 rescued the destabilized cell-cell junctions observed with SUN1 KD alone". However, it is quite clear that Sun1 depletion also affects cell shape and cell alignment and this is not rescued by GEF-H1 depletion (Fig 6g). This should be described and commented on. Moreover please show the effects of GEF-H1 alone.

      We thank the reviewer for pointing out the effects on cell shape. SUN1 depletion typically leads to shape changes consistent with elevated contractility, but this is considered to be downstream of the effects quantified here. We updated the panel in Figure 6G to a more representative image showing cell shape rescue by co-depletion of SUN1 and GEF-H1. We present new data panels showing that GEF-H1 depletion alone does not affect junction integrity (Figure 6I-J). We also present new data showing that co-depletion of GEF-H1 and SUN1 does not rescue microtubule density at the periphery (Figure 6-figure supplement 3B-C), consistent with our model that GEF-H1 activation is downstream of microtubule perturbations induced by SUN1 loss.

      3) In Fig. 6a, the authors show rescue of junction morphology in Sun1-depleted cells by deletion of Nesprin1. The effect of Nesprin1 KD alone is missing.

      We thank the reviewer for this comment, and we now include new panels (Figure 7figure supplement 1B-G) demonstrating that Nesprin-1 depletion does not affect biotin-matrix labeling, peripheral microtubule density, or GEF-H1/microtubule localization absent co-depletion with SUN1. These findings are consistent with our model that Nesprin-1 loss does not affect cell junctions on its own because it is held in a non-functional complex with SUN1 that is not available in the absence of SUN1.

      References

      Azoitei, M. L., Noh, J., Marston, D. J., Roudot, P., Marshall, C. B., Daugird, T. A., Lisanza, S. L., Sandί, M., Ikura, M., Sondek, J., Rottapel, R., Hahn, K. M., Danuser, & Danuser, G. (2019). Spatiotemporal dynamics of GEF-H1 activation controlled by microtubule- and Src-mediated pathways. Journal of Cell Biology, 218(9), 3077-3097. https://doi.org/10.1083/jcb.201812073

      Denis, K. B., Cabe, J. I., Danielsson, B. E., Tieu, K. V, Mayer, C. R., & Conway, D. E. (2021). The LINC complex is required for endothelial cell adhesion and adaptation to shear stress and cyclic stretch. Molecular Biology of the Cell, mbcE20110698. https://doi.org/10.1091/mbc.E20-11-0698

      King, S. J., Nowak, K., Suryavanshi, N., Holt, I., Shanahan, C. M., & Ridley, A. J. (2014). Nesprin-1 and nesprin-2 regulate endothelial cell shape and migration. Cytoskeleton (Hoboken, N.J.), 71(7), 423–434. https://doi.org/10.1002/cm.21182

      Lei, K., Zhang, X., Ding, X., Guo, X., Chen, M., Zhu, B., Xu, T., Zhuang, Y., Xu, R., & Han, M. (2009). SUN1 and SUN2 play critical but partially redundant roles in anchoring nuclei in skeletal muscle cells in mice. PNAS, 106(25), 10207–10212.

      Muzumdar, M. D., Tasic, B., Miyamichi, K., Li, L., & Luo, L. (2007). A global doublefluorescent Cre reporter mouse. Genesis, 45(9), 593-605. https://doi.org/10.1002/dvg.20335

      Ueda, N., Maekawa, M., Matsui, T. S., Deguchi, S., Takata, T., Katahira, J., Higashiyama, S., & Hieda, M. (2022). Inner Nuclear Membrane Protein, SUN1, is Required for Cytoskeletal Force Generation and Focal Adhesion Maturation. Frontiers in Cell and Developmental Biology, 10, 885859. https://doi.org/10.3389/fcell.2022.885859

      Zhang, X., Lei, K., Yuan, X., Wu, X., Zhuang, Y., Xu, T., Xu, R., & Han, M. (2009). SUN1/2 and Syne/Nesprin-1/2 complexes connect centrosome to the nucleus during neurogenesis and neuronal migration in mice. Neuron, 64(2), 173–187. https://doi.org/10.1016/j.neuron.2009.08.018.

    1. Author Response

      Reviewer #1 (Public Review):

      In mammals, a small subset of genes undergoes canonical genomic imprinting, with highly biased expression in function of parent of origin allele. Recent studies, using polymorphic mouse embryos and tissues, have reevaluating the number of allele-specific expressed genes (ASE) to 3 times more than previously thought, however with most of these novel genes showing a very low ASE (50%-60% bias toward one parental allele). Here, the authors undergo a comparison of 4 datasets and complete bioinformatic reanalysis of 3 recent allele specific RNAseq to study potential novel imprinted genes, using recently released iSoLDE pipeline. Very few genes have been confirmed with true ASE in the different studies and/or validated by pyrosequencing analysis, However, the authors show that most of the newly discovered ASE genes are lying in close proximity of already known imprinted loci and could be co-regulated by these imprinted clusters. This is important to understand how and to which extent imprinted control regions control gene expression.

      This manuscript highlights the number of potential false discovered imprinted genes in previous datasets that could result to either lack of replicates, weak allelic ratio or low gene expression and lack of read depth. But the lack of overlap in the ASE called genes (at the exception to the known imprinted genes) between the different datasets is worrying and important to discuss, as the authors did. I would have appreciated more details into the differences between the different datasets that could explain the lack of reproducibility : library preparation protocol, sequencer technology, SNP calling, number of reads per SNP, bioinformatics pipeline.

      We agree and a comparison of all the studies is included in the methods section. In particular, we have now included more information on SNP calling and sequencer technology.

      Studying allele specific expression of lowly expressed genes is difficult by technology based on PCR amplification (library preparation, pyrosequencing) and could result on a bias expression only due to the random amplification of a small pool of molecules. Could the author compare the level of expression of their different classes of genes? The more robust ASE genes in their study could be the more highly expressed? Several genes were identified only in one or two of the previous studies, were they expressed in the other studies when not define as ASE? This would also allow defining a threshold of expression to study allelic bias in the future. To conclude, this study is an important resource for the epigenetic field and better understand genomic imprinting.

      We thank-you for this suggestion. We have now taken all RNAseq data that we had run through the ISoLDE pipeline and extracted the transcripts per million (TPM) expression levels for each of the genes called in the original studies. We find no over representation of lowly expressed genes in the novel biased genes compared with known imprinted genes. We also looked specifically at the expression levels of the genes tested by pyrosequencing in these datasets and saw no relationship between validation and expression levels. Expression levels are consistent between studies, especially in the same tissue, indicating the lack of reproducibility between studies is not due to differing expression. These observations have been added to the manuscript.

      Reviewer #2 (Public Review):

      This work aims to understand genomic imprinting in the mouse and provide further insight to challenges and patterns identified in previous studies.

      Firstly, genomic imprinting studies have been surrounded by controversy especially ~10 years ago when the explosion of sequencing data but immature methods to analyze it lead to highly exaggerated claims of widespread imprinting. While the methods have improved, clear standards are not set and results still have some inconsistencies between studies. The authors first do a meta-analysis of previous studies, comparing their results and doing a useful reanalysis of the data. This provides some valuable insights into the reasons for inconsistencies and guides towards better study designs. While this work does not exactly set a common standard for the field, or provide a full authoritative catalog of imprinted loci in mouse tissues, it provides a step in that direction. I find these analyses relatively simple and straightforward, but they seem solid.

      Previous studies have described a relatively common pattern of subtle expression bias towards one parental allele, rather than the classical imprinting pattern of fully monoallelic expression. This work digs deeper into this phenomenon, using first the meta-analysis data and then also targeted pyrosequencing analysis of selected loci. The analysis is generally well done, although I did not understand why gDNA amplification bias was not systematically corrected in all cases but only if it was above a given (low) threshold. I doubt this would affect the results much though. To some extent the results confirm previously observed patterns (bimodal distribution of either subtle or full bias, and effect of distance from the core of the imprinted locus). The novel insights mostly concern individual loci, with discovery and validation of some novel genes, typically with a subtle or context-specific parental bias.

      The study also provides some insights into mechanisms, especially by analysis of existing mouse models with a deletion of the ICR of specific loci. The change in the parental bias pattern was then used to infer potential methylation and chromatin-related mechanisms in these imprinted loci, including how the subtle bias further away is achieved. There are interesting novel findings here, as well as hypotheses for further research. However, this is an area where the conclusions rely quite heavily on published research especially as this study doesn't include single-cell resolution, and it's not entirely clear how much of e.g. the Figure 7 mechanisms part is based on discoveries of this study.

      We agree that Figure 7 does not illustrate models based exclusively on data generated in this study: instead, it serves as hypotheses to be tested in the coming years

      Imprinting is a fascinating phenomenon that can be informative of mechanisms of genome regulation and parental effects in general. It is a bit of a niche area though, and the target audience of this study is likely going to be limited to specialists doing research on this specific topic. As the authors point out, the functional importance of the findings is unknown.

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, the authors studied the erythropoiesis and hematopoietic stem/progenitor cell (HSPC) phenotypes in a ribosome gene Rps12 mutant mouse model. They found that RpS12 is required for both steady and stress hematopoiesis. Mechanistically, RpS12+/- HSCs/MPPs exhibited increased cycling, loss of quiescence, protein translation rate, and apoptosis rates, which may be attributed to ERK and Akt/mTOR hyperactivation. Overall, this is a new mouse model that sheds light into our understanding of Rps gene function in murine hematopoiesis. The phenotypic and functional analysis of the mice are largely properly controlled, robust, and analyzed.

      A major weakness of this work is its descriptive nature, without a clear mechanism that explains the phenotypes observed in RpS12+/- mice. It is possible that the counterintuitive activation of ERK/mTOR pathway and increased protein synthesis rate is a compensatory negative feedback. Direct mechanism of Rps12 loss could be studied by ths acute loss of Rps12, which is doable using their floxed mice. At the minimum, this can be done in mammalian hematopoietic cell lines.

      We thank the reviewer for pointing this out. We have addressed this question by developing a new inducible conditional knockout Rps12 mouse model (see response below to major point 1).

      Below are some specific concerns need to be addressed.

      1) Line 226. The authors conclude that "Together, these results suggest that RpS12 plays an essential role in HSC function, including self-renewal and differentiation." The reviewer has three concerns regarding this conclusion and corresponding Figure3. 1) The data shows that RpS12+/- mice have decreased number of both total BM cells and multiple subpopulations of HSPCs. The frequency of HSPC subpopulations should also be shown to clarify if the decreased HSPC numbers arises from decreased total BM cellularity or proportionally decrease in frequency. 2) This figure characterizes phenotypic HSPC in BM by flow and lineage cells in PB by CBC. HSC function and differentiation are not really examined in this figure, except for the colony assay in Figure 3K. BMT data in Figure4 is actually for HSC function and differentiation. So the conclusion here should be rephrased. 3) Since all LT-, ST-HSCs, as well as all MPPs are decreased in number, how can the authors conclude that Rps12 is important for HSC differentiation? No experiments presented here were specifically designed to address HSC differentiation.

      We thank the reviewer for this excellent point. We think that the main defect is in HSC and progenitor maintenance, rather than in HSC differentiation. This is consistent with the decrease in multiple HSC and progenitor populations, as observed both by calculating absolute numbers and by frequency of the parent population (see new Supplementary Figures S2C-S2C). We have removed any references to altered differentiation from the text.

      We added data on the population frequency in the Supplementary Figure 2. And in the corresponding text. See lines 221-235.

      2) Figure 3A and 5E. The flow cytometry gating of HSC/MPP is not well performed or presented, especially HSC plot. Populations are not well separated by phenotypic markers. This concerns the validity of the quantification data.

      We chose a better representative HSC plot and included it in the Figure 3A

      3) It is very difficult to read bone marrow cytospin images in Fig 6F without annotation of cell types shown in the figure. It appears that WT and +/- looked remarkably different in terms of cell size and cell types. This mouse may have other profound phenotypes that need detailed examination, such as lineage cells in the BM and spleen, and colony assays for different types of progenitors, etc.

      The purpose of the bone marrow cytospin images in Figure 6F was to show the high number of apoptotic cells in the bone marrow of Rps12 KO/+ mice compared with controls. The differences in apoptosis in the LSK and myeloid progenitor populations are quantified in the flow cytometry data shown in Figure 6G-H. A detailed quantitative analysis of different bone marrow cell populations and their relative frequencies is also shown in Figures 2 and 3. In Rps12 KO/+ bone marrow, we observed a significant decrease in multiple stem cell and progenitor populations.

      4) For all the intracellular phospho-flow shown in Fig7, both a negative control of a fluorescent 2nd antibody only and a positive stimulus should be included. It is very concerning that no significant changes of pAKT and pERK signaling (MFI) after SCF stimulation from the histogram in WT LSKs. There are no distinct peaks that indicate non-phospho-proteins and phosphoproteins. This casts doubt on the validity of results. It is possible though that Rsp12+/- have very high basal level of activation of pAKT/mTOR and pERK pathway. This again may point to a negative feedback mechanism of Rps12 haploinsufficiency.

      It is true that we did not observe an increase in pAKT, p4EBP1, or pERK in control cells in every case. This is often an issue with these specific phospho-flow cytometry antibodies, as they are not very sensitive, and the response to SCF is very time-dependent. We did observe an increase in pS6 with SCF in both LSK cells and progenitors (Figure 7B, E). However, the main point of this experiment was to assess the basal level of signaling in Rps12 KO/+ vs control cells. We did not observe hypersensitivity of RpS12 cells to SCF, but we did observe significant increases in pAKT, pS6, p4EBP1, and pERK in Rsp12 KO/+ LSK cells.

      To address the concern about the validity of staining, please see the requested flow histograms for unstained vs individual Phospho-antibodies (Ab): p4EBP1, pERK, pS6 and pAKT (Figure R1 for reviewers) below. Additionally, since staining with the surface antibodies potentially can change the peak, we are including additional an control of the cell surface antibodies vs full sample with surface antibodies and Phospho-Ab: p4EBP1, pERK, pS6 and pAKT. We can include this figure in the Supplementary Data if requested.

      5) The authors performed in vitro OP-Puro assay to assess the global protein translation in different HSPC subpopulations. 1) Can the authors provide more information about the incubation media, any cytokine or serum included? The incubation media with supplements may boost the overall translation status, although cells from WT and RpS12+/- are cultured side by side. Based on this, in vivo OP-Puro assay should be performed in both genotypes. 2) Polysome profiling assay should be performed in primary HSPCs, or at least in hematopoietic cell lines. It is plausible that RpS12 haploinsufficiency may affect the content of translational polysome fractions.

      We are including these details in the methods section: for in vitro OP-Puro assay (lines 555565) cells were resuspended in DMEM (Corning 10-013-CV) media supplemented with 50 µM β-mercaptoethanol (Sigma) and 20 µM OPP (Thermo Scientific C10456). Cells were incubated for 45 minutes at 37°C and then washed with Ca2+ and Mg2+ free PBS. No additional cytokines were added.

      We did not perform polysome profiles. Polysome profiling of mutant stem and progenitor cells would be very challenging, as their numbers are much reduced. We now deem this of reduced interest, given the conclusion of the revised manuscript that RpS12 haploinsufficiency reduces overall translation. Also, because in RpS12-floxed/+;SCL-CRE-ERT mouse model with acute deletion of RpS12 we observed the expected decrease in translation in HSCs using the same ex vivo OPP protocol, we did not follow up with in vivo OPP treatment,

    1. Author Response:

      Reviewer #1 (Public Review):

      1) All feeding data presented in the manuscript are from the interactions of individual flies with a source of liquid food, where interaction is defined as 'physical contact of a specific duration.' It would be helpful to approach the measurement of feeding from multiple angles to form the notion of hedonic feeding since the debate around hedonic feeding in Drosophila has been ongoing for some time and remains controversial. One possibility would be to measure food intake volumetrically in addition to food interaction patterns and durations (e.g. via the modified CAFE assay used by Ja).

      We acknowledge that our FLIC assays address only one dimension of feeding behavior, physical interaction with liquid food. However, there is clear evidence that interactions are strongly predictive of consumption, and it would be technically difficult to measure feeding durations at the resolution of milliseconds using a Café assay.  Nevertheless, we appreciate the spirit of this comment and agree that expanding our inference to other measures of feeding, as well as feeding environments, is an important next step. To this end, we will include measures of feeding on more traditional solid food, using the ConEx assay, and find that flies in the hedonic environment consume twice as much sucrose volume compared to flies in the control environment. These will be added as supplemental data (Figure 1 – Figure Supplement 1A), and the text will be updated to reflect our findings.

      2) Some of the statistical analyses were presented in a way that may make understanding the data unnecessarily difficult for readers. Examples include:

      a) In Table I the authors present food interaction classifications based on direct observation. These are helpful. However, the classification system is updated or incompletely used as the manuscript progresses, most importantly changing from four categories with seven total subcategories to three categories and no subcategories. In subsequent data analyses, only one or two of these categories are assessed. It would be helpful, especially when moving from direct observation to automated categorization, to quantify the exact correspondences between all of the prior and new classifications, as well as elaborate on the types of data that are being excluded.

      We appreciate the feedback on our usage of the behavioral classification system and will make several adjustments to improve it. We will rename some of the behaviors to make them more intuitive (see Reviewer #2, comment #1), and update the main text and Table 1 to reflect these changes. We will update the text and figures to be more transparent about when we group subcategories into main categories for quantification and when we quantify all subcategories separately. Because these videos required manual scoring by an experimenter, after our initial characterizations we opted to score only main categories (which contain subcategories). We agree that it would be useful to quantify correspondence between subcategories and the automated FLIC signal. However, we believe this task is better suited for more advanced and automated video tracking software, and, incidentally, more sophisticated analysis of FLIC data, which has a very high-dimensional character that has yet to be properly exploited. At the moment, therefore, we are not confident in the ability to understand the data at the desired resolution.

      b) The authors switch between a variety of biological and physiological conditions with varying assays, which makes following the train of reasoning nearly impossible to follow. For example, the authors introduce us to circadian aspects of feeding behavior to introduce the concept of 'meal' and 'non-meal' periods of the day. It is then not clear in which of the subsequent experiments this paradigm is used to measure food interactions. Is it the majority of the subsequent figure panels? However, the authors also use starved flies for some assays, which would be incompatible with circadian-locked meals. The somewhat random and incompletely reported use of males and females, which the authors show behave differently, also makes the results more difficult to parse. Finally, the authors are comparing within-fly for the 'control environment' and between flies for their 'hedonic environment' (Figure 3A and subsequent panels), which I believe is not a good thing to do.

      We apologize for our difficulties conveying our inference, which was also noted by Reviewer #2.  We will work hard to improve this component in the revision. With respect to the confusion about circadian feeding, we introduced circadian meal-times to complement starvation as a second (perhaps more natural) way to measure behaviors associated with hunger. Importantly, we do not use circadian meal-times beyond Figure 1; all subsequent FLIC experiments were conducted during non-meal times of day for 6 hours, which avoids confounding our data with circadian-locked meals even when we use starved flies. We will clarify this point in the revision.

      The reviewer also points out that we make both within-fly and between-fly comparisons, which is a point that we note. Perhaps some concern arises, again, from the challenges that we faced in properly delineating our inferences about different types of feeding measures (and motivations). Inference about homeostatic feeding was made using within-fly measures, comparing events on sucrose vs. those on yeast. Inference about hedonic feeding was made using between fly measures (average durations of different flies on 2% vs. 20% sucrose). Treatment comparisons to control always used measures of the same type, such that inference was not made using between-fly measures for treatment and within-fly for control (i.e., all of our figure panels were either within-fly or between fly). We will clarify this in the revision.

      Importantly, our approach to all experiments avoided confounding by used randomized design at multiple levels (e.g., randomizing control and hedonic environments to FLIC DFMs, alternating food choice sidedness in the DFMs), by ensuring that flies in both environments are sibling flies that came from the same vial environment before being tested, and by performing each experiment multiple times.

      c) Statistical analyses are not always used consistently. For example, in Figures 3B and C, post hoc test results are shown for sucrose vs. yeast interactions, but no such statistics are given for 3E and 3F, preventing readers from assessing if the assay design is measuring what the authors tell us it is measuring.

      We report p-values for two-way ANOVA interaction terms for all appropriate experiments. If (and only if) the interaction term is significant, we conduct post-hoc tests for more detailed statistical analysis and report the p-values. The reviewer points out that we do not perform post-hoc tests in figures 3E and 3F. These figures had a non-significant interaction term, and thus, we did not feel a post-hoc test was warranted.

      Reviewer #2 (Public Review):

      1) The dissection of feeding into distinct behavioral elements and its correlation with electrical FLIC signals that allow interpreting feeding types is a fundamental new method to dissect feeding in flies. However, the categories of micro-behaviors in Table 1 are not intuitive.

      We agree and will update the Table, figures, and main text. Please see also our response to Reviewer #1, comment #1.

      2) The details for the behavioral data analysis are not clear and should be made more obvious. For example, how many males and females were used in each experiment? Were any of the females mated or were they all virgins? If all virgins, why not use mated females? Mating status may have an effect on the feeding drive. If mated and virgin females were used, are there any differences between them? Similarly, for diurnal feeding experiments, it is not immediately clear from the graphs how many animals were used and how the frequencies were obtained (Fig. 1F, presumably averages for each category per fly but that is inconsistent with the legend in the supplement for this figure). Why does the transition heat map not include all micro-behaviors (Fig. 1E, no LQ data which are significant in diurnal feeding)?

      We will clarify the number of flies and events for each behavioral experiment in Figure 1, and we will update the figure legend appropriately. We note that these behavioral datasets are non-overlapping, and each time we mention the number of events scored in the text, that number includes only “new” videos. Female and male flies for all experiments were mated, and we will clarify this in the main text and methods.

      For the diurnal experiment in Figure 1F, we scored over 700 events from new (non-overlapping) video compilations and updated the number of flies and event number in the figure legend. The diurnal data we present in the supplement for this figure is a separate experiment conducted on 38 flies, intended only to demonstrate the circadian nature of fly feeding.

      For the transition heat map, analysis of this sort seems to require a large amount of data to have sufficient power to return a transition matrix. LQ events are relatively low in frequency, so we opted to combine them with L events for this analysis. We have updated the figure and figure legend to reflect this.

      3) The CaMPARI images do not look great, particularly in the pan-neuronal condition (Fig. 5A). It would be useful to include the movie of the stack. Did any other brain regions show activity differences, such as SEZ or PI? These regions are known to be involved in feeding so it seems surprising they show no effect.

      We find that CaMPARI imaging is subject to high levels of noise and background, especially when using a broad driver as the reviewer has pointed out. This is why we opted to follow-up our pan-neuronal CaMPARI experiment using a more specific mushroom body driver and to test our correlational findings of increased MB activity in hedonic environments with genetic approaches in the remainder of Figure 5. We will include movies of the confocal stacks for both CaMPARI experiments, as requested.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes the accrual of RSV mutations in a severely immunocompromised child with persistent infection and demonstrates that ribavirin increases the observed mutation rate with base pair changes (C to U and G to A) compatible with its known mechanism. The paper utilizes a mathematical model to explain the counterintuitive finding that viral load does not decrease despite loss of viral fitness and clinical improvement. Positive selection is observed but does not keep pace with deleterious mutations induced by ribavirin. Overall, though the data is restricted and limited to a single person, the analysis is rigorous and supports the paper's interesting conclusions.

      The paper is fascinating, but its generalizability is somewhat limited by the single study participant. Nevertheless, comparisons of therapy-induced deleterious mutations versus adaptive mutations over time is potentially important for multiple viruses.

      We thank the reviewer for their comments. Although we acknowledge that this is only a single case of infection, we believe that it is an interesting case, and we are keen to share our findings with the broader scientific community.

      Reviewer #2 (Public Review):

      In this work, Illingworth et al. investigate the effectiveness of ribavirin and favipiravir on the treatment of a paediatric patient with chronic RSV. These drugs cause mutations and the authors tested whether they could observe this effect through deep sequencing viruses from nasal aspirates over the course of treatment. They found an increase in mutations caused by ribavirin but favipiravir appeared to have no additional mutagenic effect. Despite the lack of change in viral load, the authors suggest that the ribavirin reduced viral fitness and did not lead to adaptive escape mutations. The authors modelled how generation time and fitness interacted with mutational load. They also estimated fitness for different haplotypes generated from the mutational data.

      Strengths of the paper:

      Using mutagenic drugs to treat viruses is generally accepted but results have been mixed with severe viral infections and specific evidence of the precise effects of the drugs is often lacking. This paper is especially valuable for demonstrating that despite in vitro evidence that favipiravir had some effect against RSV, there was no evidence for favipiravir having an effect in a patient. This differs from the authors previous work showing a clear clinical benefit to favipiravir in treating influenza. This paper also appears to be the first to sequence RSV from a patient having been exposed to ribavirin which is important for demonstrating that the drug is having a measurable effect.

      Weaknesses in the paper:

      I think there is a conceptual problem with the paper. Ribavirin is supposed to increase the mutational rate of the virus which would increase the mutational load. Mutational load has been calculated by summing up the frequencies of minor alleles. However, if a particular mutation rises in frequency, it does not mean that ribavirin has caused additional mutations at the same site but rather viruses containing the mutation have risen in frequency. If a subpopulation containing mutations rises through drift or selection to a relatively high percentage that will bias the mutational load. The authors provide ~75 mutations which were at significant percentages across multiple different timepoints. It seems that these mutations contribute significantly to the mutational load but changes in mutation percentages between samples do not reflect changes in mutational events but changes in viral haplotypes/subpopulations. In a previous study Lumby et al. 2020, the authors removed mutations at >5% from their analysis but there is no indication that they performed this step similarly here. Summing many small changes will give an indication of background mutational rate (though counting only a single mutation at each locus is perhaps the only method to remove the effect of viral clonal expansion).

      The mutational load is defined as the mean number of mutations per virus with respect to the consensus, equal to the sum of minor allele frequencies across the genome. We filter variant frequencies prior to calculating mutational load to compensate for noise arising from genome sequencing.

      We use a deterministic model of mutation-selection balance to describe the overall dynamics of mutational load, but are conscious that the dynamics of individual variants are complex. Genetic drift could contribute to these dynamics, as might hidden structure in the viral population, with stochastic observations of viruses from distinct subpopulations. As we make clear, our key assumption regarding mutational load is that all variants from the consensus are at least mildly deleterious; under this assumption calculating the sum of allele frequencies is an appropriate measurement of mutational load. Our model accounts for the possible presence of variants under stronger and weaker selection being observed at lower and higher frequencies respectively.

      We note that, in a case where distinct variants occurred in subpopulations, these variants would be observed in a mixture at lower frequencies than they existed in the subpopulations. This would lead to the observation of more variants overall, with each variant being at a reduced frequency. While stochastic effects would alter the frequencies of mutations in individual samples, if mutational load acted equally on each subpopulation, the total mutational load would be preserved across samples. The existence of subpopulations would not of itself invalidate the calculation of mutational load as we have performed it.

      Our previous study Lumby et al, 2020 considered a case where favipiravir was given for a short period of time in a case of influenza B infection. In that case we did not make an assessment of the total mutational load in a population, although we did remove mutations at >5% when considering the spectrum of mutations i.e. the proportion of mutations of each type C to T, G to A, etc. We are still working on different approaches to measuring mutational load, but we are not convinced that removing high frequency mutations is always a good idea when evaluating the total mutational load. Cutting out higher frequencies is potentially a useful means to study mutational spectra under viral mutagenesis, but in a measurement of mutational load it could exclude deleterious mutations.

      While ribavirin appears to have shown an effect, many questions remain. Why does the mutational load only increase for 3 points before plateauing? The authors would likely argue that this is the new saturation point for mutation load but they don't test it. Sequencing points from after the cessation of treatment would be expected to show lower mutational load but this data was not collected. Furthermore, questions remain over the methodology. It is thought that Ribavirin should only increase transitions and a transition/transversion ratio for the different samples would have been helpful. The absolute numbers of many mutation classes appear to have increased including transversions e.g AU. There isn't a good reason why nucleoside analogues should have caused this effect and perhaps it is an artefact.

      Ribavirin has been shown to increase C to T and G to A mutations; these are both transitions, but T to C and A to G mutations are also transitions; the proportion of these was found to decrease under treatment. We have included a new figure showing Ts/Tv ratios but we do not find a significant pattern of change in these statistics over time.

      The plateauing of the observed mutational load is consistent with the theory of mutationselection balance. Following a change in the mutation rate we would expect a shift to a new equilibrium U/s.

      Sequencing was conducted as part of an investigation that was secondary to treatment of the patient: All of the samples that were collected were sequenced. We agree that upon the cessation of mutagenic drugs we would expect to see a fall in mutational load.

      I don't think that the authors can reasonably determine how many haplotypes there are in the population from short read sequencing data. I think that the sequencing data very clearly shows subpopulations due to the large changes in mutation frequencies between different time points. The authors say that their analysis assumes a well-mixed population which is clearly not the case. Therefore, determining fitness of different haplotypes or mutations is likely not accurate.

      Although we have short read sequencing data, some of the reads we have span more than one locus, providing some information about linkage between variants. As noted in the Methods section our inference approach provides a minimal reconstruction of haplotypes: Our reconstruction details the smallest set of distinct haplotypes necessary to explain the data.

      Where we use a haplotype-based model to reconstruct the within-host evolution of the population, we neglect the potential presence of subpopulations by assuming a well-mixed population, then fully discuss the implications of this assumption for our result.

      Our basic question is whether within-host adaptation leads to a gain in viral fitness in excess of the loss of fitness imposed by an increase in mutational load. In this comparison we make a conservative (i.e. low) estimate for the extent of the loss of fitness through mutational load.

      When we look at within-host evolution our assumption of a well-mixed population attributes all of the systematic change in the viral population to the effects of selection. If some of this change arises through stochastic differences in emissions from a structured population, the influence of selection would be less than our inference. Thus, our estimate of the gain in fitness through within-host adaptation is a high estimate. As our high estimate of within-host fitness gain is less than a low estimate of the fitness lost through mutational load, our result is robust to our assumption.

      The authors construct a model to estimate viral fitness and suggest that viral fitness decreased with the drug. This is somewhat problematic to me as viral load has not changed so it would be reasonable to say that viral fitness was likely unaffected by the drug. The authors define fitness in terms of the number of mutations that each virus likely has and assumes that these mutations are deleterious. The authors then use this to claim that mutagenic drugs reduce fitness. This seems very circular to me. If the drugs reduce fitness, it should be observed as a property of the virus population. As the only measure was viral load, which didn't change, it is difficult to claim ribavirin reduced viral fitness. There are other reasons why there could be an increase in the number of mutations e.g. sequencing more subpopulations which would have nothing to do with fitness.

      We have discussed our assumption that variants in the viral population are deleterious; this lies behind the use of a model of mutation-selection balance. Under this assumption, the accumulation of a greater number of mutations following ribavirin treatment is indicative of a loss of viral fitness, although we cannot precisely quantify the magnitude of this loss. The link between an increased mutation rate and lower viral fitness is intrinsic to the concept of mutagenic drugs; our data show an increase in mutational load coincident with the therapeutic use of ribavirin.

      A change in viral fitness does not necessarily lead to a substantial and clearly observable drop in viral load; we say more about this in the response to comments below.

      At various points, the paper assumes that there is no selection taking place but immunoglobulin was being applied weekly and palivizumab monthly. The timing of when these drugs were given should be included. How did the palivizumab affect selection? The K272E mutation seems to go up and down but it is not clear if this was in response to drug infusion timing or if this mutation was present in a subpopulation.

      Our approach assumes that selection could act at two distinct levels: Firstly, we assume that the observed increase in mutational load correlates to a reduction in viral fitness; the link between viral fitness and mutational load is intrinsic to the equation of Haldane. Secondly we use a haplotype-based model to infer how selection is acting on the level of higherfrequency mutations; under the assumption of a well-mixed model we identify a signal of within-host adaptation.

      We have added details of the timing of palivizumab treatment to Figure 1. Immunoglobulin was given throughout; details of treatment have been given in Supporting Data. As we have now clarified in the Methods, our identification of potentially selected alleles was a two stage process, with the first assessing the level of noise in the data. Our model of noise envisages nonuniformity arising from multiple sources, including a situation whereby the viral population was divided in subpopulations, and in which reads comprised stochastic samples from these subpopulations. Given our model for noise, the observation of the K272E mutation at generally higher frequencies in earlier samples and generally lower frequencies in later samples was sufficient to call this as a potentially selected variant. We did not explore more complex models of drug-dependent selection.

      I think the main impact of the paper will be that favipiravir will not be used in the future to treat RSV. Given that the EC50 of favipiravir against RSC is ~100x that of influenza, favipiravir was unlikely to reach a therapeutic level in the patient. Nucleoside analogues have a mixed record at treating serious viral infections. Hopefully, this work will spur on future studies to precisely measure the effect that ribavirin has on RSV.

      Favipiravir was used in this patient following its successful experimental use against a case of influenza B infection (Lumby et al., 2020). We would be happy if our work inspires future research in this area.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript explores how biliary epithelial cells respond to excess dietary lipids, an important area of research given the increasing prevalence of NAFLD. The authors utilize in vivo models complemented with cultured organoid systems. Interesting, E2F transcription factors appear important for BEC glycolytic activation and proliferation.

      We thank this reviewer for his/her comments and for finding the E2F-mediated mechanism of interest.

      Much of the work utilizes the BEC-organoid model, which is complicated by the fact that liver cell organoid models often fail to maintain exclusive cell identity in culture. The method used by the authors (Broutier et al., 2016) can lead to organoids with a mixture of ductal and hepatocyte markers. It would be helpful for the authors to further demonstrate the cholangiocyte identity of the organoid cells.

      We understand the concern of this reviewer. Indeed, this method can give rise to biliary cells or more hepatocyte-like cells. However, this choice depends on the culture media used. Our experiments used BEC-organoids in an undifferentiated state with a biliary expression profile. Please see point 1 above for a detailed answer.

      The authors suggest that BECs form lipid droplets in vivo by detecting BODIPY immunofluorescence of liver cryosections. While confocal microscopy would ensure that the BODIPY fluorescence signal is within the same plane as the cell of interest, the authors use a virtual slide microscope that cannot exclude fluorescence from a different focal plane. The conclusion that BECs accumulate lipids does not seem to be fully supported by this analysis.

      We fully agree with this criticism. To address this concern, we decided to use FACS analysis, a quantitative and independent method, to further confirm our initial findings. To this end, we stained sorted EPCAM+ BECs isolated from livers of CD- or HFD-fed mice with BODIPY, quantified the number of BODIPY+/EPCAM+ BECs in each experimental condition, and confirmed that these cells accumulate more lipids after HFD feeding (New Figure 1I, page 5, lines 112-115, and see also reply rebuttal to point 4).

      Several mouse experiments rely heavily on rare BEC proliferation events with the median proliferation event per bile duct being 0-1 cell. While the proliferative effect appears consistent across experiments, a more quantitative approach, such as performing Epcam+ BEC FACS and flow cytometry-based cell cycle analyses, would be helpful.

      Following this suggestion, we quantified proliferative EdU+ BEC cells by FACS in a new cohort of C57BL/6J mice fed CD or HFD. These data, now included in the revised manuscript (New Figure 2G, page 7, lines 143-147), strongly confirm that immunofluorescence quantification mirrors the FACS quantification and reinforce the initial finding that EPCAM+ BECs proliferate more in the livers of HFD-fed mice. Please see point 6 above for a detailed answer.

      Finally, it is not yet clear how relevant the findings in this study are to ductular reaction, which is a non-specific histopathologic indicator of liver injury in the context of severe liver disease. In NAFLD, the ductular reaction is uncommon in benign steatosis, and if seen at all, occurs in the setting of substantial liver inflammation and fibrosis (Gadd et al., Hepatology 2014). The authors use a dietary model containing 60 kcal% fat, which causes adipose lipid accumulation as well as subsequent liver lipid accumulation. This diet does not cause overt inflammation or fibrosis that would represent experimental NASH, which typically requires the addition of cholesterol in dietary lipid NASH models (Farrell et al., Hepatology, 2019). While the E2F-driven proliferation may be important for physiologic bile duct function in the setting of obesity, the claim that E2Fs mediate DR initiation would require an additional pathophysiologic model or human data to demonstrate relevance. The authors could clarify this point in their discussion.

      We agree with this reviewer that 15 weeks of HFD on C57BL/6J feeding are insufficient to trigger a ductular reaction. For this purpose, we used the term “BEC activation” in our manuscript, which refers to the first mandatory step for the ductular reaction to initiate. We apologize if our initial manuscript did not sufficiently emphasize this point. However, as suggested by the reviewer we investigated the ductular reaction in our model. In order to further characterize the livers after 15 weeks of CD or HFD feeding, we stained the bile ducts for pancytokeratin (PANCK) and osteopontin (OPN) and asked a pathologist (Dr. Christine Gopfert at EPFL) to evaluate these sections with a particular focus on the bile ducts. She concluded that the livers of HFD-fed mice showed steatosis and inflammation but no apparent fibrosis (New Figure 1 – figure supplement 1E). The shape of bile ducts was similar in the livers of CD- and HFD-fed mice (New Figure 1 – figure supplement 1I), concomitant with the absence of portal fibrosis and inflammation. In addition, we checked the expression levels of several established markers of ductular reaction in our RNA sequencing data and observed that, of all these genes, only Ncam1 was significantly upregulated with HFD feeding in EPCAM+-BEC cells (New Figure 2 – figure supplements 1D and 1E, Page 6, lines 127-131). Overall, these data support our conclusion that HFD triggers BEC activation without signs of an established ductular reaction and might suggest Ncam1 as a marker for this initial BEC activation process. Please see point 3 above for a detailed answer.

      Reviewer #2 (Public Review):

      The manuscript by Yildiz et al investigates the early response of BECs to high fatty acid treatment. To achieve this, they employ organoids derived from primary isolated BECs and treat them with a FA mix followed by viability studies and analysis of selected lipid metabolism genes, which are upregulated indicating an adjustment to lipid overload. Both organoids with lipid overload and BECs in mice exposed to a HFD show increased BEC proliferation, indicating BEC activation as seen in DR. Applying bulk RNA-sequencing analysis to sorted BECs from HFD mice identified four E2F transcription factors and target genes as upregulated. Functional analysis of knock-out mice showed a clear requirement for E2F1 in mediating HFD induced BEC proliferation. Given the known function of E2Fs the authors performed cell respiration and transcriptome analysis of organoids challenged with FA treatment and found a shift of BECs towards a glycolytic metabolism. The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures.

      We appreciate that this reviewer finds our study well-constructed, clear, and with high-quality figures.

      My major point is the lack of classification of the progression of DR, since the authors investigate the early stages of DR associated with lipid overload reminiscent of stages preceding late NAFLD fibrosis. How are early stages distinguished from later stages in this study? Molecularly and/or morphologically? While the presented data are very suggestive, a more substantial description would support the findings and resulting claims.

      We thank the reviewer for the suggestion. We would like to emphasize that instead of ductular reaction, we used the term “BEC activation” in our revised manuscript, referring to the first mandatory step for initiating the ductular reaction. Both reviewers criticized the poor characterization of the ductular reaction process in the first version of our study; we put substantial effort into further clarifying this point. Our response to this point can be read in our reply to the last comment of reviewer 1 and point 3 of the rebuttal.

    1. Author Response

      Reviewer #1 (Public Review):

      It is now widely accepted that the age of the brain can differ from the person's chronological age and neuroimaging methods are ideally suited to analyze the brain age and associated biomarkers. Preclinical studies of rodent models with appropriate neuroimaging do attest that lifestyle-related prevention approaches may help to slow down brain aging and the potential of BrainAGE as a predictor of age-related health outcomes. However, there is a paucity of data on this in humans. It is in this context the present manuscript receives its due attention.

      Comments:

      1) Lifestyle intervention benefits need to be analyzed using robust biomarkers which should be profiled non-invasively in a clinical setting. There is increasing evidence of the role of telomere length in brain aging. Gampawar et al (2020) have proposed a hypothesis on the effect of telomeres on brain structure and function over the life span and named it as the "Telomere Brain Axis". In this context, if the authors could measure telomere length before and after lifestyle intervention, this will give a strong biomarker utility and value addition for the lifestyle modification benefits. 2) Authors should also consider measuring BDNF levels before and after lifestyle intervention.

      Response to comments 1+2: we agree that associating both telomere length and BDNF level with brain age would be interesting and relevant. However, we did not measure these two variables. We would certainly consider adding these in future work. Regarding telomere length, we now include a short discussion of brain age in relation to other bodily ages, such as telomere length (Discussion section):

      “Studying changes in functional brain aging is part of a broader field that examines changes in various biological ages, such as telomere length1, DNA methylation2, and arterial stiffness3. Evaluating changes in these bodily systems over time allows us to capture health and lifestyle-related factors that affect overall aging and may guide the development of targeted interventions to reduce age-related decline. For example, in the CENTRAL cohort, we recently reported that reducing body weight and intrahepatic fat following a lifestyle intervention was related to methylation age attenuation4. In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among dataset6 and larger training sample size5 may improve the accuracy of such models in the future. We also suggest that examining the dynamics of multiple bodily ages and their interactions would enhance our understanding of the complex aging process8,9. “

      And

      “These findings complement the growing interest in bodily aging indicated, for example, by DNA methylation4 as health biomarkers and interventions that may affect them.”

      Reviewer #2 (Public Review):

      In this study, Levakov et al. investigated brain age based on resting-state functional connectivity (RSFC) in a group of obese participants following an 18-month lifestyle intervention. The study benefits from various sophisticated measurements of overall health, including body MRI and blood biomarkers. Although the data is leveraged from a solid randomized control set-up, the lack of control groups in the current study means that the results cannot be attributed to the lifestyle intervention with certainty. However, the study does show a relationship between general weight loss and RSFC-based brain age estimations over the course of the intervention. While this may represent an important contribution to the literature, the RSFC-based brain age prediction shows low model performance, making it difficult to interpret the validity of the derived estimates and the scale of change. The study would benefit from more rigorous analyses and a more critical discussion of findings. If incorporated, the study contributes to the growing field of literature indicating that weight-reduction in obese subjects may attenuate the detrimental effect of obesity on the brain.

      The following points may be addressed to improve the study:

      Brain age / model performance:

      1) Figure 2: In the test set, the correlation between true and predicted age is 0.244. The fitted slope looks like it would be approximately 0.11 (55-50)/(80-35); change in y divided by change in x. This means that for a chronological age change of 12 months, the brain age changes by 0.11*12 = 1.3 months. I.e., due to the relatively poor model performance, an 80-year-old participant in the plot (fig 2) has a predicted age of ~55. Hence, although the age prediction step can generate a summary score for all the RSFC data, it can be difficult to interpret the meaning of these brain age estimates and the 'expected change' since the scale is in years.

      2) In Figure 2 it could also help to add the x = y line to get a better overview of the prediction variance. The estimates are likely clustered around the mean/median age of the training dataset, and age is overestimated in younger subs and overestimated in older subs (usually referred to as "age bias"). It is important to inspect the data points here to understand what the estimates represent, i.e., is variation in RSFC potentially lost by wrapping the data in this summary measure, since the age prediction is not particularly accurate, and should age bias in the predictions be accounted for by adjusting the test data for the bias observed in the training data?

      Response to comment 1+2: we agree with the reviewer that due to the relatively moderate correlation between the predicted and observed age, a large change in the observed age corresponds to a small change in the predicted age. We now state this limitation in Results section 2.1:

      “Despite being significant and reproducible, we note that the correlations between the observed and predicted age were relatively moderate.”

      And discuss this point in the Discussion section:

      “In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among dataset6 and larger training sample size5 may improve the accuracy of such models in the future.”

      Moreover, , we now add the x=y line to Fig. 2, so the readers can better assess the prediction variance as suggested by the reviewer:

      We prefer to avoid using different scales (year/month) in the x and y axes to avoid misleading the readers, but the list of observed and predicted ages are available as SI files with a precision of 2 decimals point (~3 days).

      We note that despite the moderate precision accuracy, we replicated these results in three separate cohorts.

      Regarding the effect of “age bias” (also known as “regression attenuation” or “regression dilution” 10), we are aware of this phenomenon and agree that it must be accounted for. In fact, the “age bias” is one of the reasons we chose to use the difference between the expected and observed ages as the primary outcome of the study, as this measure already takes this bias into account. To demonstrate this effect we now compute brain age attenuation in two ways: 1. As described and used in the current study (Methods 4.9); and 2. By regressing out the effect of age on the predicted brain age at both times separately, then subtracting the adjusted predicted age at T18 from the adjusted predicted age at T0. The second method is the standard method to account for age bias as described in a previous work 11. Below is a scatter plot of both measures across all participants:

      The x-axis represents the first method, used in the current study, and the y-axis represents the second method, described in Smith et al., (2019). Across all subjects, we found a nearly perfect 1:1 correspondence between the two methods (r=.998, p<0.001; MAE=0.45), as the two are mathematically identical. The small gap between the two is because the brain age attenuation model also takes into account the difference in the exact time that passed between the two scans for each participant (mean=21.36m, std = 1.68m).

      We now note this in Methods section 4.9:

      “We note that the result of computing the difference between the bias-corrected brain age gap at both times was nearly identical to the brain age attenuation measure (r=.99, p<0.001; MAE=0.45). The difference between the two is because the brain age attenuation model takes into account the difference in the exact time that passed between the two scans for each participant (mean=21.36m, std = 1.68m).”

      3) In Figure 3, some of the changes observed between time points are very large. For example, one subject with a chronological age of 62 shows a ten-year increase in brain age over 18 months. This change is twice as large as the full range of age variation in the brain age estimates (average brain age increases from 50 to 55 across the full chronological age span). This makes it difficult to interpret RSFC change in units of brain age. E.g., is it reasonable that a person's brain ages by ten years, either up or down, in 18 months? The colour scale goes from -12 years to 14 years, so some of the observed changes are 14 / 1.5 = 9 times larger than the actual time from baseline to follow-up.

      We agree that our model precision was relatively low, especially compared to the period of the intervention, as also stated by reviewer #1. We now discuss this issue in light of the studies pointed out by the reviewer (Discussion section):

      “In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among datasets6 and larger training sample size5 may improve the accuracy of such models in the future.”

      Again, we note that despite the moderate precision accuracy, we replicated these results in three separate cohorts and found that both the correlation and the MAE between the predicted and observed age were significant in all of them.

      RSFC for age prediction:

      1) Several studies show better age prediction accuracy with structural MRI features compared to RSFC. If the focus of the study is to use an accurate estimate of brain ageing rather than specifically looking at changes in RSFC, adding structural MRI data could be helpful.

      We focused on brain structural changes in a previous work, and the focus of the current work was assessing age-related functional connectivity alterations. We now added a few sentences in the Introduction section that would hopefully better motivate our choice:

      “We previously found that weight loss, glycemic control, lowering of blood pressure, and increment in polyphenols-rich food were associated with an attenuation in brain atrophy 12. Obesity is also manifested in age-related changes in the brain’s functional organization as assessed with resting-state functional connectivity (RSFC). These changes are dynamic13 and can be observed in short time scales14 and thus of relevance when studying lifestyle intervention.”

      2) If changes in RSFC are the main focus, using brain age adds a complicated layer that is not necessarily helpful. It could be easier to simply assess RSFC change from baseline to follow up, and correlate potential changes with changes in e.g., BMI.

      We are specifically interested in age-related changes as we described a-priori in the registration of the study: https://clinicaltrials.gov/ct2/show/NCT03020186

      Moreover, age-related changes in RSFC are complex, multivariate and dependent upon the choice of theoretical network measures. We think that a data-driven brain age prediction approach might better capture these multifaceted changes and their relation to aging. We now state this in the Introduction section:

      “Studies have linked obesity with decreased connectivity within the default mode network15,16 and increased connectivity with the lateral orbitofrontal cortex17, which are also seen in normal aging18,19. Longitudinal trials have reported changes in these connectivity patterns following weight reduction20,21, indicating that they can be altered. However, findings regarding functional changes are less consistent than those related to anatomical changes due to the multiple measures22 and scales23 used to quantify RSFC. Hence, focusing on a single measure, the functional brain age, may better capture these complex, multivariant changes and their relation to aging. “

      The lack of control groups

      1) If no control group data is available, it is important to clarify this in the manuscript, and evaluate which conclusions can and cannot be drawn based on the data and study design.

      We agree that this point should be made more clear, and we now state this in the limitation section of the Discussion:

      “We also note that the lack of a no-intervention control group limits our ability to directly relate our findings to the intervention. Hence, we can only relate brain age attenuation to the observed changes in health biomarkers.”

      Also, following reviewers’ #2 and #3 comments, we refer to the weight loss following 18 months of lifestyle intervention instead of to the intervention itself. This is now made clear in the title, abstract, and the main text.

      Reviewer #3 (Public Review):

      The authors report on an interesting study that addresses the effects of a physical and dietary intervention on accelerated/decelerated brain ageing in obese individuals. More specifically, the authors examined potential associations between reductions in Body-Mass-Index (BMI) and a decrease in relative brain-predicted age after an 18-months period in N = 102 individuals. Brain age models were based on resting-state functional connectivity data. In addition to change in BMI, the authors also tested for associations between change in relative brain age and change in waist circumference, six liver markers, three glycemic markers, four lipid markers, and four MRI fat deposition measures. Moreover, change in self-reported consumption of food, stratified by categories such as 'processed food' and 'sweets and beverages', was tested for an association with change in relative brain age. Their analysis revealed no evidence for a general reduction in relative brain age in the tested sample. However, changes in BMI, as well as changes in several liver, glycemic, lipid, and fat-deposition markers showed significant covariation with changes in relative brain age. Three markers remained significant after additionally controlling for BMI, indicating an incremental contribution of these markers to change in relative brain age. Further associations were found for variables of subjective food consumption. The authors conclude that lifestyle interventions may have beneficial effects on brain aging.

      Overall, the writing is concise and straightforward, and the langue and style are appropriate. A strength of the study is the longitudinal design that allows for addressing individual accelerations or decelerations in brain aging. Research on biological aging parameters has often been limited to cross-sectional analyses so inferences about intra-individual variation have frequently been drawn from inter-individual variation. The presented study allows, in fact, investigating within-person differences. Moreover, I very much appreciate that the authors seek to publish their code and materials online, although the respective GitHub project page did not appear to be set to 'public' at the time (error 404). Another strength of the study is that brain age models have been trained and validated in external samples. One further strength of this study is that it is based on a registered trial, which allows for the evaluation of the aims and motivation of the investigators and provides further insights into the primary and secondary outcomes measures (see the clinical trial identification code).

      One weakness of the study is that no comparison between the active control group and the two experimental groups has been carried out, which would have enabled causal inferences on the potential effects of different types of interventions on changes in relative brain age. In this regard, it should also be noted that all groups underwent a lifestyle intervention. Hence, from an experimenter's perspective, it is problematic to conclude that lifestyle interventions may modulate brain age, given the lack of a control group without lifestyle intervention. This issue is fueled by the study title, which suggests a strong focus on the effects of lifestyle intervention. Technically, however, this study rather constitutes an investigation of the effects of successful weight loss/body fat reduction on brain age among participants who have taken part in a lifestyle intervention. In keeping with this, the provided information on the main effect of time on brain age is scarce, essentially limited to a sign test comparing the proportions of participants with an increase vs. decrease in relative brain age. Interestingly, this analysis did not suggest that the proportion of participants who benefit from the intervention (regarding brain age) significantly exceeds the number of participants who do not benefit. So strictly speaking, the data rather indicates that it's not the lifestyle intervention per sé that contributes to changes in brain age, but successful weight loss/body fat reduction. In sum, I feel that the authors' claims on the effects of the intervention cannot be underscored very well given the lack of a control group without lifestyle intervention.

      We agree that this point, also raised by reviewer #2, should be made clear, and we now state this in the limitation section of the Discussion:

      “We also note that the lack of a no-intervention control group limits our ability to directly relate our findings to the intervention. Hence, we can only relate brain age attenuation to the observed changes in health biomarkers.”

      Also, following reviewers #2 and #3, we refer to the weight loss following 18 months of lifestyle intervention instead of to the intervention itself. This is now explicitly mentioned in the title, abstract, and within the text:

      Title: “The effect of weight loss following 18 months of lifestyle intervention on brain age assessed with resting-state functional connectivity”

      Abstract: “…, we tested the effect of weight loss following 18 months of lifestyle intervention on predicted brain age, based on MRI-assessed resting-state functional connectivity (RSFC).”

      Another major weakness is that no rationale is provided for why the authors use functional connectivity data instead of structural scans for their age estimation models. This gets even more evident in view of the relatively low prediction accuracies achieved in both the validation and test sets. My notion of the literature is that the vast majority of studies in this field implicate brain age models that were trained on structural MRI data, and these models have achieved way higher prediction accuracies. Along with the missing rationale, I feel that the low model performances require some more elaboration in the discussion section. To be clear, low prediction accuracies may be seen as a study result and, as such, they should not be considered as a quality criterion of the study. Nevertheless, the choice of functional MRI data and the relevance of the achieved model performances for subsequent association analysis needs to be addressed more thoroughly.

      We agree that age estimation from structural compared to functional imaging yields a higher prediction accuracy. In a previous publication using the same dataset12, we demonstrated that weight loss was associated with an attenuation in brain atrophy, as we describe in the introduction:

      “We previously found that weight loss, glycemic control and lowering of blood pressure, as well as increment in polyphenols rich food, were associated with an attenuation in brain atrophy 12.”

      Here we were specifically interested in age-related functional alterations that are associated with successful weight reduction. Compared to structural brain changes aging effect on functional connectivity is more complex and multifaced. Hence, we decided to utilize a data-driven or prediction-driven approach for assessing age-related changes in functional connectivity by predicting participants’ functional brain age. We now describe this rationale in the introduction section:

      “Studies have linked obesity with decreased connectivity within the default mode network15,16 and increased connectivity with the lateral orbitofrontal cortex17, which are also seen in normal aging18,19. Longitudinal trials have reported changes in these connectivity patterns following weight reduction20,21, indicating that they can be altered. However, findings regarding functional changes are less consistent than those related to anatomical changes due to the multiple measures22 and scales23 used to quantify RSFC. Hence, focusing on a single measure, the functional brain age, may better capture these complex changes and their relation to aging.”

      We address the point regarding the low model performance in response to reviewer #2, comment #2.

    1. Author Response

      Reviewer #1 (Public Review):

      IRF8 is a key transcription factor in the differentiation of hematopoietic cell lineages including dendritic cell (DC) and monocyte/macrophage lineages. The promoter and enhancer regions of Irf8 have been a focus of intense research in recent times. In the submitted study Xu H. et. Al., have first time reported a lncRNA transcribed specifically in the pDC subtype from +32Kb which is also the region for the enhancer for Irf8 specifically in the cDC1 subtype. Authors have employed modern-day tools for an in-depth understanding of the role of lncIrf8, its promoter region, and crosstalk with Irf8 promoter to identify that it is not the lncIRF8 itself but its promoter region is crucial for pDC and cDC1 differentiation conferring feedback inhibition of Irf8 transcription. In the attempt to decipher the crosstalk between the promoter regions of IRF8 and lncIRF8 by employing various in vitro artificial systems, the study falls short of identifying the real significance of the lncIRF8 which is specifically expressed in pDC subtype.

      We appreciate the public review made by the reviewer. We agree with the reviewer that most of the experiments on the identification of the negative feedback regulation of IRF8 via the lncIRF8 promoter element were carried out in vitro. But we would like to point out also our in vivo work: (i) transplantation lncIRF8 promoter KO cells into mice demonstrates that pDC and cDC1 development were compromised (Figure 3); (ii) lncIRF8 is expressed in in vivo BM and spleen pDC (new Figure 1-figure supplement 3). We also would like to emphasize that (i) in vivo studies on the identification of the negative feedback regulation of IRF8 via the lncIRF8 promoter element and (ii) mechanistic studies with CRISPR activation and CRISPR interference would have been difficult to perform in vivo with current tools available in mice.

      According to our current understand lncIRF8 act as an indicator of +32 kb enhancer activity and we agree with the reviewer that further potential functions of lncIRF8 still need to be explored. We added a sentence on page 13, lines 427 and 428 on potential additional functions of lncIRF8:

      "However, lncIRF8 might have additional functions in DC biology, which are not revealed in the current study and remain to be identified."

      Reviewer #2 (Public Review):

      The manuscript of Xu and colleagues examines in detail the regulation of the important transcription factor IRF8 in dendritic cell (DC) subsets. They identify a long noncoding RNA arises from the +32kb enhancer of IRF8 specifically in plasmacytoid DCs (pDCs)and show clearly that this lncIRF8 marks the activity of a region of this enhancer but the RNA itself does not appear to have any function. Deletion of the promoter of the lncIRF8 ablated cDC1 and pDC differentiation using an in vitro cell differentiation model. The authors propose an innovative model that the lncIRF8 promoter sequences act to limit IRF8 expression in cDC1, but are inactive in pDCs, resulting in their characteristically very high IRF8 expression.

      This is a conceptually interesting study that makes excellent use of an extensive set of genomic data for the DC subsets. There has been a lot of recent research investigating the regulation of the IRF8 gene in hematopoiesis and this study provides an important new aspect to the work. The use of an in vitro model of DC differentiation is a powerful practical approach to investigating IRF8 regulation, as is the innovative use of CRISPR technology. Perhaps the biggest limitation of this study is that the authors have not conformed to the in-cell system data by creating a mouse strain lacking the lncIRF8 element. Such approaches by others, most notably the Murphy lab, have been instrumental in pushing this field forward. Nevertheless, Xu et al. significantly add to our current knowledge of the regulation of IRF8, a critical step in forming the dendritic cell network.

      We appreciate the public review made by the reviewer and the positive assessment of our work. We agree with the review that extending our in-cell system data to lncIRF8 promoter KO mice will further strengthen our data and this will be subject of our future work.

    1. Author Response

      Reviewer #1 (Public Review):

      Using health insurance claims data (from 8M subjects), a retrospective propensity score matched cohort study was performed (450K in both groups) to quantify associations between bisphosphonate (BP) use and COVID- 19 related outcomes (COVID-19 diagnosis, testing and COVID-19 hospitalization. The observation periods were 1-1-2019 till 2-29-2020 for BP use and from 3-1-2020 and 6-30-2020 for the COVID endpoints. In primary and sensitivity analyses BP use was consistently associated with lower odds for COVID-19, testing and COVID-19 hospitalization.

      The major strength of this study is the size of the study population, allowing a propensity-based matched- cohort study with 450K in both groups, with a sizeable number of COVID-19 related endpoints. Health insurance claims data were used with the intrinsic risk of some misclassification for exposure. In addition there probably is misclassification of endpoints as testing for COVID-19 was limited during the study period. Furthermore, the retrospective nature of the study includes the risk of residual confounding, which has been addressed - to some extent - by sensitivity analyses.

      In all analyses there is a consistent finding that BP exposure is associated with reduced odds for COVID-19 related outcomes. The effect size is large, with high precision.

      The authors extensively discuss the (many) potential limitations inherent to the study design and conclude that these findings warrant confirmation, preferably in intervention studies. If confirmed BP use could be a powerful adjunct in the prevention of infection and hospitalization due to COVID-19.

      We thank the reviewer for this overall very positive feedback. We appreciate the reviewer's comments regarding the potential risks associated with misclassification of exposure and other potential limitations, which we have sought to address in a number of sensitivity analyses and are also addressing in the discussion of our paper. In addition, as noted by the reviewer, the observed effect size of BP use on COVID-19 related outcomes is large, with high precision, which we feel is a strong argument to explore this class of drugs in further prospective studies.

      Reviewer #2 (Public Review):

      The authors performed a retrospective cohort study using claims data to assess the causal relationship between bisphosphonate (BP) use and COVID-19 outcomes. They used propensity score matching to adjust for measured confounders. This is an interesting study and the authors performed several sensitivity analyses to assess the robustness of their findings. The authors are properly cautious in the interpretation of their results and justly call for randomized controlled trials to confirm a causal relationship. However, there are some methodological limitations that are not properly addressed yet.

      Strengths of the paper include:

      (A) Availability of a large dataset.

      (B) Using propensity score matching to adjust for confounding.

      (C) Sensitivity analyses to challenge key assumptions (although not all of them add value in my opinion, see specific comments)

      (D) Cautious interpretation of results, the authors are aware of the limitations of the study design.

      Limitation of the paper are:

      (A) This is an observational study using register data. Therefore, the study is prone to residual confounding and information bias. The authors are well aware of that.

      (B) The authors adjusted for Carlson comorbidity index whereas they had individual comorbidity data available and a dataset large enough to adjust for each comorbidity separately.

      (C) The primary analysis violates the positivity assumption (a substantial part of the population had no indication for bisphosphonates; see specific comments). I feel that one of the sensitivity analyses 1 or 2 would be more suited for a primary analysis.

      (D) Some of the other sensitivity analyses have underlying assumptions that are not discussed and do not necessarily hold (see specific comments).

      In its current form the limitations hinder a good interpretation of the results and, therefore, in my opinion do not support the conclusion of the paper.

      The finding of a substantial risk reduction of (severe) COVID-19 in bisphosphonate users compared to non- users in this observational study may be of interest to other researchers considering to set up randomized controlled trials for evaluation of repurpose drugs for prevention of (severe) COVID-19.

      We thank the reviewer for the insightful comments and questions related to our manuscript. Our response to the concerns regarding limitations of our study is as follows:

      (A) We agree that there is likely residual confounding and information bias due to use of US health insurance claims datasets which do not include information on certain potentially relevant variables. Nonetheless, given the large effect size and precision of our analysis, we feel that our findings support our main conclusion that additional prospective trials appear warranted to further explore whether BPs might confer a meaure of protection against severe respiratory infections, including COVID-19. We have added a sentence on the second page of our Discussion (line 859-860) to emphasize this point: "Specifically, there is the potential that key patient characteristics impacting outcomes could not be derived from claims data."

      (B) The progression of this study mirrors the real-world performance of the analysis where we initially used the CCI in matching to control for comorbidity burden on a broader scale. This was our a priori approach. After observing large effect sizes, we performed more stringent matching for sensitivity analyses 1 and 2. Irrespective of the matching strategy chosen, effect sizes remained similar for all outcome parameters. Therefore, we elected to include both the primary analysis and the sensitivity analyses with more stringent matching in order to more transparently show what was done in entirety during our analyses, as we feel it displays all of the efforts taken to identify sources of unmeasured confounding which could have impacted our results.

      (C) We agree that the positivity assumption is a key factor to consider when building comparable treatment cohorts. We also agree that it is the important to separately perform the analysis for either all patients with an indication for use of BPs and for other anti-osteoporosis medications, as we have done in our analysis of the Osteo-Dx-Rx cohort and Bone-Rx cohort, respectively. However, we did not have sufficient data, a priori, to determine whether BP users would be more similar in their risk of COVID-19 outcomes to non- users or to other users of anti-resorptive medications. In addition, we believe that this specific limitation does not negate our findings in the primary analysis for the following reasons: (1) ‘Type of Outcome’: the outcomes in this study are related to infectious disease and are not direct clinical outcomes of any known treatment benefits of BPs. The clinical benefits being assessed - impact of BP use on COVID-19-related outcomes - were essentially unknown at the time of the study data; this fact mitigates the impact of any violation of the positivity assumption; and (2) ‘Clinical Population’: after propensity score matching, both the BP user and the BP non-user group in the primary analysis mainly consisted of older females (90.1% female, 97.2% age>50), which is the main population with clinical indications for BP use. According to NCHS Data Brief No. 93 (April 2012) released by the CDC, ~75% and 95% of US women between 60-69 and 70-79 suffer from either low bone mass or osteoporosis, respectively, and essentially all women (and 70% of men) above age 80 suffer from these conditions, which often go undiagnosed (https://www.cdc.gov/nchs/data/databriefs/db93.pdf). Women aged 60 and older make up ~75% of our study population (Table 1). Although bone density measurements are not available for non- BP users in the matched primary cohort, there is a high probability that the incidence of osteoporosis and/or low bone mass in these patients was similar to the national average. This justifies the assumption that BP therapy was indicated for most non-BP users in the matched primary cohort. Arguably, for these patients the positivity assumption was not violated.

      (D) We will discuss in detail below the specific issues raised by the reviewer regarding our sensitivity analyses. In general we acknowledge that individual analytical and/or matching approaches may each have their own limitations, but the analyses performed herein were done to test in a systematic fashion the different critical threats to the validity of our initial results in the primary cohort analysis, which were based on a priori-defined methods and yielded a large and robust effect size. Thus, the individual sensitivity analyses should be considered in the greater context of the entire project.

      Specific comments (in order of manuscript):

      Methods:

      Line 158: it is unclear how the authors dealt with patients who died during the follow-up period. The wording suggests they were excluded which would be inappropriate.

      When this study was executed, we were unable to link the patient-level US insurance claims data with patient-level mortality data due to HIPAA concerns. Therefore, line 158 (now 177) defines continuous insurance coverage during the observation period as a verifiable eligibility criterion we used for patient inclusion. It was necessary to disqualify individuals who discontinued insurance coverage for a variety of reasons, e.g. due to loss or change of coverage, relocation etc., but our approach also eliminated patients who died. Appendix 3 (line 2449ff) describes methods we employed post hoc to assess how censoring due to death could have impacted our analyses. We discuss our conclusions from this post hoc analysis in the main text (lines 1053-1058) as follows: "An additional limitation is potential censoring of patients who died during the observation period, resulting in truncated insurance eligibility and exclusion based on the continuous insurance eligibility requirement. However, modelling the impact of censoring by using death rates observed in BP users and non-users in the first six months of 2020 and attributing all deaths as COVID-19-related did not significantly alter the decreased odds of COVID-19 diagnosis in BP users (see Appendix 3)."

      Why did the authors use CCI for propensity matching rather than the individual comorbid conditions? I presume using separate variables will improve the comparability of the cohorts. The authors discuss imbalances in comorbidities as a limitation but should rather have avoided this.

      CCI was the a priori approach defined at the study outset and was chosen due to the widespread use and understanding of this score. The general CCI score was originally planned for matching in order to have the largest possible study population since we did not know how many patients would meet all criteria as well as have an event of interest. After realizing we had adequate sample size to power matching using stricter criteria, we proceeded to perform subsequent sensitivity analyses on more stringently matched cohorts (sensitivity analysis 2).

      Line 301-10: it seems unnecesary to me to adjust for the given covariates while these were already used for propensity score matching (except comorbidities, but see previous comment). The manuscript doesn't give a rationale why did the authors choose for this 'double correction'.

      The following language was added to the methods section (lines 325-327): “Demographic characteristics used in the matching procedure were also included in the final outcome regressions to control for the impact of those characteristics on outcomes modelled.”

      The following language was added to the Discussion section regarding the potential limitations of our srudy (lines 1078-1085): “Another limitation in the current study is related to a potential ‘double correction’ of patient characteristics that were included in both the propensity score matching procedure as well as the outcome regression modelling, which could lead to overfitting of the regression models and an overestimation of the measured treatment effect. Covariates were included in the regression models since these characteristics could have differential impacts on the outcomes themselves, and our results show that the adjusted ORs were in fact larger (showing a decreased effect size) when compared to the unadjusted ORs, which show the difference in effect sizes of the matched populations alone.”

      In causal research a very important assumption is the 'positivity assumption', which means that none of the individuals has a probability of zero or one to be exposed. Including everyone would therefore not be appropriate. My suggestion is to include either all patients with an indication (based on diagnosis) or all that use an anti-osteoporosis (AOP) drug (or one as the primary and the other as the sensitivity analysis) instead of using these cohorts as sensitivity analyses. The choice should in my opinion be based on two aspects: whether it is likely that other AOP drugs have an effect on the COVID-19 outcomes and whether BP users are deemed to be more similar (in their risk of COVID-19 outcomes) to non-users or to other AOP drug users. Or alternatively, the authors might have discussed the positivity assumption and argue why this is not applicable to their primary analysis.

      The following text has been added to the Discussion section addressing potential limitations of our study (lines 987-1009): " Another potential limitation of this study relates to the positivity assumption, which when building comparable treatment cohorts is violated when the comparator population does not have an indication for the exposure being modelled 56. This limitation is present in the primary cohort comparisons between BP users and BP non-users, as well as in the sensitivity analyses involving other preventive medications. This limitation, however, is mitigated by the fact that the outcomes in this study are related to infectious disease and are not direct clinical outcomes of known treatment benefits of BPs. The fact that the clinical benefits being assessed – the impact of BPs on COVID-related outcomes – was essentially unknown clinically at the time of the study data minimizes the impact of violation of the positivity assumption. Furthermore, our sensitivity analyses involving the “Bone-Rx” and “Osteo-Dx- Rx” cohorts did not suffer this potential violation, and the results from those analyses support those from the primary analysis cohort comparisons. Moreover, we note that the propensity score matched BP users and BP non-users in the primary analysis cohort mainly consisted of older females. According to the CDC, ~75% and 95% of US women between 60-69 and 70-79 suffer from either low bone mass or osteoporosis, respectively (https://www.cdc.gov/nchs/data/databriefs/db93.pdf). Essentially all women (and 70% of men) above age 80 suffer from these conditions, which often go undiagnosed. Women aged 60 and older represent ~75% of our study population (Table 1). Although bone density measurements are not available for non-BP users in the matched primary cohort, there is a high probability that the incidence of osteoporosis and/or low bone mass in these patients was similar to the national average.Thus, BP therapy would have been indicated for most non-BP users in the matched primary cohort, and arguably, for these patients the positivity assumption was not violated."

      Sensitivity Analysis 3: Association of BP-use with Exploratory Negative Control Outcomes: what is the implicit assumption in this analysis? I think the assumption here is that any residual confounding would be of the same magnitude for these outcomes. But that depends on the strength of the association between the confounder and the outcome which needs not be the same. Here, risk avoiding behavior (social distancing) is the most obvious unmeasured confounder, which may not have a strong effect on other health outcomes. Also it is unclear to me why acute cholecystitis and acute pancreatitis-related inpatient/emergency-room were selected as negative controls. Do the authors have convincing evidence that BPs have no effect on these outcomes? Yet, if the authors believe that this is indeed a valid approach to measure residual confounding, I think the authors might have taken a step further and present ORs for BP → COVID-19 outcomes that are corrected for the unmeasured confounding. (e.g. if OR BP → COVID-19 is ~ 0.2 and OR BP → acute cholecystitis is ~ 0.5, then 'corrected' OR of BP → COVID-19 would be ~ 0.4.

      We appreciate the reviewer’s thoughtful comments regarding the differential strength of the association between unmeasured confounders and outcome. We had initially selected acute cholecystitis and pancreatitis-related inpatient and emergency room visits as negative controls because we deemed them to be emergent clinical scenarios that should not be impacted by risk avoiding behavior. However, upon further search, we identified several publications that suggest a potential impact of osteoporosis and/or BPs on gallbladder diseases (DOIhttps://doi.org/10.1186/s12876-014-0192-z; http://dx.doi.org/10.1136/annrheumdis-2017-eular.3900), thus calling the validity our strategy into question. We therefore agree that the designation of negative control outcomes is problematic and adds relatively little to the overall story. Therefore, we have removed these analyses from the revised manuscript.

      Sensitivity Analysis 4: Association of BP-use with Exploratory Positive Control Outcomes: this doesn't help me be convinced of the lack of bias. If previous researchers suffered from residual confounding, the same type of mechanisms apply here. (It might still be valuable to replicate the previous findings, but not as a sensitivity analysis of the current study).

      We agree that the same residual confounding in previous research papers could be present in our study. Nonetheless, it was important to assess whether our analysis would be potentially subject to additional (or different) confounding due to the nature of insurance claims data as compared to the previous electronic record-based studies. Therefore, it was relevant to see if previous findings of an association between BP use and upper respiratory infections are observable in our cohort.

      The second goal of sensitivity analysis #4 (now #3) was to see whether associations could be found on different sets of respiratory infection-based conditions, both during the time of the pandemic/study period as well as during the pre-pandemic time, i.e. before medical care in the US was significantly impacted by the pandemic. In light of these considerations, we feel that sensitivity analysis 4 adds value by showing consistency in our core findings.

      Sensitivity Analysis 5: Association of Other Preventive Drugs with COVID-19-Related Outcomes: Same here as for sensitivity analysis 3: the assumption that the association of unmeasured confounders with other drugs is equally strong as for BPs. Authors should explicitly state the assumptions of the sensitivity analyses and argue why they are reasonable.

      The following sentence was added to the Discussion section (lines 1019-1020): “ "These analyses were based on the assumption that the association of unmeasured confounders with other drugs is comparable in magnitude and quality as for BPs."

      Results: The data are clearly presented. The C-statistic / ROC-AUC of the propensity model is missing.

      Unfortunately, a significant amount of time has passed since execution of our original analysis of the Komodo dataset by our co-authors at Cerner Enviza. To date, our ability to perform follow-up studies with the Komodo dataset (which is exclusively housed on Komodo's secure servers) has become limited because business arrangements between these companies have been terminated, and the pertinent statistical software is no longer active. This issue prevents us from attaining the original C-statistic and ROC-AUC information, however, we were able to extract the actual; propensity scores themselves for the base cohort matching (BP-users versus non-users). The table below illustrates that the distribution of propensity scores for the base cohort match ranged from <0.01 to a max of 0.49, with 81.4% of patients having a propensity score of 10-49%, and 52.9% of patients having a propensity score of 20-49%. This distribution is unlikely to reflect patients who had a propensity score of either all 0 or all 1.

      Discussion:

      When discussing other studies the authors reduce these results to 'did' or 'did not find an association'. Although commonly practiced, it doesn't justify the statistical uncertainty of both positive and negative findings. Instead I encourage the authors to include effect estimates and confidence intervals. This is particularly relevant for studies that are inconclusive (i.e. lower bound of confidence interval not excluding a clinically relevant reduction while upper bound not excluding a NULL-effect).

      We appreciate the reviewer’s suggestion and have added this information on p.21/22 in the Discussion.

      Line 1145 "These retrospective findings strongly suggest that BPs should be considered for prophylactic and/or therapeutic use in individuals at risk of SARS-CoV-2 infection." I agree for prophylactic use but do not see how the study results suggest anything for therapeutic use.

      We have removed “and/or therapeutic use” from this sentence (line 1088-1090).

      The authors should discuss the acceptability of using BPs as preventive treatment (long-term use in persons without osteoporosis or other indication for BPs). This is not my expertise but I reckon there will be little experience with long-term inhibiting osteoblasts in people with healthy bones. The authors should also discuss what prospective study design would be suitable and what sample size would be needed to demonstrate a reasonable reduction. (Say 50% accounting for some residual confounding being present in the current study.)

      Although BPs are also used in pediatric populations and in patients without osteoporosis (for example, patients with malignancy), we do recognize the lack of long-term safety data in use of BPs as preventative treatments. We tried to partially address this concern in our sub-stratified analysis of COVID-19 related outcomes and time of exposure to BP. Reassuringly, we observed that patients newly prescribed alendronic acid in February 2020 also had decreased odds of COVID-19 related outcomes (Figure 3B), suggesting that the duration of BP treatment may not need to be long-term. This was further discussed in the last paragraph of our Discussion where we state that " BP use at the time of infection may not be necessary for protection against COVID-19. Rather, our results suggest that prophylactic BP therapy may be sufficient to achieve a potentially rapid and sustained immune modulation resulting in profound mitigation of the incidence and/or severity of infections by SARS- CoV-2."

      We agree that a future prospective study on the effect of BPs on COVID-19 related outcomes will require careful consideration of the study design, sample size, statistical power etc. However, we feel that a detailed discussion of these considerations is beyond the scope of the present study.

      The authors should discuss the fact that confounders were based on registry data which is prone to misclassification. This can result in residual confounding.

      Some potential sources of misclassification have been discussed on line 932-948. In addition, the following language was added (line 970-985): "Additionally, limitations may be present due to misclassification bias of study outcomes due to the specific procedure/diagnostic codes used as well as the potential for residual confounding occurring for patient characteristics related to study outcomes that are unable to be operationalized in claims data, which would impact all cohort comparisons. For SARS- CoV-2 testing, procedure codes were limited to those testing for active infection, and therefore observations could be missed if they were captured via antibody testing (CPT 86318, 86328). These codes were excluded a priori due to the focus on the symptomatic COVID-19 population. Furthermore, for the COVID-19 diagnosis and hospitalization outcomes, all events were identified using the ICD-10 code for lab-confirmed COVID-19 (U07.1), and therefore events with an associated diagnosis code for suspected COVID-19 (U07.2) were not included. This was done to have a more stringent algorithm when identifying COVID-19-related events, and any impact of events identified using U07.2 is considered minimal, as previous studies of the early COVID-19 outbreak have found that U07.1 alone has a positive predictive value of 94%55, and for this study U07.1 captured 99.2%, 99.0%, and 97.5% of all COVID-19 patient-diagnoses for the primary, “Bone-Rx”, and “Osteo-Dx-Rx” cohorts, respectively."

    1. Author Response:

      We thank the reviewers and editor for their feedback, which we will carefully consider as we revise the manuscript. We aim to provide more detail on how this technique could be used with other probes, ideally showing experimental data to support this use. We will add further detail of the histology from our ex vivo ovine and porcine and in vivo porcine testing. We will also provide a more thorough comparison of our technique to other recently developed lesioning techniques. In order to provide more complete evidence that our technique perturbs local neuron populations, we will refine the action potential analysis presented before and after lesions in non-human primates. In addition to providing further clarity of the method, we will include more non-human primate data where possible.

    1. Author Response:

      We are very glad that the reviewers found our paper of broad interest to the community of population, evolutionary, and ecological genetics. We thank them for their positive feedback and insightful comments and suggestions. We are preparing a revision of the preprint that will address these points. 

      One issue raised by the reviewers was that it is important to acknowledge possible limitations of the demographic model used in simulation in capturing different aspects of genomic variation. In particular, different demographic models inferred for the same species using different methods or sets of samples may have different strengths and weaknesses, and this should be considered when selecting a demographic model for simulation. This is an important point that we intend to discuss in the revised version of our manuscript. We also plan to expand the documentation of the stdpopsim catalog to include more information about  the type of data used to fit every demographic model. Below we provide an outline of our thoughts on the topic.

      First of all, it is important to acknowledge that demographic models inferred from genomic data cannot fully capture all aspects of the true demographic changes in the history of a species. As a result, these models do a good job in capturing some aspects of genetic variation, but not all of them. This is primarily determined by two factors: the method used for demographic inference, and the samples whose genomes were used in inference. Regardless of the method applied, the inferred demographic model can only reflect the genealogical ancestry of the sampled individuals, and this will typically make up a small portion of the complete genealogical ancestry of the species (albeit the genealogy of any set of sampled individuals includes many ancestors). Thus, demographic models inferred from larger sets of samples from diverse ancestry backgrounds may provide a more comprehensive depiction of genetic variation within a species, as long as a sufficiently realistic demographic model can be fit. That said, the choice of samples used for inference will mostly influence recent changes in genetic variation. This is because the genealogy of even a single individual consists of numerous ancestors in each generation in the deep past (which is the premise behind PSMC-style inference methods).

      The computational method used for inference also affects the way genetic variation is reflected by the demographic model, because different methods derive their inference from different features of genomic variation. Some methods make use of the site frequency spectrum at unlinked single sites (e.g., dadi, Stairway plot), while other methods use haplotype structure (e.g., PSMC, MSMC, IBDNe). This, in turn, may influence the accuracy of different features in the inferred demography. For example, very recent demographic changes, such as recent admixture or bottlenecks, are difficult to infer from the site frequency spectrum, but are more easily inferred by examining shared long haplotypes (as demonstrated by the demographic model inferred for Bos Taurus by MacLeod et al. (2013)). There have been several studies that compare different approaches to demography inference (e.g., Biechman et al. (2017); Harris and Nielsen (2013)), but unfortunately, there is currently no succinct handbook that describes the relative strengths and weaknesses of different methods. Indeed, we hope that the standardized simulations provided by stdpopsim will facilitate systematic comparisons between methods, which will, in turn, provide valuable insights for researchers when selecting demographic models for simulation.

      It is important to note that inclusion of a demographic model in the stdpopsim catalog does not involve any judgment as to which aspects of genetic variation it captures. Any model that is a faithful implementation of a published model inferred from genomic data can be added to the stdpopsim catalog. Thus, potential users of stdpopsim should use the implemented models with the appropriate caution, keeping in mind the limitations discussed above. Scientists contributing a new model to the catalog are required to write a brief summary, which is added to the documentation page of the catalog: https://popsim-consortium.github.io/stdpopsim-docs/ latest/catalog.html. This summary includes a graphical description of the model (such as the one shown for Anopheles gambiae in Fig. 2B of the paper), as well as a description of the data and method used for inference. We will mention this in the revised manuscript to help users of stdpopsim navigate through this resource.

    1. Author Response:

      First of all, we would like to thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article.

      The main criticism on our manuscript, from both reviewers, is that the cryo-EM structures are of low resolution and that the fit of the crystallographic structures of the PAD and the stalk domain into these low-resolution structures is questionable. We would like to point out that the cryo-EM data, and the conclusions from it, are not essential for the main conclusions of the article. All mutants that we made in this study were designed based on the structural data obtained from the high-resolution X-ray structures, with no input from the low-resolution cryo-EM docked models. We chose to include the cryo-EM data since it allowed us to speculate about the interaction between the PAD and the stalk domain of PrgB, domains that we have separately determined the structures of via X-ray crystallography. We agree with the reviewers that further experiments are needed to verify this potential interaction. Therefore, we will perform additional biochemical assays to investigate the proposed interaction. We will also try to optimize the cryo-EM data to hopefully allow for a more reliable fit of our high-resolution crystallographic structures. Once that is done, we will submit a revised version of the manuscript.

      On behalf of all authors,

      Ronnie Berntsson

    1. Author Response:

      We’d like to thank the three reviewers for reviewing our work in depth and providing insightful comments and suggestions.

      Reviewer 1

      1. The in vivo efficacy of MS023 does not seem to be very great. The mice treated with MS023 display a very small reduction in ADMA levels and a small increase in SDMA levels (Fig S6A).

      REPLY: We have quantified proteins with ADMA and SDMA by Western blotting tail clippings from mice treated with vehicle (n=6) and MS023 (n=6). These were normalized for equal loading to b-actin levels. The average ADMA relative expression was 0.92 for vehicle treated mice and 0.86 for MS023 treated mice (p < 0.044). The average SDMA relative expression was 0.89 for vehicle treated mice and 0.98 for MS023 treated mice (p < 0.000019). These whole-body measurements show MS023 promotes the decrease of proteins with ADMA and increasing proteins with SDMA, as observed before with inhibition of PRMT1 (Dhar et al, 2013).

      Reviewer 2

      1. Two weaknesses are noted which lie in overstatements of the findings. There are six type I PRMTs (PRMT1, 2, 3, 6, 8, and CARM1), all of which are inhibited by MS023. While the authors demonstrate that their observations are not due to the inhibition of CARM1, they do not demonstrate that it is due to the inhibition of PRMT1, as they suggest. 

      REPLY: MS023 has been shown to have in vitro activity for several type I enzymes (Eram et al, 2016) and the same goes for GSK3368712 (Fedoriw et al, 2019). MS023 IC50 in vitro 30nM PRMT1, 119 nM PRMT3, 83 nM CARM1, 4 nM PRMT6, and 5 nM PRMT8 (Eram et al., 2016).  It was documented early that PRMT1 is the major cellular type I enzyme (Pawlak et al, 2000) and this is why PRMT1 and PRMT5, major type II, are embryonic lethal in mice (Guccione & Richard, 2019). In vivo data using MS023 is paralleled by using siPRMT1 (Gao et al, 2019; Plotnikov et al, 2020; Wu et al, 2022; Zhu et al, 2019). Thus in vivo, MS023 targets the main type I PRMT, PRMT1. Further, in support of our claim that MS023 targets PRMT1 in MuSCs is our previous observation that deleting PRMT1 stimulates MuSC proliferation. Since this effect was irreversible (Blanc et al, 2016) we pursued studies with the reversible MS023, the only compound to have significant activity towards PRMT1 in vivo. For these reasons, we are convinced that the effect of MS023 is mainly mediated by inhibiting PRMT1 in the MuSC.

      To be thorough we should test all other type I PRMT inhibitors as they become available. CARM1 was shown to be a player in MuSC (Kawabe et al, 2012), but we excluded it using a CARM1 inhibitor TP-064 (Nakayama et al, 2018). PRMT6 mice that we generated are perfectly viable without overt phenotypes, suggesting PRMT6 is not involved (Neault et al, 2012), and PRMT8 is brain specific (Taneda et al, 2007).

      2. Furthermore, this study suggests that the switch and elevated cellular metabolism in muscle stem cells due to MS023 enhanced self-renewal and engraftment capabilities but does not demonstrate this fact directly as stated. 

      REPLY: Agreed. The link between cellular metabolism and MS023 enhanced self-renewal and engraftment capabilities is correlative and we will edit the revised text to reflect this.

      Reviewer 3

      1. However, the proposed underlying mechanism, which is claimed to rely on the expansion of MuSC and 'reprograming' of MuSCs towards a "unique and previously uncharacterized identity" is not sufficiently supported. The extent of the description of scRNA-seq data is inappropriate. Some conclusions from the scRNA-seq data appear to be overinterpreted or are rather trivial.

      REPLY: We presented the top marker genes for each subpopulation that was identified in our scRNAseq to aid the reader in establishing a broad view of whether a given subpopulation was quiescent-like, proliferating, or differentiating. M1-M5 clusters were all enriched for cell cycle markers (Mki67, Cdk1, etc), indicating a proliferative identity. The unique finding in our data is that treatment with MS023 resulted in a shift in identity as compared to the DMSO-treated proliferating MuSCs (M1, M2 and M4), creating transcriptionally distinct M3 and M5 clusters. M3 and M5 had elevated markers for metabolism (E.g. Eno1, Atp5k, etc) and early activation (E.g. Fos, Jun), while the untreated MuSCs in clusters M1, M2 and M4 did not. Furthermore, M3 and M5 had higher baseline levels of Pax7 expression when compared to untreated cells. Together, these findings describe a transitional subpopulation of MuSCs unique to MS023 treatment which not only harbour stem like/early activation markers Pax7, Fos and Jun, but also elevated proliferative markers related to cell cycle and energy metabolism. This particular combination of characteristics is unique to the MS023-treated MuSCs, thus identifying a novel subtype of MuSC identity. In accordance with our scRNAseq data, we validated experimentally that MS023-treated cells have higher energy metabolism and increased self-renewal potential, thereby confirming that the unique transcriptomic signature of these cells also lead to a different cell fate decision.

      2. It remains completely unclear whether the MS023-stimulated increase of metabolic pathway activity (OXPHOS, glycolysis) plays any role for preserving stem cell properties of MuSC during expansion and improves engraftment. Additional functional and mechanistic studies are required to explore the underlying molecular processes.

      REPLY: Agreed. The link between cellular metabolism and MS023 enhanced self-renewal and engraftment capabilities is correlative and we will edit the revised text to reflect this.

      3. Furthermore, it remains completely unclear whether the acclaimed increase in grip and tetanic strength of mdx mice after MS023 treatment relies on enhanced expansion of MuSC mediated by PRMT1 inhibition. 

      REPLY: Agreed. We cannot exclude if the effect is mediated by an expansion of the MuSC pool or by an effect on other cell types, such as a direct impact on the myofibers. The goal of this figure was to provide a therapeutic perspective for the use of type I PRMT inhibitor for the treatment of DMD. Muscle wasting/weakness in DMD is a complex and multifactorial process (e.g., myofiber fragility, MuSC defects, chronic inflammation, fibrofatty accumulation). If MS023 can target multiple aspects of the physiopathology of the disease it would increase its therapeutic applicability. Further studies will be needed to determine the exact mechanism by which MS023 mediate its beneficial effect. The manuscript will be modified to reflect this.

      References

      • Blanc RS, Vogel G, Li X, Yu Z, Li S, Richard S (2016) Arginine methylation by PRMT1 regulates muscle stem cell fate. Mol Cell Biol 37: e00457-00416

      • Dhar S, Vemulapalli  V, Patananan AN, Huang GL, Di Lorenzo A, Richard S, Comb MJ, Guo A, Clarke SG, Bedford MT (2013) Loss of the major Type I arginine methyltransferase PRMT1 causes substrate scavenging by other PRMTs. Scientific reports 3: 1311

      • Eram MS, Shen Y, Szewczyk M, Wu H, Senisterra G, Li F, Butler KV, Kaniskan HU, Speed BA, Dela Sena C et al (2016) A Potent, Selective, and Cell-Active Inhibitor of Human Type I Protein Arginine Methyltransferases. ACS Chem Biol 11: 772-781

      • Fedoriw A, Rajapurkar SR, Brien SO, Gerhart SV, Lorna H, Pappalardi B, Shah N, Laraio J, Liu Y, Butticello M et al (2019) Anti-tumor activity of the first-in-class type I PRMT inhibitor, GSK3368715, synergizes with PRMT5 inhibition through MTAP loss. Cancer cell XX: XX

      • Gao G, Zhang L, Villarreal OD, He W, Su D, Bedford E, Moh P, Shen J, Shi X, Bedford MT et al (2019) PRMT1 loss sensitizes cells to PRMT5 inhibition. Nucleic acids research 47: 5038-5048

      • Guccione E, Richard S (2019) The regulation, functions and clinical relevance of arginine methylation. Nat Rev Mol Cell Biol 20: 642-657

      • Kawabe Y, Wang YX, McKinnell IW, Bedford MT, Rudnicki MA (2012) Carm1 regulates Pax7 transcriptional activity through MLL1/2 recruitment during asymmetric satellite stem cell divisions. Cell Stem Cell 11: 333-345

      • Nakayama K, Szewczyk MM, Dela Sena C, Wu H, Dong A, al. e (2018) TP-064, a potent and selective small molecule inhibitor of PRMT4 for multiple myeloma. Oncotarget 9: 18480-18493

      • Neault M, Mallette FA, Vogel G, Michaud-Levesque J, Richard S (2012) Ablation of PRMT6 reveals a role as a negative transcriptional regulator of the p53 tumor suppressor. Nucleic acids research 40: 9513-9521

      • Pawlak MR, Scherer CA, Chen J, Roshon MJ, Ruley HE (2000) Arginine N-Methyltransferase 1 Is Required for Early Postimplantation Mouse Development, but Cells Deficient in the Enzyme Are Viable. Mol Cell Biol 20: 4859-4869

      • Plotnikov A, Kozer N, Cohen G, Carvalho S, Duberstein S, Almog O, Solmesky LJ, Shurrush KA, Babaev I, Benjamin S et al (2020) PRMT1 inhibition induces differentiation of colon cancer cells. Scientific reports 10: 20030

      • Taneda T, Miyata S, Kousaka A, Inoue K, Koyama Y, Mori Y, Tohyama M (2007) Specific regional distribution of protein arginine methyltransferase 8 (PRMT8) in the mouse brain. Brain Res 1155: 1-9

      • Wu Q, Nie DY, Ba-Alawi W, Ji Y, Zhang Z, Cruickshank J, Haight J, Ciamponi FE, Chen J, Duan S et al (2022) PRMT inhibition induces a viral mimicry response in triple-negative breast cancer. Nature chemical biology 18: 821-830

      • Zhu Y, He X, Lin YC, Dong H, Zhang L, Chen X, Wang Z, Shen Y, Li M, Wang H et al (2019) Targeting PRMT1-mediated FLT3 methylation disrupts maintenance of MLL-rearranged acute lymphoblastic leukemia. Blood 134: 1257-1268

    1. Author Response

      Reviewer #2 (Public Review):

      1) The main limitation of this study is that the results are primarily descriptive in nature, and thus, do not provide mechanistic insight into how Ryr1 disease mutations lead to the muscle-specific changes observed in the EDL, soleus and EOM proteomes.

      An intrinsic feature of the high-throughput proteomic analysis technology is the generation of lists of differentially expressed proteins (DEP) in different muscles from WT and mutated mice. Although the definition of mechanistic insights related to changes of dozens of proteins is very interesting, it is a difficult task to accomplish and goes beyond the goal of the high-throughput proteomic analysis presented here. Nevertheless, the analysis of DEPs may indeed provide arguments to speculate on the pathogenesis of the phenotype linked to recessive RyR1 mutations. In the unrevised manuscript, we pointed out that the fiber type I predominance observed in congenital myopathies linked to recessive Ryr1 mutation are consistent with the high expression level of heat shock proteins in slow twitch muscles. However, as suggested by Reviewer 3, we have removed "vague statements" from the text of the revised manuscript, concerning major insights into pathophysiological mechanisms, since we are aware that the mechanistic information, if any, that we can extract from the data set, cannot go over the intrinsic limitation of the high-throughput proteomic technology.

      b) Results comparing fast twitch (EDL) and slow twitch (soleus) muscles from WT mice confirmed several known differences between the two muscle types. Similar analyses between EOM/EDL and EOM/soleus muscles from WT mice were not conducted.

      We agree with the point raised by the Reviewer. In the revised manuscript we have changed Figure 2. The new Figure 2 shows the analysis of differentially expressed proteins in EDL, soleus and EOMs from WT mice. We have also added 2 new Tables (new Supplementary Table 2 and 3) and have inserted our findings in the revised Results section (page, 7, lines 157-176, pages 8 and 9).

      c) While a reactome pathway analysis for proteins changes observed in EDL is shown in Supplemental Figure 1, the authors do not fully discuss the nature of the proteins and corresponding pathways impacted in the other two muscle groups analyzed.

      We have now included in the revised manuscript a new Figure 2 which includes the Reactome pathway analysis comparing EDL with soleus, EDL with EOM and soleus with EOM (panels C, F and I, respectively). We have also inserted into the revised manuscript a brief description of the pathways showing the greatest changes in protein content (page 7 line 156-175, pages 8 and 9). We agree that the data showing changes in protein content between the 3 muscle groups of the WT mice are important also because they validate the results of the proteomic approach. Indeed, the present results confirm that many proteins including MyHCIIb, calsequestrin 1, SERCA1, parvalbumin etc are more abundantly expressed in fast twitch EDL muscles compared to soleus. Similarly, our results confirm that EOMs are enriched in MyHC-EO as well as cardiac isoforms of ECC proteins. This point has been clarified in the revised version of the manuscript (page 8, lines 198-213; page 9 lines 214-228). Nevertheless, we would like to point out that the main focus of our study is to compare the changes of protein content induced by the presence of recessive RyR1 mutations.

      Reviewer #3 (Public Review):

      a) it would be useful to determine whether changes in protein levels correlated with changes in mRNA levels …….

      We performed qPCR analysis of Stac3 and Cacna1s in EDL, Soleus and EOM from WT mice (see Figure 1 below). The expression of transcripts encoding Cacna1s and Stac3 is approximately 9-fold higher in EDL compared to Soleus. The fold change of Stac3 and Cacna1s transcripts in EDL muscles is higher compared to the differences we observed by Mass spectrometry at the protein level between EDL and Soleus. Indeed, we found that the content of the Stac3 protein in EDL is 3-fold higher compared to that in soleus. Although there is no apparent linear correlation between mRNA and protein levels, we believe that a few plausible conclusions can be drawn, namely: (i) the expression level of both transcripts and proteins is higher EDL compared to EOM and soleus muscles, respectively, (ii) the expression level of transcripts encoding Stac3 correlate with those encoding Cacan1s and confirm proteomic data. In addition, the level of Stac3 transcript does not changes between WT and dHT, confirming our proteomic data which show that Stac3 protein content in muscles from dHT is similar to that found in WT littermates. Altogether these results support the concept that the differences in Stac3 content between EDL and soleus occur at both the protein and transcript levels, namely high Stac3 mRNA level correlates with higher protein content (EDL) and low mRNA levels correlated with low Stac3 protein content in Soleus muscles (see Figure 1 below).

      Figure 2: qPCR of Cacna1s and Stac3 in muscles from WT mice. The expression levels of the transcripts encoding Cacna1s and Stac3 are the highest in EDL muscles and the lowest in soleus muscles (top panels). There are no significant changes in their relative expression levels in dHT vs WT. Each symbol represents the value from of a single mouse. * p=0.028 Mann Whitney test qPCR was performed as described in Elbaz et al., 2019 (Hum Mol Genet 28, 2987-2999).

      ….and whether or not the protein present was functional, and whether Stac3 was in fact stoichiometrically depleted in relation to Cacna1s.

      We thought about this point but think that there are no plausible arguments to believe that Stac3 is not functional, one simple reason being that our WT mice do not have a phenotype which would be associated with the absence of Stac3 (Reinholt et al., PLoS One 8, e62760 2013, Nelson et al. Proc. Natl. Acad. Sci. USA 110:11881 2013).

      b) In the abstract, the authors stated that skeletal muscle is responsible for voluntary movement. It is also responsible for non-voluntary. The abstract needs to be refocused on the mutation and on what we learn from this study. Please avoid vague statements like "we provide important insights to the pathophysiological mechanisms..." mainly when the study is descriptive and not mechanistic.

      The abstract of the revised manuscript has been rewritten. In particular, we removed statements referring to important “pathophysiological mechanistic insight”.

      c) The author should bring up the mutation name, location and phenotype early in the introduction.

      In the revised manuscript we provide the information requested by the Reviewer (page 2 lines 36-38 and page 4, lines 98-102).

      d) This reviewer also suggests that the authors refocus the introduction on the mutation location in the 3D RyR1 structure (available cryo-EM structure), if there is any nearby ligand binding site, protomers junction or any other known interacting protein partners. This will help the reader to understand how this mutation could be important for the channel's function

      The residue Ala4329 is present inside the TMx (Auxiliary transmembrane helices) domain which spans from residue 4322 to 4370 and interposes structurally (des Georges A et al. 2016 Cell 167,145-57; Chen W, et al. 2020 EMBO Rep. 21, e49891). Although the structural resolution of the region has been improved (des Georges et al, 2016), parts of the domain still remain with no defined atomic coordinates, especially the region encompassing a.a. E4253 – F4540. Because of such undefined atomic coordinates of the region E4253-F4540, we are not able to determine the real orientation and the disposition of the amino acids in this region, including the A4329 residue. As reference, structure PDB: 5TAL of des Georges et al, 2016 was analyzed with UCSF Chimera (production version 1.16) (Pettersen et al. J. Comput. Chem. 25: 1605-1612. doi: 10.1002/jcc.20084).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes a relatively novel approach to discovering combinations of herbal medications that may help modulate immune responses, and in turn help treat diseases such as cancer. The authors use breast plasma call mastitis as a disease in which they present results from a non-blinded clinical trial with modest results. The main shortcomings are a lack of rigor around standardizing the control group given steroids versus the treatment group given the combinations of herbal medications. There needs to be a detailed statistical analysis of the comparison in tumor size, stage, invasiveness, etc. as well as consideration of confounding disease states (autoimmune disease, prior cancers, diabetes, etc.). While the results are interesting in that the use of herbal medications is often overlooked in Western medicine, the manuscript needs great detail in the clinical comparison in order to provide convincing evidence for an effect.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is relatively novel with considerable translational impact to the field of herbal medications. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      Reviewer #2 (Public Review):

      The work is rather interesting and novel because for the first time, the authors employed knowledge graph, a cutting-edge technique in the domain of artificial intelligence, to identify a novel herbal drug combination for the treatment of PCM. The results of the clinical trial study clearly demonstrated that the drug combination is effective to ameliorate the symptoms of PCM patients and improve the general health status of the patients. Overall, the strategy of this manuscript may provide a paradigm for the design of drug combination towards many other human disorders.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

      Reviewer #3 (Public Review):

      The major merit of the manuscript is that the authors introduced the concept of knowledge graph into the domain of herbal drugs or TCM. Namely, the authors designed a knowledge graph towards systematic immunity or immunotherapy based on massive data mining techniques. The authors successfully identified an herbal drug combination for PCM with the help of a scoring system. Moreover, the authors conducted a clinical trial study and the clinical data showed that the herbal drug combination holds great promise as an effective treatment for PCM. The weakness of the manuscript is that some details for the herbal drug combination and the clinical trial study are missing.

      Many thanks for your very kind words about our work. We are excited to hear that you think our work is relatively novel and holds great promise as an effective remedy for PCM. We are truly thankful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

    1. Author Response

      Reviewer #1 (Public Review):

      After giving a very accessible introduction to cellular processes during brain development, the authors present the computational model used in this study. It combines the kinematics of cell proliferation with the mechanic of brain tissue growth and is essentially equal to their model presented in Zarzor et al (2021), but extended for the outer subventricular zone (OSVZ), see for example Figs. 2 in the present manuscript and in Zarzor et al (2021). This zone, which is specific to humans, provides a second zone of cell proliferation. The division rate in the OSVZ is smaller and at most equal to that in the ventricular zone.

      The authors present two main findings: The distance between sulci in the cortex is decreased whereas the cell density in the ventricular zone is increased in presence of the OSVZ. Furthermore, the "folding evolution", which is the ratio between the outer perimeter at time t and the initial perimeter increases in presence of the OSVZ. The strongest effect is seen, when division rates in both proliferating zones are equal. The authors compare the cases of varying and constant cortical stiffness, which they had also done in Zarzor et al (2021). Finally, they consider the feedback of cortical folding on OSVZ thickness.

      The computational model provides a sound description of how cell proliferation and migration combined with tissue mechanics yield cortical folding patterns. However, only a few parameter values are varied in a limited range. Also, it remains unclear to me, how important the specific functional dependencies of, for example, the cell division rate on the radial coordinate are. This point seems of particular importance because the effect of the presence of the OSVZ on the folding patterns seems rather minute, see Fig. 5. The authors do not propose experiments that could be used to test their description and results. Finally, the analysis is restricted to 2 dimensions.

      Thank you very much for the valuable suggestions. We agree that we are only able to show limited parameter studies in the manuscript. Therefore, we have now implemented a user interface that can be downloaded from Github (https://github.com/SaeedZarzor/BFSimulator) and will allow interested readers to directly change the parameter values and run the simulations.

      To better emphasize the effect of the presence of the OSVZ on the folding patterns, we have edited the corresponding section and figure in the revised manuscript to include a quantification of the distance between sulci:

      “In general, the distance between neighboring sulci decreases with increasing Gosvz, as marked in Figure 7. For the displayed cases, the distance decreases from d = 8.796 mm for Gosvz = 0 to d = 8.67 mm for Gosvz = 10 and finally d = 8.2 mm for Gosvz = 20. Interestingly, the cortical thickness and effective stiffness ratio at the first instability point (denoted by w in Figure 5) are the same for all these cases. Therefore, we attribute the observed differences to the faster increase in the cell density and thus cortical growth, cortical stiffness and the effective stiffness after the instability has been initiated.”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Reviewer #2 (Public Review):

      Weaknesses

      • To account for the complexity of biological phenomena, the model relies on a large number of ad hoc choices whose consequences are difficult to predict.

      We fully agree that there are quite a number of model assumptions that we have to make. Still, we have achieved great agreement with the data from fetal brain sections, which in our opinion justified the assumptions made.

      To better explain the choice of parameters, we have now included the following paragraph in the manuscript: “The mechanical and diffusion parameters are adapted from the literature Budday et al. (2020); de Rooij and Kuhl (2018), while the geometry parameters are estimated based on histologically stained human brain sections and magnetic resonance images. For instance, to determine the MST factor, we measured the relative distance between the ISVZ and OSVZ in histologically stained images. The final value adopted is the result of dividing the measured distance by the expected time. When determining the growth problem parameters, numerical stability and algorithm convergence were major criteria.”

      • The physical model description is highly technical and out of reach for a non-specialist.

      Thank you for making this point! We have now adapted the model description to better emphasize the main features of the model and the feedback mechanisms between the mechanical growth problem and the cell density problem:

      “...is the Cauchy stress tensor formulated in terms of the elastic deformation tensor, as only the elastic deformation induces stresses. The Cauchy stress describes the three dimensional stress state in the spatial (grown and deformed) configuration and is computed by deriving the strain energy function…”

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex (as the cortical stiffness changes while the subcortical stiffness remains constant) and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value G_vz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “G^s_osvz is the division rate in the OSVZ that decreases with increasing maximum stretch s in the domain”

      • The description of neurogenesis shows three zones of cell proliferation, each inhabited by a specific cell type. Despite its realism, the proposed model does not take into account the ISVZ where the intermediate progenitors operate.

      Indeed, in our model we have focused on two original sources of the cells which are radial glial cells and ORGCs. As we know so far, the intermediate progenitor cells are produced from those two cell types, so they are indirectly included in the model as a resulting cell density.

      • The experiment of comparing several regimes derived from the relative importance of proliferation in the VZ and OSVZ is not very clear. It leads to the observation of the evolution of cell density maxima over time, which seems insufficient to conclude the importance of the OSVZ for folding. One wonders whether the key parameter that leads to folding is the rate of OSVZ proliferation or simply the total quantity of neurons generated by the two or even the three zones.

      Thank you for this remark. We fully agree with the Reviewer that a key factor is the total quantity of neurons generated. However, the major question we intend to address here is where these neurons originate from and how the different proliferating zones interact. In other words, we do not question the existence of the OSVZ, but we are trying to build a computational model that can mimic all relevant cellular processes during brain development - to then study their individual effect on cortical folding. Therefore, we do not argue that the OSVZ is necessary for folding, but that it plays a crucial role in the speed of generating these folds and their complexity in the Conclusion section:

      “Our results show that the existence of the OSVZ particularly triggers the emergence of secondary mechanical instabilities leading to more complex folding patterns. Furthermore, the proliferation of outer radial glial cells (ORGCs) reduces the time required to induce the mechanical instability and thus cortical folding.”

      • The experiment on the heterogeneity of proliferation in the OSVZ is a bit frustrating. I would like to see a set-up corresponding to the mosaics found in ferrets and closely associated with folding patterns.

      This is a valuable point, thank you! We have now added new results showing a more distinct regional variation of the OSVZ and have adapted our conclusions regarding this point:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

      “Finally, our simulations reveal that inhomogeneous cell proliferation patterns in the OSVZ can control the location of first gyri and sulci but do not necessarily affect the distance between sulci and the overall complexity of the emerging folding pattern.”

      Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with the user interface shown below is now updated in the Data availability section.

      • It would be interesting to elaborate a little on the possibility of extending the model in 3D, which seems imperative to evaluate the nature of the folding pattern generated. Comparing them to reality is an essential step in gauging the credibility of the model. For instance, it would be interesting to test to which extent the model can father the type of variability observed in the general population (Mangin et al.). It will also be particularly interesting to work on the inverse model between the real folding patterns and the heterogeneous proliferation maps that can generate them.

      We fully agree with the Reviewer. Unfortunately, to the best of the Author’s knowledge, there is currently no data set providing both the 3D evolution of the folding pattern and the corresponding distribution of the cell density. Therefore, the validation of 3D results is difficult. Promisingly, our model achieved good agreement with data from histologically stained fetal brain sections regarding the local gyrification index, final cortical thickness, and cell density distribution, as presented in Zarzor, et al (2021). We have indeed initiated the collection of additional data, ideally for the 3D validation. However, this will take some time and is out of the scope of the current work. It is also a great suggestion to compare our 3D simulation results with the variability found in the general population. Indeed, we plan to do such work in the future but consider this out of the scope of the current work, which focuses more on the OSVZ.

      To still show that our model can be extended to 3D, we have now included the following results: “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone G_vz = 600, the folding complexity increases with increasing initial division rate in the OSVZ G_osvz.”

      Reviewer #3 (Public Review):

      Zarzor et al. developed a new multifield computational model, which couples cell proliferation and migration at the cellular level with biological growth at the organ level, to study the effect of OSVZ on cortical folding. Their approach complements the classical experimental approach in answering open questions in brain development. Their simulation results found the existence of OSVZ triggers the emergence of secondary mechanical instabilities that leads to more complex folding patterns. Also, they found that mechanical forces not only fold the cortex but also deepen subcortical zones as a result of cortical folding. Their physics-based computational modeling approach offered a novel way to predictively assess the links between cellular mechanisms and cortical folding during early human brain development, further shedding light on identifying the potential controlling parameters for reverse brain study.

      Strengths:

      The newly developed physics-based computational model has several advantages compared to previous existing computational brain models. First, it breaks the traditional double-layer computational brain model, gray matter layer and white matter layer, by introducing the outer subventricular zone. Second, it develops multiscale computational modeling by bringing the cellular level features, cell diffusion, and migration, into the macroscale biological growth model. Third, it could provide a cause-effect analysis of cortical folding and axonal fiber development. Finally, their approach could complement, but not substitute, sophisticated experimental approaches to answer some open questions in brain science.

      Weaknesses:

      The cellular diffusion and migration seem determined and controlled by a single variable, cell density, which is one-way coupled with the deformation gradient of the brain model. However, cell migration and diffusion should be potentially coupled with stress and vice versa. Also, the current computational model can be improved by extending it to a 3D model. Finally, they can further improve the study of regional proliferation variation by introducing fully-randomized heterogenous cell density and growth in their model.

      Thank you. We apologize for the lack of clarity in the original submission. There are indeed more coupling mechanisms, which we have now better emphasized when introducing the model:

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value Gvz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “Gosvzs is the division rate in the OSVZ that again decreases with increasing maximum stretch s in the domain”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Finally, we have added new results showing a more distinct regional variation of the OSVZ. Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with user interface is available in the paper:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors push a fresh perspective with a sufficiently sophisticated and novel methodology. I have some remaining reservations that concern the actual make-up of the data basis and consistency of results between the two (N=16) samples, the statistical analysis, as well as the “travelling” part.

      I previously commented on the fact that findings from both datasets were difficult to discern and more effort should be made to highlight these. Also, a major conclusion “the directionality effect [effect of attention on forward waves] only occurs for visual stimulation” only rested on a qualitative comparison between studies. The authors have improved on this here, e.g., by toning down this conclusion. One thing that is still missing is a graphical representation of the data from Foster et al. (the second dataset analysed here) that would support the statistical results and allow the reader a visual comparison between the sets of findings.

      We are glad that the reviewer recognizes the improvement in the presentation of the conclusions. According to the suggestions, we have modified figure 2, not only by including a third dataset (see point below), but also in a way that allows a direct comparison between the three datasets. Specifically, the results from the three datasets are now shown in three columns next to each other. The first row shows the FW and BW waves in contra and ipsilateral lines of electrodes for each dataset: our dataset and the one from Feldmann-Wustefeld and colleagues (the first and the second column in the figure, both with visual stimulation) shows a clear interaction between direction and laterality, as confirmed by the statistical analysis. The dataset from Foster and colleagues (the third column, no visual stimulation) shows a laterality effect only in the backward waves but not in the forward ones, in line with the hypothesis that FW waves are modulated only in the presence of visual stimulation. The second row shows a schematic representation of the task, and the third row illustrate the electrodes’ lines used in each dataset. We hope the reviewer will be satisfied with the current data presentation.

      Also, for any naive reader, the concept of travelling waves may be hard to grasp in the way data are currently presented - only based on the results of the 2D-FFT. Can forward and backward-travelling waves be illustrated in a representative example to make this more intuitive?

      We thank the reviewer for the suggestion. We included in figure 1 an additional panel E that represents a schematic example of forward and backward waves in the temporal domain (i.e., in the EEG data). We hope this example will provide a better understanding of the data and the traveling wave concept.

      Finally, the way Bayes Factors from the Bayesian ANOVA are presented, especially with those close to the ‘meaningful boundaries’ ⅓ and 3, as defined in the ‘Statistical analysis’ section, requires some unification/revision. For example, here: “We found a positive correlation between contra- and ipsi- lateral backward waves, and occipital (all Pearson’s r~=0.4, all BFs 10 ~=3) and -to a smaller extent- frontal areas (all Pearson’s r~=0.3, all BFs 10 ~=2).”, where the second part should strictly be labelled as inconclusive evidence. In the same vein, there is occasional mention of “negative effects”, where it should say that evidence favours the absence of an effect.

      We agree with the reviewer and apologize for the inaccuracies in reporting the statistical analysis. We corrected as suggested (see below), replacing ‘negative effects’ with ‘evidence favors the absence of an effect’.

      From the updated manuscript :

      "We found moderate evidence of a positive correlation between contra- and ipsi- lateral backward waves, and occipital (all Pearson’s r~=0.4, all BFs10~=3) but inconclusive evidence in the frontal areas (all Pearson’s r~=0.3, all BFs10~=2)."

      From the revised ‘Results’ section, now it reads:

      […] whereas all other factors and their interactions revealed evidence in favor of the absence of an effect (BFs10<0.3).

      […] but not in the forward waves (BF10=0.231, error<0.01%, supporting evidence in favor of the absence of an effect).

      Reviewer #2 (Public Review):

      The present manuscript takes a new perspective and investigates the functional relevance of traveling alpha waves’ direction for visual spatial attention. While the modulation of alpha oscillatory power - and especially the lateralization of alpha power - has been associated with spatial attention in the literature, the present investigation offers a new perspective that helps understand and differentiate the functional roles of alpha oscillations in the ipsi- versus contralateral hemisphere for spatial attention.

      The present study uses a straightforward approach and provides an analysis of two EEG datasets, which are convergingly in line with the authors’ claim that two patterns of travelling alpha waves need to be differentiated in visual spatial attention. First, backward waves in the ipsilateral hemisphere, and second, forward waves in the contralateral hemisphere, which are only observed during visual stimulation. Importantly, the authors test the relation of these patterns of traveling waves to the overall power of alpha oscillations and to the hemispheric lateralization of alpha power. Furthermore, to test the functional significance, the authors demonstrate that the pattern of forward and backward waves around stimulus onset differentiates between hits and misses in task performance.

      Although the results are in line with the conclusions drawn, some questions remain. The authors investigate the relationship between traveling alpha waves and the hemispheric lateralization of alpha power, which is a well-established neural signature of spatial attention. Surprisingly, the lateralization of alpha power shown in Figure 3B appears relatively weak in the present dataset (by visual inspection), which raises the question of whether the investigation of a relation between lateralized alpha power and alpha traveling waves is warranted in the first place.

      We agree with the reviewer that the effect seems reduced compared to other studies, despite the topography of alpha-band lateralization in our data is in line with the literature. In order to quantify the effect, we performed an analysis similar to (Thut et al., 2006), defining a laterality index as:

      We computed such index for occipital electrodes and their average (in red in figure R1). The results reveal that for most electrodes, including their average, the laterality index is significantly larger than 0, confirming the presence of alpha-band lateralization. However, we also note that the amplitude of the effect (~0.04) is reduced compared to the study by Thut and colleagues, which was between 0.05 and 0.10.

      Figure R1 – Laterality index for occipital electrodes, quantifying alpha-band lateralization during attention allocation. All electrodes go in the expected direction, revealing an increase of alpha-band power in the ipsilateral occipital hemisphere.

      Furthermore, the authors employ between-subject correlations (with N = 16) to test the relationship between alpha traveling waves and (lateralized) alpha power. However, as inter- individual differences in patterns of travelling waves are not the main focus here, within- subject analyses of the same relations would be able to test the authors’ hypotheses much more directly.

      As suggested, we included the recommended within-subject analysis in the revised manuscript by computing a trial-by-trial correlation between alpha power and traveling waves for each participant. First, we obtained a correlation coefficient and a p-value for each subject. Then, we tested whether the correlation coefficients had an overall positive or negative distribution (i.e., according to our previous results, we expected a positive correlation between backward waves and alpha power). Additionally, we combined the p-values to test for overall significance (using the Fisher method, see Methods section below). Our results corroborate the between-subject correlation, supporting the conclusion that alpha-band power correlates mostly with backward waves (especially contro-lateral to the attended location). The other correlations (i.e., forward waves and alpha power) were statistically inconclusive. We included in the revised manuscript these new results, as shown in the following.

      From the Results section:

      “To further investigate the relation between alpha-band travelling waves and alpha power, we performed the same analysis focusing on the correlation within each participant. In particular, we correlated trial-by-trial forward and backward waves with alpha-band power for each subject, obtaining correlation coefficients ‘r’ and their respective p-values. As in the previous analysis, we correlated forward and backward waves with frontal and occipital electrodes in both contro- and ipsilateral hemispheres. We applied the Fisher method (Fisher, 1992, see Methods for details) to combine all subjects' p-values in every conditions. Overall, we found a significant effect of all combined p-values (p<0.0001), except in the lateralization condition (contra- minus ipsilateral hemisphere), similar to our previous analysis. Additionally, we tested for a consistent positive or negative distribution of the correlation coefficients. As shown in figure 3C, the results support a significant correlation between backward waves and alpha- power in the hemisphere contralateral to the attended location (BF10=10.7 and BF10=7.4 for occipital and frontal regions, respectively; all other BF10 were between 1 and 2, providing inconclusive evidence). Interestingly, this analysis also revealed a small but consistent effect in the correlation between lateralization effects, as we reported a consistently positive correlation in the contra- minus ipsilateral difference between forward waves and alpha power (BF10~5 for both frontal and occipital electrodes). However, it’s important to notice that the combined p-values obtained using the Fisher method did not reach the significance threshold in the lateralization condition, reducing the relevance of this specific result.“

      From the Methods section:

      “Additionally, we computed trial-by-trial correlations between waves and alpha power for all participants. First, we tested the correlation coefficient against zero in all conditions. Then, we obtained a combined p-value per condition using the log/lin regress Fisher method (Fisher, 1992), as shown in (Zoefel et al., 2019). Specifically, we computed the T value of a chi- square distribution with 2*N degrees of freedom from the pi values of the N participants as:

      It needs to be appreciated that the authors analyze two datasets in the present study. However, the question remains whether the absence of the forward waves effect in paradigms without visual stimulation is a general one and would replicate in other datasets. Moreover, the manuscript would benefit from a discussion of the potential implications of traveling waves for functional connectivity between posterior and anterior regions.

      We have now included a third dataset in the paper. In this dataset, from (Feldmann-Wüstefeld & Vogel, 2019), participants performed a visual working memory task by attending either the left or the right side of the screen where a stimulus was displayed. We analyzed the amount of waves during stimulus presentation, and we found the same results as in our own dataset: very strong evidence in favor of an interaction between LATERALITY (contra- and ipsilateral) and DIRECTION (FW and BW). We now included the results in figure 2 (see point above) and in the results section of the manuscript. Unfortunately, we couldn't find any other publicly available EEG dataset in which participants attend to either side of the screen without ongoing visual stimulation.

      In addition, we re-analyzed our main findings (i.e. the interaction between LATERALITY and DIRECTION) in all three datasets using a classic ANOVA to report the effect size as 𝜂2 (see point above). Unlike the Bayesian ANOVA (which -in JASP- is based on linear mixed models), the classic one does not model the slope of the random effects. Yet, we observed that the LATERALITY x DIRECTION interaction in the Foster dataset proved very significant, with a large effect size (F(1,16)=9.81, p=0.003, 𝜂2=0.13). Supposedly, modeling the slope of the random effects in the Bayesian ANOVA lowered its statistical sensitivity. For the sake of completeness, we reported both results in the manuscript.

      Concerning the potential implications of traveling waves on functional connectivity, we consider the interpretation based on the Predictive Coding scheme in the one before the last paragraph of the discussion (reported below for the reviewer’s convenience). In this framework, top-down connections have inhibitory functions, suppressing the predicted activity in lower regions. These interpretations align with our findings, relating the inhibitory role of backward travelling waves to visual attention. Similarly, in the same paragraph, we refer to the work of Spratling, which extensively investigates the relationship between selective attention and Predictive Coding.

      From the Results section:

      "To confirm our previous results, we replicated the same traveling waves analysis on two publicly available EEG datasets in which participants performed similar attentional tasks (experiment 1 of Foster et al., 2017 and experiment 1 of Feldmann-Wüstefeld and Vogel, 2019). In the first experiment from the Feldmann-Wüstefeld and Vogel dataset, participants were instructed to perform a visual working memory task in which, while keeping a central fixation, they had to memorize a set of items while ignoring a group of distracting stimuli. We focused our analysis on those trials in which the visual items to remember were placed either to the right or the left side of the screen, while the distractors were either in the upper or lower part of the screen (we pulled together the trials with either 2 or 4 distractors, as this factor was irrelevant for the purposes of our analysis). The stimuli were shown for 200ms, and we computed the amount of forward and backward waves in the 500ms following stimulus onset. As shown in figure 2 (central column), the analysis confirmed our previous results, demonstrating a strong interaction between the factors DIRECTION and LATERALITY (BF10=667, error~2%; independently, the factors DIRECTION and LATERALITY had BF10=0.2 and BF10=0.4, respectively). These results confirmed that, in the presence of visual stimulation, spatial attention modulates both forward and backward waves. Next, we analyzed another publicly available dataset from Foster et al., 2017. [...]"

      "Remarkably, as shown in figure 2 (right panel), our analysis demonstrated an effect of the lateralization (LATERALITY: BF10=3.571, error~1%), revealing more waves contralateral to the attended location, but inconclusive results regarding the interaction between DIRECTION and LATERALITY (BF10=2.056, error~1%). However, using a classical ANOVA (i.e., without modeling the slope of the random terms), the interaction between DIRECTION and LATERALITY proved significant (F(1,16)=9.81, p=0.003, 𝜂2=0.13)."

      From the Methods section:

      "We included two additional datasets in this study. In both studies, participants performed a visual attention task while keeping their fixation in the center of the screen. Regarding the Feldmann-Wüstefeld and Vogel, 2019 study, participants were asked to memorize the colors of two stimuli while ignoring a set of distractors stimuli. We analyzed uniquely those trials in which the visual stimuli were presented to the left or right side of the screen, while the distractors were placed above or below the fixation cross. After 500ms of the fixation cross, two colored 'target' stimuli were presented for 200ms. Participants were asked to memorize these stimuli, and a new 'probe’ stimulus was shown after an additional second. Participants reported whether the probe matched the target stimuli or not. We analyzed the traveling waves in the 500ms following the target stimulus onset. Participants performed a spatial attention task in the second dataset from Foster et al. 2017. First, the fixation cross cued participants to covertly attend one of eight possible spatial positions uniformly distributed around the center of the screen. After one second, a digit was displayed either in the cued location or in any other one. The remaining locations were filled with letters. Participants were instructed to report the only displayed digit. We analyzed the waves the second before the stimuli onset when participants attended to the locations cued to the left or right side of the screen (we discarded trials in which participants attended locations above or below the fixation cross). For additional details about both experimental procedures, we refer the reader to Foster et al., 2017 and Feldmann-Wüstefeld and Vogel, 2019.”

      From the discussion:

      "Our previous work proposed an alternative cause for the generation of cortical waves (Alamia and VanRullen, 2019). We demonstrated that a simple multi-level hierarchical model based on Predictive Coding (PC) principles and implementing biologically plausible constraints (temporal delays between brain areas and neural time constants) gives rise to oscillatory traveling waves propagating both forward and backward. This model is also consistent with the 2-dipoles hypothesis (Zhigalov and Jensen, 2022), considering the interaction between the parietal and occipital areas (i.e., a model of 2 hierarchical levels). However, dipoles in parietal regions are unlikely to explain the observed pattern of top-down waves, suggesting that more frontal areas may be involved in generating the feedback. This hypothesis is in line with the PC framework, in which top-down connections have an inhibitory function, suppressing the activity predicted by higher-level regions (Huang and Rao, 2011). Interestingly, Spratling proposed a simple reformulation of the terms in the PC equations that could describe it as a model of biased competition in visual attention, thus corroborating the interpretation of our finding within the PC framework (Spratling, 2008, 2012)."

    1. Author Response

      Reviewer #1 (Public Review):

      The authors developed a new concept: Skeletal age, which is chronological age + years lost due to suffering a low-energy fracture. There seem to be conceptual problems with this concept: It is not known if the years lost are lost due to the fracture or co-morbidities.

      The Reviewer raises an important point, and we are happy to discuss it as follows. While it is not possible to show the causal relationship between a fragility fracture and excess mortality, it has been shown repeatedly that a fracture is associated with an increased risk of pre-mature mortality after accounting for comorbidities and frailty. Indeed, we and others have found that comorbidities contribute little to the increased risk10,11. Moreover, in a previous study using the ‘relative survival analysis’ technique12, we have shown that hip and proximal fractures were associated with reduced life expectancy after accounting for time-related changes in background mortality in the population, suggesting that hip and proximal fractures are an independent clinical risk factor for mortality.

      In this study, we used a multivariable Cox’s proportional hazards model to adjust for confounding effects of age and severity of comorbidities, and our result clearly indicated that a fracture is associated with years of life lost. Moreover, comorbidities were considered a factor in an individual's risk profile for estimating skeletal age. As a result, skeletal age reflects the common real-world scenario that the combination of comorbidities and proximal or lower leg fractures compounded post-fracture excess mortality, much greater than each alone13.

      Technically, there are two steps to individualise skeletal age for each individual with a specific risk profile. First, we used the statistical approach recommended for the individualisation of survival time prediction using statistical models14 to individualise specific mortality risk for each participant with a specific risk profile. Specifically, we calculated the prognostic risk index as a single-number summary of the combined effects of his/her specific risk profile of a specific fracture site and the severity of comorbidity. His/her individualised fracture-mortality association was then computed as the difference between his/her prognostic index and the mean prognostic index of “typical” people in the general population. In the second step, we used the Gompertz law of mortality and the Danish national lifetable data to transform the individualised association into life expectancy loss as a result of a fracture15.

      We have modified part of the description of the methodology as follows:

      “For the second aim, we determined skeletal age for individual based on the individual’s specific risk profile. First, we calculated the prognostic risk index as a single-number summary of the combined effects of his/her specific fracture site and the severity of comorbidity51. The prognostic index is a linear combination of the risk factors with weights derived from the regression coefficients. The individualised fracture-mortality association for an individual with a specific risk profile is then the difference between the individual's prognostic index and the mean prognostic index of 'typical' people in the general population51. In the second step, we used the Gompertz law of mortality and the Danish national lifetable data to transform the excess mortality into life expectancy loss as a result of a fracture49.”.

      In addition, with the possible exception of zoledronate after hip fracture, we have no evidence that this increased risk of mortality can be changed with interventions.

      We agree that there is a lack of strong evidence from randomised controlled trials supporting the benefit of anti-resorptive therapy on post-fracture survival. As mentioned above, the mention of zoledronic acid was simply for illustrating the use of skeletal age to convey a treatment benefit. We have decided to remove the section related to the benefit of pharmacological treatment on post-fracture mortality.

      Furthermore, it is not clear why the authors think that patients and doctors will better understand the implications of older "skeletal age", on future fracture risk and the need for prevention, for example, the 10-year risk of MOF? Knowing that my bones are older than me, could make a patient feel even more fragile and afraid of being physically active. The treatment will reduce the risk of future fractures, but this study provides no information about the effect on mortality of preventing the subsequent fracture or the risk of mortality associated with recurrent fractures.

      The risk of fracture is typically conveyed to patients and the public in terms of absolute risk metric (e.g., probability) or relative risk metrics (e.g., risk ratio). However, patients and doctors often struggle to comprehend probabilistic statements such as 'Your risk of death over the next 10 years is 5% if you have suffered from a bone fracture'. The underappreciation of post-fracture mortality's gravity has caused patients to be hesitant towards treatment and prevention, contributing to the current crisis of osteoporosis treatment.

      We consider that skeletal age will make doctor-patient risk communication more intuitive and probably more effective. For example, for the same 2-fold increased mortality risk of hip fracture, telling a 60-year man with a hip fracture that his skeletal age would be 66 years old, equivalent to a 6-year loss of life is much more intuitive. The patient might be thus more likely to accept the recommended pharmacological treatment, ultimately improving health benefits. However, we have not had RCT evidence for the effectiveness of skeletal age, and this will be one of our future research focus. We would like to point out that there is RCT evidence that effective age (such as 'Heart Age', 'Lung Age') could improve the uptake of preventive actions. For example, informing patients about their heart age, as shown by Lopez-Gonzalez et al16 was found to better improve their cardiovascular risk compared to informing the Framingham probabilistic risk score.

      Introduction:

      The statement that treatment reduces the risk of dying, needs modification as the majority of clinical trials have not demonstrated reduced mortality with treatment.

      We have modified the statement as follows: “In randomised controlled trials, treating high-risk individuals with bisphosphonates or denosumab reduces the risk of fracture4, though whether the reduction translates into reduced mortality risk remains contentious5, 6.”

      It is not clear how the skeletal age captures the risk of a future fracture. The other difference between the idea of "skeletal age" and for example "heart age" is that there are treatments available for heart disease that reduce the risk of mortality, as mentioned above this has not been shown consistently in clinical trials in osteoporosis.

      We take the Reviewer's point, but we would like to point out that there are at least two RCTs on zoledronic acid showing that treating patients with a fragility fracture reduces their risk of mortality17,18.

      Because the risk profile that is associated with a post-fracture mortality is also associated with the risk of fracture, skeletal age can be seen as a measure of the decline of the skeleton due to a fracture or exposure to risk factors that raise the risk of fracture. Thus, a 60-year-old with a skeletal age of 66 is in the same risk category as a 66-year-old with 'favourable risk factors' or at least the ones that are potentially modifiable. Hence, an older skeletal age means a greater risk of fracture.

      Neither the “Skeletal Age” nor the “Heart Age”16,19,20 has the treatment intervention incorporated into its calculator. We have added details to explain how the assessment of skeletal age would provide the conceptual risk of both fracture and post-fracture mortality as follows:

      “Unlike the current fracture risk assessment tools17 which estimate the probability of fracture over a period of time using probability-based metrics, such as relative risk and absolute risk, skeletal age quantifies the consequence of a fracture using a natural frequency metric. A natural frequency metric has been consistently shown to be easier and more friendly to doctors and patients than the probability-based metrics9 11 30. It is not straightforward to appreciate the importance of the two-fold increased risk of death (i.e., relative risk = 2.0) without knowing the background risk (i.e., 2 folds of 1% would remarkably differ from 2 folds of 10%). By contrast, for the same 2-fold mortality risk of hip fracture, telling a 60-year man with a hip fracture that his skeletal age would be 66 years old, equivalent to a 6-year loss of life, is more intuitive. The skeletal age can also be interpreted as the individual being in the same risk category as a 66-year-old with 'favorable risk factors' or at least the ones that are potentially modifiable. Hence, an older skeletal age means a greater risk of fracture.”.

      Discussion:

      The prevalent comorbidities; cardiovascular diseases, cancer, and diabetes, suggest that fracture patients die from their comorbidities and not their fractures.

      Please refer to the above response for more detail. Briefly, the multivariable Cox’s proportional hazards regression adjusted for the confounding effect of age and the severity of comorbidities, indicating the association between fracture and mortality was independent of aging and comorbidity severity. On the other hand, skeletal age is a measure of excess mortality related to either fracture or co-morbidities or both.

      The discussion should be more balanced as there is a number of clinical trials demonstrating reductions in vertebral and non-vertebral fractures without effect on mortality. There may be specific effects of zoledronate on mortality, but that has not been shown for the vast majority of treatments.

      Please refer to the above response for more detail. Specifically, as the study primarily aimed at introducing skeletal age as a new metric for risk communication, we have decided to omit the paragraph discussing the potential benefit of zoledronic acid on post-fracture mortality risk in order to maintain the clarity and focus of the study.

      It is not correct that FRAX does not take mortality into account? It does not tell you specifically how high the risk of dying and how high the risk of a fracture is but integrates the two. "Skeletal age" does not provide either information, it just tells you that your skeleton is older than your chronological age - most patients and doctors will not associate that with an increased risk of dying - only of frailty.

      Although it is commonly believed that FRAX accounts for competing risk of death, it does not provide the risk of post-fracture mortality. Indeed, none of the current fracture risk assessment tools was designed to provide post-fracture mortality risk5. Skeletal age fills the gap by providing the excess mortality following a fracture for an individual with specific risk profile.

      The statement that zoledronate reduces the "skeletal age" by 3 years, has not been demonstrated and it is not clear how this can be demonstrated by the analysis reported here. As the reduced mortality has only been shown for the Horizon RFT, this cannot be inferred for other treatments and other fracture types. The information provided by the "skeletal age" is only that the fracture you already had took x years of your remaining lifetime. With the exception of perhaps zoledronate after hip fracture, we have no indication from clinical trials that the treatment of osteoporosis will change this.

      The current study was not designed to examine the effectiveness of an intervention. The statement related to the survival benefit of zoledronate is used to illustrate how skeletal age is used to convey the treatment benefit in real-world doctor-patient risk communication. Given the hazard ratio of 0.72 for zoledronate-mortality association17, a patient might find the statement “Zoledronic acid treatment helps a patient with a hip fracture gain (back) 3 years of life” much easier to understand and probably more persuasive than the traditional statement of “Zoledronic acid treatment reduced the risk of death by 28%”.

      Reviewer #2 (Public Review):

      The paper of Tran et al. introduces the concept of 'skeletal age' as a means of conveying the combined risk of fracture and fracture-associated mortality for an individual. Skeletal age is defined as the sum of chronological age and the number of years of life lost associated with a fracture. Using the very comprehensive Danish national registry and employing Cox's proportional hazards model they estimated the hazard of mortality associated with a fracture. Skeletal age was estimated for each age and fracture site stratified by gender. The authors propose to replace the fracture probability with skeletal age for individualized fracture risk assessment.

      Strengths of the study lie in the novelty of the concept of 'skeletal age' as an informative metric to internalize the combined risks of fracture and mortality, the very large and well-described Danish National Hospital Discharge Registry, the sophisticated statistical analysis and the clear messages presented in the manuscript. The limitations of the study are acknowledged by the authors.

      We appreciate your positive remark that captures the essence of our work.

      References:

      1. Lujic S, Simpson JM, Zwar N, Hosseinzadeh H, Jorm L. Multimorbidity in Australia: Comparing estimates derived using administrative data sources and survey data. PloS one 2017; 12(8): e0183817.
      2. Andersen TF, Madsen M, Jorgensen J, Mellemkjoer L, Olsen JH. The Danish National Hospital Register. A valuable source of data for modern health sciences. Dan Med Bull 1999; 46(3): 263-8.
      3. Vestergaard P, Mosekilde L. Fracture risk in patients with celiac Disease, Crohn's disease, and ulcerative colitis: a nationwide follow-up study of 16,416 patients in Denmark. Am J Epidemiol 2002; 156(1): 1-10.
      4. Hundrup YA, Hoidrup S, Obel EB, Rasmussen NK. The validity of self-reported fractures among Danish female nurses: comparison with fractures registered in the Danish National Hospital Register. Scand J Public Health 2004; 32(2): 136-43.
      5. Beaudoin C, Moore L, Gagne M, et al. Performance of predictive tools to identify individuals at risk of non-traumatic fracture: a systematic review, meta-analysis, and meta-regression. Osteoporos Int 2019; 30(4): 721-40.
      6. Spiegelhalter D. How old are you, really? Communicating chronic risk through 'effective age' of your body and organs. BMC Med Inform Decis Mak 2016; 16: 104.
      7. Vestergaard P, Rejnmark L, Mosekilde L. Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark. Osteoporos Int 2005; 16(2): 134-41.
      8. Roerholt C, Eiken P, Abrahamsen B. Initiation of anti-osteoporotic therapy in patients with recent fractures: a nationwide analysis of prescription rates and persistence. Osteoporos Int 2009; 20(2): 299-307.
      9. Cummings SR, Lui LY, Eastell R, Allen IE. Association Between Drug Treatments for Patients With Osteoporosis and Overall Mortality Rates: A Meta-analysis. JAMA Int Med 2019; 179(11): 1491-500.
      10. Chen W, Simpson JM, March LM, et al. Comorbidities Only Account for a Small Proportion of Excess Mortality After Fracture: A Record Linkage Study of Individual Fracture Types. J Bone Miner Res 2018; 33(5):795-802
      11. Vestergaard P, Rejnmark L, Mosekilde L. Increased mortality in patients with a hip fracture-effect of pre-morbid conditions and post-fracture complications. Osteoporos Int 2007; 18(12): 1583-93.
      12. Tran T, Bliuc D, Hansen L, et al. Persistence of Excess Mortality Following Individual Nonhip Fractures: A Relative Survival Analysis. J Clin Endocrinol Metab 2018; 103(9): 3205-14.
      13. Tran T, Bliuc D, Ho-Le T, et al. Association of Multimorbidity and Excess Mortality After Fractures Among Danish Adults. JAMA Netw Open 2022; 5(10): e2235856.
      14. Henderson R, Keiding N. Individual survival time prediction using statistical models. J Med Ethics 2005; 31(12): 703-6.
      15. Kulinskaya E, Gitsels LA, Bakbergenuly I, Wright N. Calculation of changes in life expectancy based on proportional hazards model of an intervention. Insur Math Econ 2020; 93: 27-35. 16 Lopez-Gonzalez AA, Aguilo A, Frontera M, et al. Effectiveness of the Heart Age tool for improving modifiable cardiovascular risk factors in a Southern European population: a randomized trial. Eur J Prev Cardiol 2015; 22(3): 389-96.
      16. Lyles KW, Colon-Emeric CS, Magaziner JS, et al. Zoledronic acid and clinical fractures and mortality after hip fracture. N Engl J Med 2007; 357(18): 1799-809.
      17. Reid IR, Horne AM, Mihov B, et al. Fracture Prevention with Zoledronate in Older Women with Osteopenia. N Engl J Med 2018; 379(25): 2407-16.
      18. Bonner C, Batcup C, Cornell S, et al. Interventions Using Heart Age for Cardiovascular Disease Risk Communication: Systematic Review of Psychological, Behavioral, and Clinical Effects. JMIR Cardio 2021; 5(2): e31056.
      19. Svendsen K, Jacobs DR, Morch-Reiersen LT, et al. Evaluating the use of the heart age tool in community pharmacies: a 4-week cluster-randomized controlled trial. Eur J Public Health 2020; 30(6): 1139-45.
      20. Suissa S. Immortal time bias in pharmaco-epidemiology. Am J Epidemiol 2008; 167(4): 492-9.
    1. Author Response

      Reviewer #1 (Public Review):

      The authors use a newly developed object-space memory task comprising of a "Stable" version and "Overlapping" version where two objects are presented in two locations per trial in a square open field. Each version consists of 5 training trials of 5-min presentations of an object-space configuration, with both object locations staying constant across training trials in the Stable condition, and only one object location staying fixed in the Overlapping condition. Memory is tested in a test trial 24 hours later where the opposite configuration is presented - overlapping configuration presented for the Stable condition and stable configuration presented for the Overlapping condition - with the thesis that memory in this test trial for the Overlapping condition will depend on the accumulated memory of spatial patterns over the training trials, whereas memory for the test trial in the Stable condition can be due to episodic memory of last trial or accumulated memory. Memory is quantified using a Discrimination Index (DI), comparing the amount of time animals spend exploring the two object locations.

      Here, animals in other groups are also presented with an interference trial equivalent to the test trial, to test if the memory of the Overlapping condition can be disrupted. The behavioral data show that for RGS14 over-expressing animals, memory in the Overlapping condition is diminished compared to controls with no interference or controls where over-expression is inhibited, whereas memory in the Stable condition is enhanced. This is interpreted as interference in semantic-like memory formation, whereas one-shot episodic memory is improved. The authors speculate that increased cortical plasticity should lead to increased and larger delta waves according to the sleep homeostasis hypothesis, and observe that instead increased cortical plasticity leads to less non-REM sleep and smaller delta waves, with more prefrontal neurons with slower firing rates (presumably more plastic neurons). They further report increased hippocampal-cortical theta coherence during task and REM sleep, increased NonREM oscillatory coupling, and changes in hippocampal ripples in RGS14 over-expressing animals.

      While these results are interesting, there are several issues that need to be addressed, and the link between physiology and behavioral results is unclear.

      1) The behavioral results rely on the interpretation that the Overlapping condition corresponds to semantic-like memory and the Stable condition corresponds to episodic-like memory. While the dissociation in memory performance due to interference seen in these two conditions is intriguing, the Stable condition can correspond not just to the memory of the previous trial but also accumulated memory of a stable spatial pattern over the 5 testing trials, similar to accumulated memory of a changing spatial pattern in the Overlapping pattern.

      Yes! We completely agree on this. We do not claim the stable condition corresponds to episodic-like memory, instead we refer to it as simple memory, since it can be solved either way (one trial memory or cumulative memory). We now expanded this in the discussion to make it clearer.

      Here, it is puzzling that in the behavioral control with no interference (Figure 1D), memory in the Stable and Overlapping condition is unchanged in the test trial, with the DI statistically at 0 in the test trial. In the original description of the Object Space task by the authors in the referenced paper, the measure of memory was a Discrimination Index significantly higher than 0 in both the Stable and Overlapping conditions. This discrepancy needs to be reconciled. Is the DI for the interference trial shown in Fig. S1 significantly different than 0? No statistics or description is provided in the figure legend here.

      As mentioned above, we apologize that we oversimplified the description. The 24h interference trial would be what corresponds to the original test trial. We added a clarifying figure for comparison in S1 (bar graph in addition to the violin plot) and stats. Performance was for all groups and conditions above chance, replicating our previous results.

      2) The physiology experiments compare Home cage (HC) conditions to the Object Space task (OS) throughout the manuscript. While some differences are seen in the control and RGS14 over-expressing animals, there is no comparison of the Stable vs. Overlapping condition in the physiology experiments. This precludes making explicit links between physiological observations and behavioral effects.

      As also mentioned above, we have now added analysis exploring the detailed OS conditions. We would like to thank the reviewers for giving us the opportunity of doing so.

      3) The authors speculate that learning will result in larger and more delta waves as per the synaptic homeostasis hypothesis. It should be noted here that an alternative hypothesis is that there should also be a selective increase in synaptic plasticity for learning and consolidation. The authors do observe that control animals show more frequent and higher-amplitude delta waves, but rather than enhancing this process, RGS14 animals with increased plasticity show the opposite effect. How can this be reconciled and linked with the behavioral data in the Stable and Overlapping condition?

      In the context of the Object Space Task, we would expect all behavioural conditions (Stable and Overlapping) to induce synaptic changes since learning does occur also in the Stable condition (see also performance on 24h trial). Thus, especially homeostatic responses such as increase in delta amplitude, we would expect for all experiences independent if subtle statistical rules are presented or not. In contrast, detailed processing, extracting underlying regularities is rather proposed by the Sleep for Active Systems Consolidation Hypothesis to occur during hippocampal-cortical interactions in form of delta/ripple/spindle interactions (with different theories emphasising different types of interactions). As mentioned above, we now add a more specific analysis in this regards, where we can show that the two OS conditions that involve moving objects (where thus potentially statistical regularities can be extracted) show a higher percentage of ripples occurring after large slow oscillations in comparison to home cage or the simple learning condition Stable. In contrast, RGS14 already has higher participation in both control conditions, emphasising that in these animals all experiences are treated by the brain as significant learning condition, explaining the behavioural effect (increased interference due to better memory for the interference). Further, we expanded in the discussion how in RGS we sometimes see an enhancement of learning effects but sometimes see a more complex interaction of what we would expect from physiological learning.

      Similarly, there is an increase in slower-firing neurons in RGS14 over-expressing animals. Slower-firing neurons have been proposed to be more plastic in the hippocampus based on their participation in learned hippocampal sequences, but appropriate references or data are needed to support the assertion that slower-firing neurons in the prefrontal cortex are more plastic.

      As described above, we have expanded the discussion including other citations that also consider the cortex. We can show that our changes would be expected if one turns the cortex as plastic as the hippocampus.

      4) It is noted that changing cortical plasticity influences hippocampal-cortical coupling and hippocampal ripples, suggesting a cortical influence on hippocampal physiological patterns. It has been previously shown that disrupting prefrontal cortical activity does alter hippocampal ripples and hippocampal theta sequences (Schmidt et al., 2019; Schmidt and Redish, 2021). The current results should be discussed in this context.

      We would like to thank the reviewer for these suggestions, they are now incorporated in the manuscript.

      Reviewer #2 (Public Review):

      In this paper, the authors provide evidence to support the longstanding proposition that a dual-learning system/systems-level consolidation (hippocampus attains memories at a fast pace which are eventually transmitted to the slow-learning neocortex) allows rapid acquisition of new memories while protecting pre-existing memories. The authors leverage many techniques (behavior, pharmacology, electrophysiology, modelling) and report a host of behavioral and electrophysiological changes on induction of increased medial prefrontal cortex (mPFC) plasticity which are interesting and will be of significant interest to the broad readership.

      The experimental design and analyses are convincing (barring some instances which are discussed below). The following recommendations will bolster the strength/quality of the manuscript:

      1) Certain concerns regarding the interpretation and analysis of the behavioral data remain. The authors need to clarify if increased mPFC plasticity leads to only an increase in one-shot memory or 'also' interference of previous information. It seems that the behavioral results could also be explained by the more parsimonious explanation that one-shot memory is improved. Do the current controls tease apart these two scenarios?

      We agree we cannot disentangle if one memory is just stronger than the other or if its an overwriting effect. We added this now to the discussion. Of note, we do not think it actually would be possible to distinguish these two effects behaviourally in rodents, or at least we cannot think of a fitting study design that would enable the contrast.

      Additionally, the authors need to clarify why the 'no trial' and 'anisomycin' controls for the stable task perform at chance levels on exposure to a new object-place association on test day (Fig 1D).

      Violin plots are sometimes hard to see. Here simple bar plots where you can see that the animals are not at chance at the 72h test in the control conditions.

      Finally, further description of how the discrimination index (exploration time of novel-exploration time of familiar/sum of both) is recommended i.e., in the stable condition, which 'object' is chosen as 'novel' (as both are in the same locations) for computing the index (Fig 1). Do negative DI values imply a neophobia to novel objects (and thus are a form of memory; this is also crucial because the modelling results (Fig 1E) use both neophilia and neophobia while negative discrimination indexes are considered similar to 0 for interpreting the behavioral results, as stated on page 3, lines 84-86?

      We added this now to the methods (For Overlapping it is moved location – stable location, for Stable it is location-to-be-moved-at-test – stable location and for random which is assigned as moved and stable is random, and then for each divided by total time). We agree that neophilia/neophobia (especially changes in the distribution) can be an issue and have discussed it in detail in Schut et al NLM 2020 where we see difference in absolute beta values (thus controlling for philia/phobia differences). We also discuss there why it is difficult to control for this in the DI in more detail. In short, one could use absolute values but then it is difficult to determine what a group chance-level would look like. However, luckily here there is not issue since we did not observe difference in neophilic or phobic tendencies while running the experiments. Critically the interference trial (that can also function as simple test trial) confirms that as a group animals show positive DI and neophilia.

      2) The authors report lower firing rates in RGS14414 animals during the task in Fig 2F. It is indeed remarkable how large the reported differences are. The authors need to rule out any differences in the behavioral state of the animals in the two groups during the task, i.e., rest vs. active exploration/movement dynamics. Are only epochs during the task while the animals interact with the objects used for computing the firing rates (same epochs as Fig 1)? If not, doing so will provide a useful comparison with Fig 1. Additionally, although the authors make the case for slow firing rate neurons being important for plasticity (based on Grosmark and Buzsaki, 2016), it is crucial to note that the firing rate dynamic (slow vs. fast) in that study for the hippocampus is defined based on the whole recorded session (predominated by sleep), indeed the firing rates of the two groups (slow vs. fast/plastic vs. rigid) during the task/maze-running do not differ in that study. Therefore, the results here seem incongruent with the Grosmark and Buzsaki paper. Since this finding is central to the main claim of the authors, it either warrants further investigation or a re-interpretation of their results.

      As mentioned in the main points, we now added the firing rate analysis (including new groups splits) for wake in the sleep box, NREM and REM separately. Each time the same results are obtained. Currently, we do not yet have the tracking and video synchronization set-up, therefore we cannot split the task for specific behaviours.

      However, we now also cite Buzsaki’s original log-normal brain review, where he first proposed the idea. There he also shows same effects as we do, in that the general firing rate distribution is the same for task and different sleep stages, just overall shifted. The analysis from Grosmark included more strigent subselection of neurons to be able to also argue that incorporation into run/replay-sequences could not have been biased by firing rate per se (instead of plasticity). However, the original proposition from Buzsaki does fit to our results. He further presents hippocampus vs cortex firing rates, which also confirm the idea (hippocampus more plastic and has slower firing rates). We included this figure above in the general comments. Further, we now expanded the discussion in this point.

      3) A concern remains as to how many of the electrophysiological changes they observe (firing rate differences, LFP differences including coupling, sleep state differences, Figs. 2-4) support their main hypothesis or are a by-product of injection of RGS14414 (for instance, one might argue that an increased 'capability' to learn new information/more plasticity might lead to more NREM sleep for consolidation, etc.). The authors need to carefully interpret all their data in light of their main hypothesis, which will substantially improve the quality/strength of the manuscript.

      We now expanded the discussion, included more structure and also include that we cannot disentangle if the cellular changes or sleep oscillation changes or an interaction of both is the cause of the result. Furthermore, we added that we cannot distinguish if the interference memory is stronger or actually overwrites the original training memory.

      Reviewer #3 (Public Review):

      The authors set out to test the idea that memories involve a fast process (for the acquisition of new information) and a slow process (where these memories are progressively transferred/integrated into more-long term storage). The former process involves the hippocampus and the latter the cerebral cortex. This 'dual-learning' system theoretically allows for new learning without causing interference in the consolidation of older memories. They test this idea by artificially increasing plasticity in the pre-limbic cortex and measuring changes in different learning/memory tasks. They also examined electrophysiological changes in sleep, as sleep is linked to memory formation and synaptic plasticity.

      The strengths of the study include a) meticulous analyses of a variety of electrophysiological measurements b) a combination of neurobiological and computational tools c) a largely comprehensive analysis of sleep-based changes. Some weaknesses include questions about the technique for increasing cortical plasticity (is this physiological?) and the absence of some additional experiments that would strengthen the conclusions. However, overall, the findings appear to support the general idea under examination.

      This study is likely to be very impactful as it provides some really new information about these important neural processes, as well as data that challenges popular ideas about sleep and synaptic plasticity.

      We would like to thank the reviewer for these positive comments. Answers to the weaknesses are presented below in the recommendations for the authors.

    1. Author Response

      Reviewer #1 (Public Review):

      I noticed 2 weaknesses, the first related to the killing assays: considering that WT IgG less efficiently promotes complement-mediated phagocytosis of bacteria, one would assume that the ingested bacteria (to be killed) would be lower in neutrophils exposed to this IgG, to begin with - which is not accounted for in the analyses shown.

      We now included a better explanation of our opsonophagocytic killing assay.

      A second weakness in my mind pertains to the in vivo experiment: the model used obviously requires a very high number of bacteria (the inoculum), somehow indicating that this specific bacterial strain does not lead to progressive infection (i.e. with replicating bacteria) but mice experience a severe acute inflammatory response followed by the rapid elimination of bacteria. This explains the high mortality - and indicates that mice succumb to acute inflammation, rather than the progressive replication of bacteria. To conclusively prove the therapeutic value of those modified antibodies, a clinically more relevant S. pneumoniae model would be helpful.

      The inoculum used in our mouse model was based on a dose finding study. Although the initial starting dose was 5x106 bacteria (based on previously published mouse infection models with S. pneumoniae serotype 6A), we needed a higher dose (1x108 bacteria) to reach 80-100% mortality. While we agree that the final dose was relatively high, this does not mean that capsule type 6 is not a clinically relevant strain. It is well known that clinically relevant serotypes in humans are not always invasive in mice (doi: 10.1128/iai.60.1.111-116.1992). This is the exact reason why we chose to perform in vivo experiments with serotype 6A, which is known to be more invasive in mice (while serotype 6B is more virulent in humans). Of course, while our in vivo data provide an important proof-of-concept for the capacity of hexamer-enhancing mutations to improve protection by anti-capsular antibodies, future studies are needed to verify the potential use of mAbs against other serotypes.

      A third aspect, which should be addressed in the discussion, unless tested and not shown, is how anti-pneumococcal IgM antibodies compare to hexamerized IgGs. Is there any advantage, or do they perform similarly with regards to complement activation?

      We have now generated and tested IgM against CPS6 (Figure 2g). Although anti-CPS6 IgM can induce complement-dependent phagocytosis to some extent, but IgM was less potent than IgG variants with hexamer-enhancing mutations. This suggests that complement activation via pre-assembled IgM oligomers was less effective than via IgG hexamers that are formed after target binding.

      These new data are now included in the revised manuscript as figure 2g, supplemental figure 9 and commented in results section lines 172-179.

      Reviewer #2 (Public Review):

      The results are intriguing, and one consideration is whether enhancing complement activation is beneficial or harmful for a therapeutic antibody. Based on these results is there the possibility of a natural selection against strong levels of complement activation?

      We appreciate the positive feedback to our presented work. Indeed, it is believed there is a natural selection against these mutations to avoid uncontrolled complement activation by naturally occurring IgGs in solution. It is important to realize that formation of IgG hexamers is a surface-dependent process. When IgG molecules bind to surface-bound antigens (via Fab), they can organize into higher-ordered hexamers via Fc-Fc interactions. The specific point mutations used in this paper increase hexamer formation after antigen binding on the cell surface. However, at high concentrations of IgG (as those occurring in our blood (>10 mg/ml), IgG hexamers might be formed independent of target binding (van Kampen et al Journal of Pharmaceutical Sciences Volume 111, Issue 6, June 2022, Pages 1587-1598). If naturally occurring IgGs would have hexamer-enhancing mutations, IgG hexamers could be formed in solution resulting in massive complement activation and depletion of the complement system.

      The study clearly shows that the introduction of the hexamerisation mutations affects the ability of the antibodies to bind and activate complement. The studies in Fig 2 examining the role of Fc are particularly elegant. One issue is that it is surprising that the WT IgG1 and IgG3 monoclonals have a minimal capacity to fix and activate complement, despite IgG1/3 to other antigens being efficient isotypes at fixing complement. In the absence of data showing whether IgG1/3 from normal human sera against capsule fixes complement then it is difficult to contextualise these results or to assess if other changes, such as in glycosylation, contribute to the results presented. Related to this, there is reasonable evidence that antibodies induced to capsules can be protective yet the data in Fig 5 suggests that without the mutations then the monoclonals are not effective at all for 6B and only effective at the highest concentration for 19A.

      As mentioned in Essential revision 3 our data with S. aureus antibodies demonstrate that this is not a consequence of how these mAbs are produced or differences in their Fc glycosylation profile. We agree with the fact that there are reasonable evidence that antibodies induced to capsules can be protective. However, not all vaccine serotypes are able to induce a strong immune protection. Serotype 6B, for instance, which is covered by current vaccines, is found to be poorly immunogenic (manuscript lines 101-103). For further studies, it would be really interesting to find out what makes this difference between mAbs and, specifically in our case between anti-CPS antibodies.

      The adoptive transfer experiments demonstrate that the antibodies can moderate bacteraemia. The mechanism of this is not explored and the importance of hexamerisation and complement activation not demonstrated, especially as it is not clear if human antibodies and mouse complement are a productive combination in this context.

      We have now included additional phagocytosis assays with mouse sera (supplemental figure 15) that demonstrate that human antibodies and mouse complement are a productive combination.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Silva et al. "Evaluation of the highly conserved S2 hairpin hinge as a pan-coronavirus target" seeks to evaluate a new epitope target on the S2 domain of SARS-CoV2 Spike protein and evaluate its potential as a pan-coronavirus target. This is an impressive combination of extensive structural, HDXMS-based dynamics and antibody engineering approaches. What is missing is a detailed correlation of HDXMS with Spike dynamics. The authors have not examined the allosteric effects of 3A3 binding to the Spike trimer, specifically cooperativity in antibody binding. Does binding of one Fab positively or negatively impact the subsequent binding of antibody? In this regard, readers would benefit from HDXMS spectral envelopes in figures, at least for the epitope locus peptides. Further, what is the effect of the intrinsic ensemble behavior of the Spike protein on 3A3 interactions? In a broader sense antibody binding is assisted by intrinsic trimer ensemble behavior, as observed by the lowered binding to the omicron variant- but are there induced binding effects? It would help to better integrate HDXMS with cryo-EM and antibody engineering. It is a novel, less explored epitope target on the S2 domain. Overall, a more definitive mechanistic conclusion for how targeting the S2 hinge can advance future pan-coronavirus strategies is missing.

      1) Given that the authors have demonstrated ensemble switching behavior from 4 ℃ to 37 ℃ (Costello et al. (2021)) why is this not factored in how the HDXMS is carried out? The samples were stored, frozen at -80 ℃, thawed, and equilibrated for 20 min at 20 ℃ with or without antibody present and analyzed by HDXMS. However, the reported t1/2 for trimer tightening at 37 ℃ is t1/2 = 2.5 h (Supplementary Fig. 7). The samples should ideally be analyzed under standardized conditions with the stable conformer. Sample heterogeneity from HDXMS is likely due to any of the following contributing factors:

      i) Intrinsic ensemble heterogeneity (Costello et al. (2021)), Kinetics of RBD- up and down conformational switching

      ii) Cooperativity of Fab binding.

      iii) Partial occupancy of trimer epitopes with bivalent IgG.

      iv) Combination of cooperativity effects and partial binding effects

      I would predict for any of the above reasons, it is intriguing why are there no bimodal kinetics of deuterium exchange reported. Partial occupancy should be evident from HDXMS paratope analysis.

      2) Pan-coronavirus neutralization potential is clearly evident. It is intriguing that the antibodies were isolated after immunization with an authentic MERS S2 domain but showed better selectivity to full-length 6P-engineered Spike. How is cooperativity built into antibody binding, given that the epitope site is occluded to various extents by the S1 domain and access is contingent upon RBD up-down kinetics?

      3) I am surprised that there is no allostery described for 3A3 (Supplementary figures 5, 6).

      The HDX-MS experiments presented in this work were carried out by the D’Arcy lab and published in a preprint on bioRxiv (originally posted on February 1, 2021) prior to publication of Costello et al. (first posted to bioRxiv July 11, 2021, epub March 2, 2022). Indeed, our bioRxiv posting inspired the Marqusee lab to request 3A3 for inclusion in their work focused on the conformational heterogeneity of the spike protein. Without prior knowledge of the conformational heterogeneity, we carried out these epitope mapping experiments at 25Ç, which allowed us to successfully mapped the epitope without determining which conformation the antibody prefers.

      The data presented in Costello et al. further confirms the location of 3A3’s epitope presented here and provides additional information about its preference for different conformational states within the spike protein. We have included an additional comment in the methods section (lines 660-661) stating, “The location of the 3A3 epitope was confirmed in a separate experiment carried out over the temperature range of 4 to 37 °C (Costello et al. 2022).”

      This is a clear example of the value of pre-prints to stimulate timely scientific collaboration. While Costello et al. used 3A3 as a tool to probe spike dynamics, here we highlight the original work that identified the epitope.

      Spectral envelopes have been provided (Supplementary Fig. 4b and Supplementary Table 3).

      The HDX-MS data provides limited insight into possible cooperative or allosteric binding of the 3A3 antibody because of other sources of heterogeneity such as spike dynamics and partial occupancy of the spike epitopes. However, no difference in occupancy was detected when HDX-MS with 3A3 Fab was compared to the same experiment with bivalent 3A3 IgG. It should be noted that in this HDX system, the antibody is not bound so tightly that the spectra are bimodal, showing the exchange of bound and unbound populations separately. Though HDX-MS experiments were performed in slight Fab or IgG excess of 1:1 Fab:spike monomer stoichiometry, the absolute stoichiometry in the context of the spike trimer is unclear.

      Reviewer #2 (Public Review):

      The authors report a conserved spike S2 hinge epitopes and two conformationally selective antibodies that help elucidate spike behavior. This work defines a third class of S2 antibody and provides insights into the potency and limitations of targeting this S2 epitope for future pan-coronavirus strategies.

      Thank you for your review of this manuscript.

      Reviewer #3 (Public Review):

      The study by Silva et al details the discovery and evaluation of a third class of broadly cross-reactive anti-Spike antibody that binds a conserved hinge region in the S2 domain. After immunizing mice with a stabilized S2 protein from MERS and generating scFv phage libraries, the authors were able to identify antibody 3A3, which showed broad cross-reactivity with SARS2 (including Omicron BA.1), SARS1, MERS, and HKU1 spike proteins. Using a combination of a low-resolution cryo-EM structure and HDX mass spectrometry, the authors were able to map amino acids in the antibody paratope and spike epitope, the latter of which is the hinge region of the Spike S2 domain (residues 980-1005) that plays a critical role in pre- to -post-fusion conformational changes. Through well-executed and comprehensive mutagenesis, binding, and functional assays, the authors further validated critical residues that lead to antibody escape, which centered around the 2P residues and diminished viral entry. While 3A3 and an affinity-enhanced engineered version, RAY53, did not show potent in vitro neutralization against the authentic virus, the antibody was shown to recruit Fc effector functions for viral clearance, in vitro.

      Overall, the conclusions of this paper are well supported by the data, but the usefulness of such antibodies is likely limited. The work can be strengthened by extending the analysis of 3A3-like antibodies in the context of human immune responses and in vivo effectiveness.

      1) Isolation of 3A3 was achieved after the generation of scFv-phage libraries following immunization with a MERS S2-domain immunogen in a mouse model. The fact that 3A3 binds well to 2P-stabilized sequences and binding/neutralization is diminished upon reversion of 2P mutations back to the native spike sequence (Figures 3a, 4c, and 5b), suggest that such antibodies would likely not arise from natural infection. This contrasts the isolation of fusion peptide and stem helix-directed antibodies, which were isolated from both immunized animals and convalescent individuals. To make their results more solid regarding the use of such antibodies in future vaccine strategies, the authors should provide evidence that 3A3-like antibodies can be identified in human donors. For example, they could enrich donor-derived S2-specific antibodies that bind both MERS and SARS2 S2 domains and evaluate the fraction of antibodies that recognize the hinge-epitope using competition binding assays (either ELISA or BLI), which have commonly been used to map epitope-specific sera responses. This could also be achieved with nsEMPEM of polyclonal IgGs bound to S2 proteins.

      2) The authors speculate in the discussion that strategies to enhance access to the hinge epitope, which may include ACE2-mimicking antibodies, could promote enhanced viral clearance. In addition to ACE2-mimicking antibodies, several antibodies have been described that bind the RBD and promote S1 shedding (see for instance mAb S2A4 - Piccoli et al, 2020, Cell). Several 2nd generation vaccine platforms utilize RBD-only immunogens that are likely to induce high titers of ACE2-mimicking and cross-reactive S1-shedding antibodies. Thus, adding in vitro neutralization and ADCC experiments to assess synergy between 3A3/RAY53 and such antibodies would booster this speculative claim and be of interest to many in the field developing strategies for pan-coronavirus therapies.

      3) The authors provide in vitro evidence in Figure 5c,d for Fc-mediated viral clearance. While in vivo data to show effectiveness in animal models is ideal, additional in vitro data that utilize engineered constructs that modulate effector function (e.g., DLE (+) or LALA (-)) would boost the authors' claims regarding Fc-mediated viral clearance mechanisms by 3A3/RAY53.

      1) Though we do not plan to isolate 3A3-like antibodies from human donors, there is evidence that these antibodies are elicited in infected humans via analysis of polyclonal responses in Claireaux et al 2022. We also know of several studies on naturally occurring S2 hinge targeting antibodies from colleagues that are in preparation. Understanding the therapeutic role of this antibody class is relevant to the study of broadly-reactive S2 antibodies, even if that role is limited.

      2) We agree that synergy between S2 hinge epitope binding antibodies and ACE2 mimicking antibodies will be very interesting to investigate. We hope to pursue this in future work.

      3) We agree these are excellent controls to include, in addition to isotype controls already shown. In accordance with the eLife COVID research policy, we minimized our claims around Fc-effector functions elicited by RAY53 and stated that further experiments to confirm our preliminary findings are needed.

      The existing description of the effector function experiments states in lines 392-392 “These results indicate that RAY53 binding is compatible with ADCP and ADCC,” which is already a very limited claim.

      We also added in line 450 that S2 core-binding antibodies “require further validation” of their ability to recruit effector functions.

      We appreciate the importance of controls providing effector function modulation and will include the LALAPG mutations as a standard component of our future ADCC evaluation. However, given our focus on the relevance of the epitope and consistency of the Fc regions across the antibodies, we felt that the isotype and positive control antibodies (target binding controls) were the most relevant controls to include in this study.

    1. Author Response

      eLife assessment

      Germline inactivation of NPHP2, which encodes a protein that localizes to the transition zone at the base of the primary cilium, results in infantile kidney cysts and fibrosis. In this study, the authors provide solid evidence that increased cell proliferation and fibrosis precede cyst formation in Nphp-2 mouse models, that mutant renal epithelial cells are responsible for the phenotype, and that genetic inhibition of ciliogenesis in this model reduces disease severity. They also show that valproic acid, a drug that affects a number of cellular targets and is used to treat other human conditions, slows disease progression. One limitation of the study is that it provides limited insights into the mechanisms responsible for any of its interesting observations.

      To our knowledge, our study is the first to pinpoint defective epithelial cells as the main driver for both epithelial cysts and interstitial fibrosis in a NPHP model. The discovery that abnormal signaling from epithelial cells triggered a profibrotic response in the absence of cyst formation is also novel. Our Ift88 Nphp2 double mutant results, combined with tissue-specific function of NPHP2, suggest that NPHP2 functions as a negative regulator of a profibrotic and pro-cystic pathway that interacts with cilia-mediated signaling in epithelial cells and that abnormal signaling from epithelial cells triggers interstitial fibrosis. Moreover, we identified the HDAC inhibitor VPA as a potential candidate drug for treating NPHP. Although the precise molecular function of NPHP2 remains undefined, our results suggest that epithelial specific function and epithelial-stromal crosstalk underlie NPHP like phenotypes in Nphp2 mutant kidneys. Furthermore, although whether NPHP2 interacts with polycystin-mediated signaling remains an outstanding question, our results ruled out the involvement of NPHP2 in ciliary localization of PC2.

      Reviewer #1 (Public Review):

      Nephronophthisis (Nphp) is a multigenic, recessive disorder of the kidney presenting in childhood that is characterized by cysts predominantly at the cortico-medullary junction and progressive fibrosis. An infantile form of the disease presents earlier with more diffuse cystic change. The condition is considered a ciliopathy because most of the genes linked to the condition encode proteins involved in ciliary biogenesis or function. Germline mutations in NPHP2 are associated with a particularly severe, infantile form of the disease. Given that interstitial fibrosis is a more prominent feature of Nphp compared to many other forms of polycystic kidney disease, the authors sought to determine the mutant cell types responsible for the phenotype.

      In the current study, the authors generated and characterized mouse lines with Nphp2 selectively inactivated in either renal epithelial cell or stromal cell lineages and found that inactivation in renal epithelial cells was both necessary and sufficient to cause disease. They further showed that markers of interstitial fibrosis and proliferation increase in mutants prior to the onset of histologically evident cystic disease, suggesting that aberrant epithelial-stromal cell signaling is an early and primary feature of the condition (Figures 1-4). The study design was straightforward and appropriate to address the question, and the results support their conclusions.

      They next tested whether the cilia-dependent cyst-activating pathway (CDCA) that is "unmasked" by loss of other PKD-related genes is similarly active in Nphp2 mutants by generating Nphp2/Ift88 double mutants. Their studies found that the severity of cystic disease and markers of proliferation and fibrosis was attenuated in double-mutants (Fig 5, 6). These studies were also appropriate for testing the hypothesis and the results were similarly consistent with their interpretation.

      In the last set of studies, they tested whether valproic acid (VPA), a drug that has multiple modes of action including acting as a broad inhibitor of HDACs and previously used by the investigators in other forms of polycystic kidney disease, would have similar effects in Nphp2 mutants. The authors tested daily injection from days P10 through P28 in both control and Nphp2 mutant mice with VPA or an appropriate vehicle control and found that VPA was beneficial (Fig 7). The study design was acceptable and the results generally support their conclusions. The one perplexing result is shown in Fig 7B. The Nphp2 mutants, regardless of treatment status, have body weights (BW) that are significantly lower than the controls, with treated mutants even trending lower than their untreated mutant counterparts. This is unexplained and should be addressed. In the mutants with more widespread epithelial cell knock-out of Nphp2 (Ksp-Cre, Fig 1), total body weight decreased as mice became more severely cystic with renal impairment. In the milder form of disease produced with the Pkhd1- Cre (Fig 7), total body weight is inexplicably approx. 2g lower on average despite having much more modestly elevated KBWs and BUNs. Moreover, one might have expected that mutants treated with VPA would have had BWs intermediate between untreated mutants and controls since the severity of the disease was moderately attenuated. These differences raise the question as to whether body weight differences are due to factors independent of disease status, the most likely of which would be that the controls were not littermates. This prompted a careful review of the text for descriptions of the control mice. Throughout the study, the investigators describe selecting animals from the same "cohort", but this term is imprecise.

      There is little information provided about background strains, whether any of the lines were congenic, or whether any of the studies were done using littermate controls. This must be addressed. It would help if the investigators identified the litter status in their plots. This would clearly show relationships between animals and the number of litters that had animals with these properties. If littermates were not used for each study, the authors must explain both why they didn't do so and how they then selected which animals to use. This information is especially important for interpreting the results of their genetic interaction (fig 5) and drug treatment studies (fig 7).

      We thank the reviewer for the multiple positive comments.

      To address the issue of body weight, we examined the time course of body weight change more carefully and added Figure 7-figure supplement 1 to present the results. Although Nphp2flox/flox;Pkhd1-Cre mice displayed reduced body weight at P28 in comparison to controls, this reduction was more moderate than that of Nphp2flox/flox;Ksp-Cre mice (Figure 7-figure supplement 1A). Notably, the trend of body weight difference started at around P21 in both Nphp2flox/flox;Pkhd1-Cre and Nphp2flox/flox;Ksp-Cre mice, coinciding with weaning (Figure 7-figure supplement 1B). It is possible that mutants with compromised kidney function were less capable to thrive and gain weight at around this transition time. In terms of VPA treatment, body weight trended down in both wild type and mutant mice subjected to the treatment, although the difference did not reach statistical significance (Fig. 7B). We cannot rule out the possibility that side effect of VPA contributed to weight loss in treated mice. In addition, VPA may affect body weight increase through HDAC: the HDAC inhibitor Trichostatin A was shown to inhibit adipogenesis (PMID: 34232916) and 4-hexylresorcinol, another HDAC inhibitor, reduced body weight in treated rats (PMID: 34445640). To include the additional data and references, we added the following in the Results section:

      "We analyzed body weight change of Nphp2flox/flox;Pkhd1-Cre mice in more detail and compared it to Nphp2flox/flox;Ksp-Cre mice. At P28, the reduction of body weight in Nphp2flox/flox;Pkhd1-Cre mice in comparison to control mice was more moderate than that in Nphp2flox/flox;Ksp-Cre mice (Figure 7-figure supplement 1)."

      " However, the reduced body weight phenotype in mutant mice was not suppressed by VPA treatment (Fig. 7B). We cannot rule out the possibility that the side effects of VPA contributed to weight loss in treated mice. In addition, VPA may reduce body weight through inhibiting HDAC during the growth period: the HDACI Trichostatin A was shown to inhibit adipogenesis (51)."

      Regarding genetic background, all mice analyzed in figures 5 and 7 are in the same genetic background (C57/BL6J). We added more detailed description of genetic background in the Materials and Methods section. Littermate status is now also indicated in figure legends.

      In Figure 5, multiple genotypes (i.g. Nphp2flox/flox;Ksp-Cre, Nphp2flox/flox;Ift88flox/flox;Ksp-Cre and Ift88flox/flox;Ksp-Cre) were analyzed. Because of the limited number of animals per litter and low yield of desired genotypes, non-littermates had to be included in some cases. Littermate status is now highlighted by colors in the data tables of Figure 5 source data.

      In Figure 7, because of the limited number of animals per litter and the need to subject each genotype to VPA and vehicle treatment, non-littermates had to be included in some cases. Littermate status is now indicated by highlight colors in the data tables of Figure 7 source data.

      Several other considerations. The authors state that the effects of VPA are mediated through the drug's inhibition of HDACs and suggest that future studies could be directed at refining the specific HDAC. While this is certainly possible, the authors should acknowledge that VPAs have been reported to have numerous pharmacologic effects and targets and which of these is mediating the effects in their model is unknown (text). They would need mechanistic studies to show this, though it doesn't discount their possible efficacy as a therapy for PKD.

      We agree that it is an important point to clarify and added in Discussion: "It is also worth noting that VPA could affect targets other than HDACs and testing newly approved HDACIs will provide useful insight."

      The authors also state in their abstract that their double knock-out studies "support a significant role of cilia in Nphp2 function in vivo." It is not clear to me how their studies show this nor how they can exclude that ciliary activity is operating in an Nphp2-independent, parallel fashion that modulates some common downstream pathways.

      We agree with the reviewer that our results do not exclude the possibility that NPHP2 and ciliary activity feed into a common downstream pathway, i.e., a cilia-dependent cyst-activating pathway could operate outside of cilia. We changed the sentence in abstract to "supporting a significant interaction of cilia and Nphp2 function in vivo." In addition, we added "Although cilia-dependent, the downstream pathway could potentially operate outside of cilia and receive parallel signals from both ciliary activity and Nphp2." to Discussion to clarify and reflect the results and model more precisely.

      Reviewer #2 (Public Review):

      The manuscript by Li et al demonstrates the role of Nphp2/Invs in renal epithelia in preventing NPHP-like phenotypes, such as epithelial/stromal proliferation and stromal fibrosis, in mice. Previously, mutants of the Nphp2 allele in mice, generated by insertional mutagenesis, showed severe cystic kidney disease and fibrosis in neonates.

      The authors nicely show that the NPHP-like phenotypes in mutant kidneys arise from abnormal signaling specifically within and from renal epithelial cells. Furthermore, the fibrotic response and abnormal increase of cell proliferation precede cyst formation and could be initiated independently of cyst formation. The authors also show that the removal of cilia reduces the severity of Nphp2 phenotypes. The authors suggest that similar to polycystins, NPHP2 inhibits a cilia-dependent cyst and fibrosis-activating pathway. Finally, the histone deacetylase (HDAC) inhibitor valproic acid (VPA) reduces these phenotypes and preserves kidney function in Nphp2 mutant mice, supporting HDAC inhibitors as potential candidate drugs for treating NPHP.

      Overall, understanding the mechanisms driving NPHP phenotypes is important and drugging relevant pathways in treating this disease is an important unmet need in patients. The authors have provided insights into both these aspects in this study. The manuscript is nicely written, and the assays shown are rigorous and insightful.

      We thank the reviewer for the positive comments.

      Reviewer #3 (Public Review):

      In this manuscript, Li et. al, investigate whether epithelial or stromal Nphp2 loss, a gene causative of nephronophthisis (NPHP), drives polycystic kidney disease (PKD) and kidney fibrosis in a novel floxed model of Nphp2. The authors found that only epithelial and not stromal Nphp2 loss results in NPHP-like phenotypes in their mouse model. In addition, the authors show that concurrent cilia, via Ift88 loss, and Nphp2 loss within the kidney epithelium as well as HDAC inhibition results in less severe PKD/kidney fibrosis, as has been shown in mouse models of other non-syndromic forms of PKD, such as autosomal dominant PKD caused by mutations to Pkd1 or Pkd2.

      The authors aimed to understand (1) whether the published NPHP phenotype (kidney cysts and fibrosis), known from the global Nphp2 knockout mouse, is driven by the function of NPHP2 in the kidney epithelium or stromal cells; (2) if kidney fibrosis in NPHP is linked to kidney damage caused by cysts, or independent and preceding of the PKD phenotype; (3) whether cilia are required, causative, or prohibitive of NPHP cystogenesis; and (4) if a broad spectrum HDAC inhibitor is a potential therapeutic approach for NPHP.

      With the provided results, the authors established that epithelial Nphp2 loss is likely a predominant driver of PKD in their model; however, they cannot exclude that stromal NPHP2 does not play a role in cysts growth post-initiation because the authors failed to directly compare their cell type-specific models to a global cre knockout (e.g. Cagg-cre).

      We agree with the reviewer that we cannot rule out the possibility that stromal NPHP2 plays a role post cyst initiation and added "However, our result does not rule out functional significance of interstitial cells once a pro-cystic and fibrotic response is triggered in mutant epithelial cells." to the Discussion section.

      A direct comparison between epithelial specific and global knockout models is an attractive idea, but technically challenging. For an interpretable comparison, it is essential that the stage and knockout efficiency in epithelial cells are equivalent between the two models. However, Ksp-Cre is expressed in the distal nephron specifically, sparing epithelial cells in other segments, while epithelial cells in all segments would be affected by Cagg-Cre. In addition, global knockout of Nphp2 leads to peri-natal lethality. Inducible Cagg-Cre could potentially be used to bypass earlier functional requirements. But matching stage and knockout efficiency in renal epithelial cells between Ksp-Cre and inducible Cagg-Cre mediated knockout remains challenging. These factors make a direct comparison problematic. Finally, our results revealed the role of defective epithelial cells in triggering the phenotypes but did not rule out a role of interstitial cells once abnormal signaling is initiated in epithelial cells. To clarify this point, we added " However, our result does not rule out functional significance of interstitial cells once a pro-cystic and fibrotic response is triggered in mutant epithelial cells." to the Discussion section.

      In addition, it is possible that cyst initiation/growth upon stromal Nphp2 loss occurs substantially slower compared to epithelial Nphp2 loss. However, it seems the authors did not look at kidney phenotypes beyond 28 days of age. Publications from the ADPKD field suggest, that stromal Pkd1 loss initiates cystogenesis much slower than epithelial Pkd1 loss.

      We have expanded our analysis to 8-week-old mice. We now show that Nphp2flox/flox;Foxd1-Cre mice show normal kidney weight, kidney/body weight ratio, kidney function and histology at P56, supporting our original conclusion that deletion of Nphp2 in interstitial cells fails to trigger obvious renal phenotypes, up to young adult stage. These results were presented in Figure 4- figure supplement 1 and the Results section.

      Further, while the authors suggest that kidney fibrosis precedes cyst development, the results supporting this conclusion are limited to one time point, analyzing IF staining of a single marker that can be compared between non-cystic and cystic time points. These analyses need to be extended to make any firm conclusions.

      At the precystic kidney stage (P7), we analyzed SMA and vimentin levels via immunostaining. Their mRNA levels were additionally quantified via RT-qPCR. We have now analyzed vimentin levels at multiple timepoints (P9, 14 and 21) and results were added to Figure 2. Combined, these data support the initiation of a fibrotic response prior to cyst formation.

      The most interesting finding of the manuscript, and likely most impactful to the field, is, that loss of cilia within the setting of epithelial Nphp2 loss reduces PKD severity. This finding parallels published findings for Pkd1 and Pkd2 which are suggested to function in a cilia- dependent cyst-activation mechanism. Unfortunately, the here shown studies, do not add to the mechanistic insight beyond showing the descriptive finding. Most importantly, it remains unclear whether NPHP2 functions in the same pathway as polycystin-1 or -2 (the Pkd1, Pkd2 gene products) or in a separate independent pathway.

      Our Ift88 Nphp2 double mutant results, combined with tissue-specific function of NPHP2, which to our knowledge is completely novel in a NPHP model, suggest that NPHP2 functions as a negative regulator of a profibrotic and pro-cystic pathway that interacts with cilia-mediated signaling in epithelial cells and that abnormal signaling from epithelial cells triggers interstitial fibrosis. We agree with the reviewer that whether NPHP2 functions in the same pathway as polycystins is an interestingly question. However, we feel it is out of the scope of this manuscript and would pursue this research direction in our future studies.

      With respect to the HDAC preclinical studies, the authors show supporting data that a broad- spectrum HDAC inhibitor may be suitable for slowing cyst growth in their model of NPHP. Overall, these studies are not novel to the field, as HDAC inhibition has been shown to slow PKD progression in various models of PKD al while not in NPHP specifically. Further, the studies seem like an add-on, which does not directly link to the prior cell type-specific studies of NPHP2, and no mechanisms linking the two concepts are provided.

      Although we and others showed that HDACIs slow cyst progression in other PKD models, this study is the first to show its impact on a NPHP model. Given the current lack of treatment for NPHP, we feel it important to communicate the results to the research community even though the molecular mechanism remains to be defined.

    1. Author Response

      Reviewer #1 (Public Review):

      The article "Identification of a weight loss-associated causal eQTL in MTIF3 and the effects of MTIF3 deficiency on human adipocyte function" explored the functional roles of MTIF3 during adipocyte differentiation. In persons living with obesity, genetic variation at the MTIF3 locus associates with body mass index and responses to weight loss interventions. MTIF3 regulates mitochondrial protein expression and gene knockouts cause cardiomyopathy in mice. This paper provides insight into the impacts of MTIF3 knockout on adipocyte differentiation and the expression effects of the eQTL on MTIF3 levels. The authors implement a CRISPR/Cas9 gene editing approach coupled with an in vitro platform to detect influences of MTIF3 on adipocyte glucose metabolism and gene expression. This method may serve as a platform to explore knockouts in human cell lines, so it may allow the discovery of new gene x environment influences on in vitro outcomes related to differentiation, growth, and metabolism.

      The conclusions of this paper are mostly well supported by data, but some experimental conditions and data analysis needs to be clarified and extended.

      1) The authors use CRISPR/Cas9 to generate the rs1885988 variant in the human white adipocyte cell line and performed a comprehensive validation analysis of gene editing (Figure 1). qPCR analysis showed reduced MTIF3 expression during human adipocyte differentiation (Figure 1E, F). To expand the importance of the rs1885988 variant, the authors should have provided target gene measurements to verify the canonical differentiation profile (e.g., FABP4, ADIPOQ) and help readers understand the overall impact of gene editing at the MTIF3 locus.

      Thank you for your suggestions. As you requested, we have quantified several adipocyte differentiation markers in the allele-edited cells after 12 days of adipogenic differentiation. The data (Figure 1-figure supplement 1) shows no significant difference between cells with the different genotypes. We have added more information about this in lines 100-101, and also in another context in lines 105-116.

      Notably, the intra-group variation of the marker gene expression is large (Figure 1-figure supplement 1), which makes it difficult to clearly state how much the allele editing, as opposed to random variation resulting from single cell cloning, contributes to the differentiation outcome. However, if we also consider MTIF3 knockout cells (that do not need to be single-cell cloned), their differentiation marker expression also appears unaffected (Figure 3-figure supplement 1). Taken together then, it is unlikely the allele editing with the consequent effect on MTIF3 expression affects adipogenic differentiation in our experiments. We mention the absence of effect of MTIF3 knockout on differentiation in the paragraph starting on line 137.

      2) The direct mechanistic influences of MTIF3 on adipocyte function remain unclear. MTIF3 regulates the translation initiation of mitochondrial protein synthesis. Western blots of OXPHOS proteins do not per se underscore supercomplex formation, which is also a process mediated by MTIF3. Blue native gel electrophoresis may prove a better method to establish the effects of MTIF3 loss-of-function on supercomplex formation.

      As suggested, we have run blue native gel electrophoresis to detect the formation of OXPHOS respiration complexes. In the revised manuscript (lines: 158-168 and Figure 4 E,F), we show how MTIF3 knockout indeed interferes with the complex formation, with lower abundance of complexes V/III2+IV1, III2/IV2 and IV1. Additionally, although the blot signal for complex I+III2+IVn is diffuse, it appears higher in scrambled control cells than in MTIF3 knockout cells. Interestingly, complex II content is slightly higher in MTIF3 knockouts, which may result from a compensatory regulation mechanism, as none of the subunits of complex II is encoded by mitochondrial DNA. We also found several faster-migrating (“undefined bands” in the figure) in the MTIF3 knockout samples, although it is hard to determine whether those are single chain proteins, or degradation or mistranslation products. Overall though, the native gel blots show impaired OXPHOS complex assembly in MTIF3 knockout samples.

      In addition, we performed western blots for other mitochondrial proteins, including COX II (subunit of OXPHOS complex IV), ND2 (subunit of OXPHOS complex I), ATP8 (subunit of OXPHOS complex V), and CYTB (subunit of OXPHOS complex III). The data (Figure 4 A,B), show decreased ND2 and COX II, trending decrease of CYTB, and unaffected ATP8 content in MTIF3 knockout adipocytes.

      The methods (paragraph starting at line 479), results (paragraph starting at line 145), and discussion (lines: 261-263, 274-277) were incorporated in the revised manuscript.

      3) Based on the findings, the authors argue that MTIF3 knockout alters the function of adipocytes. However, many of the experiments show fairly small effect sizes (Figure 5A, Figure 6A). How does the MTIF3 knockout explicitly perform functions related to body weight regulation? Gene editing in vivo would have helped to substantiate the authors' conclusions.

      In the paper we are looking at the consequences of MTIF3 deficiency in one cell type, over short time, in vitro. The outcome of body weight regulation, e.g. during weight loss, would result from long-term effects of MTIF3-altered metabolism in more than one tissue. We envisage that small changes in energy metabolism in not only fat, but also in e.g. muscle, would make a substantial difference over time in vivo (this, we cannot capture in in vitro models). We have added this discussion to lines 294-311.

      As for in vivo genomic editing, the alleles of interest are specific to the human genome. Ideally, a genotype-based recall study in humans would be appropriate, but due to time and resource limitation, we are not able to conduct such a study at the moment (although we certainly hope to perform such a study in the future). As for modeling the MTIF3 deficiency in mice – the MTIF3 knockout mice are not viable [1], and certainly other options (e.g. overexpression, tissue-specific knockouts) are possible and tempting to investigate. This, however, would require considerable additional work which we could only perform in a future project.

      4) In several instances, the authors refer to 'feeding' cells with glucose (line 206, line 171). Feeding experiments often imply complex nutrient interventions in animal models and people, which cannot be easily recapitulated in cell culture. The in vitro experiments simply alter levels of glucose and more precise language would state the specific challenges accurately.

      In the revised manuscript, we have substituted “feeding” for exact glucose concentration, or “glucose concentration” where appropriate. (paragraph starting at line 215, and lines 577-578, 597, 873-879)

      Reviewer #2 (Public Review):

      Huang Mi, et al. investigated the role of MTIF3, the mitochondrial translation initiation factor 3, in the function of adipocytes. They first detected the expression of the obesity-related MTIF3 variants based on the GTEx database and found two variants lead to an increase in MTIF3 expression. Then they knockout MTIF3 in differentiated hWAs adipocytes and characterized the mitochondrial function. They found loss of MTIF3 decrease mitochondrial respiration and fatty acid oxidation. They further treated cells with low glucose medium to mimic weight loss intervention and found MTIF3 knockout adipocytes lose fewer triglycerides than control adipocytes. This paper provides new information about MTIF3 in adipocytes and the potential functional role of MTIF3 in mitochondrial function.

      1) The authors provided sufficient data to show those two genetic variants increase MTIF3 expression. Their CRISPR/Cas9 knockin cell line is also convincing. But they didn't show if the genetic variants affect adipogenesis. Adipogenesis is an important process for weight gain and fat deposition. In lines 103-107, the authors mentioned that the "allele-edited cells have some problem in differentiated state, e.g. triglyceride or mitochondrial content", so they used an inducible Cas9 system. However, the issue of differentiated allele-edited cells may be the functional effect of MTIF3 genetic variants, such as interrupting adipogenesis, decreasing triglyceride, or affecting mitochondrial number. The authors should provide that information.

      Thank you for all your suggestions. We think we were not clear regarding this issue. We did not mean that the allele-edited cells have problem in differentiated state, which then definitely could be (as you point out) due to the functional effect of MTIF3 genetic variants. The problem relates to the process of single-cell cloning itself, which inherently introduces random variation. As a consequence, the data on adipogenic differentiation in allele-edited cells has relatively high intra-group variation. We have added more clarifying text in lines 104-116.

      To provide the data on this, per your request, in the revised manuscript we include the results for the rs67785913-edited cells in Figure 1-figure supplement 1. As shown, we observed no differences in the expression of adipogenic markers (ADIPOQ, PPARG, CEBPA, SREBF1 and FABP4) or in mitochondrial content between the two rs67785913 genotypes. Since the intra-group variation is often high, it is hard to conclude how much the rs67785913 eQTL affects the quantified variables. Much of the variation could instead be ascribed to the effects of single cell cloning.

      The cloning per se introduces random variation, but is required to obtain homozygous allele-edited cells. Because of this dilemma, and to clarify how much MTIF3 expression can actually influence adipogenic differentiation, we have, during the revision, also used the hWAs-iCas9 cells to generate MTIF3 knockouts at the preadipocyte stage and then tested their differentiation capacity. As we show in Figure 3-figure supplement 1, we found no apparent differences in adipogenic marker gene expression between scrambled control and MTIF3 knockout cells (we mention that in lines 137-144). Taken together, our results may indicate that the rs67785913 genotype, through affecting MTIF3 expression, is unlikely to regulate adipogenic differentiation.

      2) In Figure 4, the author mentioned that MTIF3 knockout does not affect the expression of adipogenic differentiation markers. They need to provide more evidence to prove their point. Oil-red O staining is a clearer way to quantify adipocyte differentiation in cell culture. In addition, in Fig. 4B western blot, the author should include MTIF3 as a control to show the knockout efficiency. It is not clear the meaning of plus and minus in that panel. The author should also compare the total triglyceride levels in MTIF3 knockout cells and control cells.

      We have now included Oil-red O staining results and total triglyceride levels (Figure 3 F,G), which show no apparent differences between scrambled control and MTIF3 knockout cells (method: lines 427-431; results: lines 137-144). We also added the MTIF3 blots to figure 4A as a control, showing high and consistent MTIF3 knockout efficiency in independent experiments. In the original manuscript, the plus and minus referred to control and knockout, respectively. To clarify that, we have changed the expression to SC and KO in the revised manuscript.

      With regards to Oil-red O vs. quantification of adipogenic markers, we actually prefer the latter method, as it gives more accurate and less variable results than Oil-red O (at least in the cell line we use). We have, however, performed Oil-red O as well to address your question.

      3) MTIF3 is a translation initiation factor in mitochondria and is involved in the protein synthesis of mitochondrial DNA-encoding genes. The authors should check protein levels rather than the mRNA levels of mitochondrial DNA-encoding genes (Fig. 6E). It's interesting to see the increase of mRNA levels of ND1 and ND2, which might be feedback of lower translation. Since ND1 and ND2 are in OXPHOS complex I, the expression levels of complex I in MTIF3 KO cells would be worth checking. Additionally, the author should also check the mitochondria copy number.

      As suggested, we have detected several mitochondrial encoding proteins which are subunits of each mitochondrial OXPHOS complex. As shown in figure 4A, ND2 (subunit of OXPHOS complex I) and COX II (subunit of OXPHOS complex IV) expression were significantly reduced, CYTB (subunit of OXPHOS complex V) expression tended to decrease, and ATP8 expression was not affected in the MTIF3 knockout adipocytes. We also detected the formation of the OXPHOS respiration complex in extracted mitochondrial proteins and found MTIF3 perturbation affect mitochondrial complex assembly. The detailed methods (lines: 479-490), results (lines: 145-169) and discussion (lines: 260-262, 274-277) were incorporated in the revised manuscript.

      We have also added the mitochondrial copy number data (Figure 3A), showing that MTIF3 knockout has lower mitochondrial content (methods: lines 491-500; results: 156-157)

      4) MTIF3 knockout adipocytes retain more triglycerides under glucose restriction is interesting. It may link to the previous result of lower fatty acid oxidation in MTIF3 knockout adipocytes. However, the authors then showed there is no difference in lipolysis. The author should discuss those results in the manuscript.The authors could also check lipolysis in glucose restriction conditions. It's also necessary to include the triglyceride levels of KO cell lines at full medium

      We have now examined the glycerol release in glucose restriction condition, and found no differences between control and MTIF3 knockouts (Figure 6-figure supplement 1). Interestingly, in 1 mM glucose, both genotypes released less glycerol than at 25 mM glucose, and this has been observed before in SGBS cell line [2] According to your suggestion, we have added the total triglyceride content at 25 mM glucose condition (Figure 6C), which also was not different between control and MTIF3 knockout cells. We speculate the higher retention of triglycerides in the knockouts could be due to higher re-esterification of lipolytically released fatty acids, since, as we observed, fatty acid oxidation is impaired in the knockouts. In the revised manuscript, we added that to the discussion (lines: 289-293).

      References

      1. Rudler, D.L., et al., Fidelity of translation initiation is required for coordinated respiratory complex assembly. Sci Adv, 2019. 5(12): p. eaay2118.
      2. Renes, J., et al., Calorie restriction-induced changes in the secretome of human adipocytes, comparison with resveratrol-induced secretome effects. Biochim Biophys Acta, 2014. 1844(9): p. 1511-22.
    1. Author Response

      Reviewer #2 (Public Review):

      The idea that decidualization is related to or evolved from wound healing, including fibroblast activation, is old, going back all the way to Creighton 1878 who pointed to the similarity between granulation tissue and decidual tissue, and is supported by the fact that embryo implantation is a compensated form of the endometrial lesion. Nevertheless, the mechanistic connection between FB activation and decidualization is an important fact necessary for understanding decidualization, a fact that is reflected in previous work, for instance, Kim et al., 1999 (Hum Reprod 14 Suppl 2), their reference 20, and Oliver et al., 1999 (Humn Reprod 14), their reference 56 a.o.m. More specifically, a recent single-cell study of in vitro decidualization has shown that a myofibroblast-like cell state is a transient state in the process of decidualization, i.e. decidual cells themselves are not so much activated fibroblasts, but rather decidual cells differentiate after endometrial stromal fibroblasts undergo a FB activation like process, and the decidual re-programming happens from these activated FB like states (Stadtmauer et al., 2021, Biol. of Reprod. 1-18).

      Yes, the paper from Stadtmauer DJ and Wagner GP (2022) was cited in revised version.

      The above assessment of how the current study fits into the conceptual landscape of mammalian reproductive biology does not diminish the importance of the paper under consideration. The study contributes a large amount of observational and experimental facts to the understanding of how FB activation and decidualization are related. The authors suggest, in particular, that blastocyst-derived TNF activates the cLPA- producing Arachidonic acid (AA), activating PGI2 and PPARd signaling pathway (more about this later).

      Other major comments:

      The authors suggest that luminal epithelial cells signal through the release of arachidonic acid (AA) in response to TNF. That is interesting and supported by in vitro experiments inducing decidualization and FB activation by AA. What makes this conclusion a little problematic is that it is known that luminal epithelial cells also express COX2/PTGS2 and thus the synthesis of prostaglandins is already starting in the LE and thus LE can also signal to the stoma via PGE2, PGI2 as well as PGL2 rather than AA directly. The in vitro experiments can not exclude the possibility that the ESF is producing some prostaglandin and then having an autocrine effect.

      Yes, we agree with you. It is possible that PGI2 and PGE2 from luminal epithelial cells may also induce fibroblast activation. Based on the data from in situ hybridization, COX-2, mPGES, PGIS and PPARδ are mainly expressed in subluminal stromal cells at mouse implantation site on day 5 of pregnancy (Lim et al, 2000; Ni et al, 2002; Wang et al, 2004). Therefore, PGI2 from stromal cells should be the dominant one compared to that from luminal epithelial cells. In the future, we will examine the effects of AA on COX-2, mPGES and PGIS in luminal epoithelial cells.

      Lim H, Dey SK. PPAR delta functions as a prostacyclin receptor in blastocyst implantation. Trends Endocrinol Metab. 2000 May-Jun;11(4):137-42.

      Ni H, Sun T, Ding NZ, Ma XH, Yang ZM. Differential expression of microsomal prostaglandin e synthase at implantation sites and in decidual cells of mouse uterus. Biol Reprod. 2002 Jul;67(1):351-8.

      Wang H, Ma WG, Tejada L, Zhang H, Morrow JD, Das SK, Dey SK. Rescue of female infertility from the loss of cyclooxygenase-2 by compensatory up-regulation of cyclooxygenase-1 is a function of genetic makeup. J Biol Chem. 2004 Mar 12;279(11):10649-58.

      344: here the authors report that PGE2 has no effect on FB activation marker expression, but the problem with that is, that (at least in human ESF), progesterone is causing a change in the expression of the PGE2 receptors from EP4 to EP2, and it is only the EP2 receptor that activates cAMP/PKA pathway.

      Yes, we agree with you. PGES is highly expressed in stromal cells at implantation site. Previous studies also show that PGE2 is important during decidualization. In our study, PGES showed no significant changes after stromal cells were treated with AA. PGE2 also had no significant effects on fibroblast activation. Therefore, we focused on PGI2-PPAR pathway. It is possible that PGE2 may regulate decidualization through an alternative way rather than fibroblast activation.

      The fact that the authors show an effect of PGI2 is interesting because PGI2 receptors are among the strongest expressed PTG receptors in mammalian ESF. Prostacyclin receptor is a GPCR rather than a nuclear receptor. So the question is really why the authors have not pursued the role of prostacyclin receptor and instead have focused on PPARd?

      Yes, we agree with you. When mouse stromal cells were treated with AA, there was no significant change for the protein level of prostacyclin receptor (Figures 4E, 4F). When mouse stromal cells were treated with the agonist SELEXIPAG of prostacyclin receptor, the markers of fibroblast activation showed lower changes compared with treatments with PPARδ (Figure 3D). Therefore, we focused on PPARδ. Yes, we agree with you. Although prostacyclin receptor is less responsive than PPARδ in activating fibroblast activation, it should contribute to fibroblast activation. In the future, we will pursue the effect of prostacyclin receptor on fibroblast activation. Thank you very much for your suggestion.

      Reviewer #3 (Public Review):

      This manuscript postulates that uterine stroma cells undergo a stage of activation between the resting state and the differentiated decidual state in order to support embryo implantation. Using in vivo mouse and in vitro mouse and human stroma cells they demonstrate that during decidualization the stroma cells express the marker genes for activated stroma. They then trace an axis from the embryo-producing TNF to prostaglandin production and activin A that is required for this process. They propose data to show that activation of the stroma is altered in infertility due to fetal trisomy 16.

      The strengths of this manuscript are:

      1) This is a comprehensive study using both in vivo and in vitro studies and in both mouse and human stroma cells.

      2) The experiments use a combination of ligands, agonists, and inhibitors to map the signaling axis regulating stroma activation.

      3) The data shown support the conclusions in this manuscript.

      The weaknesses of this manuscript are:

      1) The conclusion that Acitvin A is the regulator of stroma activation as mentioned by this manuscript is correlative. What is needed is a knockdown of Activin A and then assess stroma activation to prove Activin A is the major regulator and not one of many TGFb family members.

      Yes, the data from Activin A knockdown were provided.

      2) The use of uterine epithelial cells is problematic. The in vitro co-culture approach is not a state-of-the-art co-culture. Removal of epithelial cells from the uterus results in loss of the epithelial phenotype. If the manuscript used an epithelial organoid stroma cell coculture approach it may better reflect the role of the epithelial cells in this process. Otherwise, it is not clear that the epithelial cells are actual participants in the signaling axis. The treatments could be directly on the stroma cells.

      Yes, we agree with you. According to your suggestions, we established a culture system for epithelial organoid. When the epithelial organoids were treated with TNF, a similar response was obtained compared with in vitro cultured mouse epithelial cells.

      3) Ishikawa cells are endometrial cancer cells. They do not really reflect uterine epithelium and it is not clear that any epithelial cell could be substituted for these cells.

      Thank you very much for your comments. It is true that Ishikawa cells should be different from in vivo endometrial epithelial cells. However, several studies showed that Ishikawa cell line possess apical adhesiveness to JAR trophoblast cells and expresses many of the same enzymes and structural proteins found in normal human endometrium (Castelbaum AJ et al, 1997).. Because both estrogen and progesterone receptors are expressed in Ishikawa cells, Ishikawa cells show a good response to both estrogen and progesterone (Castelbaum AJ et al, 1997). Therefore, Ishikawa cells are used as a model for receptive endometrial epithelial cells (Hannan NJ et al, 2010).

      Castelbaum AJ, Ying L, Somkuti SG, Sun J, Ilesanmi AO, Lessey BA. Characterization of integrin expression in a well differentiated endometrial adenocarcinoma cell line (Ishikawa). J Clin Endocrinol Metab 1997; 82:136-142.

      Hannan NJ, Paiva P, Dimitriadis E, Salamonsen LA. Models for study of human embryo implantation: choice of cell lines? Biol Reprod. 2010; 82:235-245.

      Lessey BA, Ilesanmi AO, Castelbaum AJ, Yuan L, Somkuti SG, Chwalisz K, Satyaswaroop PG. Characterization of the functional progesterone receptor in an endometrial adenocarcinoma cell line (Ishikawa): progesterone-induced expression of the alpha1 integrin. J Steroid Biochem Mol Biol. 1996; 59:31-39.

      4) The activation of stroma cells in the fetal trisomy 16 experiments at the end is very superficial. Data should show that these cells decidualize with decidual markers. This appears to be an experiment to show the translational value of the signaling axis. This experiment, again, is not well developed, does not add much to the manuscript, and should be omitted.

      Yes, we agree with you. The description on human trisomy 16 was deleted.

      In summary, the concept of stroma cell activation as part of decidualization is nicely developed and will add to the field. Normally investigators consider decidualization a mesenchymal to epithelial transition while some consider it stromal activation. This manuscript demonstrates that stroma cell activation is a critical part of the process of decidualization.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors screen large libraries of small proteins to identify three proteins of <50 aa that rescue the growth of an auxotrophic serB deletion Escherichia coli strain. They convincingly show that the growth rescue is due to the small proteins increasing expression of the his operon by reducing transcriptional attenuation. The authors argue that the small proteins function by directly binding the his RNA 5' UTR to alter RNA secondary structure.

      The conclusion that the three small proteins reduce his operon attenuation is well supported by the data. A previous study suggested this mechanism for a somewhat larger, randomly selected protein, but the current study extends this prior work by firmly establishing that the proteins modulate attenuation. The suggestion that the small proteins function by directly binding the his RNA is less well supported by the data. The RNase T1 mapping data are not straightforward to interpret, and there is no assessment of protein-RNA interactions in vivo.

      Major comments:

      1) The RNase T1 probing data are not straightforward to interpret, and hence are insufficient to conclude that Hdp1 binding to the his 5' UTR is the mechanism by which it reduces attenuation. Specifically, G96 has reduced cleavage in the presence of Hdp1, inconsistent with the antiterminator conformation. The authors argue that G96 could be within the site of Hdp1 binding. This is certainly possible but would require additional experimental evidence to draw a confident conclusion. Also, the increased cleavage of bases around the start codon and Shine-Dalgarno sequence is inconsistent with a shift from the terminator to the antiterminator conformation. One confounding issue here is the lack of replicates and the lack of quantification. Additional probes could be tested, which would provide complementary structural information.

      We agree that the RNase T1 probing data alone does not provide sufficient resolution to fully assess changes in terminator/anti-terminator conformations. Therefore, we have clarified our interpretation of the data, addressed its limitations, and have softened the conclusions that can be drawn from it in the text (lines 419-431). We have also included two additional T1 probing experimental replicates in Supplementary Fig. S11 which are in agreement with the cleavage patterns presented in the main text Figure 3D. Based on the revised conclusions and the consistency of the cleavage patterns between the experimental replicates, we do not think that quantification of the probing data would provide any additional information.

      2) There are no experiments to test whether Hdp1 binds the his RNA in vivo. The in vitro data show that Hdp1 can bind the his RNA, but they do not show that this occurs in vivo, or that this is the mechanism by which Hdp1 regulates the expression of the his operon.

      As addressed in the Essential Revisions section, we have now performed and included data from co- immunoprecipitation assays, in which we were able to successfully detect and demonstrate enrichment of his operator-regulated RNA transcripts in HA-tagged Hdp1 pull-down samples. We were also able to demonstrate less enrichment (i.e. reduced interaction/specificity) for thr operator-regulated RNA transcripts in the Hdp1 pull-downs as well as lower enrichment for all his operator-regulated target RNA transcripts in pull-downs performed with the HA-tagged Hdp1 L27Q mutant. These data are presented in Fig. 3A and discussed in lines 313-337.

      Reviewer #2 (Public Review):

      In this work, Babina et al. address a central question in molecular evolution that is only partially answered: how does cellular novelty emerge in evolution? The authors focus here on small proteins, whose importance to various cellular functions has become more appreciated recently. Babina et al. ask if functional small proteins can emerge from random sequences, a question that is mostly unresolved with only a small number of examples in the published literature for such functions. In this study, the authors demonstrate that proteins selected from random, synthetic libraries can rescue auxotrophy in E. coli. Namely, the authors find three small, random proteins (<50 amino acids) that allow E. coli cells with a ΔserB genetic background to grow in a medium without the amino-acid serine. They then show that this rescue is based on the up-regulation of HisB, an enzyme that can compensate for the serB deletion. Finally, using different molecular biology techniques, the authors propose a model in which up-regulation of HisB is achieved by physical interactions between the random proteins and the his operator that regulates the transcription of the his operon in E. coli.

      Notably, as the authors themselves point out, a previous study has already shown that semi-random proteins can result in up-regulation of HisB levels to rescue ΔserB cells. Thus, most of the novelty comes from the attempt to figure out the molecular mechanism of the three random proteins. The idea that a random protein binds the 5' of an mRNA which results in up-regulated expression levels is interesting and can benefit the field. However, some clarification on existing data and additional control experiments are needed to support the authors' claims:

      1) Growth data are not presented in the current form of the manuscript, which makes it impossible to evaluate many of its claims. Especially, the extent of rescue and fitness gain achieved by these random proteins compared to cells harboring the serB gene.

      We thank the reviewer for pointing out this discrepancy. We have now added all relevant growth data under non-permissive conditions (Figure 1G, Supplementary Figures S2, S3, S5) and have also included data on the fitness effects exerted by Hdp expression in cells harboring serB under permissive conditions (LB medium), to allow for comparison with the empty plasmid control strain (Supplementary Figure S1).

      2) The authors have screened their library on other auxotrophic strains, however, they could only find random proteins that rescue growth in the ΔserB background. Currently, they do not address this point, but it might be relevant to the molecular mechanism of those random proteins.

      The reviewer raises an interesting point. We have added a paragraph to our Discussion addressing why we believe that the serB-model with a complementary enzyme is an ideal target for the selection of de novo genes (lines 536-543).

      3) Central to the authors' claims is the up-regulation of HisB, however, they mostly work with an alternative LacZ system to assess the effects of the random proteins on expression. The paper will benefit from some more work measuring actual HisB levels as expressed by the various constructs used along the paper. The authors did provide an important proteomic analysis to show that HisB (along with other proteins in the his operon) is up- regulated as a result of the expression of one of the random proteins. However, it is unclear if the reported ~3- fold increase in HisB levels is enough to allow the growth of ΔserB cells in a medium without serine.

      We thank the reviewer for raising this concern and allowing the opportunity to clarify. It is well established that upregulation of HisB can rescue growth of a SerB-deficient strain on minimal medium (for examples, see Patrick, et al. PMID: 17884825, Digianantonio and Hecht PMID: 26884172). We have now performed additional proteomics analyses that show a specific upregulation of the his operon upon expression of Hdp1 and Hdp3. We have also added a control experiment overexpressing HisB from our expression vector, showing that it restores growth of the auxotrophic ΔserB mutant. It is also clear that histidine starvation itself does not de-repress HisB sufficiently to allow growth of a ΔserB mutant, as this strain does not grow on minimal medium lacking histidine (such as M9 minimal medium that was used for the functional selection in our study). In addition to upregulation of HisB, we show that the rescue is dependent on presence of HisB and provide additional experiments showing a specific interactions in vitro and in vivo of Hdp1 with the his operator RNA. Our results clearly show that rescue depends on HisB and that Hdp expression upregulates HisB, and we do believe our central claim is substantiated beyond reasonable doubt. The reviewer’s main concern, that it is unclear if expression levels of HisB are high enough to allow growth is, in our opinion, resolved by the observation that Hdp-dependent upregulation of HisB does restore growth.

      We respectfully disagree with the reviewer’s suggestion that an exact determination of the level of upregulation is relevant and needed, as outlined above. In addition, we would like to point out that it is not possible to measure HisB upregulation compared to an empty plasmid control strain under non- permissive conditions. Comparing HisB levels in a ΔserB strain expressing Hdp vs. the empty plasmid control in minimal medium is not possible, since the empty plasmid control strain is not able to grow, and the corresponding baseline of HisB expression cannot be determined in a non-growing strain. To circumvent this, we determined HisB levels in rich medium, which does not necessarily reflect the exact amount of upregulation occurring under non-permissive conditions, but still allows us to detect a physiological activity. Alternative experimental setups, such as comparing HisB levels in a strain carrying serB in minimal medium also suffer severe shortcomings as it no longer reflects the cellular physiology of the auxotoph under non-permissive conditions, where growth is dependent on HisB upregulation.

      4) It is unclear how noisy and statistically significant some of the critical experiments in the manuscript are, especially the EMSA and T1-digestion experiments. The authors should try to find a different operator with a similar RNA structure and attenuation function, but a different nucleotide sequence, to the his operator, and show that this control operator is unaffected by the random proteins. Demonstrating the lack of phenotypes using the LacZ system, EMSA experiments, and T1-digestion patterns will much support the authors' claims.

      We thank the reviewer for suggesting this important control and agree that its inclusion significantly strengthens our claims. We used the threonine operon (thr) operator, which is regulated by terminator/anti-terminator formation similar to that of to the his operon with the his operator. We show that Hdp1 does not cause de-repression of this operator using a lacZ reporter construct. Strongly supporting this is the fact that our whole proteome analysis showed specific upregulation of the his operon. Any other off target de-repression would be detected in this assay. Furthermore, we now include the thr operator RNA as a control in the EMSAs, which demonstrates reduced binding with Hdp1 in comparison to the his operator RNA. We also added an in vivo pull-down experiment using tagged Hdp1, showing marked enrichment of his operator-regulated RNA transcripts, whereas the observed enrichment of the control thr RNA transcripts is substantially less.

    1. Author Response

      Reviewer #1 (Public Review):

      Thakkar et al describe the immune effects of 3rd and 4th doses of COVID-19 monovalent vaccines in a diverse cohort of immunocompromised cancer patients. They describe augmentation of anti-Spike antibodies after dose 3, especially seroconversion in 57% of patients, followed by a durable response over six months. The fourth dose was associated with increased anti-Spike antibodies in 67% of patients. T-cell responses were seen in 74% and 94% of patients after the third and fourth doses respectively. Strikingly, neutralization of Omicron was absent in all patients after the third dose but increased to 33% after the fourth dose.

      Strengths:

      Diverse cohort (34% Caucasian, 31% AA, 25% Hispanic 8% Asian) including 106 cancer patients after dose 3, of which 47 patients were longitudinally assessed for six months, as well as eighteen patients assessed after the fourth dose. Seronegative as well as seropositive patients benefit from a third dose of vaccination. Assessment of cellular (T cell) immune responses and viral neutralization against wild-type as well as Omicron variant is commendable.

      Weaknesses:

      The efficacy of the bivalent vaccine (Omicron specific) is not studied here, since the fourth dose of vaccine was a monovalent vaccine. This should be clarified in the discussion.

      We have added text in the discussion section regarding this comment, lines 470-472

      “The bivalent COVID-19 vaccine was introduced after the enrollment for our study was closed however it is reassuring to see that the bivalent vaccine has better neutralization activity against Omicron sub-variants”

      The authors describe an increase in anti-S titers after monoclonal antibodies. Were any of the patients receiving IVIG, and what was the effect, if any on Anti-S antibodies? Characteristics of breakthrough infections, particularly if they had prolonged duration, would be important to include.

      We have added text in the results section for IVIG (lines 382-383) and characteristics of breakthrough infections (lines 341-344)

      “No patients were on intravenous immunoglobulin (IVIG) at the time of study participation” “Of the 4 breakthrough infections, 1 patient had no symptoms, and 3 had mild symptoms”

      Reviewer #2 (Public Review):

      In this manuscript, Thakkar and colleagues evaluate the immunogenicity of 3rd and 4th doses of SARS-CoV2 vaccinations in patients with cancer. The authors find that additional vaccine doses are able to seroconvert a subset of patients and that antibody levels correlate with T-cell responses and viral neutralization.

      The main strengths of this manuscript are:

      1) The authors systemically performed a broad array of immunological assessments, including assessments of antibody levels, T cell activity, and neutralization assays, in a large cohort of patients with cancer receiving 3rd and 4th doses of COVID vaccines.

      2) The authors recruited an ethnically diverse cohort of patients with diverse cancer types, though enrolled participants were enriched for hematological malignancies.

      3) Prior to FDA/CDC guidance supporting a 4th vaccine dose, the authors recruited participants with no or inadequate responses into a prospective clinical trial of a 4th dose, the results of which are outlined here.

      4) The authors' findings that patients with hematologic malignancies and those receiving anti-CD20/BTK inhibitors have lower immunological responses to SARS-CoV-2 vaccines are consistent with multiple prior studies, including prior studies from these authors.

      5) The authors also find that 3rd and 4th COVID vaccine doses are able to seroconvert a subset of patients with no or "inadequate" responses, though it's unclear whether seroconversion is enough for true protection from SARS-CoV-2 infection.

      The main weaknesses of the manuscript include:

      1) The study cohorts disproportionately enrolled patients with hematological malignancies who have been previously shown to mount lower immunological responses to COVID-19 vaccines; thus, the findings may not be representative of a typical oncology patient population.

      We have clarified this in the discussion (lines 465-466)

      “However, caution should be exercised in generalizing these results to the broader immunosuppressed population given the small sample size of our cohort and the disproportionately high representation of hematologic malignancy patients”

      2) The subgroup analyses were relatively small.

      The discussion text in line 464-465 is in concordance with this observation

      “However, caution should be exercised in generalizing these results to the broader immunosuppressed population given the small sample size of our cohort and the disproportionately high representation of hematologic malignancy patients”

      3) The nomenclature used in the manuscript was confusing when it came to "baseline" assessments and boosters versus additional doses of vaccines.

      We have clarified the nomenclature throughout the manuscript

      4) Ultimately, the major limitation of this manuscript is that antibody levels/T-cell responses/neutralization are surrogates for immune protection against SARS-CoV-2, but it's unclear what defines the ideal cutoffs for protection. Simply seroconverting may still be insufficient. The authors don't provide data showing antibody levels as relates to breakthrough infection, likely because they are underpowered for this analysis.

      We have added text to expand on this further lines 475-482

      “Further efforts are also needed to better determine cut-off values at which anti-S antibody levels provide protection from symptomatic COVID-19. At the present time, this data exists only for neutralizing antibody titers[36, 44] and the commercially available anti-S antibody assays are quite heterogenous with efforts being made to improve equivalency in titer reporting[45]. Our study while providing a correlation between anti-S antibody titer and neutralizing antibody titer supports that the higher the titer, the better neutralization is expected and by extrapolation, less likelihood of symptomatic infection however this needs to be confirmed in larger, systematic studies”.

    1. Author Response

      Reviewer #3 (Public Review):

      Zhang, Q. et al. developed a two-photon fluorescence microscope (2PFM) by incorporating direct wavefront sensing adaptive optics (AO), which is optimized for mouse in vivo retinal imaging. By using the same 2PFM with the option of using or not using the incorporated AO system, this team compared the in vivo retinal images and convincingly demonstrated that AO correction acquired brighter and higher resolution images of retinal ganglion cells (RGCs) and their axons in both densely and sparse labeled transgenic mouse lines, normal and defected capillary vasculatures, and RGC spontaneous activities detected by genetic Ca2+ sensor. Interestingly and importantly, this team found that a global correction by removing the common aberration from the entire FOV enhances imaging signals throughout the entire large FOV, indicating a preferable AO imaging strategy for large FOVs. The potential applications of the in vivo retinal imaging techniques and strategies developed by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal vasculatures and neurons during disease progression and before and after treatments. It would be beneficial to the manuscript and the readers if the authors can elaborate on optic design a little bit more. For example, whether the incorporation of AO adversely affects the 2PFM optic design? If the 2PFM can be further optimized by uncompromised optic design without incorporating AO, the quality of in vivo images will comparable to the AO-2PFM or not?

      We thank the reviewer for these thoughtful questions.

      Whether the incorporation of AO adversely affects 2PFM optical design may be a matter of perspective. As we demonstrated in the retina and elsewhere, AO substantially improves the achievable spatial resolution. Its incorporation does not reduce the temporal resolution of the system, as the ocular aberrations are temporally stable in the anesthetized mouse due to the lack of eye movement and do not require repeated aberration measurements throughout the imaging session. Signal enhancement by AO can increase the frame rate by reducing pixel dwell time required to achieve desired signal-to-noise ratio (SNR). The deformable mirror used for wavefront correction has high reflectivity, thus does not reduce the power throughput of the 2PFM. Using similar lenses for conjugation of the AO path to those employed by the 2PFM itself, we also maintain the scanning field of view size.

      However, the incorporation of AO, including the direct wavefront sensing module (the “L10-L11-SH-sensor” path in Fig. 1A) and the deformable mirror (together with a pair of lenses for optical conjugation), does increase the complexity of the imaging system. Maintaining the optimal performance of AO also requires advanced optical knowledge that may not be possessed by most biological users.

      For this reason, we carefully designed the 2PFM path for optimal imaging performance without AO, characterized its performance (“AO two-photon fluorescence microscope (AO-2PFM)” and “System correction” sections of Materials and Methods, Fig. S1), and optimized sample preparation including designing our own contact lens (“In vivo imaging” section of Materials and Methods, Fig. S2). Our efforts, which we believe to have led to the best possible performance of a 2PFM sans AO, allowed us to resolve retinal capillaries and cell bodies (in 2D) in vivo. Therefore, our 2PFM (sans AO) design and sample preparation procedure should benefit users who do not plan to implement AO.

      Hypothetically, if the ocular aberrations of all mouse eyes were similar, it would be possible to add a static corrective element to a conventional 2PFM to improve image resolution (in the same spirit as the non-prescription reading glasses for far-sighted human eyes). However, as shown in Fig. S6 (“Zernike decompositions and corrective wavefronts for all experiments”), ocular aberrations are variable. These variabilities may arise from alignment differences (e.g., different angles between the optical axis of the ocular optics and the optical axis of the 2PFM), which can be minimized by establish a procedure to reproducibly position the eyes of different mice in similar ways. In this case, a static corrective element may be designed for substantial aberration reduction. However, the variations also arise from optical differences in the ages [1] or strains [2] of the mice. To have a 2PFM that always performs at the diffraction limit, an adaptive element as employed by AO is necessary to maintain optimal performance regardless of the specifics of the sample.

      References

      1. C. Cheng, J. Parreno, R. B. Nowak, S. K. Biswas, K. Wang, M. Hoshino, K. Uesugi, N. Yagi, J. A. Moncaster, W.-K. Lo, B. Pierscionek, and V. M. Fowler, "Age-related changes in eye lens biomechanics, morphology, refractive index and transparency," Aging (Albany. NY). 11(24), 12497–12531 (2019).
      2. C. Tan, H. na Park, J. Light, K. Lacy, and M. Pardue, "Strain differences in mouse lens refractive indices when measured with OCT," Invest. Ophthalmol. Vis. Sci. 54(15), 1917 (2013).
    1. Authoor Response

      Reviewer #1 (Public Review):

      This manuscript investigates the question of how polylysogeny impacts competition with a sensitive non-lysogen, and how this is shaped by phage resistance. This is an important and timely question, as lysogeny can be a strategy to invade new niches, and prophages are important vehicles for the acquisition of a range of virulence factors by pathogens including Klebsiella. The authors use a polylysogenic Klebsiella clone in competition with a non-lysogen that is sensitive to at least some of the prophages produced by the polylysogen. They compete these strains over a 30-day period and measure host population dynamics and evolution of phage resistance and lysogenic conversion in the (initially) sensitive competitor. Overall, the experiment shows that lysogen formation is relatively rare and short-lived. Instead, phage resistance through complete loss of the capsule is the primary mechanism evolving, but other resistant capsule mutants, with more subtle mutations affecting capsule expression, emerge as well. The authors have collected a very impressive amount of data and made some very interesting observations.

      My main problem with this paper is that the manuscript lacks a clear narrative, making it very hard to extract the key message this paper wants to convey. Related to this, (some of) the conclusions that the authors make do not appear to be well supported by the data. For example, the authors conclude that selection favours more subtle capsule mutations because they are less costly than capsule-loss mutants (lines 497-500). However, there are no data to support this conclusion, as fitness costs of the various resistance phenotypes analysed were not measured. Apart from the genotypes, the data that are presented in this show that these subtle mutants have more subtle decreases in capsule production compared to the mutants that show a complete loss of capsule. But this does not tell us their relative cost. It also doesn’t tell us how the emergence of these different mutants relates to phage pressure, because whilst bacterial population dynamics data are monitored meticulously, phage dynamics data are missing (I have not found them in the supplemental information either). This makes it impossible to directly relate the emergence of the various resistance mechanisms to phage infection pressure during the coevolution experiment, even though this appears to be a hypothesis the authors wish to test.

      Overall I think the overarching question of the manuscript is important and the model system is a very relevant one to study this question, but in my view, the current data don’t support the conclusions of the paper. Apart from these criticisms, the manuscript is very well written and the figures are overall easy to interpret.

      We thank the reviewer for the critical assessment of our work and the time invested in the process. We have modified our manuscript following the recommendations, provided new data and we are convinced that our main results are now fully supported by the data.

      Reviewer #2 (Public Review):

      This manuscript presents data on multiple experiments regarding the co-evolution of poly-lysogenic and phage-susceptible Klebsiella pneumoniae strains. In particular, the manuscript aimed to determine the mechanisms of resistance that would shape bacterial competition over co-evolutionary timescales. The major finding is that the potential for lysogenization as a phage resistance mechanism is narrow and only likely to occur given certain circumstances. Moreover, the manuscript again reinforces the importance of receptor changes -initially loss, but modification in structure or expression over longer time scales- as a major mechanism of phage resistance that influences bacterial competition.

      Strengths

      A major strength of this manuscript is the care in designing experiments and conducting follow-up experiments to isolate the essential elements to support each of the conclusions. This includes using orthogonal methods such as sequencing and modeling to support or expand the findings from culturing and experimental evolution. The study features results that were beautifully replicated (e.g. Figure 3) lending confidence to the findings.

      Weaknesses

      Two weaknesses of the manuscript in its current form are: 1) a need to discuss other studies that also have found context-dependent results and 2) more focus on delivering the key overall "message" of the paper to the reader. Finally, not a weakness, but a (necessary) limitation is the study system, but this manuscript sets a bar for other groups to test in their systems to probe the generality of the findings.

      The support for the conclusions is compelling. The findings were counter to the initial expectation (lysogenization as a major feature) and the manuscript does an admirable job of supporting the unexpected conclusion with thorough experimental work, supplemented with modeling.

      This manuscript will be of great significance in microbial evolution, both for its implications in limiting the scope of lysogenization as a viable phage resistance mechanism in the long term and for its significant experimental rigor, particularly with regard to the co-evolutionary timescale studied. The study has very important implications for the evolution of antimicrobial resistance and phage therapy.

      We thank the reviewer for the time spent and enthusiasm towards our experimental set-up.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors conducted a thorough analysis of the correlation between height and measures of cognitive abilities (what are essentially IQ test components) across four cohorts of children and adolescents in the UK measured between 1957 and 2018. The authors find the strength of the association between height and cognitive measures declined over this time frame--for example, among 10- and 11-year-olds born in 1958, height explained roughly 3% of the variation in verbal reasoning scores; this dropped to approximately 0.6% among those born in 2001. These associations were further attenuated after accounting for proxy measures of social class.

      The authors' analyses were performed carefully and their observations regarding declining height / cognitive measure associations are likely to be robust if we interpret their results with an important caveat: these results reflect measurements aimed at assessing cognition rather than cognition itself. The importance of this distinction is evidenced by the changing correlation structure of the cognitive measures over time. For example, age 11 verbal / math scores were correlated at >= 0.75 at the first two time points but dropped to 0.33 at the most recent time point. Similar patterns are present for the other cognitive measures and time points. The authors' conclude that such changes are unlikely to impact their primary findings, but I'm less certain. For example, one interpretation of this finding is that older cognitive measures were simply worse at indexing distinct cognitive domains and instead reflected a combination of cognitive ability together with non-specific factors relating to opportunity, health, class, etc. Further, height was historically a stronger proxy for class and economic status than it is today (e.g., by capturing adequate nutritional intake, risk for childhood disease, etc.). Together, then, previously high height / cognitive measure correlations might reflect the fact that both phenotypes previously indexed socio-economic factors to a greater extent than they might today (which is still non-negligible).

      We agree, it is possible that our results could in principle be explained by changes to the measures. We have provided further analysis to attempt to inform the likelihood of this suggestion and have expanded our discussion of this issue (Discussion, explanation of findings section; copied below).

      First, we conducted additional sensitivity analysis repeating our main analysis using cognition measures in which the number of response options was set to be the same for each test (the lowest common denominator across all cohorts). This was tested in two separate approaches: 1) by reducing the number of categories to the same number in each cohort; and 2) or by picking a random sample of question items for each category. Our main findings were unchanged: described in “Additional and sensitivity analyses” section, Figs S20-S21.

      Regarding the suggestion that “high height / cognitive measure correlations might reflect the fact that both phenotypes previously indexed socio-economic factors to a greater extent than they might today” – we sought to account for this by adjustment for measured indicators of socioeconomic position, and found the trend remained after adjustment (Fig 1 panel 2). As in other observational studies we cannot fully rule out the possibility of residual confounding however (Discussion, Explanation of findings paragraph 2).

      “The multi-purpose and multidisciplinary cohorts used cognition tests which differed slightly in each cohort. It is therefore possible that differences in testing could have either: 1) entirely generated the pattern of results we observed, such that if identical tests were used the association between cognition and height would otherwise have been identical in each cohort; in contrast to previous findings which reported using identical tests20; or 2) biased our results, such that if identical tests were used the decline in association between cognition and height would have been less marked than we reported. While we cannot directly falsify this alternative hypothesis given our reliance on historical data sources, a number of lines of reasoning suggest that the first scenario is unlikely. First, our results were similar when using 4 different cognitive tests (spanning mathematical and verbal reasoning); any bias which generated the results we observed should be similarly present across all 4 tests. Other things being equal, one would expect that more discriminatory tests (i.e., those with a greater number of responses) would have higher accuracy and thus better index cognition. Our results were similar when the youngest cohort had similar numbers of unique scores in cognitive tests compared with the oldest cohort (Verbal @ 11 years: n=41 in 1946c, n=40 in 2001c) and fewer unique scores (Maths @ 7/11: n=51 in 1946c, n=21 in 2001c). Our results were also similar in sensitivity analyses in which the number of response options were set to be the same in each cohort. Higher random measurement error in the independent variable (cognition) would lead to weakened observed associations with the outcome (height),52 yet we do not a-priori anticipate that this such error was higher in younger across all tests in such a manner that would have led to the correlation we observed. Ensuring comparability of exposure is a major challenge across such large timespans. Reassuringly, our results are consistent with those from a previous study which reported consistent tests being used (from 1939-1967).20 However, even seemingly identical require modification across time (e.g., for verbal reasoning/vocabulary there is typically a need to adapt question items due to societal and cultural changes over time in vocabulary and numerical use); further, changes to education such as increases in testing may have led to increasing preparedness and familiarity with testing than in the past even where identical tests are used.

      Interestingly, we observed a marked reduction in the correlation between cognitive tests across time (e.g., between verbal and maths scores). This trend has been reported in previous studies53 54 and warrants future investigation; it is consistent with evidence that IQ gains across time seemingly differ by cognitive domain,45 potentially capturing differences across time in cognitive skill use and development in the population. Previous studies using three (1958-2001c) of the included cohorts have also reported changing associations between cognition (verbal test scores at 10/11 years) and other traits: a declining negative association with birth weight19 and a change in direction of association with maternal age (from negative to positive);55 each finding has plausible explanations based on changes across time in relevant societal phenomena (improved medical conditions19 and changes in parental characteristics,55 respectfully), yet also cannot conclusively falsify the notion that differences in tests used influences the results obtained. In this paper, we used multiple tests and sensitivity analyses to attempt to address this.”

      Additionally, their findings add an interesting data point to a collection of recent results suggesting that the relationship between cognitive and anthropometric measures is complex and difficult to interpret. For example, studies using genetic markers to examine shared genetic bases have virtually all relied on methods assuming mating is random, which is not the case empirically. Howe et al. (doi.org/10.1038/s41588-022-01062-7) recently reported that the ostensible genetic correlation of -.32 between years of education and BMI attenuates to -.05 when using direct-effect estimates, which should theoretically be immune to the effects of non-random mating and other confounding variables. Likewise, Keller et al. (doi.org/10.1371/journal.pgen.1003451) and Border et al. (doi.org/10.1101/2022.03.21.485215) used very different approaches to arrive at the same conclusion that ~50% of the nominal genetic correlation between IQ and height could be attributed to bivariate assortative mating rather than shared causal biological factors. Given that assortative mating on both IQ measures and height involves many other traits (not just two as assumed in such bivariate models), the true extent to which height / IQ correlations reflect causal factors is plausibly even lower than these estimates suggest. For these reasons, I do not entirely agree with the authors' review of previous findings in the introduction, where they write "recent studies have suggested that links between higher cognition and taller height can be largely explained by genetic factors", though it is certainly true that this claim has been made.

      We have revised our introduction to better reflect the complexity of previous findings and to note that this claim.

      Reviewer #2 (Public Review):

      The authors use birth cohorts with extensive cognitive assessments and height measurements along with data on parental height and socioeconomic status. The authors estimate that the correlation between height and cognitive ability has approximately halved in the last 60 years.

      Quantile regression results suggest that this is due to a stronger association between low cognitive ability and short stature in older cohorts, potentially due to environmental factors that cause both and that have been removed by improvements in the environment in the last 60 years.

      While this is a plausible hypothesis, the evidence presented in the manuscript is unable to rule out alternative hypotheses, such as changes in assortative mating.

      The results in the manuscript will be of interest to researchers investigating how genetics and environment lead to correlations between cognitive and physical/health traits, and to researchers interested in the relationship between social and health inequalities.

      While my sense of the evidence presented is that there is fairly solid statistical evidence for a trend where the correlation between cognitive ability and height declines over time, there is no formal quantification of this trend nor measurement of the uncertainty in the trend.

      We now include additional statistical tests to compare estimates in each cohort (Fig S6). We have opted to include this in supplemental material given the large number of tests included already.

      Similarly, the quantile regression plots in Figure 2 appear to show a trend across the height deciles for the two oldest cohorts, but no quantification of how strong this is nor what uncertainty exists is calculated. Furthermore, if the apparent trend in the quantile regression plots is true, wouldn't this imply a non-linear association between height and cognitive ability for the older cohorts? Can this be seen in the scatterplots or in a non-linear regression?

      We included 95% confidence intervals in our quantile regression analyses which provide an indication of uncertainty. We believe that given the substantial amount of analyses (across 4 historical cohorts and 4 cognition tests; 23 supplemental results) further work would be best placed to undertake additional statistical exploration of both quantile regression and non-linear associations. We would be happy to reconsider this if requested.

      I think the authors could have done more with their data to investigate the contribution of assortative mating to the observed trend. Looking at Figure S4, it looks like the correlation between mother's education and father's height in the 2001 cohort is substantially lower than for previous cohorts. While cognitive ability may not be available for parents, one could look at, for example, father's education and mother's height across the cohorts and see if there is a downward trend in correlation.

      We now include in Figure S5 cross-cohort investigation of the correlation between parental height and maternal education. We find that the correlation is similar across 1946c, 1958c, and 1970c, yet is weaker in 2001c (Fig S5). We comment on this in the paper (see revised discussion, explanation of findings section). Interpretation of these results is complicated by measurement error in parental education (typically reported for both parents by mothers). Further, interpretation may be further complicated by reductions in the socioeconomic patterning of height across time (see https://www.thelancet.com/journals/lanpub/article/PIIS2468-2667(18)30045-8/fulltext). Future would which focuses on assortative mating could investigate these issues.

      Reviewer #3 (Public Review):

      A difficulty with the paper is the different cognitive tests used in the different cohorts; the authors address this at some length in the discussion. However, I am afraid that this matter makes the results hard or impossible to interpret along the lines of their research question. One would need to know that, if these cognitive tests were administered in a single cohort at one time, they would have the same correlation with height.

      Please see our responses to Reviewer 1 and our revised Discussion. We are reliant upon imperfect historical data to make inferences on long-run trends, in the absence of ideal data for this paper (eg, the same tests used in all cohorts born in 1946, 1958, 1970 and millennium; though even in this instance some changes would be required (eg, to the words chosen in verbal reasoning tasks; see Discussion, explanation of findings section)).

      I judge that the main limitation of the method is the fact that different cognitive tests are used in the different cohorts. The tests in themselves are valid tests of cognitive functions. However, given that the focus of the study is on the change in correlations across time, then it is a worry that the tests are different; that is, the authors have the burden of proving to us that, if the environmental/social changes had NOT been operative across time, then the height-cognitive test correlations would be the same. What can the authors do to prove to us that if, say, all of these different-cohort verbal tests had been given to a single cohort on a single occasion, then they would have the same correlations with height? The same goes for the mathematics based tests. I note the tests' somewhat different distributions in Figure 1, but that is not the only thing that could lead to different correlations with, say, height. I am aware that all cognitive tests tend to correlate positively and that they all have loadings on general intelligence; however, different tests will not necessarily have the same correlations with outside variables (e.g. height). This will depend on things such as their content, their reliability/internal consistency etc.

      In the Results the authors state: "Cognitive test scores were strongly-moderately positively correlated with each other, with the size of the correlation weakening across time." That's true, but perhaps, also a major concern for this study. One possible reason for the decline in verbal-maths test correlations across cohorts (old to recent) is that the nature of these tests has changed across time, either/both in terms of content (what capabilities are assessed) or something such as reliability/internal consistency/ceiling-or-floor effects (how well the capabilities are assessed). That is, given that the height-cognitive test correlations show a similarly declining pattern of correlations over cohorts, it could be that the tests' contents (of the different tests) is partly or wholly responsible. I raise that as a possibility only, and I appreciate that it might be correct, as the authors prefer, that there is an inherent lowering of intelligence-height correlations over time, but I do not think that one can rule out-with the present study's design-that it might have been due to the change in tests. For example, a reading-math correlation of 0.74 in 1946 lowered to a correlation of .32 in 2001, in the face of different tests. To show that this is not due to the different tests being used would require more information. If this is a true result, it is big news.

      Please see our responses to Reviewer 1. This includes additional analysis and an expanded discussion of this possible cause of bias. We hope our manuscript now provides further evidence and discussion to inform the likelihood of this possibility.

      I have a suggestion: if the authors wish to rule out the possibility that the lowering intelligence-height correlations across cohorts are due to different cognitive tests being used, they should take all the cognitive tests used here and apply them cross-sectionally to single-year-born samples (of 11- and 16-year olds) that have also been measured for height. If the cognitive tests all correlate at the same level with height within each of these two samples (they needn't do so across the 11- and 16-year olds), then one could proceed more safely with between-cohorts (1946, 1958, 1970, 2001) comparisons of the correlations.

      We thank the reviewer for this suggestion. However we are unsure that we understood the suggested analysis or whether it was tractable given our data—the cohorts we used were born in either 1946, 1958, 1970, or around 2000. We do not have cross-sectional samples of 11 and 16 year olds at the same time.

    1. Author Response:

      Dear eLife Editorial Board, dear reviewers, dear readers,

      We very much thank the eLife editors and reviewers for their overall very positive review and encouraging assessment of our manuscript, and for highlighting our study’s innovation and relevance for using genomic approaches for the conservation of biodiversity.

      We very much thank the reviewers for pointing out parts of the manuscript that could be described more clearly or in more detail to make the study fully reproducible, and have therefore rewritten parts of the manuscript. We importantly follow reviewer 1’s specific recommendation to focus the main text on clearly understandable results, and therefore now only showcase the application of selective nanopore sequencing (aka adaptive sampling) to one soil sample, which we hope will make the flow of the manuscript easier to understand.

      We further agree that parts of the study could have been conducted more extensively (e.g. include more samples and thereby showcase the broad applicability of the approach), which was unfortunately not feasible since I as the lead author left New Zealand to take up another position abroad. We are, however, following up on this work with another controlled large-scale study.  

      We further agree that both qPCR and metabarcoding have their advantages and disadvantages. Metabarcoding approaches, however, importantly deliver more information about the biodiversity of a location than just the presence of a single species; this, in our case, includes other endangered species and evidence of kākāpō predators. We further show that the chosen marker gene region (12S rRNA) is species-specific enough to distinguish kākāpō from its two closest relatives. While qPCR has been shown to be more sensitive for some species, the difference is often minimal (see e.g., Harper et al., Ecol Evol. 2018 Jun; 8(12): 6330–6341), and for some species has been shown to be equally sensitive (Schneider et al., PLoS ONE 2016, 11, e0162493). qPCR approaches further require the careful design of species-specific primers, and herewith the access to samples and DNA of the target species and of closely related species – all of which are not necessarily at hand, especially not for conservationists who want to use these approaches regularly in the future, and in countries like New Zealand where genomic work with material from any “treasured” species has to be approved in a long and detailed process according to national regulations and the Nagoya Protocol. Given all these reasons, and the general good performance of our metabarcoding approach (also in detecting our species of interest), we do not see the necessity of applying a qPCR approach in this study.

      To avoid any confusion, we now also describe the samplings sites in more detail and use their labels consistently throughout the manuscript. Briefly, the sites were always sampled directly at the site, and at 4m and 20m distance, and all in replicates, as described in detail in the manuscript. Specifically, the “abandoned nests” had only been abandoned ~30 days before sampling, as described in the Methods, and this is why kākāpō DNA is still present.

      We further thank reviewer 2 for suggesting to discuss the impact of selective nanopore sequencing on pore efficiency in more depth, and added a respective sentence to the Discussion. We in general added more references and the broader scientific context to the Discussion.

      Thank you again for this very helpful review of our work.

      With best regards,<br /> Lara Urban

    1. Author Response:

      We are grateful for the detailed feedback provided by the two anonymous reviewers. We provide a point-by-point response to their reviews below:

      Reviewer #1 (Public Review):

      Medwig-Kinney et al perform the latest in a series of studies unraveling the genetic and physical mechanisms involved in the formation of C. elegans gonad. They have paid particular attention to how two different cell fates are specified, the ventral uterine (VU) or anchor cell (AC), and the behaviors of these two cell types. This cell fate choice is interesting because the anchor cell performs an invasive migration through a basement membrane. A process that is required for correct C. elegans gonad formation and that can act as a model for other invasive processes, such as malignant cancer progression. The authors have identified a range of genes that are involved in the AC/VC fate choice, and that imparts the AC cell with its ability to arrest the cell cycle and perform an invasive migration. Taking advantage of a range of genetic tools, the authors show that the transcription factor NHR-63 is strongly expressed in the AC cell. The authors also present evidence that NHR-63 is could function as a transcriptional repressor through interactions with a Groucho and also a TCF homolog, and they also suggest that these proteins are forming repressive condensates through phase separation.

      The authors have produced an extensive dataset to support their two primary claims: that NHR-67 expression levels determine whether a cell is invasive or proliferative, and also that NHR-67 forms a repressive complex through interactions with other proteins. The authors should be commended for clearly and honestly conveying what is already known in this area of study with exhaustive references. But absent data unambiguously linking the formation and dissolution of NHR-67 condensates with the activation of downstream genes that NHR-67 is actively repressing, the novelty of these findings is limited.

      Response 1.1: We thank the reviewer for recognizing the extensive dataset we provide in this manuscript in support of our claims that, (1) NHR-67 expression levels are important for distinguishing between AC and VU cell fates, and (2) NHR-67 interacts with transcriptional repressors in VU cells. We acknowledge that a complete mechanistic understanding of the functional significance of NHR-67 puncta is not possible without knowing direct targets of NHR-67 in the AC. Unfortunately, tools to identify transcriptional targets in individual cells or lineages in C. elegans do not exist, and generation of such tools would be beyond the scope of this work. This is evidenced by the fact that the first successful attempt to transcriptionally profile the AC was only posted as a preprint one month ago (Costa et al., doi: 10.1101/2022.12.28.522136). It is our hope that the findings we present here can be integrated with future AC- and VU-specific profiling efforts to provide a more complete picture of the functional significance of NHR-67 subnuclear organization.

      Reviewer #2 (Public Review):

      Medwig-Kinney et al. explore the role of the transcription factor NHR-67 in distinguishing between AC and VU cell identity in the C. elegans gonad. NHR-67 is expressed at high levels in AC cells where it induces G1 arrest, a requirement for the AC fate invasion program (Matus et al., 2015). NHR-67 is also present at low levels in the non-invasive VU cells and, in this new study, the authors suggest a role for this residual NHR-67 in maintaining VU cell fate. What this new role entails, however, is not clear. The model in Figure 7E shows NHR-67 switching from a transcriptional activator in ACs to a transcriptional repressor in VUs by virtue of recruiting translational repressors. In this model, NHR-67 actively suppresses AC differentiation in VU cells by binding to its normal targets and acting as a repressor rather than an activator. Elsewhere in the text, however, the authors suggest that NHR-67 is "post-translationally sequestered" (line 450) in nuclear condensates in VU cells. In that model, the low levels of NHR-67 in VU cells are not functional because inactivated by sequestration in condensates away from DNA. Neither model is fully supported by the data, which may explain why the authors seem to imply both possibilities. This uncertainty is confusing and prevents the paper from arriving at a compelling conclusion. What is the function, if any, of NHR-67 and so-called "repressive condensates" in VU cells?

      Response 2.1: As the reviewer correctly notes, we present two possible models in this manuscript. The interaction between NHR-67 and the Groucho/TCF complex in the VU cells could (1) switch the role of NHR-67 from a transcriptional activator to a transcriptional repressor, or (2) sequester NHR-67 away from its transcriptional targets. Indeed, we cannot definitively exclude the possibility of either model. In our resubmission, we will attempt to make this more clear in the text and by presenting both possible models in the summary figure (Fig. 7E).

      Below we list problems with data interpretation and key missing experiments:

      1) The authors report that NHR-67 forms "repressive condensates" (aka. puncta) in the nuclei of VU cells and imply that these condensates prevent VU cells from becoming ACs. Fig. 3A, however, shows an example of an AC that also assemble NHR-67 puncta (these are less obvious simply due to the higher levels of NHR-67 in ACs). The presence of NHR-67 puncta in the AC seems to directly contradict the author's assumption that the puncta repress the AC fate program. Similarly, Figure 5-figure supplement 1A shows that UNC-37 and LSY-22 also form puncta in ACs. The authors need to analyze both AC and VU cells to demonstrate that NHR-67 puncta only form in VUs, as implied by their model.

      Response 2.2: The puncta formed by NHR-67 in the AC are different in appearance than those observed in the VU cells and furthermore do not exhibit strong colocalization with that of UNC-37 or LSY-22. The Manders’ overlap coefficient between NHR-67 and UNC-37 is 0.181 in the AC, whereas it is 0.686 in the VU cells. Likewise, the Manders’ overlap coefficient between NHR-67 and LSY-22 is 0.189 in the AC compared to 0.741 in the VU cells. We speculate that the areas of NHR-67 subnuclear enrichment in the AC may represent concentration around transcriptional targets, but testing this would require knowledge of direct targets of NHR-67.

      2) While a pool of NHR-67 localizes to "repressive condensates", it appears that a substantial portion of NHR-67 also exists diffusively in the nucleoplasm. This would appear to contradict a "sequestration model" since, for such a model to work, a majority of NHR-67 should be in puncta. What proportion of NHR-67 is in puncta? Is the concentration of NHR-67 in the nucleoplasm lower in VUs compared to ACs and does this depend on the puncta?

      Response 2.3: The proportion of NHR-67 localizing to puncta versus the nucleoplasm is dynamic, as these puncta form and dissolve over the course of the cell cycle. However, we estimate that approximately 25-40% of NHR-67 protein resides in puncta based on segmentation and quantification of fluorescent intensity of sum Z-projections. We also measured NHR-67 concentration in the nucleoplasm of VU cells and found that it is only 28% of what is observed in ACs (n = 10). We disagree with the notion that the majority of NHR-67 protein should be located in puncta to support the sequestration model. As one example, previously published work examining phase separation of endogenous YAP shows that it is present in the nucleoplasm in addition to puncta (Cai et al., 2019, doi: ​​10.1038/s41556-019-0433-z). In our system, it is possible that the combination of transcriptional downregulation and partial sequestration away from DNA is sufficient to disrupt the normal activity of NHR-67.

      3) The authors do not report whether NHR-67, UNC-37, LSY-22, or POP-1 localization to puncta is interdependent, as implied in the model shown in Fig. 7.

      Response 2.4: It is difficult to test whether localization of these proteins to puncta is interdependent, as perturbation of UNC-37, LSY-22, and POP-1 result in ectopic ACs. Trying to determine if loss of puncta results in VU-to-AC transdifferentiation or vice versa becomes a chicken-egg argument. It is also possible that UNC-37 and LSY-22 are at least partially redundant in this context. We based our model, shown in Fig. 7E, on known or predicted protein-protein interactions, which we confirmed through yeast two-hybrid analyses (Fig. 7D; Fig. 7-figure supplement 1).

      4) The evidence that the "repressor condensates" suppress AC fate in VUs is presented in Fig. 4D where the authors deplete the presumed repressor LSY-22. First, the authors do not examine whether NHR-67 forms puncta under these conditions. Second, the authors rely on a single marker (cdh-3p::mCherry::moeABD) to score AC fate: this marker shows weak expression in cells flanking one bright cell (presumably the AC) which the authors interpret as a VU AC transformation. The authors, however, do not identify the cells that express the marker by lineage analyses and dismiss the possibility that the marker-positive cells could arise from the division of an AC-committed cell. Finally, the authors did not test whether marker expression was dependent on NHR-67, as predicted by the model shown in Fig. 7.

      Response 2.5: For the auxin-inducible degron experiments, strains contained labeled AID-tagged proteins, a labeled TIR1 transgene, and a labeled AC marker. Thus, we were limited by the number of fluorescent channels we could co-visualize and therefore could not also visualize NHR-67 (to assess for puncta formation) or another AC marker (such as LAG-2). We could have generated an AID-tagged LSY-22 strain without a fluorescent protein, but then we would not be able to quantify its depletion, which this reviewer points out is important to measure. We did visualize NHR-67::GFP expression following RNAi-induced  knockdown of POP-1 and observed consistent loss of puncta in ectopic ACs. However, this again becomes a chicken-egg argument as far as whether cell fate change or loss of puncta causes the other.

      5) Interaction between NHR-67 and UNC-37 is shown using Y2H, but not verified in vivo. Furthermore, the functional significance of the NHR-67/UNC-37 interaction is not tested.

      Response 2.6: We attempted to remove the intrinsically disordered region found at the C-terminus of the endogenous nhr-67 locus, using CRISPR/Cas9, as this would both confirm the NHR-67/UNC-37 interaction in vivo and allow us to determine the functional significance of this interaction. However, we were unable to recover a viable line after several attempts, suggesting that this region of the protein is vital.

      6) Throughout the manuscript, the authors do not use lineage analysis to confirm fate transformation as is the standard in the field.

      Response 2.7: The timing between AC/VU cell fate specification and AC invasion (the point at which we look for differentiated ACs) is approximately 10-12 hours at 25 °C. With our imaging setup, we are limited to approximately 3-4 hours of live-cell imaging. Therefore, lineage tracing was not feasible for our experiments. Instead, we relied on visualization of established markers of AC and VU cell fate to determine how ectopic ACs arose. In Fig. 6B,C we show that the expression of two AC markers (cdh-3 and lag-2) turn on while a VU marker (lag-1) get downregulated within the same cell. In our opinion, live-imaging experiments that show in real time changes in cell fate via reporters was the most definitive way to observe the phenotype.

      There are 4 multipotential gonadal cells with the potential to differentiate into VUs or ACs. Which ones contribute to the extra ACs in the different genetic backgrounds examined was not determined, which complicates interpretation. The authors should consider and test the following possibilities: disruption of NHR-67 regulation causes 1) extra pluripotent cells to directly become ACs early in development, 2) causes VU cells to gradually trans-fate to an AC-like fate after VU fate specification (as implied by the authors), or 3) causes an AC to undergo extra cell division(s)?? In Fig. 1F, 5 cells are designated as ACs, which is one more that the 4 precursors depicted in Fig. 1A, implying that some of the "ACs" were derived from progenitors that divided.

      Response 2.8: When trying to determine the source of the ectopic ACs, we considered the three possibilities noted by the reviewer: (1) misspecification of AC/VU precursors, (2) VU-to-AC transdifferentiation, or (3) proliferation of the AC. We eliminated option 3 as a possibility, as the ectopic ACs we observed here were invasive and all of our previous work has shown that proliferating ACs cannot invade and that cell cycle exit is necessary for invasion (Matus et al., 2015; Medwig-Kinney & Smith et al., 2020; Smith et al., 2022). Specifically, NHR-67 is upstream of the cyclin dependent kinase CKI-1 and we found that induced expression of NHR-67 resulted in slow growth and developmental arrest, likely because of inducing cell cycle exit. For our experiment using hsp::NHR-67, we induced heat shock after AC/VU specification. For POP-1 perturbation, we explicitly acknowledged that misspecification of the AC/VU precursors could also contribute to ectopic ACs (Fig. 6A; lines 364-402). We could not achieve robust protein depletion through delayed RNAi treatment, so instead we utilized timelapse microscopy and quantification of AC and VU cell markers (Fig. 6B,C; see response 2.7 above).

      In conclusion, while the authors report on interesting observations, in particular the co-localization of NHR-67 with UNC-37/Groucho and POP-1 in nuclear puncta, the functional significance of these observations remains unclear. The authors have not demonstrated that the "repressive condensates" are functional and play a role in the suppression of AC fate in VU cells as claimed. The colocalization data suggest that NHR-67 interacts with repressors, but additional experiments are needed to demonstrate that these interactions are specific to VUs, impact VU fate, and sequester NHR-67 from its targets or transform NHR-67 into a transcriptional repressor.

      Response 2.9: We agree that, at this time, we cannot pinpoint the precise mechanism through which NHR-67 puncta function (i.e., by sequestering NHR-67 from DNA or switching the role of NHR-67 from activating to repressing). However, identification of NHR-67 puncta and their colocalization with UNC-37, LSY-22, and POP-1 in VU cells allowed us to discover an undescribed role for the Groucho/TCF complex in maintaining VU cell fate. This, combined with our evidence demonstrating that NHR-67 transcriptional regulation is important for distinguishing between AC and VU cell fate, are the main contributions of our study.

    1. Author Response:

      Reviewer #1 (Public Review):

      Vaparanta et al propose a new bioinformatic algorithm for pathway discovery from multi-omics data sources at one time point, and validate some of their algorithm's predictions using functional experiments. The authors should be commended for their detailed experimental work and comprehensive data collection around TYRO3 signaling in melanoma, which will likely be of value to that field. They also provide a mature software package that is well documented for implementing their bioinformatic methods. The reviewer's experience with the software was that it is computationally efficient/fast with well written code. The biological data (both multiomics and functional validation studies) will be of interest to melanoma research as well as scientists interested in TYRO3 signaling.

      The authors wish to thank the Reviewer for the positive comments.

      At this time, however, the bioinformatics algorithm proposed is of unclear utility to the broader multiomics community for the following reasons:

      First, the algorithm itself has numerous hyperparameters, which can make it challenging to use and potentially highly sensitive to these user inputs. Just the regulatory complex inference step has 10 hyperparameters/settings required to be selected.

      We have now reduced the number of parameters in the code by automating the choice for 2 of the parameters. The manuscript is now accompanied by a sensitivity analysis on all the key parameters in the code (new Supplementary Figures 5-11) and we have created a script to inform the choice of the key parameter S (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10). We have additionally thoroughly revised the accompanying documentation in helping the user choose the right settings for their datasets (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3).

      Second, the algorithm is presented in an ad hoc manner without mathematical/statistical justifications of the many design decisions and steps in the analysis. For example, the authors write "The inference of regulatory complexes from the combined score follows the nearest neighbor principle, assuming that while a single high combined score can be random chance, the combination of combined scores between 3 cell signaling molecules would be predictive". It is mathematically unclear that this is true…

      We have now tested the effect of the design decisions of the algorithm on the ability to discover known associations in omics datasets (new Supplementary Figure 4). Adhering to the design decision of the algorithm greatly improves the amount of known associations found in real omics data.

      …and thus this reviewer attempted to test the algorithm using simulated uncorrelated Gaussian noise (see code/outputs at end of the review) in 10K genes and 10 samples using a best attempt at hyperparameter selection per the code comments and documentation. It appears that nearly 1/3 of all genes (i.e., 3205 of 10K) were erroneously grouped into complexes (assuming no mistakes in reviewer's usage of the code). In general, "unbiased" pathway analysis in multiomics that is not relying on prior knowledge will require solving the extraordinarily challenging task of estimating a very large covariance matrix from statistically small sample sizes. This puts the method at high risk of producing spurious results.

      The Reviewer raises an important topic that should be considered in de novo analyses. However, the test dataset the reviewer used is not truly representative of the omics datasets that should be used to evaluate the performance of the algorithm. First, the algorithm should be only used with positive expression values due to the way the stoichiometry score is calculated. This is now more clearly indicated in the accompanying documentation (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3). The Gaussian noise used by the reviewer does not represent any positive expression values of any omics datasets.

      Second, the way the algorithm is constructed it will try to find an association to all features in the dataset if so instructed by the parameters. To this end, we have now added a new parameter (parameter S) into the algorithm to better control this setting. If correctly used in the test dataset used by the reviewer the algorithm now returns 0 complexes. The authors also wish to point out that they strongly believe that the amount of features in the dataset that have no real association with other features in real omics data is very low since most intracellular molecules have common upstream regulators. This poses a problem only if the dataset has a very limited amount of features.

      Third, it seems to the authors that instead of testing the limits of the algorithm with totally randomized data, it would be more valuable to assess whether the algorithm can find true positives among randomized data. To this end we estimated the true positive and false positive rate with normally, negative binomial and beta distributed simulated data (new Supplementary Figures 7-9). Indeed, the algorithm can discover only the true positives among the false positives as long as the S parameter is not set too low. We now provide a separate script (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10) that will help the user to choose the parameter S for their data so that the amount of false positives in the inference is minimized.

      Fourth, the data produced by the standard normal distribution has a relatively low variance, already 68% values fall between -1 and 1 and 95% values between -2 and 2. If you simulate 10000 random rows with a sample size of 10 of such low variance parameter you are at high chance of creating highly correlating rows that actually would be representative of true positives in the dataset due to the generally high variation within omics data. Therefore, it is exceedingly hard to interpret whether the features were erroneously assigned into complexes or not because the chosen simulation method could have by chance created associations that represent true positives in the dataset.

      Fifth, we also analyzed the standard normal distributed simulated data with WGCNA, which is still the most widely used module discovery method. WGCNA assigned almost all the features into modules. However, I think it is clear due to the wide us that the analysis still can offer valuable insight into biological processes. Therefore, the authors are not sure how concerned they should be about the results of this test.

      Third, pathway analysis has long been a bioinformatic goal in the literature, with the authors citing a landmark paper for the WGCNA method from 2008. As such, there are numerous and long-standing discussions in the literature regarding challenges of pathway analysis (i.e., omics data often has dimensionality D far larger than sample size N, and correlation matrix estimation requires D^2 >> N parameters to be estimated) and its potential for spurious correlations. Some authors use sophisticated statistical tools (e.g., "Biological network inference using low order partial correlation" 2014, "Learning Large‐Scale Graphical Gaussian Models from Genomic Data" 2005, "Incorporating prior knowledge into Gene Network Study" 2013) to attempt to deal with this issue.

      The authors agree that if by spurious the Reviewer means non causal indirect associations like in the paper by Zuo et al. (Zuo et al., 2014. Biological network inference using low order partial correlation. Methods 69:266-73. doi: 10.1016/j.ymeth.2014.06.010.), then, indeed, the algorithm has not been designed to find directed networks. Instead, the algorithm has been designed to find common upstream regulators.

      Furthermore, the authors indicate that their approach is the first to attempt pathway analysis in multi-omics setting, stating "Integrative approaches combining more than one robust molecular association measure, however, have not been explored", but one can find attempts such as "An Integrative Transcriptomic and Metabolomic Study of Lung Function in Children With Asthma" to build on WGCNA for work in multiomics datasets.

      Indeed, the Reviewer is correct that correlation networks and WGCNA have been previously used with multi-omics datasets. What the authors meant to convey is that these previous approaches rely only on one measure of molecular association, which in the case of correlation networks is correlation and WGCNA covariation, while our method is the first that combines two measures of molecular association, the correlation and stoichiometry score. We have now amended the sentence in the manuscript (lines 51-52).

      The 2020 review paper "Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources" seems to identify multiple published methods dealing with pathway estimation in multiomics datasets. As the paper stands, this reviewer cannot adequately assess the impact of the proposed bioinformatic algorithm and its results against the existing body of literature for pathway inference.

      We have now benchmarked our method against existing module discovery, network and multi-omics integration methods and provide evidence that our method outperforms these methods (new Figure 4).

      Reviewer #2 (Public Review):

      The authors describe a bioinformatic platform that allows for unbiased pathway analysis from multiomics data. The concept is based on correlation, stoichiometry scores and their combination to evidence interaction between two proteins, transcripts or phosphosites in an omic dataset. This platform was developed and validated on both previously published and in house omics data. I really appreciate that the paper is well written and clear, and I would like to acknowledge the amount of work generated to produce the in-house dataset.

      The authors wish to thank the Reviewer for the encouraging words.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors' conclusions presented herein are supported by a well-established mouse genetic conditional approach and an extensive array of phenotypic analyses.

      Strengths:

      1. The authors utilized well-described genetic tools, AdipoQCre, to target preadipocyte-like progenitor cell populations in bone marrow, as well as Csf1 floxed alleles. They further sifted through the cell population by showing that mature lipid-laden adipocytes express Csf1 at a much lower level, and determined that AdipoQCre-marked progenitor cell population presents a major cellular source of M-CSF,

      2. The reanalysis of published scRNAseq datasets in Figure 1, as well as the following phenotypic analyses of the mutant mice are well-conducted. The analyses include a broad range of experiments both in vivo (3DmicroCT, histology, flow cytometry) and ex vivo (osteoclastogenesis assay in bone marrow cell culture). The confidence of the reported findings is high.

      3. The data presented in this manuscript are of very high quality.

      Weaknesses:

      1. The role of AdipoQ-lineage progenitors as a source of M-CSF is overstated. The authors claim in many instances that "mature bone adipocytes do not express M-CSF", "These cells however do not produce Csf1", "...these peripheral AdipoQ+ cells nearly do not produce M-CSF". However, the authors' qPCR experiments only show four times differences in Csf1 expression. Therefore, the claim that AdipoQ-lineage progenitors are an exclusive source of M-CSF is not well substantiated. In line with this, some of the recent literature reporting conditional deletion of M-CSF in other bone cells (JBMR Plus. 4:e10080., Nature. 590:457-462) are not included.

      We thank the reviewer for this important question. We have performed the below experiments to further clarify and support our conclusion:

      1) We increased the replicates of each group cells in Fig. 3A (the old Fig. 1E) to five/group and based on reviewer 3’ recommendation on housekeeping gene usage, we found that the mRNA expression of Csf1 in bone marrow AdipoQ-lineage progenitor cells is 20-30 fold higher than those in mature adipocytes. This result has been updated in Fig. 3A.

      2) We further performed immunofluorescence staining of M-CSF on bone slices, and found that the majority of bone marrow AdipoQ-expressing progenitor cells express M-CSF (Fig. 3B, 1865 cells out of 2001 cells counted, n=3 mice, 93.2%). In contrast, M-CSF expression was not detected in mature bone marrow adipocytes (Perilipin1+) (Fig. 3C, 0 cells out of 115 cells counted, n=3 mice, 0%), indicating that mature bone marrow adipocytes are unlikely a significant source of M-CSF.

      3) We performed western blot to analyze M-CSF protein expression in peripheral adipose. As shown in Fig. 3D, the stromal vascular fraction (SVF) cells in adipose, which contain multiple cell populations including adipogenic progenitors, express M-CSF. On the contrary, M-CSF was nearly undetectable in the peripheral mature adipocytes isolated from adipose (Fig. 3D).

      These data collectively support that mature adipocytes are not a significant source of M-CSF as evidenced by nearly undetectable M-CSF expression compared to the Adipoq-lineage progenitors. The results were described on pg. 5. However, the reviewer’s comment on ‘exclusive source’ is well taken as osteocytes and osteo lineage also express certain levels of M-CSF. We deleted ‘exclusive source’ in the manuscript, have added relevant literature and discussion in the Results and Discussion section on pp. 5 and 9.

      2. Some of the phenotypic analyses are still incomplete. The authors did not report whether CHet (AdipoQCre Csf1(flox/+)) showed any bone phenotype. Further, the authors did not show that Csf1 mRNA or M-CSF protein is expressed in AdipoQ-lineage progenitors using histological methods. Current evidence is only based on scRNAseq and qPCR of isolated cells. Whether there was any change in circulating bone resorption markers in CKO mice was not shown. Cortical bone parameters were not included in the 3D-microCT analyses. These missing pieces of information would be important to correctly interpret the phenotypes.

      The het mice (Csf1f/+;AdipoQ Cre) do not show abnormal bone phenotype, which is now shown in Fig. 4-figure supplement 4. We performed immunofluorescence staining of M-CSF on bone slices, and found that the majority of bone marrow AdipoQ-expressing progenitor cells express M-CSF (Fig. 3B, 1865 cells out of 2001 cells counted, n=3 mice, 93.2%). We tested serum TRAP level in mice, and found that the Csf1 deficiency in Csf1∆AdipoQ mice significantly decreased the TRAP level in serum, compared to that in the WT control mice (Fig. 5B). Csf1∆AdipoQ mice do not exhibit abnormal cortical bone phenotype. The cortical bone parameters are now included in Fig. 4G.

      3. Which bone marrow cell population(s) are marked by AdipoQCre remain largely unclear. It is possible that AdipoQCre also marks at least part of MSPC-osteo cluster in addition to MSPC-adipo. Adipo-lineage progenitors may not stay entirely as adipoprogenitors and drift toward osteoblasts or their precursor cells.

      We thank the reviewer for the insightful comment on this interesting mystery and complicated question, which is drawing more attention in the field.

      In addition to Adipoq-lineage progenitors, Adipoq Cre also labels other clusters. However, the expression levels of Adipoq and frequency of Adipoq+ cells in other cell populations are relatively low. For example, the integrated scRNAseq dataset we analyzed shows that Adipoq is expressed at a low level (with scaled mean expression at 0.68, (27)) in a small proportion of MSPC-osteo cells (Fig. 1), and small amounts (31, 37) (about 4%) of osteoblasts in 8 or 12-week-old mice are Adipoq-lineage. A recent report found that in 24-week-old mice, about 15-40% of osteoblasts are marked with Adipoq Cre (37). This raises a few important possibilities that will need to be distinguished in future work. One possibility is that the Adipoq-lineage cells (adipo-CAR cells/MALPs) have minor or latent osteogenic potential that may become more evident under specific conditions, such as in older animals. However, balanced against this is the alternative that Adipoq-cre could primarily target a population of solely adipogenic adipo-CAR cells but that its specificity is imperfect, leading to progressive low levels of deletion in a separate population expressing very low levels of Adipoq, such as osteo-CAR cells. An additional possibility is that the Adipoq-lineage cells may themselves actually be further subdivided into multiple component cell types, including a major adipogenic and a separate minor osteogenic subpopulation. Ultimately, at the root of these issues is that Adipoq cre primarily defines one or possibly more lineages of cells rather than a cell type within those lineages. Therefore, application of further markers to fractionate the adipoq-lineage into its component cell types will be needed to resolve these possibilities, focusing on whether any potential osteogenic activity present can be fractionated away from the primary adipogenic activity present.

      Of note, the Adipoq expression level and positive cell proportion are much higher in bone marrow Adipoq lineage progenitors than the levels seen in osteoblast lineage (Fig.1, Fig.2, (22, 27, 31)) or endothelial cells in bone marrow (38, 39). For example, the MSPC-Adipo cluster (Adipoq-lineage progenitors) has 6441 cells with the highest level (scaled mean expression level at 3.01 per (27) at Single Cell Portal) of Adipoq seen among bone marrow cells analyzed. In contrast, the MSPC-osteo cluster consists of 2247 cells with a very low Adipoq expression level (scaled mean expression level at 0.68 per (27) at Single Cell Portal). Taken together with both average expression level and cell numbers in each cluster, the relative overall contribution to Adipoq expression by MSPC-osteo vs the Adipoq-lineage progenitors is 7.8% ((2247 x 0.68)/(6441 x 3.01)). Therefore, the expression of Adipoq in MSPC-osteo cluster is marginal compared to that in the Adipoq-lineage progenitors. These data make Adipoq as an important marker to identify bone marrow Adipoq lineage progenitors. Overall, our work not only validates prior research identifying adipoq-lineage cells, identified as MALPs (22, 31), as a key osteoclast regulatory population, but also further extends the scope of their functions to encompass M-CSF production and regulation of macrophages.

      These points have been added to the Discussion sections on pp. 9-10.

      4. The OVX data in Figure 5 are not very well explained. The data do not seem to support the authors' conclusion that M-CSF deficiency in AdipoQ-lineage progenitors alleviates estrogen-deficiency induced osteoporosis. The CKO mice lose bone mass almost to the same extent as WT mice upon OVX.

      To address the reviewer’s question, we calculated the changes of the uCT parameter values between Sham and OVX groups in the WT control and Csf1∆AdipoQ mice. Significant changes were identified between the control and Csf1∆Adipoq mice in several μCT parameters. For example, a decrease in trabecular BV/TV after OVX: 35.1% in the control vs 20.9% in Csf1∆Adipoq mice; a decrease in Tb. N after OVX:11.34% in the control vs 7.97% in Csf1∆Adipoq mice; a decrease in Conn-Dens after OVX: 39.7% in the control vs 14.56% in Csf1∆Adipoq mice; an increase in Tb. Sp after OVX: 12.51% in the control vs 1.97% in Csf1∆Adipoq mice. These results support our conclusion that M-CSF deficiency in AdipoQlineage progenitors alleviates estrogen-deficiency induced osteoporosis. These value changes have been included in Fig. 7C and discussed on pg. 7.

      Reviewer #3 (Public Review):

      Macrophage colony-stimulating factor (M-CSF) plays key roles in the differentiation of myeloid-lineage cells, including monocytes, macrophages and osteoclasts. The latter mediate bone resorption, which is important for physiological bone remodelling but, unrestrained, contributes to bone loss in conditions such as in post-menopausal osteoporosis. M-CSF production within the bone marrow is implicated in the maintenance of myeloid and skeletal homeostasis, but the cellular source of bone marrow M-CSF has remained elusive. In this study, Inoue et al address this issue through advanced transcriptomic and gene targeting approaches. They conclude that a population of Adipoq-expressing progenitors within the bone marrow, designated "AdipoQ-lineage progenitors", is the key cellular source of M-CSF. Consistent with this, they find that transgenic deletion of M-CSF from these cells disrupts macrophage and osteoclast development, leading to osteopetrosis and possibly preventing bone loss following ovariectomy. However, they have not adequately addressed the possibility that M-CSF production from other cell types, particularly adipocytes in peripheral adipose tissues, may also be influencing these phenotypes. Specific strengths and weaknesses are as follows:

      Strengths:

      1. The manuscript is written in a clear, succinct manner and the data are generally nicely presented. It is therefore a pleasure to read.

      2. The analysis of single-cell transcriptomic data is clear and convincing, and the skeletal phenotyping has been done to a high standard.

      Weaknesses:

      1. The authors underplay the potential contribution of M-CSF production from other cell types, particularly from adipocytes in peripheral adipose tissues. They show that M-CSF expression from these cells is lower than from the bone marrow progenitors that they focus on; however, based on this they allude to "no expression" of M-CSF from these other adipocytes. This overlooks the findings of other studies showing that peripheral adipocytes produce M-CSF and that this has biological functions. Whether their knockout model alters M-CSF expression in peripheral adipose tissue, whether for whole tissue or for isolated adipocytes, has not been tested.

      We performed western blot to analyze M-CSF protein expression in peripheral adipose. As shown in Fig. 3D, the stromal vascular fraction (SVF) cells in adipose, which contain multiple cell populations including adipogenic progenitors, express M-CSF. On the contrary, M-CSF was nearly undetectable in the peripheral mature adipocytes isolated from adipose (Fig. 3D). These data collectively support that mature adipocytes are not a significant source of M-CSF as evidenced by nearly undetectable M-CSF expression compared to the Adipoq-lineage progenitors. However, we understand that current techniques may have limitation in identification of trace amount of M-CSF. We thus deleted descriptions such as ‘exclusive’ or ‘do not produce/express…’ in the revised manuscript.

      2. The decreases in M-CSF have been assessed at the transcript level, but not for M-CSF protein. Whether their knockout model

      We performed immunofluorescence staining of M-CSF on bone slices, and found a drastic decrease in M-CSF protein in bone marrow AdipoQ+ cells in Csf1∆AdipoQ mice compared to the WT control mice. The results are shown in Fig. 4B, and Fig. 3B-D.

      3. It is also unclear if the Adipoq-lineage progenitors consist exclusively of adipogenic cells, or if osteogenic progenitors are also part of this population.

      We thank the reviewer for the insightful comment on this interesting mystery and complicated question, which is drawing more attention in the field.

      In addition to Adipoq-lineage progenitors, Adipoq Cre also labels other clusters. However, the expression levels of Adipoq and frequency of Adipoq+ cells in other cell populations are relatively low. For example, the integrated scRNAseq dataset we analyzed shows that Adipoq is expressed at a low level (with scaled mean expression at 0.68, (27)) in a small proportion of MSPC-osteo cells (Fig. 1), and small amounts (31, 37) (about 4%) of osteoblasts in 8 or 12-week-old mice are Adipoq-lineage. A recent report found that in 24-week-old mice, about 15-40% of osteoblasts are marked with Adipoq Cre (37). This raises a few important possibilities that will need to be distinguished in future work. One possibility is that the Adipoq-lineage cells (adipo-CAR cells/MALPs) have minor or latent osteogenic potential that may become more evident under specific conditions, such as in older animals. However, balanced against this is the alternative that Adipoq-cre could primarily target a population of solely adipogenic adipo-CAR cells but that its specificity is imperfect, leading to progressive low levels of deletion in a separate population expressing very low levels of Adipoq, such as osteo-CAR cells. An additional possibility is that the Adipoq-lineage cells may themselves actually be further subdivided into multiple component cell types, including a major adipogenic and a separate minor osteogenic subpopulation. Ultimately, at the root of these issues is that Adipoq cre primarily defines one or possibly more lineages of cells rather than a cell type within those lineages. Therefore, application of further markers to fractionate the adipoq-lineage into its component cell types will be needed to resolve these possibilities, focusing on whether any potential osteogenic activity present can be fractionated away from the primary adipogenic activity present.

      Of note, the Adipoq expression level and positive cell proportion are much higher in bone marrow Adipoq lineage progenitors than the levels seen in osteoblast lineage (Fig.1, Fig.2, (22, 27, 31)) or endothelial cells in bone marrow (38, 39). For example, the MSPC-Adipo cluster (Adipoq-lineage progenitors) has 6441 cells with the highest level (scaled mean expression level at 3.01 per (27) at Single Cell Portal) of Adipoq seen among bone marrow cells analyzed. In contrast, the MSPC-osteo cluster consists of 2247 cells with a very low Adipoq expression level (scaled mean expression level at 0.68 per (27) at Single Cell Portal). Taken together with both average expression level and cell numbers in each cluster, the relative overall contribution to Adipoq expression by MSPC-osteo vs the Adipoq-lineage progenitors is 7.8% ((2247 x 0.68)/(6441 x 3.01)). Therefore, the expression of Adipoq in MSPC-osteo cluster is marginal compared to that in the Adipoq-lineage progenitors. These data make Adipoq as an important marker to identify bone marrow Adipoq lineage progenitors. Overall, our work not only validates prior research identifying adipoq-lineage cells, identified as MALPs (22, 31), as a key osteoclast regulatory population, but also further extends the scope of their functions to encompass M-CSF production and regulation of macrophages.

      These points have been added to the Discussion section on pp. 9-10.

      If these weaknesses are addressed then this work has potential to yield firm conclusions and new insights into the regulation of myeloid and skeletal homeostasis, both in normal physiology and in clinically relevant conditions.

      Yes, we have addressed the above 3 major questions.

    1. Author Response

      Reviewer #1 (Public Review):

      The current study proposed a drug discovery pipeline to accelerate the process of identifying drug candidates for LCA10 patients using cells from mouse retinal organoid for initial screening, human patient iPSC-derived retinal organoid for further testing, and then mouse mutants for in vivo validation. Reserpine was identified as the top candidate, possibly through modulating proteostasis and autophagy to promote cilium assembly. The study was with high translational value. However, the rationale using dissociated cells from mouse retinal organoid for initial drug screening needs to be justified. In addition, the consistency of phenotypic characteristics in human patient iPSC-derived retinal organoid needs to be reported. It was unclear if the rescued phenotypic changes were from the drug effects or a result of phenotypic variations in organoids.

      We thank the reviewer for the comments and suggestions. Please see the response provided in the “Essential Revisions” earlier. Briefly, the use of single-cell cultures for screening is to compensate for the variations of the Nrl-GFP signal in rd16 organoids so that each compound was present to homogenous cells. In addition, we performed a large-scale screening with 11 concentrations and 2 duplicates of over 6000 compounds. It was thus not feasible to manually perform the screening. We used a semi-automatic electronic dispenser to set up the screens in 1536-well plates and a liquid handling system to add the compounds. Intact mouse retinal organoids are too big to be dispensed and would be damaged during the process. They are also too big to fit into one well of a 1536-well plate or even in a 384-well plate. Therefore, single-cell cultures outweigh intact organoids in this application. We understand the potential pitfalls and thus the positive hits were verified in intact organoids in the secondary assays.

      We have now tested reserpine on retinal organoids derived from 2 clones of each (a total of 4) of LCA1 and LCA2 patients. As suggested by the reviewers, we quantified the fluorescence intensity of rod marker rhodopsin staining in multiple sections of at least two batches of differentiation (Figure 3C and Figure 3—figure supplement 2). Although showing variability as predicted, reserpine treatment significantly increased the fluorescence intensity of rhodopsin in retinal organoids differentiated from multiple lines (Figure 3C), further validating the rescue effect of reserpine.

      Reviewer #2 (Public Review):

      In this manuscript, a drug discovery pipeline was developed using a human iPSC derived organoid-based high-throughput screening platform to be used to identify drug candidates for maintaining photoreceptor survival in LCA10 retinopathies. Reserpine proved effective in patient organoids and in mutant mouse retina in vivo to improve photoreceptor survival and outer segment structure. Protein homeostasis was restored after reserpine treatment by increasing p62 levels, decreasing the 20S proteasome, and increasing proteasome activity. The manuscript is clearly written, contains a large amount of valuable and high-quality data and demonstrates that rebalancing proteostasis can stabilize photoreceptor overall homeostasis in the presence of a mutation that causes retinal degeneration.

      The manuscript may lack functional in vivo data on the treatment by reserpine in RD16 mice such as ERG measurements or other functional tests (the authors also refer to it as future direction). Nevertheless, in my view, the study provides a solid and convincing set of data and substantially advances our understanding on the neuroprotective effects of reserpine beyond the scope of the retina and therefore can be expected to have widespread influence on a readership interested in the principles of neuroprotection rebalancing proteostasis.

      We sincerely thank the reviewer for the positive comments and suggestions. This study has taken many years to materialize. We agree and have now performed full-field electroretinogram (ERG) of untreated and reserpine-treated rd16 retina (as stated in response to an earlier comment). Scotopic a-wave was only marginally increased, yet scotopic b-wave displayed a significant higher amplitude, suggesting improved rod photoreceptor function (Figure 6D).

      Reviewer #3 (Public Review):

      Chen et al. perform an innovative screen using retinal organoids derived from rd16 mice to identify small molecules to treat CEP290 hypomorphic mutations linked to ciliopathies such as LCA. The authors identify reserpine which promotes photoreceptor development and viability in retinal organoids derived from LCA patient iPSCs and rd16 mouse retinas. The authors finally propose a mechanistic model where reserpine restores proteostasis thereby improving ciliogenesis.

      The authors present a highly effective drug screen that utilizes the benefits of retinal organoids while also accounting for the inherent variability of retinal organoids by performing a screen on 2D cultures derived from the organoids. This is an innovated approach to using retinal organoids in drug screens and is of interest to the greater community. The success of the screen is reflected in the effectiveness of reserpine in the in vivo rd16 mouse retinal model where it promotes photoreceptor survival. However there are multiple issues with the LCA patient organoid screen that must be resolved.

      We are grateful to the reviewer for generous comments. We have incorporated the suggestions and performed additional work to resolve the issues, as mentioned earlier in this response as well as below.

      The patient derived iPSC lines are not controlled sufficiently enough to make conclusions stated in the manuscript. The authors rely on single iPSC clones from disease patients to perform experiments, and it is not clear whether karyotyping to validate normal chromosomal integrity was performed. In the case of the RNAseq experiment one patient clone does not show any differences calling into question the findings from the other clone. Patient derived iPSC studies would benefit from the use of multiple independently derived iPSC clones per patient, or rescuing the LCA10 mutation using CRISPR editing to validate the correlation of the mutation with the differences observed.

      This study could be strengthened by parallel RNAseq studies is the rd16 mouse retina and patient iPSC retinal organoids.

      Thanks for the suggestions. As mentioned earlier in “Essential Revisions” and response to other reviewers, we have performed additional experiments using multiple iPSC clones and from three patients (2 each from LCA1 and LCA2). These iPSC lines have been characterized previously (Shimada et al. 2017). We have now provided more details on iPSC derivation, iPSC maintenance, and differentiation. Karyotypes of all human and mouse iPSC lines were provided in Figure 1—figure supplement 1. Retinal organoids were generated using iPSC lines within 10 passages of test cells.

      The purpose of the RNA-seq data is to provide primers on the signaling pathways modulated by reserpine treatment. The rescue effect of reserpine suggests that these pathways might be implicated in disease pathogenesis. Based on our RNA-seq data, we have validated the dysregulation of proteostasis pathway in patient-derived retinal organoids and in vivo rd16 retina. Further investigations are needed to validate other pathways but are beyond the scope of this manuscript. Although RNA-seq studies have advantages, more detailed molecular and functional assays are needed to validate the findings of RNA-seq studies and therefore we argue that performing additional RNA-seq on different clones or cell lines or mouse retina would provide more solid information.

      According to our quantification of rhodopsin staining intensity (Figure 3C and Figure 3—figure supplement 2), LCA1 organoids are more responsive to reserpine compared to LCA2, which is not surprising based on the variations of patient responsiveness to drug treatments in previous clinical studies. We note that reserpine is not a transcription factor, thus the differentially expressed genes in reserpine treatments are secondary effects and the change of gene profiles upon reserpine treatment could vary in time and intensity, which could explain the few differentially expressed genes observed in LCA-2. Nevertheless, the action mechanisms of reserpine we found based on LCA1 could be validated on LCA2 (Figure 5—figure supplement 3), further strengthening our findings.

      The reason why we performed RNA-seq on treated organoids but not treated mice was to identify the signaling pathways modulated by reserpine in a well-controlled manner in order to catch the small changes. Compared to reserpine treatment on organoid cultures, in which the organoids have stable and constant contact with reserpine, intravitreal injection of reserpine into P7 mice is technically challenging and leads to substantial variations. In this case, some small changes might be missed and masked by the variations.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors sought to be able to examine what cellular mechanisms underlie increases in mature blood cell production upon immune challenge. To this end they devised a new in vitro organ culturing system for the lymph gland, the main hematopoietic organ of the fruit fly Drosophila melanogaster; the fly serves as an excellent model for studying fundamental questions in immunology, as it allows live imaging combined with genetic manipulation, and the molecular pathways and cellular functions of its innate immune system are highly conserved with vertebrates.

      The authors provide compelling evidence that the cultured lymph gland shows a similar time scale, dynamics, and capacity for cell division as was observed in vivo, and does not undergo undue oxidative stress in their optimized culture conditions. This technique will prove extremely useful to the large community studying the fly lymph gland, and potentially vertebrate immunologists seeking to expand the models they utilize.

      In these cultured glands, the authors identify progenitors undergoing symmetric cell divisions and provide some evidence that is consistent with, but does not prove, that these two cells maintain their proliferative capacity. They detect equivalent levels in the two equally sized daughter cells of dome-Meso-GFP, a marker for JAK-STAT activity; however, this could be due to an equal inheritance of the protein from the mother, not an equivalent maintenance of a proliferative capacity.

      This is an interesting question. A close look at the our movie (Video 4) of the dome-Meso-GFP marker shows the following sequence of events: the marker is nuclear, the mother cell divides and the nuclear envelope breaks down, cell division is completed, the dome-Meso-GFP re-accumulates at the nucleus of the daughter cells. This sequence of events implies that JAK-STAT is still active in the daughter cells. But as the reviewer points out there is a possibility of inheritance of the signal from the mother. If one of the cells were to differentiate, we would expect two things to occur, a differentiation marker to turn on in one of the daughter cells, and likely a slow decrease in the signal level of dome-Meso-GFP in one of the cells over time. We failed to mention that we accounted for both of those possibilities in our experiments such as the one shown in Video 5. We did this by first, including the eater-dsRed in the genetic background (see Figure 2 figure legend) in which these experiments were undertaken, if differentiation took place dsRed level would go up, an occurrence which we did not observe. Second, long-term tracking of dome-Meso-GFP levels for extended periods of time after completion of cell division did not show divergence or significant decrease of protein levels in the two daughter cells (Figure 2 - figure supplement 2). In any case, to directly make readers aware of this important caveat raised by the reviewer concern we added to the Results section in line 225-230 an explanation mentioning the possibility of inheritance of the marker and why we did not think this was the case.

      The authors develop a technique to conduct tracking of progenitor cell size over time in the cultured lymph glands and identify a switch increase in growth after division, as well as two orientations of the divisions, with the main one occurring 90% of the time.

      They show that bacterial infection results in a significant decrease in the division of Blood progenitors and the elimination of the minor orientation of division, but no obvious change in the rate of division.

      By imaging two markers, Dome-GFP for the progenitor state and Eater dsRed for the differentiated one, they examine the trajectories by which differentiation occurs in the wild-type lymph gland. They describe two main categories of fate transitions. In one that they call linear, the blood cells express high levels of the differentiation marker along with the progenitor marker before turning off the progenitor marker. The dynamics of how these progenitor cells get to the state of expressing both the differentiation and progenitor marker at high levels is not described. In the other, which they call sigmoidal, cells express only high levels of the progenitor marker, and the differentiation marker increases after or as the progenitor marker decreases. The authors show that upon infection there is a large increase in the amount of the linear type of differentiation. But how this change in the type of differentiation upon infection explains the increased amount of differentiation is not clear.

      A potential explanation comes from an aspect of their data that the authors don't comment upon. In their live analysis of lymph glands at a distinct time point in the uninfected state (Fig 7M-N), 95% of the cells they analyze traversing the sigmoidal path are in the intermediate step. This would predict that the cells on this path spend a much longer time stuck in this intermediate state before traversing to the final differentiated one, or that only a small fraction of the cells that become sigmoidal intermediate cells progress onwards to full differentiation. But this does not match the trajectories observed in the real-time analysis for uninfected cultured lymph glands (Fig 7A'-D') marker. Perhaps their algorithm discarded traces from the live imaging in which the differentiation marker did not come up quickly and was thus not analyzed in the trajectories.

      If my interpretation of the single time point analysis is true, this would argue that the linear path is actually much faster/more fruitful than the sigmoidal one and this would explain why a higher level of total progenitor differentiation infection is the result of infection-inducing more differentiation by the linear path. Otherwise, I don't understand how their data explains that observation.

      We understand the reviewer concern here and would like to state categorically that we did not use an algorithm to “discard” traces. As the reviewer outlines there is a large concentration of cells in the Dome-Meso-GFP (low expressing), eater-dsRed (low expressing) state. This is an intermediate state for the sigmoid differentiation trajectory. The reviewer suggests two scenarios to explain this. The first scenario is that this is the slowest (and thus rate limiting) step in the sigmoid differentiation trajectory. But, also as the reviewer notes, our tracking of individual cell trajectories doesn't show that cells spend a lot of time in this state. This leaves the second scenario the reviewer outlines, that only a small fraction of the cells that are in the Dome-Meso-GFP (low expressing), eater-dsRed (low expressing) state go on to differentiate (at least in the larval stage). We favor this model, because it is consistent with our observations, mainly that manipulating the sigmoid pathway is not a good way to induce the production of mature blood cells following infection, compared to manipulating the linear pathway. As the reviewer correctly points out the linear pathway provides a powerful way to change the rate of production of mature blood cells, with a few hours of infection the number of cells that are found in the intermediate state for this trajectory (Dome-Meso-GFP (high expressing), eater-DSred (high expressing)) increases 5-6 times. We now mention this specifically in the Discussion in line 532-539.

    1. Author Response

      Reviewer #1 (Public Review):

      Single-cell sequencing technologies such as 10x, in conjunction with DNA barcoded multimeric peptide MHCs (pMHCs) has enabled high throughput paring of T cell receptor transcript with antigen specificity. However, the data generated through this method often suffers from the relatively high background due to ambient DNA barcodes and TCR transcripts leaking into "productive" GEMs that contain a 10X bead and a T cell decorated with antigen-specific barcoded proteins. Such contaminations can affect data analysis and interpretation and have the potential to lead to spurious results such as an incorrect assessment of antigen-TCR pairs or TCR cross-reactivity. To address this problem, Povelsen and colleagues have described a data-driven algorithm called "Accurate T cell Receptor Antigen Pairing through data-driven filtering of sequencing information from single-cells" (ATRAP) that supplies a set of filtering approaches that significantly reduces background and allows for accurate pairing of T cell clonotypes with cognate pMHC antigens.

      This paper is rigorously conducted and will be useful for the field - there are some areas where further clarifications and comparisons will benefit the reader.

      Strengths:

      1) Povelsen and colleagues have systematically evaluated the extent to which parameters in the experimental metadata can be used to assess the likelihood of a GEM to correctly identify the antigen specificity of the associated T cell clonotype.

      2) Povelsen and colleagues have provided elegant data-driven scoring metrics in the form of concordance score, specificity score, and an optimal ratio of pMHC UMI counts between different pMHCs on a GEM, which allows for easy identification of poor quality data points.

      3) Based on the experimental goals, ATRAP allows for customizable filters that could achieve appropriate data quality while maximizing data retention.

      Weakness:

      1) The authors mention that 100% of the 6,073 "productive" GEMs contained more than one sample hashing barcode, and 65% contained pMHC multiplets. While the rest of the paper elaborates on the steps taken to deal with pMHC multiplets issue, not much is said about the extent of multiplet hashing issue and how was it dealt with when assigning cells to individual donors. How is this accounted for? Even a brief explanation would be beneficial.

      We agree that the issue of multiplet hashing was only very briefly discussed in the manuscript. The reason for this is that although cell hashing multiplets exist for every GEM, it is generally a much simpler issue to solve than pMHC multiplets, because one hashing entry most often has much higher counts compared to the others (see supplementary fig. 3). Moreover, in the experimental design, only one hashing antibody is added to each sample. It is therefore given that only a single hashing signal should be associated with each GEM, i.e. this does not mirror the complex nature of the pMHC data, where cross-reactivity could result in more than one pMHC being a true binder to a given TCR. Given the simplicity associated with the hashing signal, we have here opted for utilizing an existing tool to annotate cell hashing. We have elaborated the description of this in the revised manuscript (line 384).

      2) It would be helpful for the authors to describe how experimental factors such as the quality of the input MHC protein may affect the outputted data (where different proteins may have different degrees of non-specific binding), and to what degree the ATRAP approach is robust to these changes. As an example, the authors mention that RVR/ A03 was present at high UMI counts across all GEMs and RPH/ B07 was consistently detected at low levels. Are these observations the property of the pMHCs or the barcoded dextran reagent? Furthermore, are there differences in the frequency of each of these multimers in the starting staining library which manifests in consistent high vs low read counts for the pMHC barcodes?

      We understand the reviewers' concern. We have extensive experience from staining with large libraries of different pMHCs in a bulk setting (Bentzen et al 2016), where it is part of the routine analyses to include an aliquot of the barcoded pMHC library taken prior to incubation with cells (input sample). From this data, we know that even if pMHCs are present in uneven amounts prior to cell incubation, this unevenness is not translated to the final output. I.e. if a given barcode (associated with a specific pMHC) is present at levels up to 2x higher than the remaining barcodes, this does not result in that barcode also being enriched after cell incubation if T cells do not recognize the corresponding pMHC. And vice versa, a barcode present at lower levels in the input can still be enriched after incubation with cells.. From the same type of data, we also have experience with differences in the background associated with different MHC/HLA molecules, i.e. a general higher level of background related to a certain MHC irrespectively of the peptide bound in this. We agree that this potentially could be a confounding factor influencing our results (as it will influence any other results related to the potential different background signal associated with different MHC/HLA molecules). We are currently in other studies investigating in a broader sense whether these differences reflect a biological inherent MHC association or are experimental artifacts. In the current work, we have opted for not defining pHLA specific UMI count threshold to ensure that any biological relevance remains unmasked, but still ensure that we can at the same time filter the data to identify the most likely true pMHC specific interaction.

      3) It would be helpful for the authors to further explain how ATRAP handles TCRs that may be present in only one (or a small number) of GEMs, as seen in Figure 7b, and potentially for the large number of relatively small clonotypes observed for the RVR/A03 peptide in Figure 6 (it is difficult to know if the long tail of clonotypes for RVR is in the range of 1 or 10 GEMs based on the scale bar). Beyond that, is there any effect on expected (or observed) clonal expansion on these data analyses, for example, if samples are previously expanded with a peptide antigen ex vivo or not?

      ITRAP removes any GEM that does not meet the criteria of the selected filters. Small clones are only removed if all GEMs in a clone fail to meet the selected filter criteria. As ITRAP is based on combinations of filters which are user-defined, one can choose to filter away singlet specificities, i.e. a TCR-pMHC pair only observed in a single GEM. However, this might not be relevant in all cases. We believe that it is a strength of the method that it is flexible and adaptable to the needs of individual users. This also allows for additional filters to be imposed by the user, if one for instance wishes to remove clones of fewer than a certain number of GEMs. With respect to figure 6, we agree that it was difficult to estimate the number of clonotypes within a given peptide plateau, and have updated the figure to include a clonotype count in the x-axis. In relation to the effect on clonotype expansion, we would first like to refer to figure 7. Here, we in figure a) and b) display the observed T cell frequencies towards the individual pMHCs as obtained by the two different experiment approaches a) conventional fluorescent multimer staining, and b) GEMs counts as obtained using the single-cell pipeline described here. This analysis demonstrates a very high concordance between the two approaches of the T cell populations, reflected by the vast majority of the responses detected by fluorescent multimer staining also being captured in the single-cell screening, (recall of 0.95). This result suggests that sensitivity of the SC approach, in the context of the current pMHC epitope set, is comparable to that of conventional fluorescent multimer staining. With regard to clonotype expansion, we would next like to refer back to figure 3. Even though we have not expanded the clones in vitro, this figure shows how the specificity of a TCR clone can be more confidently assigned when there are more GEMs mapped to a given TCR clone. Hence, to identify a single TCR-pMHC match, it could in many cases be valuable to expand a given clone prior to the experiments. However, since the 10x pipeline can only include a limited number of cells, we argue that it is valuable to identify pMHC TCR pairs on unexpanded/unmanipulated material to include as many different pairs as possible.

      4) The authors mention a second method, ICON, for conducting these types of analyses, and that the approach leads to significantly more data loss. However, given there could be differences in dataset quality themselves, and given the dataset, ICON is publicly available, it would be helpful for a more explicit cross-comparison to be conducted and presented as a figure in the paper.

      We have conducted such a comparative analysis in a separate manuscript (available at BioRxiv doi.org/10.1101/2023.02.01.526310). The overall conclusion is that both methods allow for effective denoising of the provided data, with an overall advantage in favor of iTRAP. We have extended the discussion in the current manuscript with a brief summary of the main findings from this study.

      Reviewer #2 (Public Review):

      The study by Povlsen, Bentzen et al. describes certain computational pipelines authors used to analyze the results from a single-cell sequencing experiment of pMHC-multimer stained T cells. DNA-barcoded pMHC multimers and single-cell sequencing technologies provide an opportunity for the high-throughput discovery of novel antigen-specific TCRs and profiling antigen-specific T-cell responses to multiple epitopes in parallel from a single sample. The authors' goal was to develop a computational pipeline that eliminates potential noise in TCR-pMHC assignments from single-cell sequencing data. With several reasonable biological assumptions about underlying data (absence of cross-reactivity between these epitopes, same specificity for different T-cells within a clonotype, more similarity for TCRs recognizing the same epitope, HLA-restriction of T cell response) authors identify the optimal strategy and thresholds to filter out artifacts from their data.

      It is not clear If the identified thresholds are optimal for other experiments of this kind, and how the violation of authors' assumptions (for example, inclusion of several highly similar pMHC-multimers recognized by the same clone of cross-reactive T cells) will impact the algorithm performance and threshold selection by the algorithm. The authors do not discuss several recent papers featuring highly similar experimental techniques and the same data filtering challenges:

      https://www.science.org/doi/10.1126/sciimmunol.abk3070

      https://www.nature.com/articles/s41590-022-01184-4

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184244/

      As described above, we have investigated the use of ITRAP on the large data set provided by 10X Genomics, and here further compared the result to that obtained by ICON in an independent publication [BioRxiv doi.org/10.1101/2023.02.01.526310]. We have included a brief summary of the findings in study in the current manuscript. The overall results and conclusions between the two studies align very well. UMI count filtering and donor-HLA matching are in both cases driving the strongly denoising signal. However, when it comes to the identified UMI thresholds, they were found to differ between the two data sets. As stated above, this we however believe to be a strength of the ITRAP framework, since it demonstrates that the tools can be robustly applied to data originating from very different technical and/or biological settings.

      We acknowledge that ITRAP is highly dependent on the data containing a set of “large” clonotypes for which a single pMHC target can be assigned using the statistical approach outlined in the manuscript. This since the UMI filtering thresholds are defined based on these clonotypes and associated peptide annotations. However, other than this, the method does not exclude identification of cross-reactive TCR (in contrast to for instance ICON). We have expanded the discussion to make this point more clear.

      When it comes to the papers mentioned by the reviewer, these are clearly of high interest to us, and we are currently in the process of analyzing these data using the ITRAP framework. We however believe these analyses are beyond the score of the current publication, in particular since we have conducted the parallel benchmark study on the 10X Genomics data mentioned above.

      Unfortunately, I was unable to validate the method on other datasets or apply other approaches to the authors' data because neither code nor raw or processed data were available at the moment of the review.

      All data sets and code has been made publicly available at https://services.healthtech.dtu.dk/suppl/immunology/ITRAP

      One of the weaknesses of this study is that the motivation for the experiment and underlying hypothesis is unclear from the manuscript. Why these particular epitopes were selected, why these donors were selected, are any of the donors seropositive for EBV/CMV/influenza is unclear. Without particular research questions, it is hard to evaluate pipeline performance and justify a particular filtering strategy: for some applications, maximum specificity (i.e. no incorrect TCR specificity assignments) is crucial, while for others the main goal is to retain as many cells as possible.

      We understand this concern and have elaborate our motivation for the experimental design in the text. The overall motivation for this study was to generate TCR-pMHC data complementing what was available in the public domain at the start of the project. This with the purpose of generating novel data for training of TCR specificity prediction models. This is also the reason why we explicitly “deselected” T cells specific for the 3 negative control peptides, since these already are covered with large amounts of TCR sequences in the public databases.

      We do not know the serostatus of the donors included, but have determined the antigen-specificities present in the donors prior to initiating the study (evaluated for T cell recognition against 945 common viral specificities, using barcoded pMHC multimers in a bulk setting). The 945 peptides were selected from prevalent epitopes within IEDB. This means that the T cell specificities for the donors selected to be included in the current study was known a priori. We have updated the motivation for performing the study (lines 122-126).

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript "Optimal Cancer Evasion in a Dynamic Immune Microenvironment Generates Diverse Post-Escape Tumor Antigenicity Profiles" by George and Levine describes TEAL - a mathematical model for the dynamics of cancer evolution in response to immune recognition. The authors consider a process in which tumor cells from one clone are characterized by a set of neoantigens that may be recognized by the immune system with a certain probability. In response to the recognition, the tumor may adapt to evade immune recognition, by effective removal of recognizable neoantigens. The authors characterize the statistics of this adaptive process, considering, in particular, the evasion probability parameter, and a possibility of an adaptive strategy when this parameter is optimized in each step of the evolution. The dynamics of the latter process are solved with a dynamic programming approach. In the optimal case, the model captures the tradeoff between a cancer population's need for adaptability in hostile immune microenvironments and the cost of such adaptability to that population. Additionally, immune recognition of neoantigens is incorporated. These two factors, antitumor vs pro-tumor IME as quantified by the Beta penalty term, and the level of immune recognition as quantified by the rate q, form the basis of a characterization of tumors as 'hot' or 'cold'.

      I think this framework is a valuable attempt to formally characterize the processes and conditions that result in immunologically hot vs cold tumors. The model and the analytical work are sound and potentially interesting to a major audience. However, certain points require clarification for evaluation of the relevance of the model:

      1) Tumor clonality

      My main concern is about the lack of representation of the evolutionary process in the model and that the heterogeneity of the tumor is just glossed over.

      The single mention of the problem occurs in Section 2, p2: "Our focus is on a clonal population, recognizing that subclonal TAA distributions in this model may be studied by considering independent processes in parallel for each clone."

      I don't think this assumption resolves the impact of tumor heterogeneity on the immune evasion process. Furthermore, I would claim that the process depicted in Fig 1A is very rare and that cancers rarely lose recognizable neoantigens - typically it would be realized via subclonal evolution, with an already present cancer clone without the neoantigens picking up. Similarly, the adaptation of a tumor clone is an evolutionary process - supposedly the subclones that manage to escape recognition via genetic or epigenetic changes are the ones that persist. It is not clear what the authors assume about the heterogeneity of the adapting/adapted population between different generations, n->(n+1). Is the implicit assumption that the n+1 generation is again clonal, i.e. that the fitness advantage of the resulting subclone was such that the remaining clones were eliminated? Or does the model just focuses on the fittest subclone? A discussion on whether these considerations are relevant to the result would clarify the relevance of the result.

      We thank the reviewer for these helpful clarifying points. Empirical evidence in lung cancer exists for genomic changes manifesting as lost neoantigens in treatment-resistant clones (and Anagnostou et al. Cancer Discovery 2017) showed that those lost antigens were also shown to generate functional immune responses). Similar results for melanoma have also been shown (Verdegaal et al. Nature 2016), with loss of neoantigens associated with reactivity in TILs. Recent observations (Jaeger et al. Clinical Cancer Research 2020) even show that mutated peptides may be hid by protein stabilization, in addition to reduced expression patterns. We however do wish to clarify that our model implicitly equates antigen loss and the progression of a subpopulation currently adapted to evade immune targeting – either by direct pruning of the fittest subclone or by stochastic emergence and subsequent growth of a new one lacking the targeted antigens – as equivalent.

      Because we for foundational understanding studied the case where a single clonal signature was tracked in time, we under-explained the implementation of such a model in more complicated cases. As mentioned previously, the next most complicated scenario involves a heterogeneous population of cancer cells with disjoint neoantigen profiles. In this case, a parallel process can be studied wherein the effects of recognition in one environment are decoupled from the other (relevant to, for example, spatially distinct sub-populations). This description however misses the case where such disparate populations evolve to express shared antigens, or in the case where there are both clonal and subclonal antigen targets. Here, our model can still be applied in parallel to study distinct clones but requires additional structure. Namely, in this case we would need to incorporate non-trivial coupling between the possible recognition/selection against certain antigens shared across clones. For example, control of a population with clonal antigens {a,b} but having unique subclones having either antigens {w,x} or {y,z} could be considered by studying the process in parallel, and control in the next periods would require recognition/selection against either 1) at least one of {w,x} and at least one of {y,z}, or 2) at least one of {a,b}. In this more general framework, the arrival of new subclones with distinct features from the parent clone in question could also be incorporated and studied across time periods. This strategy of subdividing more complicated evolutionary structures has now been further elaborated on in the Methods section, and we have expounded these points in the discussion (see additions given under Editor Comment 2).

      2) Time scales

      Section 2, p2: "We assume henceforth that the recognition-evasion pair consists of the T cell repertoire of the adaptive immune system and a cancer cell population, recognizable by a minimal collection of s_n TAAs present on the surface of cancer cells in sufficient abundance for recognition to occur over some time interval n.".

      How do the results depend on the duration of interval n? The duration should be long enough to allow for recognition and, up to some limiting duration, proportional to the TAA recognition probability q. However, it should not be so long that the state of the system can change significantly. A clarification on this point is needed.

      We agree with the reviewer that these points should be elaborated upon when discussing the time interval. Very briefly, we opted for a discrete-time model tracking a cancer population under selective immune pressure. In order for 𝒒 to represent the total recognition probability of an immune system against a particular TAA, the time interval 𝚫𝒏 in question is a coarse-grained feature representing the time between the earliest chance that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape. This time period may vary substantially across cancer subtypes and depends on the cancer per-cell division rate, for example (George, Levine. Can Res 2020). As the reviewer pointed out, in implementing such a model there is an asymmetric risk to considering 𝚫𝒏 too large, as the future state of the system may not be well-reflected by the simple loss and addition of new TAAs. On the other hand, considering small time intervals 𝚫𝒏, while possible, would require the incorporation of additional intermediate states ending in neither cancer elimination nor cancer escape.

      We have clarified the points that the reviewer has brought up by adding them to the discussion section: In this discrete-time evolutionary model, the intertemporal period considered represents the time period between the earliest moment that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape (George, Levine. Can Res 2020). This effectively gives 𝒒 a probabilistic representation for the total rate of opportunity to recognize a given TAA during cancer progression. Implementing this model in cancer subtype-specific contexts thus requires a consideration of the per-cell division rates, for example.

      Reviewer #3 (Public Review):

      Cancer cell populations co-evolve under the pressure exerted by the recognition of tumor-associated antigens by the adaptive immune system. Here, George and Levine analyze how cancers could dynamically adapt the rate of tumor-associated antigen loss to optimize their probability of escape. This is an interesting hypothesis that if confirmed experimentally could potentially inform treatments. The authors analyze mathematically how such optimally adapting tumors gain and lose tumorassociated antigens over time. By simplifying the complex interplay of immune recognition and tumor evolution in a toy model, the authors are able to study questions of practical interest analytically or through stochastic simulations. They show how different model parameters relating to the tumor microenvironment and immune surveillance lead to different dynamics of tumor immunogenicity, and more immunologically hot or cold tumors.

      Simple models are important because they allow an exhaustive study of dynamical regimes for different parameters, such as has been done elegantly in this study. However, in this quest for simplification, the authors have not considered biological features that are likely to be of importance for understanding the process of cancer immune co-evolution in generality: tumor heterogeneity and immune recognition that only stochastically results in cancer elimination. In this sense, this paper might be seen as the opening act in a series of more sophisticated models, and the authors discuss avenues towards such further developments.

      We share the reviewer’s credence in foundational modeling for comprehensive predictions on available dynamical behavior for the important problem at hand. The reviewer also correctly points out that that future model refinement will be needed to further develop the foundational model developed in this work. In an attempt to illustrate one of the more reasonable generalizations, which is to include nontrivial sub-clonal heterogeneity in tumor antigens, we now describe how one would go about enhancing the existing model to address this, which has been added to the Methods and Discussion sections (see additions given under Editor Comment 2).

    1. Author Response

      Reviewer #1 (Public Review):

      N1-methyladenosine (m1A) is a rather intriguing RNA modification that can affect gene expression and RNA stability etc. The manuscript presented the exploration of RNAs m1A modification in normal and OGD/R-treated neurons and the effects of m1A on diverse RNAs. The authors showed that m1 modification can mediate circRNA/LncRNA-miRNA-mRNA mechanism and 3'UTR methylation of mRNAs can disturb miRNA-mRNA binding.

      The manuscript provides evidence for the following,

      1) The OGD/R can have impacts on various functions of m1A mRNAs and neuron fates.

      2) The m1A methylation of mRNA 3'UTRs disturbs the miRNA-mRNA binding.

      3) The authors identified three possible patterns of m1A modification regulation in neurons.

      The main merit of the manuscript is that the authors identified some critical features and patterns of m1A modification and in neurons and OGD/R-treated neurons. Moreover, the authors identified m1A modifications on different RNAs and explored the possible effects of m1A modification on the functions of different RNAs and the overall posttranscriptional regulation mechanism via an integrated approach of omics and bioinformatics. The major weakness of the manuscript is that technique details for many results are missing. Moreover, language inconsistences can be found throughout the manuscript. My general feeling about the manuscript is that some conclusions are rather superficial and therefore require validation and discussion.

      We appreciate your endorsement and constructive opinion concerning our work. Our study provides a comprehensive exploration of the characteristics of m1A modifications in neurons. According to your suggestions, we have specified the technique details in the revised manuscript have included our perspectives on some of the conclusions in the Discussion section. In addition, we have made changes to language inconsistences throughout the manuscript. We hope that the revisions made are acceptable and meet your requirements.

      Reviewer #2 (Public Review):

      In this manuscript, investigators explore the m1A modification, an important post-transcriptional regulatory mechanism, in primary normal neuron and OGD/R treated neuron. As far as I know, the regulatory m1A modification remains poorly characterized in neuron. This is an interesting topic in the context of epitranscriptomics. This paper not only provided us with a landscape of m1A modifications in neuron, but also explored the impact of m1A modifications on the biological functions of different RNA (mRNA, lncRNA, circRNA). In addition, the argument that m1A modification affects miRNA binding to other RNAs is of interest to reader, and the authors have performed a dual luciferase validation here to add feasibility to this conclusion.

      Thank you for your careful review of our study, and thank you for your appreciation on our work. The aim of this work was to explore the characteristics of m1A modification in neurons. We believe that incorporating your advice into the revised manuscript has enhanced the quality of our article.

      Reviewer #3 (Public Review):

      Overall, this is an interesting and well performed study that described a comprehensive landscape of m1A modification in primary neuron and investigated the role of m1A in the circRNA/lncRNA‒miRNA-mRNA regulatory network following OGD/R. The focus on the two different complex regulatory networks for differential expression and differential methylation is important and it will be a valuable resource for the research community that focuses on epitranscriptomics and central nerve system diseases. Collectively, the authors present an exciting piece of work that certainly adds to the literature regarding epitranscriptomic features in neuron. While interesting results obtained and the paper is nicely written, I have the following suggestions for minor revisions to improve the paper.

      We are grateful for your many positive comments and recognition of the potential of our work. Due to your suggestion, we found some shortcomings in our current manuscript. These suggestions were introduced and added value to our article. Our future research will continue to explore some conclusions obtained from this work. And we will continue to contribute our research outcomes in this field. Thank you again for your excellent suggestions!

      1) The authors have explored the role of m1A modification in neuron, but it would have been helpful if the authors described the significance of these findings in depth in some sections (Figure 5 and Figure 6) to enhance the value of the article.

      Thank you for your insightful suggestion. We agree to the comment that the significance of these findings should be described in detail. As such, we have added corresponding content to the Results (line 407-424) and Discussion (line 532-550) sections respectively.

      2) The authors should describe in detail the current research state of m1A modification and the significance of this study to the field of epitranscriptomics in the introduction and Discussion section.

      Thank you for your insightful suggestion. There is relatively little knowledge in the m1A modification area. It is really important to summarize the existing knowledge and research progress in a comprehensive and detailed manner. We conducted a comprehensive latest literature search and added corresponding content to the Introduction (line 78-83) and Discussion section (line 505-511, line 532-562) as you suggested.

    1. Author Response

      Reviewer 1 (Public Review):

      Protein oligomerization is essential to their in vivo function, and it is generally challenging to determine the distribution of oligomeric states and the corresponding conformational ensembles. By combining coarse-grained molecular dynamics simulations and experimental small-angle X-ray scattering profiles at different protein concentrations, the authors have established a robust approach to self-consistently determine the oligomeric state(s) and the conformational ensemble. The approach has been applied specifically to the speckle-type POZ protein (SPOP) and generated new insights into the conformational ensemble and structural features that determine the ensemble. The model was further tested by the analysis of several relevant mutants as well as models with different types of structural restraints. The results also support the isodesmic selfassociation model, with KD values comparable to those measured from independent experiments in the literature. The approach is potentially applicable to a broad set of systems.

      We thank the reviewer for taking the time to assess our work.

      Reviewer 2 (Public Review):

      This manuscript applied the SAXS data analysis of protein selfassembly by implementing the simultaneous fitting of intra- and intermolecular motions/conformations against SAXS data at a series of oligomerization states/concentrations. Despite several major assumptions hinted, a diverse pool of conformational and oligomeric candidates was generated from CG simulations, and more importantly, these candidates were fitted into these SAXS data to reach a reasonable agreement, suggesting a somewhat convergence (even if the ensemble-fitting could well be at a local minimal). This is considered a technical advance, given the fairly large numbers of both the oligomer fraction phi_i (i=1, ..., N) and the conformational weight w_k (k=1, ..., n), where N is the number of oligomers and n is the number of internal conformational states.

      We thank Prof. Yang for taking the time to assess our work.

      Central is optimizing phi_i and w_k, simultaneously. The former has been illustrated in Fig. 4 and SI-Fig. 7 for the total number of 60mers. The latter relies on an overfitting-preventing strategy, as shown in SI_Fig. 1, where an effective fraction cutoff was used from 0.1 to 1.0, as opposed to the number of conformational states. What are the numbers of conformational states for these oligomers? This should be quantifiable, e.g., defining the conformational differences by chi_2.

      The reviewer is correct that the entropy-based term for preventing overfitting is a key aspect of the method. In contrast to some of the other methods to combine experiments with simulations, our approach does, however, not require us to define individual conformational states. Instead, the weights in the entropy term refer to individual configurations rather than states, and we can thus integrate the SAXS experiments and simulations without, for example, clustering the conformations. Indeed, for most of the collective variables that we have calculated from the ensembles, such as the radii of gyration, end-to-end distances, and MATH-MATH distances, we observe continuous monomodal probability distributions, which suggests that it might be difficult to define a few distinct conformational states. For the MATH-BTB/BACK distance, we observe a trimodal distribution, and these distinct conformational states are shown as overlaid structures in Fig. 4i. Thus, while these “states” change populations during reweighting, this is the result from changing weights of the individual configurations.

      Reviewer 3 (Public Review):

      Molecular-level interpretations of SAXS data are challenging, especially for oligomeric systems of variable length with intrinsic flexibility and the possibility of multiple association interfaces. In order to make this challenge tractable, a number of assumptions are made here: 1) There is a single pathway by which individual domains associate first into homodimers and then into longer oligomers; 2) the association kinetics is isodesmic, which allows the direct calculation of oligomer distributions based on the given value of a single dissociation constant; 3) the internal dynamics within dimers is restricted essentially to relative domain-domain motions, that are sampled comprehensively via MD simulations. As a result, excellent fits to the SAXS data are obtained and the underlying conformational ensembles are highly plausible. The resulting models are useful to further understand SPOP function, especially in the context of liquidliquid phase separation.

      We thank the reviewer for taking time to read our work and for their various suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      This work provides a new general framework for estimating missing data on cervical cancer epidemiology, including sexual behavior, HPV prevalence, and cervical cancer incidence. These data are useful to determine impact projections of cervical cancer prevention. The authors suggest a three-step approach: 1) a clustering method applied on registries with an intermediate level of data availability to cluster cervical cancer incidence based on a Poisson-regression-based CEM algorithm, 2) a classification method applied on registries with a low level of data availability to classify cervical cancer incidence based on a Random Forest, 3) a projection method applied on missing data based on the mean of available data. The authors use India as a case study to implement this new methodology. Results indicate that two patterns of cervical cancer incidence are identified in India (high and low incidence), classifying all Indian states with missing data to a low incidence. From this classification, missing data is approximated using the mean of the available data within each cluster.

      A strength of this approach is that this methodology can be applied to regions with missing data, although a minimum set of information is needed. This makes it possible to have individual data for each unit in the region.

      One of the weaknesses of this methodology is the need for a minimum set of epidemiological data to enable impact projections. It is true that when epidemiological cervical cancer data is not available, authors mentioned that general indicators (e.g., human development index, geography) can be used but projections will be probably less realistic. As observed with other techniques, countries with fewer resources have less data available and cannot benefit from these types of techniques to have more adequate guidelines.

      Imputation of missing data is always a challenging issue. The technique proposed in this manuscript is an interesting new approach to missing data imputation that could be applied with a minimum set of available data. However, we must focus on obtaining reliable data from each region of the world to help local health authorities implement better preventive measures for the local population.

      We thank the reviewer for the considerate comments and suggestions and have tried to incorporate them as much as possible in the revised manuscript.

      As the reviewer has pointed out, the applicability of the proposed methodology depends on the available data. In our opinion, it is a general challenge for approximating missing data, rather than a weakness particular to our methodology. In fact, we believe that our framework is flexible to address missing data in many situations. To clarify this point, we have included the following sentences in the Discussion (lines 363-376, page 18): “It is important to note that, in general, the applicability the proposed framework depend on the actual amount of data available. However, in our opinion, it is a general challenge for approximating missing data, rather than a weakness particular to our methodology. By allowing possible adaptations, we believe that our framework is sufficient flexible to address missing data in many situations.”

      Finally, we fully agree with the reviewer that we should continue our effort to collect more data for countries where these are not available. The proposed framework should be considered as a solution to the situation in which collection of additional data is not or not yet possible.

      Reviewer #2 (Public Review):

      The burden of cervical cancer worldwide is well recognized. While prevention strategies, including vaccination against human papillomavirus (HPV), cervical cancer screening, and pre-cancer treatment, can reduce the burden of cervical cancer, access to these measures is still limited, especially in low- and middle-income countries. Since the impact of prevention strategies is heavily dependent on the disease's burden on a particular population, we need to know the latter to assess the impact of these context-specific prevention strategies.

      However, epidemiological data on cervical cancer are not always available for all geographical areas. This paper uses India as a case study to propose a framework called "Footprinting" to comprehensively evaluate the burden of cervical cancer. The authors applied a three-step analytical strategy to impute cervical cancer epidemiological data in states where this information was unavailable using data from cervical cancer incidence, HPV prevalence, and sexual behaviour from other regions. The findings suggest a high and low incidence of cervical cancer incidence in different parts of India; all Indian states with missing data were classified as low incidence.

      The proposed analytical strategy presents an important solution for imputing data from geographic areas of a country where data are missing.

      We thank the reviewer for the considerate comments and suggestions and have tried to incorporate them as much as possible in the revised manuscript.

      One conceptual limitation of this work is the lack of explanation or evidence that sexual behaviour can be used to approximate cervical cancer and/or HPV rates.

      A similar comment was raised by Reviewer #1. It is well established that sexual contact is the only transmission route of carcinogenic HPV infection, and hence necessary for the occurrence of cervical cancer [ref #26 Vaccerella 2006, Muñoz 1992 Int J Cancer 52, 743-749].

      We have included sexual behaviour variables that have previously been shown to be risk factors of HPV infection and cervical cancer risk, e.g., age of sexual debut and number of sexual partners [ref #26 Vaccerella 2006, ref #27 Schulte-Frohlinde 2021]. Furthermore, we used variables that are commonly available so that the analyses can be easily applied to other settings.

      As far as we know, there is no established set of sexual behaviour variables for predicting the patterns of HPV prevalence and cervical cancer incidence. The good prediction performance in the India case study shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.

      To clarify these points we have included the following paragraph in the Discussion (lines 319-325, page 16): “In our analysis of classifying clusters of cervical cancer incidence, we only included some of the sexual behaviour variables available in the NACO report [15]. We selected variables that were previously shown to be risk factors of HPV infection and cervical cancer risk and that are commonly available so that the analyses can be easily applied to other settings, e.g., age of sexual debut and number of sexual partners [26, 27]. As far as we know, there is no established set of sexual behaviour variables for predicting the patterns of HPV prevalence and cervical cancer incidence. The good prediction performance shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.”

      Also, full information on the three main indicators is only available in two states. This is used to impute the values for the other states.

      Indeed, HPV prevalence data were only available for two states. While we acknowledge that this affects the certainty in the imputed HPV prevalence, we considered the imputed results to be satisfactory based on the good accordance with the cervical cancer incidence data we found in the validation step (lines 286-23, page 14). We verified that the ratio of HPV prevalence between the high-and low-incidence cluster (1.7-fold) was very similar to the ratio of age-standardized cervical cancer incidence (1.9-fold).

      Furthermore, we note that previous modelling works on India relied on even less data, namely one source of HPV prevalence and cervical cancer incidence data [ref #29 Brisson 2020, Diaz 2008 Br J Cancer].

      Moreover, the available data used in this study also present some limitations; for example, cervical cancer incidence data were from 2012 to 2016, while sex behaviour data were from 2006. This large gap is likely to have a significant cohort effect, especially given changes in sexual norms in Western countries over the last few decades, which may have gradually influenced other countries, especially in this age of the internet and social media.

      In our opinion, for the purpose of modelling the natural history of cervical cancer, it is not necessarily more adequate to use the most recent data of sexual behaviour data. Arguably, as sexual behaviour is the “exposure” for the “outcome” cervical cancer, calibration of HPV transmission and cervical cancer model is best done with data of sexual behaviour and cervical from the same cohorts, hence, sexual behaviour data from an earlier period than the cervical cancer data.

      In addition, if changes of sexual behaviour occur across the country, it should not affect the clustering much.

      Finally, due to delay in reporting, cervical cancer incidence from the period 2012-2016 is the most recent edition at the moment of writing. Regarding sexual behaviour data, there is at the moment no later edition of the NACO report published after that of year 2006.

      Finally, it would be interesting to validate this methodology to confirm its utility.

      We agree that it would be very interesting to validate this proposed methodology in other regions. Unfortunately, it was beyond the scope of this work. Currently, we are working on a project in which we try to apply footprinting to a collection of low- and middle-income countries.

      The proposed framework's strength is difficult to evaluate because the steps and justification for the model variables were not clearly presented, nor were the models validated.

      We acknowledge that the framework could be more clearly presented and have added additional explanation in the following places to do so:

      • Concerning the framework steps, in Method (144-163, pages 7-8): “For convenience of explanation, we assumed earlier that data availability occurs hierarchically. However, the framework can also be applied with less stringent data requirements. First, the source of Footprint data needs not necessarily cover all geographical units. It is still possible to train a classifier in the classification step with Footprint data available for only a part of clustered geographical units. Second, if none of the key cervical cancer epidemiological data (sexual behavior, HPV prevalence, and cervical cancer incidence data) have large enough coverage to serve as Footprint data, alternatives indicators of similarity, such as human development index and geographical distance, could also be used as substitute. However, the resulting classification performance might be suboptimal, as we expect these indicators to correlate less well with cervical cancer risk. Third, for the projection step, data of cervical cancer incidence, sexual behavior, and HPV prevalence needed for calibration of projection models need not necessarily belong to the same geographical unit. Calibration can be performed as long as the three types of data are available within each cluster.

      With these less stringent data requirements, the proposed framework should sufficient flexible to be applied to many situations. However, one should still be cautious in applying the framework when there are little data. This means that, in some cases, we might need to exclude from the analysis some geographical units with too little data or redefine bigger geographical units if the data are not granular enough. Furthermore, we should assess the goodness-of-fit of the obtained clustering, performance of classification, correlation of data within different clusters, and calibration fits to ensure the validity of the final impact projections.”

      • Concerning selection of model variables (lines 319-325, page 16): “In our analysis of classifying clusters of cervical cancer incidence, we only included some of the sexual behaviour variables available in the NACO report [15]. We selected variables that were previously shown to be risk factors of HPV infection and cervical cancer risk and that are commonly available (e.g., age of sexual debut and number of sexual partners) so that the analyses can be easily applied to other settings [26, 27]. In the India case study, the good classification performance shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.”

      Based on the authors' interpretation of the framework findings, this framework may help extrapolate data from one country to another. I'm curious as to whether this framework could be applied across states and countries.

      We thank the reviewer for this comment. Currently, we are working on a multi-year projects in which we try to apply the framework to all low- and middle-income countries.

    1. Author Response:

      eLife assessment

      This work is an attempt to establish conditions that accurately and efficiently mimic a drought response in Arabidopsis grown on defined agar-solidified media - an admirable goal as a reliable experimental system is key to conducting successful low water potential experiments and would enable high-throughput genetic screening (and GWAS) to assess the impacts of environmental perturbations on various genetic backgrounds. The authors compare transcriptome patterns of plant subjected to water limitation imposed using different experimental systems. The work is valuable in that it lays out the challenges of such an endeavor and points out shortcomings of previous attempts. However, a lack of water relations measurements, incomplete experimental design, and lack of critical evaluation of these methods in light of previous results render the proposed new methodology inadequate.

      We thank eLife for the initial assessment and comments to our work. In our revised manuscript we plan to address the main concerns raised by reviewers. Specifically, we plan to perform water relations measurements for all our treatment assays, as well as explore the separate effects agar hardening and nutrient concentration have in our low-water agar assay. We will also provide a more in depth critical review of our results compared to previously published results.

      Reviewer #1 (Public Review):

      High-throughput genetic screening is a powerful approach to elucidate genes and gene networks involved in a variety of biological events. Such screens are well established in single-celled organisms (i.e. CRISPR-based K/O in tissue culture or unicellular organisms; screens of natural variants in response to drugs). It is desirable to extend such methodology, for example to Arabidopsis where more than 1000 ecotypes from around the Northern hemisphere are available for study. These ecotypes may be locally adapted and are fully sequenced, so the system is set up for powerful exploration of GxE. But to do so, establishing consistent "in vitro" conditions that mimic ecologically relevant conditions like drought is essential. 

      The authors note that previous attempts to mimic drought response have shortcomings, many of which are revealed by 'omics type analysis. For example, three treatments thought to induce osmotic stress; the addition of PEG, mannitol, or NaCl, fail to elicit a transcriptional response that is comparable to that of bonafide drought. As an alternative, the authors suggest using a low water-agar assay, which in the things they measure, does a better job of mimicking osmotic stress responses. The major issues with this assay are, however, that it introduces another set of issues, for example, changing agar concentration can lead to mechanical effects, as illustrated nicely in the work of Olivier Hamant's group.

      We thank the reviewer for their comments. We hypothesize that our low-water agar assay is able to replicate drought gene expression patterns through a combination of hardened agar and higher nutrient concentration. However, we did not explore the separate effects each of these factors may play in eliciting such responses. Thus, in our revised manuscript, we will explore what role the mechanical effects of changing agar concentration has on root gene expression. However, we suspect that the mechanical effects introduced by hard agar does not introduce another issue per se, but in fact may help with replicating the transcriptional effects seen under drought.

      Reviewer #2 (Public Review):

      […] The authors have not always considered literature that would be relevant to their topic. For example, there is a number of studies that have reported (and deposited in the public database) transcriptome analysis of plants on PEG-plates or plants exposed to well-controlled, moderate severity soil drying assays (for the latter, check the paper of Des Marais et al. and others, for the former, Verslues and colleagues have published a series of studies using PEG-agar plates). They also overlook studies that have recorded growth responses of wild type and a range of mutants on properly prepared PEG plates and found that those results agree well with results when plants are exposed to a controlled, partial soil drying to impose a similar low water potential stress. In short, the authors need to make such comparisons to other data and think more about what may be wrong with their own experimental designs before making any sweeping conclusions about what is suitable or not suitable for imposing low water potential stress. 

      To solve the problem of using these other systems to impose low water potential stress, the authors propose the seemingly logical (but overly simplistic) idea of adding less water to the same mix of nutrients and agar. Because the increased agar concentration does not substantially influence water potential (the agar polymerizes and thus is not osmotically active), what they are essentially doing is using a concentrated solution of macronutrients in the growth media to impose stress. This is a rediscovery of an old proposal that concentrated macronutrient solutions could be used to study the osmotic component of salt stress (see older papers of Rana Munns). There are also effects of using very hard agar that is of unclear relationship to actual drought stress and low water potential. Thus, I see no reason to think that this would be a better method to impose low water potential. 

      We thank the reviewer for their comments. In our revised manuscript, we will address points regarding plant and soil water potential; similar concerns were also raised by Reviewer 1 and 3. We note that we report vermiculite water content in Supplementary Table 4.

      We would like to clarify that both the PEG media and overlay solution were buffered - we did not include this within the written description in the methods, but will do in our revised manuscript.

      We agree with the reviewer’s concern that it may be problematic to compare the transcriptomic profiles of seedling and mature plants. In light of this, we plan to explore what effects our treatment media has on mature rosettes.

      We note that we do not claim that PEG is unable to produce low-water potential responses similar to partial soil drying. Indeed, we indicate that it is a good technique for eliciting phenotypes comparable to drought at the physiological level (line 48). Rather, we claim that PEG is unable to produce gene expression responses that are sufficiently similar to partial vermiculite drying.

      Reviewer #3 (Public Review):

      […] The authors observed that gene expression responses of roots in their 'low-water agar' assay resembled more closely the water deficit in pots compared to the PEG, mannitol, and salt treatments (all at the highest dose). In particular, 28 % of PEG led to the down-regulation of many genes that were up-regulated under drought in pots. Through GO term analysis, it was pointed out that this may be due to the negative effect of PEG on oxygen solubility since downregulated genes were over-represented in oxygen-related categories. The data also shows that the treatment with abscisic acid on plates was very good at simulating drought in roots. Gene expression changes in shoots showed generally a high concordance between all treatments at the highest dose and water deficit in pots, with mannitol being the closest match. This is surprising, since plants grow in plates under non-transpiring conditions, while a mismatch between water loss by transpiration on water supply via the roots leads to drought symptoms such as wilting in pot and field-grown plants. The authors concluded that their 'low-water agar' assay provides a better alternative to simulate drought on plates. 

      Strengths: 

      The development of a more robust assay to simulate drought on plates to allow for high-throughput screening is certainly an important goal since many phenotypes that are discovered on plates cannot be recapitulated on the soil. Adding less water to the media mix and thereby increasing agar strength and nutrient concentration appears to be a good approach since nutrients are also concentrated in soils during water deficit, as pointed out by the authors. To my knowledge, this approach has not specifically been used to simulate drought on plates previously. Comparing their new 'low-water agar' assay to popular treatments with PEG, mannitol, salt, and abscisic acid, as well as plants grown in pots on vermiculite led to a comprehensive overview of how these treatments affect gene expression changes that surpass previous studies. It is promising that the impact of 'low-water agar' on the shoot size of 20 diverse Arabidopsis accessions shows some association with plant fitness under drought in the field. Their methodology could be powerful in identifying a better substitute for plate-based high-throughput drought assays that have an emphasis on gene expression changes. 

      Weaknesses: 

      While the authors use a good methodological framework to compare the different drought treatments, gene expression changes were only compared between the highest dose of each stress assay (Fig. 2B, 3B). From Fig. 1F it appears that gene expression changes depend significantly on the level of stress that is imposed. Therefore, their conclusion that the 'low-water agar' assay is better at simulating drought is only valid when comparing the highest dose of each treatment and only for gene expression changes in roots. Considering how comparable different levels of stress were in this study leads to another weakness. The authors correctly point out that PEG, mannitol, and salt are used due to their ability to lower the water potential through an increase in osmotic strength (L. 45/46). In soils, water deficit leads to lower water potential, due to the concentration of nutrients (as pointed out in L. 171), as well as higher adhesion forces of water molecules to soil particles and a decline in soil hydraulic conductivity for water, which causes an imbalance between supply and demand (see Juenger and Verslues, The Plant Cell 2022 for a recent review). While the authors selected three different doses for each treatment that are commonly used in the literature, these are not necessarily comparable on a physiological level. For example, 200 mM mannitol has an approximate osmotic potential of around -5 bar (Michel et al. Plant Physiol. 1983) whereas 28 % PEG has an osmotic potential closer to -10 bar (Michel et al. Plant Physiol. 1973). It also remains unclear how the increase in agar concentration versus the increase in nutrient concentration in the 'low-water agar' affect water potentials. For these reasons it cannot be known whether a better match of the 'low-water agar' at the 28% dose to water deficit in pots for roots in comparison to the other treatments is due to a good match in stress levels with the 'low-water agar' or adverse side-effect of PEG, mannitol, or and salt on gene regulation. Lastly, since only two biological replicates for RNA sequencing were collected per treatment, it is not possible to know how much variance exists and if this variance is greater than the treatments themselves. 

      We thank the reviewer for their comments. In our statistical analyses, we found that dose-responsive genes (as fit by a linear model) were very similar to those genes found differentially expressed at the highest dose. Thus, for clarity, we decided to simply present the genes differentially expressed at the highest dose. We see now that this might have been an oversimplification. In our revised manuscript, we will present genes that are dose responsive across the range of treatment doses, thus providing more evidence that lower doses of low-water agar are also capable of simulating drought (as is suggested by overlap analysis of Figure 2A).

      Additionally, we will also explore the osmotic potential of each of our different assays to provide a better benchmark of how comparable each of our treatments are (as similarly requested by Reviewer 1 and 2). Lastly, to address concerns regarding the size of variance in gene expression, we will sequence a 3rd replicate of RNA.

    1. Author Response

      Reviewer #2 (Public Review):

      1) Although the images and videos were of great quality, the results derived from them provided little new knowledge and few conceptual insights into male reproductive tract biology and basically confirmed what has been published using traditional methods. For example, the high intensity of the vascular network in the initial segment was previously reported by Abe in 1984 and Suzuki in 1982; the pattern of the major lymphatic vessel and drainage was beautifully depicted by Perez-Clavier, 1982.

      We thank the reviewer for his/her appreciative comments regarding the quality of the images/videos we provide in this study. We do not fully agree with his/her assessment of the lack of novelty. Our work confirms earlier reports that are now dated (1980s), which in itself is worth mentioning for the interested community, especially when the confirmation uses the most advanced technologies available today. We have never said that nothing was done in the past, and we have acknowledged all past contributors (including those mentioned by the reviewer) by pointing out the limitations of the technical tools that were available at the time. In addition, our current work provides a more comprehensive and global view by extending our approach to the entire mouse epididymis, whereas previous work was much more limited.

      2) The authors were very cautious when interpreting the results of marker immunostaining however these markers were not specific for a definite cell type. For example, as the authors stated, VEGFR3 marks both lymphatic vessels and fenestrated blood vessels. how could the authors claim the VEGFR3+ network was lymphatic? The authors claimed that they used three markers for the lymphatic vessel. But staining results of the networks were very different. How could the author make conclusions about the network of lymphatic vessels in the epididymis?

      We broadly agree with the reviewer and have made it clear that one cannot be 100% sure that all the VEGFR3+ structures we present are lymphatic. However, in total, we used 4 documented lymphatic markers (not 3 as mentioned by the reviewer) which are (VEGFR3, LYVE1, PROX1 and PDPN). Three of them give very similar profiles, while only PDPN shows some differences. We are currently studying in more detail the expression of PDPN in the mouse epididymis because we speculate that this marker may target a population of pluripotent cells in this tissue. Therefore, with the 3 similar profiles and with the subtraction of PVLAP+ structures, we are pretty confident that what we show corresponds to the different lymphatic structures.

      3) To understand the vascular network development in the epididymis, would the authors please look at the fetal stage when the vascular network is established in the first place? Wolffian duct tissues are much smaller and thinner and would be amenable for 3D imaging probably even without clearing.

      We generally agree with the reviewer that this could be an interesting addition. However, it represents a significant amount of additional work. Organ clearing will certainly be required because it is unlikely that Wolffian duct will be sufficiently transparent to allow lightsheet microscopy. In the literature, the study of Wolffian duct relies primarily on whole mounts, inclusions, and cryosections. Besides the fact that this represents a lot of extra work, we are not totally convinced that this would be of much use. A key reason is that the epididymis is an organ that differentiates completely after birth (Robaire and Hinton, 2015). It is reported that differentiation of mouse caput segment 1 occurs around 19DPN (Xu et al., 2016) and is intimately related to the development of the vasculature (Lebarr et al., 1986). Regarding the lymphatic network, Swingen et al, (2012) reports that lymphangiogenesis in the mouse testis and epididymis is initiated late in gestation after 15DPC. Videos showing the external lymphatic vessels of the testis and epididymis at 17.5DPC can be seen at https://doi.org/10.1371/journal.pone.0052620.s002. The authors indicate that lymphangiogenesis occurs via sprouting from the adjacent mesonephros. We hypothesize that the more internal lymphatics evolve between birth and 10DPN, which corresponds to the time when we observed LEPC Lyve1pos cells.

      4) Immunofluorescence staining of VEGF factors was not convincing. As a secreted factor, VEGF will be secreted out of the cells, would it be detected more in the interstitium? I am always skeptical about the results of immunostaining secreted growth factors. Would it be possible to perform in situ or RNAscope to confirm the spatial expression pattern of VEGFs?

      Well, active VEGF factors result from alternative mRNA splicing events and posttranslational proteolytic cleavage. Therefore, in our opinion, the study of VEGF mRNA by in situ hybridization or RNAscope analysis will not be very informative about the actual presence of active forms of VEGF in the epididymis. If necessary, we can provide as supplementary material immunohistochemistry data showing the presence of VEFG-A in the epididymal principal cells. Our major objective with these data was to show that VEGF factors and their respective receptors were present in the epididymis. Nevertheless, in an attempt to convince the reviewer, we provide as accompanying data to this rebuttal letter new sets of figures (Figures VEGF-A-response editor & VEGFC /VEGF-D-response editor) that we believe can improve the perception of our data. If the editorial office feels it is necessary, these figures could be added to the supplementary figure set (as Figure 6figure supplement 1 and Figure 6-figure supplement 2). For VEGF-A the data exists already in the literature as we have indicated (Korpelainen, 1998). In fine, our goal was not to show which cell types of the epididymis epithelium produce VEGFs but rather than VEGF factors and their receptors where there in order to support angiogenesis or lymphangiogenic activity in the tissue. In addition, we hypothesize that because septa have been reported to constitute barriers between segments restricting passive diffusion of molecules (Turner et al., 2003; Stammler et al., 2015), the VEGF factors are expected to be produced locally.

      Figure VEGF-A - response editor : Immunofluorescence of the angiogenic ligand VEGF-A in the epididymis. Figure 6 shows that this ligand is mainly found in the caput and more precisely in S1.It is very strongly expressed in the peritubular microvascularization of the SI which expresses the VEGFR3:YFP transgene whereas it is less expressed by intertubular blood vessels (asterisk). This seems to indicate that it is the peritubular vessels that are in the majority responsible for the angiogenic activity measured in our study. Furthermore, it is expressed by the epithelium as secretory vesicles (IS, and S3 and enlargement) which is in agreement with in situ hybridization work performed by Korpelainene E.I et al J.Cell.biol 1998). The enlargement shown in S3_Z shows the sagital plane of the tubule where one can distinguish VEGFR:YFP positive cells that strongly express are also VEGF-A positive indicating that the same cells of the epithelium express both the receptor and the ligand. Here the transgene is detected directly without the use of an anti-GFP which allows to enhance the signal.

      Figure VEGF-C / VEGF-D - response editor : Immunofluorescence of VEGF-C and VEGF-D lymphangiogenic ligands in the epididymis. This figure shows that these ligands are mainly found in the interstitial tissue throughout the organ with a higher proportion in the caudal part. This expression may be largely driven by fibroblasts, which are widely represented in the interstitium, or by endothelial cells, since these two ligands are expressed by these cell types. However, as shown in the figures and in the enlargement of panel A, VEGF-C is also produced by epithelial cells within what may appear as secretory vesicles. In contrast, for VEGF-D, we observe only few weakly positive epithelial cells (panel B). These ligands are also detected in the lumen of epididymal tubules (visible for VEGF-C Panel A S2). This presence may be explained by lumicrine transfer from the testis, in addition to secretion from epithelial cells. Here the transgene is detected directly without the use of an anti-GFP which allows to enhance the signal.

      5) The study is descriptive and does not provide functional and mechanistic insights. Maybe, the combination of 3D imaging with lineage tracing of endothelium cells or ligation study (removal/ligation of the certain vessel) would help better understand how the vascular network is established and their functional significance.

      The technical approaches suggested by the reviewer could certainly improve our understanding of the rather complex epididymal vascular network. Taken together, they represent the body of a comprehensive follow-up study that is worth undertaking.

      6) Immune response is among many physiological processes in which vascular networks play significant roles. Discussion would be needed in other physiological processes, such as tissue metabolism and stem/progenitor cell niche microenvironment.

      We agree with the reviewer that the mammalian vasculature is involved in other physiological processes beyond immune/inflammatory responses. We have deliberately chosen to focus our discussion on the inflammatory and immune context of the epididymis, as we believe this is the most relevant aspect. It is also in full agreement with the research that our team has been conducting for 15 years to try to understand the complex orchestration of tolerance versus immune surveillance in this territory. This is a finely tuned process that, if properly understood, can help to understand and appropriately treat clinical situations of infertility and/or urological problems. As our discussion section is already quite long, we feel that it was not justified to extend it further on other aspects. However, in response to the reviewer's suggestion, we now mention at the end of the first paragraph of the discussion that the epididymal vascular network is likely to serve different processes in this tissue (page 9, lines 299 to 303).

      7) How could the author determine the Cd-A labeled vessel in Fig 1 was an artery, not a vein? This leads to another critical question. Would it be possible to stain with artery and vein markers to help illustrate the blood flow directions of the vessel?

      The reviewer is right on the fact that we arbitrarily called the Cd-A vessel in Figure 1 an artery. Cd-A is not an acronym we use anymore. What we have done is to use the acronym SEA (superior epididymal artery) to indicate what we firmly believe to be an artery, as also suggested by previous literature (e.g., Suzuki, 1982; Abe et al, 1982) in which this same structure has been consistently referred to as an artery. For other blood vessels, we now have used the acronym "Cd-BV" because we do not know whether we are dealing with a vein or an artery as rightfully pointed out by the reviewer. This is clearly stated in the legend of Figure 1.

    1. Author Resposnse

      Reviewer #2 (Public Review):

      This manuscript reassesses the strength of evidence for rapid human germline mutation spectrum evolution, using high coverage whole genome sequencing data and paying particular attention to the potential impact of confounders like biased gene conversion. The authors also refute some recently published arguments that historical changes in the age of reproduction might explain the existence of such mutation spectrum changes. My overall impression is that the paper presents a useful new angle for studying mutation spectrum evolution, and the analysis is nicely suited to addressing whether a particular model such as the parental age model can explain a set of observed polymorphism data. My main criticism is that the paper overstates certain weaknesses of previously published papers on mutation spectrum evolution as well as the generation time hypothesis; correcting these oversimplifications would more accurately capture what the paper's new analyses add to the state of knowledge in these areas.

      As part of the motivation for the current study, the introduction states in lines 97-99 that "it thus remains unclear if the numerous observed [mutation spectrum] differences across human populations stem from rapid evolution of the mutation process itself, other evolutionary processes, or technical factors." This seems to overstate the uncertainty that existed prior to this study, given that Speidel, et al. 2021 found elevated TCC>TTC fractions in ancient genomes from a specific ancient European population, which seems like pretty airtight evidence that this historical mutation rate increase really happened. In addition, earlier papers (Harris 2015, Mathieson & Reich 2016, Harris & Pritchard 2017) already presented analyses rejecting the hypothesis that biased gene conversion or genetic drift could explain the reported patterns-in fact, the Mathieson & Reich paper reports one mutation spectrum difference between populations that they conclude is an artifact caused by the Native American population bottleneck, but they conclude that other mutation spectrum differences appear more robust.

      We completely agree with the reviewer that there has been compelling evidence from multiple independent groups supporting transient elevation of TCC>TTC mutation rate in Europeans. Beyond the TCC signal, however, the mechanisms underlying the observed differences in mutation spectrum across populations remain unclear. In particular, several biological and technical factors impact the mutation spectrum and none of the previous studies have investigated their effects, independently or altogether. Thus, it remains unclear if the mutation rate is evolving rapidly across populations, or if one or more factors (like biased gene conversion) differ across groups or over evolutionary time. Our analysis framework attempts to control these effects together to more reliably investigate the effects of various factors and examine when and how often there has been evolution of mutation rate over the course of human evolution.

      As the authors acknowledge in the discussion of their own results, biased gene conversion and non-equilibrium demography are difficult confounders to deal with, and neither previous papers nor the current paper are able to do this in a way that is 100% foolproof. The current manuscript makes a valuable contribution by presenting new ways of dealing with these issues, particularly since previous papers' work on this topic was often confined to supplementary material, but it seems appropriate to acknowledge that earlier papers discussed the potential impacts of biased gene conversion and demographic complexity and presented their own analyses arguing that these phenomena were poor explanations for the existence of mutation spectrum differences between populations.

      For the most part, I found the paper's introduction to be a useful summary of previous work, but there are a few additional places where the limitations of previous work could be described more clearly. I'd suggest noting that the data artifacts discovered by Anderson-Trocmé, et al. were restricted to a few old samples and that the large differences the current manuscript focuses on were never implicated as potential cell line artifacts. In addition, when the authors mention that their new approach includes "minimiz[ing] confounding effects of selection by removing constrained regions and known targets of selection" (lines 106-107), they should note that earlier papers like Harris & Pritchard 2017 also excluded conserved regions and exons.

      We agree with the reviewer that some of the previous work also attempted to account for the contributions of selection or other factors in post hoc ways; we now acknowledge this in the Results section more explicitly. However, we note that our contribution is in introducing a framework to account for these effects a priori and then assess if there are differences in mutation spectrum across populations and over the course of human evolution. In particular, an innovation of our framework is to better control for the effect of gBGC, which has not been done in previous studies.

      One innovative aspect of the current paper's approach is the use of allele ages inferred by Relate, which certainly has advantages over using allele frequencies as a proxy for allele age. Though the authors of Relate previously used this approach to study mutation spectrum evolution, they did not perform such a thorough investigation of ancient alleles and collapsed mutation type ratios. I like the authors' approach of building uncertainty into the use of Relate's age estimates, but I wonder about the validity of assuming that the allele age posterior probability is distributed uniformly between the upper and lower confidence bounds. Can the authors address why this is more appropriate than some kind of peaked distribution like a beta distribution?

      The lower and upper bounds of the allele age reported by Relate reflect the start and end points of the branch that the mutation falls on in the reconstructed genealogical tree. If Relate does a perfect job in reconstructing the tree and estimating the branch lengths, the mutation age should be uniformly distributed in the inferred interval. It is unrealistic that Relate can perform perfectly in tree building, and there is likely considerable uncertainty and even bias in the time to endpoints of the branch. Unfortunately, Relate does not report the uncertainty in the lower and upper bounds of the mutation age, so we were not able to model the posterior distribution of the allele age properly. However, assuming a uniform distribution of the mutation age between the upper and lower confidence bounds should be valid to first approximation.

      I would also argue that the statement on line 104 about Relate's reliability is not yet supported by data-there is certainly value in using Relate ages to investigate mutation spectrum change over time and compare this to what has been seen using allele frequencies, but I don't think we know enough yet to say that the Relate ages are definitely more reliable. Relate's estimates might be biased by the same processes like selection and demography that make allele frequencies challenging to interpret. The paper's statements about the limitations of allele frequencies are fair, but there is always a tradeoff between the clear drawbacks of simple summary statistics and the more cryptic possible blind spots of complicated "black box" algorithms (in the case of Relate, an MCMC that needs to converge properly). DeWitt, et al. 2021 noted that the demographic history inferred by Relate doesn't accurately predict the underlying data's site frequency spectrum, indicating that the associated allele ages might have some problems that need to be better characterized. While testing Relate for biases is beyond the scope of this work, the introduction should acknowledge that the accuracy and precision of its time estimates are still somewhat uncertain.

      We agree with the reviewer and have now added a paragraph in the Discussion highlighting some issues of Relate regarding mutation age estimation and ancestral allele polarization.

      The paper's results on C>T mutations in Europeans versus Africans are a nice confirmation of previous results, including the observation from Mathieson & Reich that neither SBS7 nor SBS11 is a good match for the mutational signature at play. More novel is the ancient mutational signature enriched in Africa and the interrogation of the ability of parental age to explain the observed patterns. I just have a few minor suggestions regarding these analyses:

      1) I like the idea of using maternal age C>G hotspots to test the plausibility of the maternal age as an explanatory factor, but I think this would be more convincing with the addition of a power analysis. Given two populations that have average maternal ages of 20 and 40, and the same population sample sizes available from 1000 Genomes, can the authors calculate whether the results they'd predict are any different from what is observed (i.e. no significant differences within the maternal hotspots and significant differences outside of these regions)?

      We thank the review for this suggestion. We performed simulations to estimate the power of observing significant inter-population differences within and outside the maternal C>G mutation hotspots, under the assumption that all differences in the mutation spectrum between the two populations are related to the parental age (i.e., generation time). We found that, because of the extraordinarily strong maternal age effects in the maternal mutation hotspots, the power for detecting variation in C>G/T>A ratio due to change in generation age is much greater within maternal hotspots than outside, despite the smaller total size of the maternal hotspot regions (and hence fewer SNPs; Figure 3 – figure supplement 4). For example, even with an age difference of five years, there is nearly 100% power to detect significant differences in the maternal hotspots, compared to <12% for regions outside the maternal hotspots. In other words, if inter-population differences in the mutation spectrum are driven by differences in maternal age across populations, we should have enough power to observe a signal in the maternal hotspot regions alone, the lack of which (Figure 2C) strongly suggests that maternal age is not driving these signals.

      2) Is it possible that the T>C/T>G ratio is elevated in all variants above a certain age but shows up as an African-specific signal because the African population retains more segregating variation in this age range, whereas non-African populations have fixed or lost more of this variation? Since Durvasula & Sankararaman identified putative tracts of super-archaic introgression within Africans, is it possible to test whether the mutation spectrum signal is enriched within those tracts?

      The observation that the T>C / T>G signal is driven by TpG>CpG mutations (which might be mis-polarized CpG transitions) casts a doubt on the signal. Given the unresolved technical issue, we have now removed any discussion of the biological explanations behind the signal and instead focus on describing the challenges with ancestral allele polarization under context-dependent mutation rate variation.

      3) Although Coll Macià, et al. argued that generation time is capable of explaining all mutation spectrum differences between populations, including the excess of TCC>TTC in Europeans, Wang et al. argue something slightly different. They exclude TCC>TTC and the other major components of the European signature from their analysis and then argue that parental age can explain the rest of the differences between populations. I think the analysis in this paper convincingly refutes the Coll Macià, et al. argument, but refuting the Wang, et al. version would require excluding the same mutation types that are excluded in that paper.

      Although we did not present an analysis that explicitly excludes TCC>TTC mutations, our analysis still shows that generation time alone cannot explain the remaining variations in the mutation spectrum observed (Figure 4). Specifically, the temporal trend of T>C/T>G ratio would suggest a decreasing generation time of Europeans with time, whereas the C>G/T>A ratio suggests the opposite. In addition, the power analysis for C>G maternal hotspots (suggested by the reviewer) further supports that the inter-population differences observed cannot be entirely driven by differences in parental ages. These observations, which do not involve TCC>TTC mutations, strongly suggest that generation time is not the sole or primary driver of differences in mutation spectrum across populations. Further, our analysis shows that several technical issues and biological processes, in addition to changes in life history traits can lead to changes in the mutation spectrum of polymorphisms. Therefore, inferring generation time using changes in mutation spectrum is not straightforward as Wang et al. proposed, because generation time is not the only or dominant factor impacting mutation spectrum.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an awesome comprehensive manuscript. Authors start by sorting putative stromal cellcontaining BM non-hematopoietic (CD235a-/CD45-) plus additional CD271+/CD235a/CD45- populations to identify nine individual stromal identities by scRNA-seq. The dual sorting strategy is a clever trick as it enriches for rare stromal (progenitor) cell signals but may suffer a certain bias towards CD271+ stromal progenitors. The lack of readable signatures already among CD45-/CD45- sorts might argue against this fear. This reviewer would appreciate a brief discussion on number & phenotype of putative additional MSSC phenotypes in light of the fact that the majority of 'blood lineage(s)'-negative scRNA-seq signatures identified blood cell progenitor identities (glycophorin A-negative & leukocyte common antigen-negative). The nine stromal cell entities share the CXCL12, VCAN, LEPR main signature. Perhaps the authors could speculate if future studies using VCAN or LEPRbased sort strategies could identify additional stromal progenitor identities?

      We would like to thank the reviewer for critically evaluating our work and for the generally positive evaluation of the paper. We apologize for delayed resubmission as it took a long time for a specific antibody to arrive to complete the confocal microscopy analyses.

      The reviewer asks for a brief discussion on the cell numbers and phenotypes of MSSC phenotypes. The cell numbers and percentages of MSSC in sorted CD45low/-CD235a- and CD45low/-CD235a-CD271+ cells can be found in Supplementary File 3 and we have added a summary of the phenotypes of MSSC in the new Supplementary File 7.

      Due to the extremely low frequency of stromal cells in human bone marrow, we chose a sorting strategy that also included CD45low cells (Fig 1A) to ensure that no stromal cells were excluded from the analysis. Although stromal elements are certainly enriched using this approach, the CD45low population contains several different hematopoietic cell types. These include CD34+ HSPCs which are characterized by low CD45 expression2, as well as the CD45low-expressing fractions of other hematopoietic cell populations such as B cells, T cells, NK cells, megakaryocytes, monocytes, dendritic cells, and granulocytes. Furthermore, CD235a- late-stage erythroid progenitors, which are negative for CD45, are represented as well. Of note, our data are consistent with previously reported murine studies showing the presence of a number of hematopoietic populations in CD45- cells, which accounted for the majority of CD45-Ter119-CD31- murine BM cells3,4. However, despite a certain enrichment of stromal elements in the CD45low cell fraction, frequencies were still too low to allow for a detailed analysis of this important bone marrow compartment. This prompted us to adopt the stromal cell-enrichment strategy as described in the manuscript to achieve a better resolution of the stromal compartment. In fact, sorting based on CD45low/-CD235a-CD271+ allowed us to sufficiently enrich bone marrow stromal cells to be clearly detectable in scRNAseq analysis. According to the reviewer’s suggestion, a brief discussion on this issue is now included in the Discussion (page 28, lines 10-15).

      The reviewer also suggested using VCAN or LEPR-based sorting strategy to identify additional stromal identities in future studies.

      However, as an extracellular matrix protein, FACS analysis of cellular VCAN expression can only be achieved based on its intracellular expression after fixation and permeabilization5,6. Additionally, while VCAN is highly and ubiquitously expressed by stromal clusters, VCAN is also expressed by monocytes (cluster 36). Therefore, VCAN is not an optimal marker to isolate viable stromal cells.

      LEPR is the marker that was reported to identify the majority of colony-forming cells in adult murine bone marrow7. We have previously reported that the majority of human adult bone marrow CFU-Fs is contained in the LEPR+ fraction 8. In our current scRNAseq surface marker profiling analysis, group A cells showed high expression of several canonical stromal markers including VCAM1, PDGFRB, ENG (CD73), as well as LEPR (Fig. 4A). However, the four stromal clusters in Group A could not be separated based on the expression of LEPR. Therefore, we chose not to use LEPR as a marker to prospectively isolate the different stromal cell types.

      The authors furthermore localized CD271+, CD81+ and NCAM/CD56+ cells in BM sections in situ. Finally, referring to the strong background of the group in HSC research, in silico prediction by CellPhoneDB identified a wide range of interactions between stromal cells and hematopoietic cells. Evidence for functional interdependence of FCU-F forming cells is completing the novel and more clear bone marrow stromal cell picture.

      We thank the reviewer for the positive comments.

      An illustrative abstract naming the top9 stromal identities in their top4 clusters by their "top10 markers" + functions would be highly appreciated.

      We thank the reviewer for the suggestion. A summary of the characteristics of stromal clusters is now shown in the new Supplementary File 7, which we hope matches the reviewer’s expectations.

      Reviewer #2 (Public Review):

      Knowledge about composition and function of the different subpopulations of the hematopoietic niche of the BM is limited. Although such knowledge about the mouse BM has been accumulating in recent years, a thorough study of the human BM still needs to be performed. The present manuscript of Li and coworkers fills this gap by performing single cell RNA sequencing (scRNAseq) on control BM as well as CD271+ BM cells enriched for non-hematopoietic niche cells.

      We apologize for delayed resubmission as it took a long time for a specific antibody to arrive to complete the confocal microscopy analyses. We thank the reviewer for the critical expert review and overall positive comments.

      Based on their scRNAseq, the authors propose 41 different BM cell populations, ten of which represented non-hematopoietic cells, including one endothelial cell cluster. The nine remaining skeletal subpopulations were subdivided into multipotent stromal stem cells (MSSC), four distinct populations of osteoprogenitors, one cluster of osteoblasts and three clusters of pre-fibroblasts. Using bioinformatic tools, the authors then compare their results and divisions of subpopulations to some previously published work from others and attempt to delineate lineage relationships using RNA velocity analyses. From these, they propose different paths from which MSSC enter the progenitor stages, and might differentiate into pre-osteoblasts and -fibroblasts.

      It is of interest to note, that apparently adipo-primed cells may also differentiate into osteolineage cells, something that should be further explored or validated. Furthermore, although this analysis yields a large adipo-primed populations, pre-adipocytes and mature adipocytes appear not to be included in the data set the authors used, which should also be explained.

      We thank the reviewer for this comment. We chose to annotate Cluster 5 as adipoprimed cluster based on the higher expression of adipogenic differentiation markers as well as a group of stress-related transcription factors (FOS, FOSB, JUNB, EGR1) (Fig. 2B-C, Figure 2-figure supplement 1C) some of which had been shown to mark bone marrow adipogenic progenitors1. Although at considerably lower levels compared to adipogenic genes, osteogenic genes were also expressed in cluster 5 cells (Fig. 2B and D), indicating the multi-potent potential of this cluster. Therefore, our initial annotation of these cells as adipoprimed progenitors was too narrow as it did not include the possible osteogenic differentiation potential. We apologize for the confusion caused by the inappropriate annotation and, in order to avoid any further confusion, cluster 5 has now been re-annotated as ‘highly adipocytic gene-expressing progenitors (HAGEPs), which we believe is a better representation of the cells. We furthermore agree with the reviewer that in-vivo differentiation needs to be performed to address potential differentiation capacities in future studies.

      With regard to the lack of adipocytes in our data set, we described in the Materials and Methods section that human bone marrow cells were isolated based on density gradient centrifugation. After centrifugation, the mononuclear cell-containing monolayers were harvested for further analysis. However, the resulting supernatant containing mature adipocytic cells was discarded14. Therefore, adipocyte clusters were not identified in our dataset. We have amended the manuscript accordingly (page 5, line 7).

      Regarding the pre-adipocytes, we are not aware of any specific markers for pre-adipocytes in the bone marrow. We examined the only known markers (ICAM1, PPARG, FABP4) that have been shown to mark committed pre-adipocytes in human adipose tissue15. As illustrated in Fig. R1 (below), low expression of all three markers was not restricted to a single distinct cluster but could be found in almost all stromal clusters. These data thus allow us to neither confirm nor exclude the presence of pre-adipocytes in the dataset. Due to the lack of specific markers for pre-adipocytes and the absence of mature adipocytes in the current dataset, it is therefore difficult to identify a well-defined pre-adipocytes cluster.

      Figure R1. UMAP illustration of the normalized expression of the markers for pre-adipocytes in stromal clusters.

      In addition, based on a separate analysis of surface molecules, the authors propose new markers that could be used to prospectively isolate different human subpopulations of BM niche cells by using CD52, CD81 and NCAM1 (=CD56). Indeed, these analyses yield six different populations with differential abilities to form fibroblast-like colonies and differentiate into adipo-, osteo-, and chondrogenic lineages. To explore how the scRNAseq data may help to understand regulatory processes within the BM, the authors predict possible interactions between hematopoietic and non-hematopoietic subpopulations in the BM. These should be further validated, to support statements as the suggestion in the abstract that separate CXCL12- and SPP1-regulated BM niches might exist.

      We agree with the reviewer that functional validation of the CellPhoneDB results using for example in vivo humanized mouse models would be needed to demonstrate the presence of different niches in the bone marrow. At this point of time we only put forward the hypothesis that different niche types exist while we will work on providing experimental proof in our future studies.

      The scRNAseq analysis is indeed a strong and important resource, also for later studies meant to increase knowledge about the hematopoietic niche of the BM. Although the analyses using different bioinformatic tools is very helpful, they remain mostly speculative, since validatory experiments, as already mentioned, are missing. As such, I feel the authors did not succeed in achieving their goals of understanding how non-hematopoietic cells of the BM regulate the different hematopoietic processes within the BM. Nevertheless, they have created valuable resources, both in the scRNAseq data they generated, as well as the different predictions about different cell populations, their lineage relationships, and how they might interact with hematopoietic cells.

      We thank the reviewer for the appreciation of the value of this dataset. We agree with the reviewer that it is of great importance to validate the contribution of potential driver genes for stromal cell differentiation and verify the in vitro data and in-silico prediction using in-vivo models. As the main goal of the current study was to formulate hypotheses based on the scRNAseq data for future studies, we believe that in vivo validation experiments using engineered human bone marrow models or humanized bone marrow ossicles are out of the scope of the current study, but certainly need to be performed in the future.

      The impact of this work is difficult to envision, since validations still need to be performed. Also, it has the born in mind that humans are not mice, which can be studied in neat homogeneous inbred populations. Human populations on the other hand, are quite diverse, so that the data generated in this manuscript and others will probably have to be combined to extrapolate data relevant to the whole of the human population. However, as it is equally difficult to generate reliable scRNAseq data from human BM, it seems likely that the data will indeed an important resource, when more data from different donors become available.

      We thank the reviewer for the generally positive evaluation of this study.

      Taken at point value, the authors provide evidence that human counterparts exist to several BM populations described in mice. In my opinion, the lineage relationships predicted using the RNA velocity analyses need more substance, as it seems the differentiation-paths may diverge from what is known from mice. If so, this issue should be studied more stringently. Similarly, the paper would have been strengthened considerably if a relevant experimental validation would have been attempted, perhaps by using genetically modified (knockdown) MSSC, similar to Battula et al. (doi: 10.1182/blood-2012-06-437988).

      In the study from Welner’s group, stromal differentiation trajectory was inferred based on scRNAseq analysis of murine bone marrow cells using Velocyto16. Velocyto identified MSCs as the ‘source’ cell state with pre-adipocytes, pro-osteoblasts, and prochondrocytes being end states. In our study, the MSSC population was predicted to be at the apex of the trajectory and the pre-osteoblast cluster was placed close to the terminal state of differentiation, which is consistent with the murine study. However, different stromal cell types were identified in mice compared with humans. For example, we have identified prefibroblasts in our dataset which are absent in the murine study, while a well-defined murine pre-adipocyte population was not identified in our human dataset. Therefore, it is not surprising to find some discrepancies between human and murine stromal differentiation trajectories. Of course and as mentioned before, critical in-vivo functional validations need to be carried out to address these important issues in the future.

      In summary, this is a very interesting but also descriptive paper with highly important resources. However, to prospectively identify or isolate human non-hematopoietic/nonendothelial niche populations, more stringent validations should have been performed to strengthen the validity of the different analyses that have been performed. As such, it remains an open question which niche subpopulations has the most impact on the different hematopoietic processes important for normal and stress hematopoiesis, as well as malignancies.

      Thank you for this comment. We completely agree that more stringent validations are necessary but are outside of the aim of our current hypothesis-generating study. Accordingly, we are planning functional verification studies using genetically manipulated stromal cells in combination with in-vivo humanized ossicles. Furthermore, other groups will hopefully use our database and contribute with functional studies in model systems that are currently not available to us, e.g. iPS-derived bone marrow in-vitro proxies.

      Specific remarks

      • Since CD45, CD235a, and CD271 are used as distinguishing markers in the sample preparation of the scRNAseq, it would be helpful to highlight these markers in the different analyses (Figures 1D, 2B, 2C-F, and 4A), and restrict the analyses to those cells that also not express CD45, CD235a (why use CD71?) and highly express CD271.

      Thank you for this comment. As shown in Fig. R2, we have modified figures Fig. 1D, 2B, and 4A showing now also the expression of PTPRC (CD45), GYPA (CD235a), and NGFR (CD271) on the top (Fig. 1D and 2B) or right (Fig. 4A) panel of the figures. To complement Fig. 2C-F, we have generated new stacked violin plots showing the expression level of three markers by all 9 stromal clusters (Fig. R2B). As we believe that including these three markers in the figures does not provide a better strategy to improve the analyses, we decided to leave the original figures unchanged in this respect.

      Figure R2. (A) Modified Fig. 1D, 2B and 4A with PTPRC (CD45), GYPA (CD235a) and NGFR (CD271) expression. (B) Stacked violin plots of PTPRC, GYPA and NGFR expressed by stromal clusters to complement Fig. 2C-F.

      With regard to cell exclusion based on CD45, as shown in the modified Figure corresponding to Fig 1A in the manuscript (Fig R2A), CD45 gene expression is observed also in the endothelial cluster, basal cluster, and neuronal cluster (Fig. R2A). These clusters represent non-hematopoietic clusters that we would like to keep in our dataset for further analysis, such as cell-cell interaction. Therefore, we choose to not restrict the analysis to solely CD45 nonexpressing cells.

      With regard to CD235a (GYPA), expression of CD235a is not detected in any of the nonhematopoietic clusters. Thus, CD235a-expressing cell exclusion is not necessary.

      For CD271, according to our previous results (own unpublished data, belonging to a dataset of which only significantly expressed genes were reported in Li et al.8), protein expression of CD271 is not necessarily reflected by gene expression. In the other words, stromal cells with CD271 protein expression do not always have high mRNA expression. A significant fraction of stromal cells would be excluded if we restrict the analyses only to those cells that show high CD271 gene expression, which would not reflect the real cellular composition of human bone marrow stroma. In order to not risk losing stromal cells, we therefore kept our previous analyses which included stromal cells with various CD271 expression levels.

      With regard to using CD71 as an exclusion marker, please see also the comments to reviewer 1. Briefly, according to our data, CD71 (TFRC)-expressing erythroid precursors could still be found after excluding CD45 and CD235a positive cells (Figure 1-figure supplement 1B and R3). As furthermore shown in Figure 1-figure supplement 1G and R2, CD71 expression in the stromal clusters is negligible. Therefore, we believe that this justifies the use of CD71 as an additional marker to exclude erythroid cells. We have amended the discussion to address this issue (page 19, lines 7-8).

      Figure R3. FACS plots illustrating the expression of (A) CD71 (TFRC) vs CD271 in CD45- CD235a- cells and (B) FSC-A vs CD81 in CD45-CD235a-CD271+CD71+ cells following exclusion of doublets and dead cells.

      • Despite a distinct neuronal cluster (39), there does not seem to be a distinctive marker for these cells. Is this true?

      Yes, the reviewer is correct that there is no significantly-expressed distinctive marker for neuronal cells. Multiple markers indicating the presence of different cell types were identified in cluster 39 (Supplementary File 4). Among them, several neuronal markers (NEUROD1, CHGB, ELAVL2, ELAVL3, ELAVL4, STMN2, INSM1, ZIC2, NNAT) were found to be enriched in this cluster (Supplementary File 4 and Fig. 1D) with higher fold changes compared to other identified genes. However, the expression of these genes was not statistically significant, which is mainly due to the heterogeneity of the cluster and thus does not allow us to draw any firm conclusions.

      Several genes including MALAT1, HNRNPH1, AC010970.1, and AD000090.1 were identified to be statistically highly expressed by cluster 39 (Supplementary File 4). The expression of these genes is not restricted to any specific cell type. It is therefore impossible to annotate the cluster based on this and our data thus indicated that cluster 39 is a heterogeneous population containing multiple cell types. Based on the expression of neuronal markers, we nevertheless chose to annotate Cluster 39 as “neuronal” as the prominent expression of neuronal markers indicated the presence of neurons in this cluster. To be more accurate, the annotation of cluster 39 has been changed to ‘neuronal cell-containing cluster’ to correctly reflect the presence of non-neuronal gene expressing cells as well (page 29, lines 3-8).

      • Since based on 2C and 2D, the authors are unable to distinguish adipo- from osteogenic cells, would the authors use the same molecules to distinguish different populations of 2C-D, or would they use other markers, if so which and why.

      We agree with the reviewer that at the first glance adipo-primed (cluster 5, now annotated as “highly adipocytic gene-expressing progenitors”, HAGEPs), balanced progenitors (cluster 16), and pre-osteoblasts (cluster 38) shared a similar expression pattern according to the violin plots in Fig. 2C and 2D. However, as illustrated in the heatmap (Fig. 2B), the expression patterns of adipo-primed (HAGEP) and balanced progenitors were quite different in terms of their expression of adipogenic and osteogenic markers. Both adipogenic and osteogenic marker expression was detected in HAGEPs, balanced progenitors, and preosteoblasts. Thus, as violin plots are summarizing the overall expression levels of a certain marker in a certain cluster, these plots tend to make it more difficult to detect differential expression patterns between different clusters. In this case, the heatmap shown in Fig. 2B is a good complement to the violin plots as it is demonstrating the different expression patterns of every cell in the different stromal clusters.

      Additionally, cluster 5 showed the expression of a group of stress-related transcription factors (FOS, FOSB, JUNB, EGR1) (Fig. 2B and Figure 2-figure supplement 1C), some of which had been shown to mark bone marrow adipogenic progenitors1. The expression of the abovementioned stress-related transcription factors (putative adipogenic progenitor markers) was generally lower in cluster 38 compared to cluster 5, further demonstrating that clusters were different.

      Furthermore, there was a gradual upregulation of more mature osteogenic markers such as RUNX1, CDH11, EBF1, and EBF3 from cluster 5 to cluster 16 and finally cluster 38. As shown in Fig. 2D, the expression of these markers was higher in cluster 38 compared to cluster 5. Therefore, cluster 38 was annotated as pre-osteoblasts.

      Most of the stromal clusters form a continuum (Fig. 2A), which correlates very well with the gradual transition of different cellular states during stromal cell development. It is highly unlikely that abrupt and dramatic gene expression changes would occur during the cellular state transition of cells of the same lineage. Therefore, it is not surprising to find the differences in gene expression profiles between stromal clusters share a certain level of similarities.

      In summary, we rely on several factors to distinguish different stromal clusters, which include canonical adipo-, osteo- and chondrogenic markers, stress markers, heatmap, violin plots, and the gradual up-regulation of certain lineage-specific markers.

      To directly answer the reviewer’s question, we believe that we are able to distinguish different stromal clusters based on our data.

      • In de Jong et al., an inflammatory MSC population (iMSC) is defined. Since the Schneider group showed that inflammatory S100A8 and A9 are expressed by inflamed MSC, is it possible that the some of the designated pre-fibroblasts actually correspond to these S100A8/A9-expressing iMSC?

      We thank the reviewer for raising this interesting question.

      First of all, we would like to point out that scRNAseq was performed using viably frozen bone marrow aspirates in de Jong’s study while freshly isolated bone marrows were used in our study. There might be discrepancies between frozen and fresh bone marrow samples in terms of cellular composition including stromal composition and, importantly, processinginduced stress-related gene expression profiles.

      To investigate if designated pre-fibroblasts actually correspond to iMSCs as suggested by the reviewer, we have re-examined the expression of some of the key iMSC genes as reported by de Jong et al 17. As shown in Fig. R6, the markers that can distinguish iMSC from other MSC clusters in de Jong et al. study were not exclusively expressed by pre-fibroblasts, but also by other stromal cell types including HAGEPs, balanced progenitors, and pre-osteoblasts.

      In the study by R. Schneider’s group18, significant upregulation of S100A8/S100A9 was observed in stromal cells from patients with myelofibrosis. Furthermore, base-line expression of S100A8/A9 was also observed in the fibroblast clusters in the control group, which correlates very well with our data of S100A8/9 expression in pre-fibroblasts in normal donors (Fig. 2F). Our data thus indicate – in line with Schneider’s findings - that there is a baseline level expression of S100A8/9 in fibroblasts in hematologically normal samples and that the expression of S100A8/9 is not restricted to inflamed MSC.

      In summary, the gene expression profiles observed in our study do not indicate the presence of iMSC in the healthy bone marrow.

      • Figure 3A: Do human adipo-primed cells (cluster 5) indeed differentiate into osteogenic cells (clusters 6, 38, and 39). This would be highly unexpected. Can the authors substantiate this "reliable outcome of the RNA velocity analysis"?

      Please refer to our previous responses regarding this topic. Briefly, as shown in Fig. 2B and D, both osteogenic and adipogenic genes are expressed in cluster 5, indicating the multi-potent potentials of this cluster. Although the cluster was initially annotated as adipo-primed progenitors, this was not intended to exclude the osteogenic differentiation potential of these progenitors. Nevertheless, this annotation did not correctly reflect the differentiation potential and might thus have caused confusion, for which we apologize. In order to more correctly describe the characteristics of these cells, cluster 5 has now been reannotated as ‘highly adipocytic gene-expressing progenitors (HAGEPs)’.

      In general, the outcome of the RNA velocity analysis needs to be corroborated by in-vivo differentiation experiments. But we believe that functional verification, which would be extensive, is out of the scope of the current study and we will address these questions in future studies.

      • How statistically certain are the authors, that the populations in Figure 4B as defined by flow cytometry, correspond to MSSC, adipo-primed cells, osteoprogenitors, etc., as defined by scRNAseq?

      To address this question, we sorted the A1-A4 populations and performed RT- PCR to examine the CD81 expression level in each cluster. As shown in Figure 4-figure supplement 1B, CD81 expression levels were higher in A1 and A2 compared with A3 and A4, which is consistent with the scRNAseq data that showed the highest CD81 expression in MSSCs compared to other clusters (Supplementary File 4).

      The phenotypes defined in this study allowed us to isolate different stromal cell types which demonstrated significant functional differences as described in the manuscript (page 19, lines 17-25; page 20, lines 1-11). These results, in combination with the quantitative real-time PCR results (Figure 4-figure supplement 1B), demonstrated that the A1-A4 subsets in FACS are functionally distinct populations and are likely to be – at least in large parts – identical or equivalent to the transcriptionally identified clusters in group A stromal cells. However, at this point, we do not have performed the required experiments (scRNAseq of sorted cells) that would provide sufficient proof to confirm this statement statistically.

      • The immunohistochemistry results shown do not allow distinct conclusions as the colors give unequivocal mix-colors, and surface expression cannot be distinguished from intracellular expression. Please use a 3D (confocal) method for such statements.

      We thank the reviewer for the suggestion and we have performed additional confocal microscopy analysis of human bone marrow biopsies as suggested by the reviewer. Representative confocal images are now presented in the middle and right panel of Fig. 6E. We also include a separate file (Supplemental confocal image file). Here, confocal scans of all maker combinations are shown as ortho views in addition to detailed intensity profile analyses of the cells of interest clearly distinguishing surface staining from intracellular staining.

      Confocal analysis of bone marrow biopsies confirmed our findings presented in the manuscript. As observed in the scanning images, CD271-expressing cells were negative for CD45 and were located in perivascular, endosteal, and peri-adipocytic regions. CD271/CD81double positive cells could be found either in the peri-adipocytic regions or perivascular regions while CD271/NCAM1 double-positive cells were exclusively situated at the bone-lining endosteal regions. The results of the confocal analysis have been added to the revised manuscript (page 21, lines 15-17).

      • Figure 5A: as all cells seem to interact with all other cells, this figure does not convey relevant information about BM regions using for instance CXCL12 or SPP1. Please reanalyze to show specificity of the interactions of the single clusters. Also, since it is unlikely the CellPhoneDB2-predicted interactions are restricted to hematopoietic responders, please also describe the possible interactions between non-hematopoietic cells.

      Fig. 5A was used to demonstrate the complexity of the interactions between hematopoietic cells and stromal cells.

      To gain a more detailed understanding of the interactions, we also performed an analysis with the top-listed ligand-receptor pairs as shown in Fig. 5B-C and Figure 5-figure supplement 1B. Here, each dot represents the interaction of a specific ligand-receptor pair listed on the x-axis between the two individual clusters indicated in the y-axis, which we believe shows what the reviewer is asking for.

      The specificity of the interactions between single clusters were shown in Fig. 5B-C and Figure 5-figure supplement 1B. The CXCL12- and SPP1-mediated interactions between MSSC/OC and hematopoietic clusters clearly suggested stromal cell type-specific interactions.

      Regarding non-hematopoietic cells, both inter- and intra-stromal interactions were identified to be operative between different stromal subsets as well as within the same stromal cell population as shown in Figure 5-figure supplement 3B. In addition, we have also analyzed the interaction pattern between endothelial cells and hematopoietic cells as shown in Fig. 7A, and thus we believe that we have sufficiently described these interactions as requested by the reviewer.

    1. Author Response

      Reviewer #2 (Public Review):

      This study identifies the neural circuits inhibited by activation of opioid receptors using complex experimental approaches such as electrophysiology, pharmacology, and optogenetics and combined them with retrograde and anterograde tracings. The authors characterize two key regions of the brainstem, the preBötzinger Complex, and the Kolliker-Fuse, and how these neuronal populations interact. Understanding the interactions of these circuits substantially increases our understanding of the neural circuits sensitive to opioid drugs which are critical to understand how opioids act on breathing and potentially design new therapies.

      Major strengths.

      This study maps the excitatory projections from the Kolliker-Fuse to the preBötzinger Complex and rostral ventral respiratory group and shows that these projections are inhibited by opioid drugs. These Kolliker-Fuse neurons express FoxP2, but not the calcitonin gene-related peptide, which distinguishes them from parabrachial neurons. In addition, the preBötzinger Complex is also hyperpolarized by opioid drugs. The experiments performed by the authors are challenging, complex, and the most appropriate types of approaches to understanding pre- and post-synaptic mechanisms, which cannot be studied in vivo. These experiments also used complex tracing methods using adenoassociated virus and cre-lox recombinase approaches.

      Limitations.

      (1) The roles of the mechanisms identified in this study have not been established in models recording opioid-induced respiratory depression or respiratory activity. This study does not record, modulate, or assess respiratory activity in-vitro or in-vivo, without or with opioid drugs such as fentanyl or morphine.

      (2) Experiments are performed in-vitro which do not mimic the effects of opioids observed in-vivo or in freely-moving animals. However, identification of pre- and post- synaptic mechanisms, as well as projections, cannot be performed in-vivo, so the authors use the right approaches for their experiments.

      We agree with both of these points. We hope this study lays the groundwork for future studies assessing the impact of these projections on respiratory activity in vitro and in vivo.

      (3) The type of neurons projecting from KP to preBötzinger Complex or ventral respiratory group have not been identified. Although some of these cells are glutamatergic, optogenetic experiments could have been performed in other cre-expressing cell populations, such as neurokinin-1 receptors.

      There are indeed many different cell populations that could be interrogated. In addition to the optogenetic identification of glutamatergic projections, we identified immunohistochemically that at least some opioid receptor-expressing, medullary-projecting KF neurons express FoxP2, and not CGRP. Further dissection of other cell populations, such as Lmx1b and Phox2b, are excellent future directions.

      Reviewer #3 (Public Review):

      This manuscript reveals opioid suppression of breathing could occur via multiple mechanisms and at multiple sites in the pontomedullary respiratory network. The authors show that opioids inhibit an excitatory pontomedullary respiratory circuit via three mechanisms: 1) postsynaptic MOR-mediated hyperpolarization of KF neurons that project to the ventrolateral medulla, 2) presynaptic MOR mediated inhibition of glutamate release from dorsolateral pontine terminals onto excitatory preBötC and rVRG neurons, and 3) postsynaptic MOR-mediated hyperpolarization of the preBötC and rVRG neurons that receive pontine glutamatergic input.

      This manuscript describes in detail a useful method for dissecting the relationship between the dorsolateral pons and the rostral medulla, which will be useful for various researchers. It's also great to see how many different methods have been applied to improve the accuracy of the results.

      1. Relationship between the dorsolateral pons and rostral ventrolateral medulla.

      The method of this paper is a good paper to show a very precise relationship between the presence of opioid receptors and the dorsolateral pons and rostral ventrolateral medulla, and for opioid receptors, based on the expression of Oprm1, the use of genetically modified mice with anterograde or retrograde viruses with additional fluorescent colors showed both anterograde and retrograde projections, revealing a relationship between the dorsolateral pons and rostral ventrolateral medulla.

      For example, to visualize dorsal pontine neurons expressing Oprm1, Oprm1Cre/Cre mice were crossed with Ai9tdTomato Cre reporter mice to generate Ai9tdT/+ oprm1Cre/+ mice (Oprm1Cre/tdT mice) expressing tdTomato on neurons that also express MOR at any point during development, and the retrograde virus encoding Cre-dependent expression of GFP (retrograde AAV-hSIN-DIO-eGFP was injected into the respiratory center of Oprm1Cre/+ mice and into the ventral respiratory neuron group, showing that KF neurons expressing Oprm1 project to the respiration-related nucleus of the ventrolateral medulla.

      However, although the authors have also corrected it, the virus may spread to other places as well as where they thought it would be injected, and it is important to note that it is injected accordingly to mark the injection site with an anterograde virus encoding a different fluorescent color mCherry, and the extent of the injection is quantified, which is excellent as a control experiment.

      In addition, the respiratory center seems to be related not only to preBötC but also to pFRG recently, so if the relation with it is described, it is important from the viewpoint of the effect on the respiratory center and the effect on the rhythm.

      Our injections centered in preBotC, rVRG or BötC did not spread extensively to slices containing 7N/pFRG (Figure 2C and Figure 2-supplement 1D, Bregma -6.0 to -6.4, shaded region labeled 7N).

    1. Author Response:

      eLife assessment

      This manuscript analyzes large-scale Neuropixels recordings from visual areas and hippocampus of mice passively viewing repeated clips of a movie and reports that neurons respond with elevated firing activities to specific, continuous sequences of movie frames. The important results support a role of rodent hippocampal neurons in general episode encoding and advance understanding of visual information processing across different brain regions. The strength of evidence for the primary conclusion is solid, but some technical limitations of the study were identified that merit further analyses.

      We thank the editors and reviews for the assessment and reviews. We have provided clarifications and updated the manuscripts to address the seeming technical limitations that are perhaps due to some misunderstanding, please see below. We provide additional results that isolate the contribution of pupil diameter, sharpwave ripple and theta power to show that movie tuning cannot be explained by these nonspecific effects. Nor are these mere time cells or some other internally generated patterns due to many differences highlighted below.

      Reviewer #1 (Public Review):

      Taking advantage of a publicly available dataset, neuronal responses in both the visual and hippocampal areas to passive presentation of a movie are analyzed in this manuscript. Since the visual responses have been described in a number of previous studies (e.g., see Refs. 11-13), the value of this manuscript lies mostly on the hippocampal responses, especially in the context of how hippocampal neurons encode episodic memories. Previous human studies show that hippocampal neurons display selective responses to short (5 s) video clips (e.g. see Gelbard-Sagiv et al, Science 322: 96-101, 2008). The hippocampal responses in head-fixed mice to a longer (30 s) movie as studied in this manuscript could potentially offer important evidence that the rodent hippocampus encodes visual episodes.

      We have now included citations to Gelbard-Sagiv et al. Science 2008 paper and many other references too, thank you for pointing that out. There are major differences between that study and ours.

      • The movies used in previous study contained very familiar, famous people and famous events, and the experiment was about the patient’s ability to recall those famous movie episodes. In our case the mice had seen this movie clip only twice before.

      • They did not look at the fine structure of neural responses below half a second whereas we looked at the mega-scale representations from 30ms to 30s.

      • The movie clips in that study were in full color with audio, we used an isoluminant, black-and-white, silent movie clip.

      • Their movie clips contained humans and was observed by humans, whereas our study mice observed a movie clip with humans and no mice or other animals.

      The analysis strategy is mostly well designed and executed. A number of factors and controls, including baseline firing, locomotion, frame-to-frame visual content variation, are carefully considered. The inclusion of neuronal responses to scrambled movie frames in the analysis is a powerful method to reveal the modulation of a key element in episodic events, temporal continuity, on the hippocampal activity. The properties of movie fields are comprehensively characterized in the manuscript.

      Thank you.

      Although the hippocampal movie fields appear to be weaker than the visual ones (Fig. 2g, Ext. Fig. 6b), the existence of consistent hippocampal responses to movie frames is supported by the data shown. Interestingly, in my opinion, a strong piece of evidence for this is a "negative" result presented in Ext. Fig. 13c, which shows higher than chance-level correlations in hippocampal responses to same scrambled frames between even and odd trials (and higher than correlations with neighboring scrambled frames). The conclusion that hippocampal movie fields depend on continuous movie frames, rather than a pure visual response to visual contents in individual frames, is supported to some degree by their changed properties after the frame scrambling (Fig. 4).

      Yes, hippocampal selectivity is not entirely abolished with scrambled movie, as we show in several figures (Fig 4d,g and Extended Data Fig. 16), but it is greatly reduced, far more than in the afferent visual cortices. The fraction of tuned cells for scrambled movies dropped to 4.5% in hippocampus, which is close to the chance level of 3%. In contrast, in visual areas selectivity was still above 80%.

      Significant overlap between even and odd trials is to be expected for the tuned cells. Without a significant overlap, i.e. a stable representation, they will not be tuned. Despite this, the correlation between even and odd trials for the (only 4.5% of) tuned cells in the hippocampus was more than 2-fold smaller than (more than 80% of) cells in visual cortices. This strongly supports our hypothesis that unlike visual cortices, hippocampal subfields depended very strongly on the continuity of visual information. We will clarify this in the main text.

      However, there are two potential issues that could complicate this main conclusion.

      One issue is related to the effect of behavioral variation or brain state. First, although the authors show that the movie fields are still present during low-speed stationary periods, there is a large drop in the movie tuning score (Z), especially in the hippocampal areas, as shown in Ext. Fig. 3b (compared to Ext. Fig. 2d). This result suggests a potentially significant enhancement by active behavior.

      There seems to be some misunderstanding here. There was no major reduction in movie tuning during immobility or active running. As we wrote in the manuscript, the drop in selectivity during purely immobile epochs is because of reduction in the amount of data, not reduction in selectivity per se. Specifically, as the amount data reduces, the statistical strength of tuning (z-scored sparsity) reduces. For example, if we split the total of 60 trials worth of data into two parts, the amount of data reduces to about half in each part, leading to a seeming reduction in selectivity in both halves. Extended figure 2B shows nearly identical tuning in all brain regions during immobility and equivalent subsamples chosen randomly from the entire data, including mobility and immobility. We will include additional data in the revised manuscript to demonstrate this more clearly. Please see below for more details.

      Second, a general, hard-to-tackle concern is that neuronal responses could be greatly affected by changes in arousal or brain state (including drowsy or occasional brief slow-wave sleep state) in head-fixed animals without a task. Without the analysis of pupil size or local field potentials (LFPs), the arousal states during the experiment are difficult to know.

      In the revised manuscript we will that the behavioral state effects cannot explain movie tuning. Specifically:

      • We compare sessions in which the mouse was mostly immobile versus sessions in which the mouse was mostly running. Movie tuned cells were found in both these cases (Extended Data Fig. 7).

      • b. We detect and remove all data around sharp-wave ripples (SWR). Movie tuning was unchanged in the remaining data.

      • c. As a further control, we quantified arousal by two standard metrics. First within a session, we split the data into two groups, segments with high theta power and segments with low theta power. Significant movie tuning persisted in both.

      • d. Finally, pupil dilation is another common method to estimate arousal, so data within a session were split into two parts: those with pupil dilation versus constriction. Movie tuning remained significant in both parts. See the new Extended Data Fig. 7.

      Many example movie fields in the presented raw data (e.g., Fig. 1c, Ext. Fig. 4) are broad with low-quality tuning, which could be due to broad changes in brain states. This concern is especially important for hippocampal responses, since the hippocampus can enter an offline mode indicated by the occurrence of LFP sharp-wave ripples (SWRs) while animals simply stay immobile. It is believed that the ripple-associated hippocampal activity is driven mainly by internal processing, not a direct response to external input (e.g., Foster and Wilson, Nature 440: 680, 2006). The "actual" hippocampal movie fields during a true active hippocampal network state, after the removal of SWR time periods, could have different quantifications that impact the main conclusion in the manuscript.

      We included the broadly tuned hippocampal neurons to demonstrate the movie-field broadening compared to those in visual areas. We will include more examples with sharp movie fields in the hippocampal regions (Main figure 1a-d right column, 2d and h, Extended Data Fig 5 and 8). Further, as stated above, we detected sharp-wave ripples and removed one second of data around SWR. Move tuning was unchanged in the remaining data. Thus, movie tuning is not generated internally via SWR (Extended Data Fig. 6). See also Extended Data 7 and 8 and the response above.

      Another issue is related to the relative contribution of direct visual response versus the response to temporal continuity in movie fields. First, the data in Ext. Fig. 8 show that rapid frame-to-frame changes in visual contents contribute largely to hippocampal movie fields (similarly to visual movie fields).

      There seems to be some misunderstanding here. That figure showed that the frame-toframe changes in the visual content had the highest effect on visual areas MSUA and much weaker in hippocampus (Extended Data Fig. 8, as per previous version). For example, the depth of modulation (max – min) / (max + min) for MSUA was 21% and 24% for V1 but below 6% for hippocampal regions. Similarly, the MSUA was more strongly (negatively) correlated with F2F correlation for visual areas (r=0.48 to 0.56) than hippocampal (0.07 to 0.3). Similarly, comparing the number of peaks or their median widths, visual regions showed stronger correlation with F2F, and largest depth of modulation than hippocampal regions, barring handful exceptions (like CA3 correlation between F2F and median peak duration). This strongly supports our claim that visual regions generated far greater response of the frame-to-frame changes in the movie than hippocampal regions.

      Interestingly, the data show that movie-field responses are correlated across all brain areas including the hippocampal ones.

      The changes in multiunit activity are strongly correlated only between visual areas and some of the hippocampal region pairs. The correlation is much weaker for hippocampal areas, or hippocampal-visual area pairs. This will be quantified explicitly in the revised text Extended Data Fig. 11 with an additional correlation matrix. Further, in Fig 3c we compared the MSUA responses with normalization between brain regions. Amongst the 21 possible brain region pairs, 5 were uncorrelated, 7 were significantly negatively correlated and 9 were significantly correlated.

      This could be due to heightened behavioral arousal caused by the changing frames as mentioned above, or due to enhanced neuronal responses to visual transients, which supports a component of direct visual response in hippocampal movie fields.

      As shown in Extended data 7 and 8 and described above, the effect of arousal as quantified by theta power of pupil diameter cannot explain the results in hippocampal areas and the correlations in multiunit responses are unrelated across many brain areas.

      Second, the data in Ext. Fig. 13c show a significant correlation in hippocampal responses to same scrambled frames between even and odd trials, which also suggests a significant component of direct visual response.

      This is plausible. The fraction of hippocampal cells which were significantly tuned for the scrambled presentation (4.5%) was close to chance level (3%), and this small subset of cells was used to compute the population overlap between even and odd trials in Ext Fig. 13 (old numbering). As described above, this significant but small amount of tuning could generate significant population overlap, which is to be expected by construction.

      Is there a significant component purely due to the temporal continuity of movie frames in hippocampal movie fields? To support that this is indeed the case, the authors have presented data that hippocampal movie fields largely disappear after movie frames are scrambled. However, this could be caused by the movie-field detection method (it is unclear whether single-frame field could be detected).

      As described in the methods section, the movie-field detection algorithm had a resolution of 3.3ms resolution, which ensured that we could detect single frame fields. As reported, we did find such short movie fields in several cells in the visual areas. The sparsity metric used is agnostic to the ordering of the responses, and hence single frame field, and the resultant significant movie-tuning, if present, can be detected by our methods.

      Another concern in the analysis is that movie-fields are not analyzed on re-arranged neural responses to scrambled movie frames. The raw data in Fig. 4e seem quite convincing. Unfortunately, the quantifications of movie fields in this case are not compared to those with the original movie.

      We saw very few (3.6-4.9%) cells with significant movie tuning for scrambled presentation in the hippocampus. Hence, we did not quantify this earlier. This is now provided in new Extended Data Fig. 16. The amount of movie tuning for the scrambled presentation taken as-is, or after rearranging the frames is below 5% for all hippocampal brain regions.

      Reviewer #2 (Public Review):

      […] The authors have concluded that the neurons in the thalamo-cortical visual areas and the hippocampus commonly encode continuous visual stimuli with their firing fields spanning the mega-scale, but they respond to different aspects of the visual stimuli (i.e., visual contents of the image versus a sequence of the images). The conclusion of the study is fairly supported by the data, but some remaining concerns should be addressed.

      1) Care should be taken in interpreting the results since the animal's behavior was not controlled during the physiological recording.

      This was done intentionally since plenty of research shows that task demand (e.g., Aronov and Tank, Nature 2017) can not only modulate hippocampal responses but also dramatically alter them. We have now provided additional figures (Extended Data Fig. 6 and 7) where we quantified the effects of the behavioral states (sharp wave ripples, theta power and pupil diameter), as well as the effect of locomotion (Extended Data Fig. 4). Movie tuning remained unaffected with these manipulations. Thus, movie tuning cannot be attributed to behavioral effects.

      It has been reported that some hippocampal neuronal activities are modulated by locomotion, which may still contribute to some of the results in the current study. Although the authors claimed that the animal's locomotion did not influence the movie-tuning by showing the unaltered proportion of movie-tuned cells with stationary epochs only, the effects of locomotion should be tested in a more specific way (e.g., comparing changes in the strength of movie-tuning under certain locomotion conditions at the single-cell level).

      Single cell analysis of the effect of locomotion and visual stimulation is underway, and beyond the scope of the current work. As detailed in the (Extended Data Fig. 4), we have ensured that in spite of the removal of running or stationary epochs, as well as removal of sharp wave ripple events (Extended Data Fig. 6) movie tuning persists. Further, we will provide examples of strongly tuned cells from sessions with predominantly running or predominantly stationary behavior (Extended Data Fig. 7).

      2) The mega-scale spanning of movie-fields needs to be further examined with a more controlled stimulus for reasonable comparison with the traditional place fields. This is because the movie used in the current study consists of a fast-changing first half and a slow-changing second half, and such varying and ununified composition of the movie might have largely affected the formation of movie-fields. According to Fig. 3, the mega-scale spanning appears to be driven by the changes in frame-to-frame correlation within the movie. That is, visual stimuli changing quickly induced several short fields while persisting stimuli with fewer changes elongated the fields.

      Please note that a strong correlation between the speed at which the movie scene changed across frames was correlated with movie-field width in the visual areas, but that correlation was much weaker in the hippocampal areas (see above). Please see Extended Data Fig. 11 and the quantification of correlation between frame-to-frame changes in the movie and the properties of movie fields.

      The presentation of persisting visual input for a long time is thought to be similar to staying in one place for a long time, and the hippocampal activities have been reported to manifest in different ways between running and standing still (i.e., theta-modulated vs. sharp wave ripple-based). Therefore, it should be further examined whether the broad movie-fields are broadly tuned to the continuous visual inputs or caused by other brain states.

      As shown in Extended Data Fig. 6, movie field properties are largely unchanged when SWR are removed from the data, or when the effect of pupil diameter or theta power were factored for (Extended Data Fig.7).

      3) The population activities of the hippocampal movie-tuned cells in Fig. 3a-b look like those of time cells, tiling the movie playback period. It needs to be clarified whether the hippocampal cells are actively coding the visual inputs or just filling the duration.

      Tiling patterns would be observed when the maximal are sorted in any data, even for random numbers. This alone does not make them time cells. The following observations suggest that movie fields cannot be explained as being time cells.

      • a. Time cells mostly cluster at the beginning of a running epoch (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) and they taper off towards the end. Such large clustering is not visible in these tiling plots for movie tuned cells.

      • b. Time fields become wider as the temporal duration progresses (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) as the encoded temporal duration increases. This is not evident in any movie fields.

      • c. Widths of movie fields in visual areas, and to a smaller extent in the hippocampal areas, were clearly modulated by the visual content, like the change from one frame to the next (F2F correlation, Extended Data Fig. 11).

      • d. Tiling pattern of movie fields was found in visual areas too, with qualitatively similar pattern as hippocampus. Clearly, visual area responses are not time cells, as shown by the scrambled stimulus experiment. Here, neural selectivity could be recovered by rearranging them based on the visual content of the continuous movie, and not the passage of time.

      The scrambled condition in which the sequence of the images was randomly permutated made the hippocampal neurons totally lose their selective responses, failing to reconstruct the neural responses to the original sequence by rearrangement of the scrambled sequence. This result indirectly addressed that the substantial portion of the hippocampal cells did not just fill the duration but represented the contents and temporal order of the images. However, it should be directly confirmed whether the tiling pattern disappeared with the population activities in the scrambled condition (as shown in Extended Data Fig. 11, but data were not shown for the hippocampus).

      As stated above for the continuous movie, tiling pattern alone does not mean those are time cells. Further, tuning, and tiling pattern remained intact with scrambled movie in the visual cortices but not in hippocampus.

      Reviewer #3 (Public Review):

      […] The paper is conceptually novel since it specifically aims to remove any behavioral or task engagement whatsoever in the head-fixed mice, a setup typically used as an open-loop control condition in virtual reality-based navigational or decision making tasks (e.g. Harvey et al., 2012). Because the study specifically addresses this aspect of encoding (i.e. exploring effects of pure visual content rather than something task-related), and because of the widespread use of video-based virtual reality paradigms in different sub-fields, the paper should be of interest to those studying visual processing as well as those studying visual and spatial coding in the hippocampal system. However, the task-free approach of the experiments (including closely controlling for movement-related effects) presents a Catch-22, since there is no way that the animal subjects can report actually recognizing or remembering any of the visual content we are to believe they do.

      Our claim is that these are movie scene evoked responses. We make no claims about the animal’s ability to recognize or remember the movie content. That would require entirely different set of experiments. Meanwhile, we have shown that these results are not an artifact of brain states such as sharp wave ripples, theta power or pupil diameter (Extended Data Fig. 6 and 7) or running behavior (Extended Data Fig. 4). Please see above for a detailed response.

      We must rely on above-chance-level decoding of movie segments, and the requirement that the movie is played in order rather than scrambled, to indicate that the hippocampal system encodes episodic content of the movie. So the study represents an interesting conceptual advance, and the analyses appear solid and support the conclusion, but there are methodological limitations.

      It is important to emphasize that these responses could constitute episodic responses but does not prove episodic memory, just as place cell responses constitute spatial responses but that does not prove spatial memory. The link between place cells and place memory is not entirely clear. For example, mice lacking NMDA receptors have intact place cells, but are impaired in spatial memory task (McHugh et al. Cell 1996), whereas spatial tuning was virtually destroyed in mice lacking GluR1 receptors, but they could still do various spatial memory tasks (Resnik et al. J. Neuro 2012). The experiments about episodic memory would require an entirely different set of experiments that involve task demand and behavioral response, which in turn would modify hippocampal responses substantially, as shown by many studies. Our hypothesis here, is that just like place cells, these episodic responses without task demand would play a role, to be determined, in episodic memory. We will emphasize this point in the main text (Ln 432-436 in the revised manuscript).

      Major concerns:

      1) A lot hinges on hinges on the cells having a z-scored sparsity >2, the cutoff for a cell to be counted as significantly modulated by the movie. What is the justification of this criterion?

      The z-scored sparsity (z>2) corresponds to p<0.03. This would mean that 3% of the results could appear by chance. Hence, z>2 is a standard method used in many publications. Another advantage of z-scored sparsity is that it is relatively insensitive to the number of spikes generated by a neuron (i.e. the mean firing rate of the neuron and the duration of the experiment). In contrast, sparsity is strongly dependent on the number of spikes which makes it difficult to compare across neurons, brain regions and conditions (See Supplement S5 Acharya et al. Cell 2016). To further address this point, we compared our z-scored sparsity measure with 2 other commonly used metrics to quantify neural selectivity, depth of modulation and mutual information (Extended Data Fig. 3). Comparable movie tuning was obtained from all 3 metrics, upon z-scoring in an identical fashion.

      It should be stated in the Results. Relatedly, it appears the formula used for calculating sparseness in the present study is not the same as that used to calculate lifetime sparseness in de Vries et al. 2020 quoted in the results (see the formula in the Methods of the de Vries 2020 paper immediately under the sentence: "Lifetime sparseness was computed using the definition in Vinje and Gallant").

      The definition of sparsity we used is used commonly by most hippocampal scientists (Treves and Rolls 1991, Skaggs et al. 1996, Ravassard et al. 2013). Lifetime sparseness equation used by de Vries et al. 2020, differs from us by just one constant factor (1-1/N) where N=900 is the number of frames in the movie. This constant factor equals (1- 1/900)=0.999. Hence, there is no difference between the sparsity obtained by these two methods. Further, z-scored sparsity is entirely unaffected by such constant factors. We will clarify this in the methods of the revised manuscript.

      To rule out systematic differences between studies beyond differences in neural sampling (single units vs. calcium imaging), it would be nice to see whether calculating lifetime sparseness per de Vries et al. changed the fraction "movie" cells in the visual and hippocampal systems.

      As stated above, the two definitions of sparsity are virtually identical and we obtained similar results using two other commonly used metrics, which are detailed in Extended Data Fig. 3.

      2) In Figures 1, 2 and the supplementary figures-the sparseness scores should be reported along with the raw data for each cell, so the readers can be apprised of what types of firing selectivity are associated with which sparseness scores-as would be shown for metrics like gridness or Raleigh vector lengths for head direction cells. It would be helpful to include this wherever there are plots showing spike rasters arranged by frame number & the trial-averaged mean rate.

      As shown in several papers (Aghajan et al Nature Neuroscience 2015, Acharya et al., Cell 2016) raw sparsity (or information content) are strongly dependent on the number of spikes of a neuron. This makes the raw values of these numbers impossible to compare across cells, brain regions and conditions. (Please see Supplement S5 from Acharya et al., Cell 2016 for details). Including the data of sparsity would thus cause undue confusion. Hence, we provide z-scored sparsity. This metric is comparable across cells and brain regions, and now provided above each example cell in Figure 1 and Extended Data Fig. 2.

      3) The examples shown on the right in Figures 1b and c are not especially compelling examples of movie-specific tuning; it would be helpful in making the case for "movie" cells if cleaner / more robust cells are shown (like the examples on the left in 1b and c).

      We did not put the most strongly tuned hippocampal neurons in the main figures so that these cells are representative of the ensemble and not the best possible ones, so as to include examples with broad tuning responses. We have clarified in the legend that these cells are some of the best tuned cells. Although not the cleanest looking, the z-scored sparsity mentioned above the panels now indicates how strongly they are modulated compared to chance levels. Additional examples, including those with sharply tuned responses are shown in Extended Data Fig. 5 and 8.

      4) The scrambled movie condition is an essential control which, along with the stability checks in Supplementary Figure 7, provide the most persuasive evidence that the movie fields reflect more than a passive readout of visual images on a screen. However, in reference to Figure 4c, can the authors offer an explanation as to why V1 is substantially less affected by the movie scrambling than it's main input (LGN) and the cortical areas immediately downstream of it? This seems to defy the interpretation that "movie coding" follows the visual processing hierarchy.

      This is an important point, one that we find very surprising as well. Perhaps this is related to other surprising observations in our manuscript, such as more neurons appeared to be tuned to the movie than the classic stimuli. A direct comparison between movie responses versus fixed images is not possible at this point due to several additional differences such as the duration of image presentations and their temporal history. The latency required to rearrange the scrambled responses (60ms for LGN, 74ms for V1, 91ms for AM/PM) supports the anatomical hierarchy. The pattern of movie tuning properties was also broadly consistent between V1 and AM/PM (Fig 2). However, all metrics of movie selectivity (Fig 2) to the continuous movie showed a consistent pattern that was the exact opposite pattern of the simple anatomical hierarchy: V1 had stronger movie tuning, higher number of movie fields per cell, narrower movie-field widths, larger mega-scale structure, and better decoding than LGN. V1 was also more robust to the scrambled sequence than LGN. One possible explanation is that there are other sources of inputs to V1, beyond LGN, that contribute significantly to movie tuning. This is an important insight and we will modify the discussion to highlight this.

      Relatedly, the hippocampal data do not quite fit with visual hierarchical ordering either, with CA3 being less sensitive to scrambling than DG. Since the data (especially in V1) seem to defy hierarchical visual processing, why not drop that interpretation? It is not particularly convincing as is.

      The anatomical organization is well established and an important factor. Even when observations do not fit the anatomical hierarchy, it provides important insights about the mechanisms. All properties of movie tuning (Fig 2) –the strength of tuning, number of movie peaks, their width and decoding accuracy firmly put visual areas upstream of hippocampal regions. But, just like visual cortex there are consistent patterns that do not support a simple feed-forward anatomical hierarchy. We have pointed out these patterns so that future work can build upon it.

      5) In the Discussion, the authors argue that the mice encode episodic content from the movie clip as a human or monkey would. This is supported by the (crucial) data from the scrambled movie condition, but is nevertheless difficult to prove empirically since the animals cannot give a behavioral report of recognition and, without some kind of reinforcement, why should a segment from a movie mean anything to a head-fixed, passively viewing mouse?

      We emphasize once again that our claim is about the nature of encoding of the movie across these neurons. We make no claims about whether this forms a memory or whether the mouse is able to recognize the content or remember it. Despite decades of research, similar claims are difficult to prove for place cells, with plenty of counter examples (See the points above). The important point here is that despite any cognitive component, we see remarkably tuned responses in these brain areas. Their role in cognition would take a lot more effort and is beyond the scope of the current work.

      Would the authors also argue that hippocampal cells would exhibit "song" fields if segments of a radio song-equally arbitrary for a mouse-were presented repeatedly? (reminiscent of the study by Aronov et al. 2017, but if sound were presented outside the context of a task). How can one distinguish between mere sequence coding vs. encoding of episodically meaningful content? One or a few sentences on this should be added in the Discussion.

      Aronov et al 2017, found the encoding of an audio sweep in hippocampus when the animals were doing a task (release the lever at a specific frequency to obtain a reward). However, without a task demand they found that hippocampal neurons did not encode the audio sequence beyond chance levels. This is at odds with our findings with the movie where we see strong tuning despite any task demand or reward. These results are consistent with but go far beyond our recent findings that hippocampal (CA1) neurons can encode the position and direction of motion of a revolving bar of light (Purandare et al. Nature 2022). Please see Ln 414-420 for related discussion.

      These responses are unlikely to be mere sequence responses since the scrambled sequence was also fixed sequence that was presented many times and it elicited reliable responses in visual areas, but not in hippocampus. Hence, we hypothesize that hippocampal areas encode temporally related information, i.e. episodic content. We will modify the discussion to address these points.

    1. Author Response:

      We thank the eLife editorial board and the reviewers for the assessment of our article. We look forward to thoroughly addressing their comments and concerns. We would like to correct one factual error in the consensus public review:

      “Importantly, the authors do not present evidence that value itself is stably encoded across days, despite the paper's title. The more conservative in its claims in the Discussion seems more appropriate: "these results demonstrate a lack of regional specialization in value coding and the stability of cue and lick [(not value)] codes in PFC."

      The imaging sessions in which we identify value coding cells were in fact performed on separate days: Experimental Days 6 and 7 (see Figure 1b), which is evidence of the stability of value coding across consecutive days. Days 6 and 7 correspond to the third day of Odor Set 1 and the third day of Odor Set 2, respectively, which is why we referred to them both as “Day 3” in the manuscript, and this may have led to the confusion about the temporal relationship between these sessions. We will clarify this terminology in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this well-written manuscript, Afshar et al demonstrated the significant transcriptional and proteomic differences between cultured human umbilical vein endothelial cells (HUVECs) and those freshly isolated from the cords. They showed that TGFbeta and BMP signaling target genes were enriched in cord cells compared to those in culture. Extracellular matrix (ECM) and cell cycle-related genes were also different between the two conditions. Because master regulators of EC shear stress response genes, KLF2 and KLF4, were downregulated in culture, the authors sought to restore the in vivo transcriptional profile with the application of shear stress in an orbital shaker and dextran-containing media for various time periods. They showed that after 48 hours of shear stress the transcriptional profile of sheared cells correlated with in vivo transcriptional profile more significantly than static cultures. They also showed, using single cell RNAseq, that EC-smooth muscle cell cocultures resulted in changes in TGFbeta and NOTCH signaling pathways and rescued 9% of the in vivo transcriptional signatures.

      This is an important study that was elegantly executed. The authors should also be commended for making their data public; thereby, creating a valuable resource for vascular biologists.

      We much appreciate the comments and thank the reviewer for the time and effort evaluating the study.

      Reviewer #2 (Public Review):

      The authors profiled the transcriptome and proteome of human umbilical vein endothelial cells freshly isolated from in vivo and compared that with the same cells exposed to in vitro culture under different conditions, including static culture, flow, and co-culture with smooth muscle cells. The experiments were properly designed and performed. The authors also provided a reasonable and sound interpretation of their findings. This study provides valuable insights into how the culturing conditions impact on gene expression, encouraging the field to select their in vitro work setting appropriately. Overall, the manuscript is well-written and easy to follow.

      Several notable strengths include:

      1. Parallel transcriptome- and proteome-wide profiling of endothelial cells enabling the unbiased interrogation of gene expression and a genome-wide view of the impact of in vitro culture on endothelial transcriptome.

      2. The innovative experimental design and comparisons were done with genetically identical ECs (from the same donors) in vivo and in vitro.

      3. The analyses were robust and provided novel information on flow-dependent and cell context-dependent gene regulation, with the native freshly isolated cells as a baseline.

      4. The donor samples used in this study were diverse including Asian, White, Black, Latino, and American Indian samples which reduce racial background bias.

      Some points that can strengthen the study:

      A clear description of experimental and analytical details (e.g. how the comparisons were made) and more in-depth interpretation and discussion of the results, e.g. the complete genes that are rescued by flow and co-culture and potential synergy of these factors.

      We thank the reviewer for highlighting the strengths and appreciate the comments on experimental and analytical details which have been now addressed in this revised manuscript. Specifically, we have expanded the discussion and included synergy and additional comments on the rescued genes. A clear description of experimental and analytical details (e.g. how the comparisons were made) and more in-depth interpretation and discussion of the results, e.g. the complete genes that are rescued by flow and co-culture and potential synergy of these factors are now included.

      Reviewer #3 (Public Review):

      Afshar et al. performed RNA-seq and LC-MS of in vivo and in vitro HUVECs to identify the role of culture conditions on gene expression. Given the widespread use of HUVECs to study EC biology, these findings are interesting and can help design better in vitro experiments. There have been previous papers that compared in vivo and in vitro HUVECs, however, the depth of sequencing and analysis in this manuscript identifies some novel effects which should be accounted for in future in vitro experiments using ECs.

      Strengths:

      1. Major findings of distinct pathways affected by cell culture are novel and interesting. The authors identify major effects on TGFb and ECM gene expression. They also corroborate previous findings of flow response pathways, namely KLF2/4 and Notch pathway regulation.

      2. Use of multiple genomic methods to profile effects of culture conditions. The LC-MS data showed a significant correlation with RNA-seq, however, the data were not as strong so not used for subsequent analyses.

      3. Use of scRNA-seq to show the dynamic effects of co-culture and shear stress on ECs is very novel. However, the heterogeneity in the EC populations is not discussed in this manuscript.

      We would like to thank the reviewer for the in-depth analysis of our study and for highlighting the novelty and strength of the data. Note that we included comments in relation to EC heterogeneity as part of the limitations of this study (in the Discussion).

      Weaknesses:

      1. The physiological relevance of these changes in gene expression is not demonstrated in the manuscript. The authors claim the significance of their data is to improve in vitro culture to better represent in vivo biology. Is this the case with orbital shear stress? Do they rescue some functional effects in ECs with long-term shear stress? An angiogenesis, barrier function, or migration assay for HUVECs exposed to different conditions would help answer this question. A similar assay for cells after EC-VSMC co-culture would validate the importance of these stimuli.

      The reviewer is correct, our manuscript did not expand into physiological read outs, we have now clearly acknowledged this as part of the limitations of the study. Notably, there is already extensive literature on the effects of different types of flow on several physiological parameters. For example, others have shown that laminar shear stress (by orbital or other means) reduces proliferation and migration (PMID: 31831023; PMID: 22012789, PMID: 12857765, PMID: 21312062, PMID: 15886673; PMID: 17323381), reduces inflammation (PMID: 34747636; PMID: 32951280), and improves barrier function (PMID: 20543206; PMID: 32457386 ; PMID: 12577139, PMID: 27246807; PMID: 31500313 ).

      From the onset, our objective was to bring granularity to transcriptional changes associated with the transition from in vivo to in vitro. Further, it was our goal to identify the cohorts of transcripts that could and those that could not be rescued by altering culture conditions. Because we had transcriptional information from the identical samples at a time that they were in the vessel, we have been able to fulfill our goal. We feel this is important, and currently missing data, that will be of value to many investigators.

      1. One explanation for the increased expression of ECM genes in vivo is that these cells are contaminated with VSMCs/fibroblasts. This could be very likely given that cells were not sorted or purified upon isolation. Expression of other VSMC or fibroblast-specific markers (i.e. CNN1, MYH11, SMTN, DCN, FBLN1) would help determine if there is some level of non-EC contamination.

      We thank the reviewer for this comment and prompted by this, we have included a new figure (Supplemental Figure 1 and new panels in Supplemental Figure 5) that directly address this concern.

      Amongst the several pieces of data, we included scRNAseq from cells that were immediately obtained from umbilical vein – three independent experiments sequenced together and showed in one UMAP (Supplemental Figure 1C). As can be appreciated, the very large majority of cells are endothelial and the only other cell types present were blood cells (erythrocytes and CD45+ cells). No smooth muscle cells or fibroblasts were detected. These three examples are indeed representative of a large number of scRNAseq datasets (35 from cords and cultures for this and other projects). Furthermore, our cultures are also routinely evaluated by FACS (one example has been provided in Supplemental Figure 1E). We do not find, as illustrated in that example, cells that are not positive for CD31 and VE-Cadherin.

      We hope this information reveals the rigor of our studies and convinces the reviewer that the transcriptional changes observed are from endothelial cells.

      1. The use of scRNA-seq in Figure 4 is interesting. There appear to be 2 distinct EC populations in the co-cultured ECs. What are the marker genes for the 2 populations?

      Indeed, we and others (Kalluri et al., 2019) have noticed two distinct populations in the in vivo and also in cultured ECs, as pointed by the reviewer. Evaluation as to these two subpopulations reflect two transcriptionally distinct groups or different states of cyclic expression patterns, requires more thorough analysis and lineage tracing studies and distinct from the focus of this manuscript. Nonetheless, we have made a point in the revised manuscript to highlight these possibilities.

      Reference: Kalluri, AS, Vellarikkal, SK, Edelman, ER, Nguyen, L, Subramanian, A, Ellinor PT, Regev, A, Kathiresan, S, Gupta, RM. Single Cell Analysis of the Normal Mouse Aorta Reveals Functionally Distinct Endothelial Cell Populations. Circulation, 2019. 140:147-163.

      1. The modest shifts in gene expression with shear stress and co-culture could be attributed to the batch effect. The authors describe 1 batch correction method (ComBat) in the bulk RNA-seq, but no mention of batch correction was noted in the scRNA-seq methods. The authors should ensure that batch effect correction in all data is adequate, and these results should be added to the manuscript.

      We thank the reviewer for this comment. Indeed, batch effects are a particularly important consideration when samples are prepared separately and/or sequenced at distinct times, note this was not the case in this study.

      For the scRNA-seq analysis, we removed the low-quality cells, but did not use batch-effect correction methods because the samples were prepared and run at the same time. Meaning, isolation was performed in parallel, generation of cDNA libraries was done concurrently, and sequencing was run in the same gel. The quality of the data (and lack of batch effect) was subsequently verified when the two mono-culture biological replicates were evaluated by Seurat and were found to overlap on the UMAP (Figure 4), the same applies to the two co-culture biological replicates. These results clearly indicate that there’s no batch effect (as the samples were not process in distinct batches) among these samples.

      1. Table 1 shows ATAC-seq was done, however, no data from these experiments are provided in the manuscript.

      As mentioned (reviewer 2), we had performed ATACseq but decided to remove from the manuscript for several reasons and apologize for missing reference to Table 1. We have now corrected this error.

      1. Shear stress was achieved with an orbital shaker, which the accompanying citation states introduces significant heterogeneity in the ECs. This is based on the location of the culture dish. Was this heterogeneity seen in the scRNA-seq data?

      Correct. We only use the 2/3 peripheral area of the plates and discard the central aspect of the plate. We have added clarifying language to the Methods > Shear stress application to reflect this: “Orbital shear stress (130 rpm) was applied to confluent cell cultures by using an orbital shaker positioned inside the incubator as previously discussed (32). The shear stress within the cell culture well corresponds to arterial magnitudes (11.5 dynes/cm2) of shear stress. To reduce issues associated with uniformity of shear stress, the endothelial cell monolayers in 6-well plates were lysed after removing center region using cell scraper (BD Falcon #35-3085) and washing with 1X HBSS (Corning #21-022-CV). The 1.8cm blade was circumferentially used in the center of the 6-well plate to remove the center of the monolayer that did not see the higher shear stress.”

      1. It would be important to know whether the authors reproduce the findings from other papers that CD34 expression is reduced in cultured HUVECs:

      Muller AM, Cronen C, Muller KM, Kirkpatrick CJ: Comparative analysis of the reactivity of human umbilical vein endothelial cells in organ and monolayer culture. Pathobiology 1999;67:99-107. Delia D, Lampugnani MG, Resnati M, Dejana E, Aiello A, Fontanella E, Soligo D, Pierotti MA, Greaves MF: Cd34 expression is regulated reciprocally with adhesion molecules in vascular endothelial cells in vitro. Blood 1993;81:1001-1008.

      Thank you for this suggestion. Supplemental Excel 4 allows the reader to review single genes that are modulated by condition and in fact, consistent with all previous literature, CD34 expression is one of the most significantly decreased genes in cultured HUVECs (0.9, p=1E-5).

    1. Author Response

      Reviewer #1 (Public Review):

      1) I was confused about the nature of the short-term plasticity mechanism being modeled. In the Introduction, the contrast drawn is between synaptic rewiring and various plasticity mechanisms at existing synapses, including long-term potentiation/depression, and shorter-term facilitation and depression. And the synaptic modulation mechanism introduced is modeled on STDP (which is a natural fit for an associative/Hebbian rule, especially given that short-term plasticity mechanisms are more often non-Hebbian).

      Indeed, because of its associative nature, the modulation mechanism was envisioned to be STDP-like, i.e. on faster time scales than the complete rewiring of the network (via backpropagation) but slower time scales than things like STSP which, as the reviewer points out, are usually not considered associative. One thing we do want to highlight is that backpropagation and the modulation mechanism are certainly not independent of one another. During training, the network’s weights that are being adjusted by backpropagation are experiencing modulations, and said modulations certainly factor into the gradient calculation.

      We have edited the abstract and introduction to try to make the distinction of what we are trying to model clearer.

      1) cont: On the other hand, in the network models the weights being altered by backpropagation are changes in strength (since the network layers are all-to-all), corresponding more closely to LTP/LTD. And in general, standard supervised artificial neural network training more closely resembles LTP/LTD than changing which neurons are connected to which (and even if there is rewiring, these networks primarily rely on persistent weight changes at existing synapses).

      Although we did not highlight this particular biological mechanism because we wanted to keep the updates as general as possible, one could view the early versus late LTP. We have added an additional discussion of how the associative modulation mechanisms and backpropagation might biologically map into this mechanism in the discussion section.

      1) cont: Moreover, given the timescales of typical systems neuroscience tasks with input coming in on the 100s of ms timescale, the need for multiple repetitions to induce long-term plasticity, and the transient nature/short decay times of the synaptic modulations in the SM matrix, the SM matrix seems to be changing on a timescale faster than LTP/LTD and closer to STP mechanisms like facilitation/depression. So it was not clear to me what mechanism this was supposed to correspond to.

      We note that although the structure of the tasks certainly resembles known neuroscience experiments that happen on shorter time scales (and with the introduction of the 19 new NeuroGym tasks, even more so), we did not have a particular time scale for task effects in mind. So each piece of “evidence” in the integration tasks may indeed occur over significantly slower time scales and could abstractly represent multiple repetitions in order to induce (say) early phase LTP.

      Given that the separation between the two plasticity mechanisms may be clearer for STSP, and indeed many of the tasks we investigate may more naturally be mapped to tasks that occur on time scales more relevant to STSP, we have introduced a second modulation rule that is only dependent upon the presynaptic firing rates. See our response to the Essential Revisions above for additional details on these new results.

      2) A number of studies have explored using short-term plasticity mechanisms to store information over time and have found that these mechanisms are useful for general information integration over time. While many of these are briefly cited, I think they need to be further discussed and the current work situated in the context of these prior studies. In particular, it was not clear to me when and how the authors' assumptions differed from those in previous studies, which specific conclusions were novel to this study, and which conclusions are true for this specific mechanism as opposed to being generally true when using STP mechanisms for integration tasks.

      We have added additional works to the related works sections and expanded the introduction to try to better convey the differences with our work and previous studies. Briefly, mostly our assumptions differed from previous studies in that we considered a network that relied only on synaptic modulations to do computations, rather than a network with both recurrence and synaptic modulations. This allowed us to isolate the computational power and behavior of computing using synaptic modulations alone.

      It is hard to say which of the conclusions are generally true when using STP mechanisms for integration tasks without a comprehensive comparison of the various models of STP on the same tasks we investigated here. That being said, we believe we have presented in this work conclusions that are not present in other works (as far as we are aware) including: (1) a demonstration of the strength of computing with synaptic connection on a large variety of sequential tasks, (2) an investigation into the dynamics of such computations how they might manifest in neuronal recordings, and (3) a brief look at how these different dynamics might be computational beneficial in neuroscience-relevant areas. We also note that one reason for the simplicity of our mechanism is that we believe it captures many effects of synaptic modulations (e.g. gradual increase/decrease of synaptic strength that eventually saturates) with a relatively simple expression, and so we believe other STP mechanisms would yield qualitatively similar results. We have edited the text to try to clarify when conclusions are novel to this study and when we are referencing results from other works.

      Reviewer #2 (Public Review):

      On the other hand, the general principle appears (perhaps naively) very general: any stimulus-dependent, sufficiently long-lived change in neuronal/synaptic properties is a potential memory buffer. For instance, one might wonder whether some non-associative form of synaptic plasticity (unlike the Hebbian-like form studied in the paper), such as short-term synaptic plasticity which depends only on the pre-synaptic activity (and is better motivated experimentally), would be equally effective. Or, for that matter, one might wonder whether just neuronal adaptation, in the hidden layer, for instance, would be sufficient. In this sense, a weakness of this work is that there is little attempt at understanding when and how the proposed mechanism fails.

      We have tried to address if the simplicity of the tasks considered in this work may be a reason for the MPN’s success by training it on 19 additional neuroscience tasks (see response to Essential Revisions above). Across all these additional tasks, we found the MPN performs comparable to its RNN counterparts.

      To address whether associativity is necessary in our setup we have introduced a version of the MPN that has modulation updates that are only presynaptic dependent. We call this the “MPNpre” and have added several results across the paper addressing its computational abilities (again, additional details are provided above in Essential Revisions). We find the MPNpre has dynamics that are qualitatively the same as its MPN counterpart and has very comparable computational capabilities.

      Certainly, some of the tasks we consider may also be solvable by introducing other forms of computation such as neuronal adaptation. Indeed, we believe the ability of the brain to solve tasks in so many different ways is one of the things that makes it so difficult to study. Our work here has attempted to highlight one particular way of doing computations (via synapse dynamics) and compared it to one particular other form (recurrent connections). Extending this work to even more forms of computation, including neuronal dynamics, would be very interesting and further help distinguish these different computational methods from one another.

      Reviewer #3 (Public Review):

      Because the MPN is essentially a low-pass filter of the activity, and the activity is the input - it seems that integration is almost automatically satisfied by the dynamics. Are these networks able to perform non-integration tasks? Decision-making (which involves saddle points), for instance, is often studied with RNNs.

      We have tested the MPN on 19 additional supervised learning tasks found in the NeuroGym package (Molano-Mazon et. al., 2022), which consists of several decision-making-based tasks and added these results to the main text (see response to Essential Revisions above, and also Figs. 7i & 7j). Across all tasks we investigated, we found the MPN performs at comparable levels to its RNN counterparts.

      Manuel Molano-Mazon, Joao Barbosa, Jordi Pastor-Ciurana, Marta Fradera, Ru-Yuan Zhang, Jeremy Forest, Jorge del Pozo Lerida, Li Ji-An, Christopher J Cueva, Jaime de la Rocha, et al. “NeuroGym: An open resource for developing and sharing neuroscience tasks”. (2022).

      The current work has some resemblance to reservoir computing models. Because the M matrix decays to zero eventually, this is reminiscent of the fading memory property of reservoir models. Specifically, the dynamic variables encode a decaying memory of the input, and - given large enough networks - almost any function of the input can be simply read out. Within this context, there were works that studied how introducing different time scales changes performance (e.g., Schrauwen et al 2007).

      Thank you for pointing out this resemblance and work. In our setup, the fact that lamba is the same for the entire network means all elements of M decrease uniformly (though the learned modulation updates may allow for the growth of M to be non-uniform). One modification that we think would be very interesting to explore is the effects on the dynamics of non-uniform learning rates or decays across synapses. In this setting, the M matrix could have significantly different time scales and may even further resemble reservoir computing setups. We have added a sentence to the discussion section discussing this possibility.

      Another point is the interaction of the proposed plasticity rule with hidden-unit dynamics. What will happen for RNNs with these plasticity rules? I see why introducing short-term plasticity in a "clean" setting can help understand it, but it would be nice to see that nothing breaks when moving to a complete setting. Here, too, there are existing works that tackle this issue (e.g., Orhan & Ma, Ballintyn et al, Rodriguez et al).

      Thank you for pointing out these additional works, they are indeed very relevant and we have added them all to the text where relevant.

      Here we believe we have shown that either recurrent connections or synaptic dynamics alone can be used to solve a wide variety of neuroscience tasks. We don’t believe a hybrid setting with both synaptic dynamics and recurrence (e.g. a Vanilla RNN with synaptic dynamics) would “break” any part of this setup. Since each of the computational mechanisms could be learned to be suppressed the network could simply solve the task by relying on only one of the two mechanisms. For example, it could use a strictly non-synaptic solution by driving eta (the learning rate of the modulations) to zero or it could use a non-recurrent solution by driving the influence of recurrent connections to be very small. Orhan & Ma mention they have a hard time training a Vanilla RNN with Hebbian modulations on the recurrent weights for any modulation effect that goes back more than one time step, but unlike our work they rely on a fixed modulation strength.

      Indeed, we think how networks with multiple computational mechanisms will solve tasks is a very interesting question to be further investigated, and a hybrid solution may be likely. We believe our work is valuable in that it illuminates one end of the spectrum that is relatively unexplored: how such tasks could be solved using just synaptic dynamics. However, what type of solution a complete setup ultimately lands on is likely largely dependent upon both the initialization and the training procedure, so we felt exploring the dynamics of such networks was outside the scope of this work.

      One point regarding biological plausibility - although the model is abstract, the fact that the MPN increases without bounds are hard to reconcile with physical processes.

      Note although the MPN expression does not have explicit bounds, in practice the exponential decay eventually does balance with the SM matrix updates, and so we observe a saturation in its size (Fig. 4c, except for the case of lamba=1.0, which is not considered elsewhere in the text). However, we explicitly added modulation bounds to the M matrix update expression and did not find it significantly changed the results (see comments on Essential Revisions above for details).

    1. Author Response

      Reviewer #2 (Public Review):

      Here I will mainly comment on the biology of adipocytes, which is my specialty.

      In this manuscript, it has been very convincingly shown that O-GlcNAc acts as an important regulator of MSC differentiation in mice, and given previous studies in which O-GlcNAc is regulated by aging and nutritional status, it makes sense that this PTM determines differentiation and BM niche.

      The point that O-GlcNAc regulates adipocyte differentiation is convincing, but there are already previous studies using 3T3-L1 (e.g., Biochemical and Biophysical Research Communications 417 (2012) 1158-1163), and a more step-by-step demonstration of the molecular mechanism would make this an excellent paper that can be extended to adipocyte research in general, not just BM.

      While O-GlcNAc has been demonstrated in regulating many aspects of metabolic physiology, our understanding of its role in adipogenesis has been limited so far. As the reviewer pointed out, there was an in vitro report on its inhibition of adipogenesis in 3T3-L1 cells (Ji et al., 2012). Two recent publications from Dr. Xiaoyong Yang’s group revealed the profound role of mature white adipocytes OGT in regulating lipolysis and obesity (Li et al., 2018; Yang et al., 2020). To my knowledge, our manuscript is the first attempt to address the regulation of adipogenesis by O-GlcNAc in vivo. While using the BMSCs as a non-conventional model, we speculate our molecular mechanisms (i.e., O-GlcNAc inhibition of C/EBPβ) could be conserved in peripheral adipose organs, including white and brown adipose tissues. Future experiments are warranted in the lab to extend the current knowledge to these adipocyte progenitors. Nonetheless, I would also like to point out that, due to the broad actions of OGT and the current lack of adipocyte progenitor specific Cre animal tools, such efforts might be futile as results can be confounded by defects in other organs/cells.

      It is somewhat unclear whether or not the authors' in vitro experiments using 10T1/2 cells accurately reflect what is happening in vivo in knockout mice. The PDGFRa+VCAM1+ population of adipocyte progenitors shown by the authors is upregulated by about 30% by knockout of Ogt (Figure 4C). How significant is this difference? Rather, might the expression of Pparg, which indicates lineage commitment, be the underlying mechanism? In any case, this manuscript is highly impactful in the sense that the differentiation of adipocytes forming the BM niche can be controlled using tissue-specific knockouts of the Ogt gene.

      We agree with the reviewer that the role of OGT in BMSC fate determination and adipogenesis might be multifaceted. The 30% increase in PDGFRa+VCAM1+ BM adipose progenitors cannot fully explain the massive adipogenesis observed in OgtΔOsx animals (Fig. 4A). Indeed, we provided in vitro evidence that genetic deletion or chemical inhibition of OGT activates adipogenesis (Fig. 4D-I). Mechanistically, we found the O-GlcNAcylation of C/EBPβ protein (but not PPARγ) is responsible in the inhibition, which leads to reduced expression of adipogenic genes, including Pparg (Fig. 4H).

    1. Author Response

      Reviewer #1 (Public Review):

      The paper states that they observed a combined total of 77,017 single-nucleotide variants (SNVs) and 12,031 insertion/deletions (In/Dels) across all tissue, age, and intervention groups. Collectively, these data represent the largest collection of somatic mtDNA mutations obtained in a single study to date. However, A study with more somatic mtDNA mutations by the LostArc method (PMID 32943091) revealed 35 million deletions (~ 470,000 unique spans) in skeletal muscle from 22 individuals with and 19 individuals without pathogenic variants in POLG. Thus, the authors should reword this part to say that this study represents the largest collections of mouse mtDNA point mutations detected, but not the largest amount of mutations (deletions exceed this number).

      Thank you for pointing this out. When we wrote that sentence, we were more referring to small polymerase-based errors, as opposed to larger structural variants that likely arise from a different mechanism. However, the distinction between these two event classes is poorly defined. We have amended our statement and have added a citation to Lujan et al. Our statement now reads “We observed a combined total of 77,017 single-nucleotide variants (SNVs) and 12,031 small insertion/deletions (In/Dels) (≲15bp in size) across all tissue, age, and intervention groups. Collectively, these data represent the largest collection of somatic mtDNA point mutations obtained in a single study to date and is second only to Lujan et al. in terms overall In/Del counts (Lujan et al., 2012).” (Lines 252-256)

      What is the theoretical limit of pt mutations in the mitochondrial genome, assuming only one pt mutation per genome? Doesn't 77000 detected independent pt mutations approach that limit? Can the authors estimate how many molecules contained two or more pt mutations? Did the analysis reveal any un-mutated regions implying an essential function? For example, on p.9 can the authors provide an explanation of why OriL and other G/C-rich regions were not uniformly covered as compared to the rest of the genome?

      This is an interesting question and one we’ve given some thought to. In fact, this basic question was the inspiration for our recent Nucleic Acids Research paper (PMC8565317) where we asked how mutations were distributed in the genome. The short answer is that we likely exceed the limit for only dG site mutations (and only for G>A mutations, at that), but not the other reference sites. The reason is that there are only 2013 dG sites and the mutation spectrum is heavily skewed toward G>X (there are 47,680 dG site mutations, 42,924 of which are G>A). In comparison, we observe only 4,421 A>X, 9,277 T>X, and 15,632 C>X mutations, but with 5,629, 4,681, and 3,976 dA, dT, and dC genomic sites, respectively. Assuming the mutations are uniformly distributed along the genome (which they are not; see our NAR paper), then random binomial sampling would require a fair amount more mutations in order to reach saturation for the other genomic sites. The uneven distribution increases this number further.

      With regard to the second question, we can’t actually do this estimation with this data set. The reason is because the ~77,000 mutations aren’t found in a single sample, but are distributed across may independent or semi-independent (i.e. different organs within a mouse), which means that most, if not all, of the mutations are necessarily on different mtDNA molecules.

      With regard to the OriL and G/C rich regions, these presumably have some sort of secondary structure that prevents the sequencer from obtaining any useful information. However, this is all speculative and we don’t know why. Interestingly, human mtDNA doesn’t show this dip at the OriL, despite a similar function and location in the mtDNA.

      Given that mitochondrial disease usually doesn't present until >60% of the genomes are affected, the very low level of detected pt mutations observed in the mouse (and presumably similar to human) would mean that they are well below a physiological level. Thus, these low-level pt mutations are well tolerated. Can the authors estimate a theoretical age of the mouse (well beyond their life span) where over 50% of the genomes carry at least one pt mutation?

      The reviewer brings up a frequent noted point in mitochondrial biology that is very much worth addressing in this manuscript. The often-cited statistic that mitochondrial disease doesn’t present until ~60% of genomes are affected is, while true, only pertinent to overt mitochondrial diseases, such as LHON, MERRF, etc, where all or nearly all cells in an individual are affected by the mutation. However, the impact of mtDNA mutations is not only contingent on how many cells have the mutation, but also the fraction of mtDNA molecules within a cell that harbor the variant. Because the deleterious effects of a mtDNA mutation act at the level of individual cells, it is important to know both how many cells harbor a mutation as well as what the heteroplasmic level is within the cell before making claims on their pathological impact.

      To date, nearly all studies on mtDNA mutations rely on bulk DNA analysis from thousands to millions of cells, which necessarily decouples variant phasing information between any two reads, resulting in a loss of important biological information such as the heteroplasmic level within any given cell. As such, with bulk sequencing it is impossible to tell the difference between a homoplasmic mutation in a small subset of cells and heteroplasmic mutation in all cells. In the first case, the cells harboring this mutation would be negatively impacted, whereas in the second example, it is unlikely. One can imagine a scenario where every cell contains a different homoplasmic pathogenic mutation which would negatively affect cellular function for every cell. In this case, mutations would be highly prevalent (100% of cells), yet individually rare. However, bulk sequencing would give the appearance that no mutation comes close to exceeding the phenotypic threshold. We highlight this issue in a recent review (Sanchez-Contreras and Kennedy, 2022; PMC8896747).

      The point that the review brings up is extremely important, so we have added a section in the discussion related to heteroplasmy versus clones.

      Also, the problem with this low level of pt mutations is that they are not physiological, the effect of the drug treatment causing a reduction in ROS-mediated transversions would not be expected to have a detectable effect on mitochondria. The improvement on mitochondrial seen by others is most likely independent of the mutations in the genome. There needs to be a cause and effect here and I don't see one.

      It is important to note that we do not make the claim (no do we want to imply) that the reduction of mutations is the reason behind the improvements in mitochondrial function by these interventions. Instead, we believe that loss of ROS-linked mutations is a consequence of the mechanism by which these interventions work. We do hypothesize that the reduction in ROS-linked mutations suggests that “there is tissue specificity in how cells repair and/or destroy oxidatively damaged mitochondria and/or mtDNA resulting in a steady-state of ROS-linked mutations.” (Lines 551-553) and that “We propose that rather than the incidence and impact of ROS damage on mtDNA being minimal, recognition and removal of ROS-linked mutations are maintained at a steady state during aging.” (Lines 572-574).

      In addition, as noted above, how “low level” these mutations are and their impact on cellular function is not easily determined in bulk sequencing studies, so a strong link between cause and effect is not an answerable relationship with this data set.

      There's no mention in this paper and methodology about how point mutations in nuclear-encoded mtDNA (NUMTs) are excluded from the reads and I'm worried that these errors are being read as rare errors in the mtDNA genome. While NUMTs have been documented for decades, a recent report in Science (PMID: 36198798) documents how frequently and fluidly NUMTs occur. Can the authors provide a clear explanation of how mutations in NUMTs are excluded?

      The reviewer is absolutely correct to call attention to this important aspect of mitochondrial biology. We don’t believe NUMTs are an important confounder in our data set for several reasons.

      1) We used isogenic inbred C57Blk6/J which, frequently, were litter mates (siblings). Therefore, any mutations from NUMTS that are there would be expected to be uniform across samples, especially between tissues from a single sample animal. Unknown and variations of NUMTS would certainly be a potentially strong confounder in an outbred population, but the use of one isogenic inbred line for this study likely eliminates this confounder.

      2) We used the mm10 reference genome which is based on the C57Blk6/J strain so any NUMTS derived variants present in our mtDNA data should preferentially align against the NUMT. Therefore, we perform a BLAST step of all reads containing at least one variant against the mm10. BLAST is much more sensitive to sequence variation compared to bwa but is far slower, so it is impractical to run as the initial aligner. We then reassign the read based to whatever genomic location has the lower e-score. The result is typically around a dozen reads are removed, demonstrating that NUMTS are not likely a major source of false mutations.

      3) Because NUMTS are inherited, then any variants would be found across all the tissues and animals we used in this study. As part of our processing, we mark and remove variants shared between multiple individual samples.

      We have made edits to the Methods section (Lines 198-206) to more explicitly highlight the filtering steps and the logic behind them. In addition, we have added a paragraph in the discussion that addresses NUMTs (Starting on line 642).

      Reviewer #2 (Public Review):

      A common problem in mutation analysis is that DNA damage (present in one strand) is difficult to separate from real mutations (present in both strands). One of the approaches to solve this problem based on independent tagging of the two strands by different unique molecular identifiers was developed by the authors about 10 years ago. This study summarizes the application of this method to a wide range of mouse tissues, ages, and drug treatment regimes. Much of the results confirm previous conclusions from this laboratory. This involves overall mutational levels of somatic mtDNA mutations (~10-6-10-5), their accumulation with age, the prevalence of GA/CT transitions, and their clonality. Although these results were not new, it is important that these were confirmed in a single study with high confidence in a huge number of independent mutations.

      We thank the reviewer for the comment and really hope this data set will be of significant use to other researchers given its breadth of sample types and large number of mutations.

      What really sets this study apart from other studies is the detection of a large proportion of transversion mutations, primarily of the C>A/G>T and C>G/G>C types. Transversions are traditionally considered 'persona non grata' in mtDNA mutational spectra and are typically associated with errors of mutational analysis (which they in fact are). The presence of these mutations in both strands of the duplex makes a good case that these mutations are real, rather than converted damage. However, because this is such a novel discovery and because regular controls do not work (I mean, for example, that these mutations never clonally expand. If there is a clonal expansion, then the mutation is real, only real mutation can expand. But in the case of non-expandable C>A/G>T and C>G/G>C this control does not help to validate these mutations), it would be nice to provide extra assurances that this is not some kind of artifact that somehow slipped through the ds sequencing procedure. I would recommend including in the supplement the data on the abundance of single-stranded base changes as detected by ds sequencing (i.e., changes confirmed in one and not in the other strand of a given molecule). An unusually high presence of such single-stranded changes of the C>A/G>T and C>G/G>C type would be a red flag for me. If ratios of single and double-stranded mutations were similar for transitions and transversions - that would reassure me and hopefully the reader.

      Furthermore, a similar excess of C>A/G>T and C>G/G>C has been observed in a recent paper by Abascal 2021 (cited in the manuscript). In that paper, a UMI- free, but otherwise very similar ds sequencing approach in nuclear DNA (BotSeqS) was demonstrated to suffer from an artifact causing (among other effects) an excess of C>A/G>T and C>G/G>C transversions. This artifact is related to end repair and nick-translation of DNA fragments during library preparation. Because BotSeqS is very similar to ds sequencing, we expect that same artifact may be taking place in the study under review. We recommend running checks similar to those undertaken by Abascal et al (which include, at the very minimum, checking the distribution of the C>A/G>T and C>G/G>C transversions within the reads (artifacts tend to be concentrated towards the ends of the reads).

      The reviewer is absolutely correct to bring up this extremely important point. We have addressed these concerns in two ways that are addressed on Lines 332-361. 1) by performing an analysis of the single-stranded consensus data, which is a measure of PCR artifacts that frequently arise as a function of DNA damage, across all the tissues of the aged cohort. We noted no differences between tissues, which indicates that the amount of ROS-induced PCR artifacts is no different between the tissues. Thus, it would require a different rate at which ROS artifacts lead to false “Duplex consensus” variants that is tissue specific. The analysis is presented in Figure 3-figure supplement 2. 2) we have included an experiment in which we show that treatment of post-fragmented DNA with FPG, a glycosylase that targets Fapy-dG and 8-oxo-dG, does not differ from untreated control DNA. Because Duplex-Seq requires that both strands of a parent DNA molecule be present to form a final Duplex Consensus Sequence, the scission of one strand by the lyase activity of FPG would prevent the formation of this final consensus and prevent this sort of error from “bleeding through”. This analyses can now be found in a Figure 3-figure supplement 3.

      Of note, even if transversions detected in this study prove to be artifacts of the Abascal type (likely) they still may reflect real ss damage in mtDNA (not instrumental artifacts, like sequencing errors or in vitro DNA damage). This is supported by the strong variation in the levels of transversions across tissues and as a result of the ameliorating drug intervention. Artifacts, in contrast, would be expected to be at a constant level. This logic, however, does not differentiate between real ds mutations and ss damage. So UMI-based ds sequencing evidence remains the only (though very strong) independent proof. So, in my view, whereas the jury may be still out on whether the observed transversions are true ds mutations or some kind of single-stranded damage, this is a critically important observation. The evidence of ss damage greatly varied between tissues and detected with such precision on a single molecule level is a very important finding as well.

      Out of caution, I would recommend mentioning the above-stated uncertainty and noting that more research is needed to fully confirm that C>A/G>T and C>G/G>C changes detected in this study are indeed double-stranded mutations.

      We agree. Together with comments from Reviewer #1 regarding NUMTs (Comment #5), we have added a paragraph in the Discussion about potential alternative explanations for our observations.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, May et al use H2B overexpression driven by Keratin14 Cre-mediated excision of a loxPstop cassette to quantify bulk chromatin dynamics in the live epidermis. They observe heterogeneity of H2B distribution within the basal stem cell layer and a change in distribution when the stem cells delaminate into the suprabasal layers. They further show that these chromatin rearrangements precede cell fate commitment, as detected by adding another Cre-mediated transgene on top (tetO-Cre mediated Keratin10 reporter). Finally, they generate an MST stem-loop transgene for the keratin 10 transcript and observe transcriptional bursting.

      We would like to clarify for the reviewer that the H2B system used is a transgenic allele of histone-2B-GFP that is driven directly by the Keratin-14 promoter (Kanda et al., 1998; Tumbar et al., 2004). This system does not rely on any Cre-mediated excision of the LoxP-stop cassette, and these mice do not carry Cre alleles. We will touch on this point below when addressing the comment on Cre expression in cells and the raised question on whether it influences the quantifications of chromatin compaction.

      The manuscript uses elegant in vivo imaging approaches to describe a set of observations that are logically based on a panel of studies that have used genetic approaches to dissect the role of heterochromatin and histone/DNA modifications in epidermal state transitions. In addition, the MST stem-loop analysis is a nice technical advance, confirming transcriptional bursting as a general phenomenon of how transcription is regulated in cells (see work from Daniel Larsson, Jonathan Chubb, Arjun Raj, and others).

      We thank the reviewer for their recognition of our contribution to the transcription field. To deepen the connection between our data and previous characterizations of transcriptional dynamics in other systems, we have added new analyses of K10MS2 transcriptional bursting on a finer temporal scale (Fig 5G-K). We find pervasive “transcriptional bursting,” consistent with findings in vitro and in other model organisms, and a surprising variation of burst durations. We believe these additional analyses significantly strengthen our conclusions and the relevance of our study to the overall transcription field.

      The value of the study in my view is recapitulating these known phenomena in a live tissue setting with high-quality imaging and careful quantification. Overall, the analyses appear thorough, although the overall changes appear relatively minor, which is perhaps to be expected from imaging bulk H2B distribution as a proxy for chromatin states.

      There is one major technical concern that might impact the interpretation of the data. The authors combine Cre lines for their key conclusions (Krt10 reporter and SRF KO) and analyze single cells that thus express very high levels of Cre. Knowing that Cre will target non-loxP sites and is genotoxic, it is possible that the effect of chromatin is due to high levels of Cre expression in single cells rather than specific effects due to cell state transitions. I would encourage the authors to carefully quantify the dose-dependent effects of the Cre protein (independent of the LoxP sites) on chromatin organization. Along these lines, is the phenotype of the SRF KO similar in the presence of two Cre alleles versus just one?

      Thank you for these kind words. This is an important potential caveat to consider. We believe that Cre activity does not significantly affect the chromatin compaction profiles for several reasons. First, we interrogated Cre activity. The quantifications in Figure 1A-E and Figure 2B-C are from mice containing K14H2B-GFP allele alone and do not carry any Cre allele. When these data were compared to those from mice that had been treated with a high dose of tamoxifen to induce Cre-mediated recombination in the vast majority of cells, the chromatin compaction profiles were not significantly different (Supp Fig 3C). We have added this comparison to Supplemental Figure 3 and addressed this point in the text (page 9). To further determine whether Cremediated recombination affects our measurement of chromatin compaction, we also analyzed adjacent basal cells with and without Cre activity in the same animal. K14H2BGFP; K14CreER; tdTomato mice were induced with a low dose of tamoxifen such that roughly 65% of epidermal cells underwent Cre recombination as demonstrated by expression of the tdTomato fluorescent reporter (Gallini et al., 2022). They also received a punch biopsy performed on the unimaged ear. Three days post injury and six days after Cre induction, the chromatin compaction profiles of cells positive and negative for Cre-mediated recombination were also not significantly different (Rebuttal Figure 1). Together, these direct comparisons between cells exposed to Cre activity and cells not exposed to Cre activity indicate that Cre activity at levels comparable to those used in our experiments has no measurable effect on our measurements of chromatin compaction.

      Rebuttal Figure 1: Effect of Cre expression on chromatin compaction profiles

      The second issue is the conclusion of "chromatin spinning". Concluding that chromatin is spinning would in my view require that the authors demonstrate that the nuclear envelope is not moving or is moving less than the chromatin. To support this conclusion the authors should do double imaging for example with LINC complex proteins, an ER/outer nuclear membrane marker, or equivalent.

      This is an excellent point. While we expect that the entire nucleus is spinning based on observations others have made in in vitro fibroblasts systems, we describe our observation as “chromatin spinning” instead of “nuclear spinning” because the K14H2B-GFP allele only allows us to directly visualize chromatin itself (Kumar et al., 2014; Zhu et al., 2018).

      Unfortunately, LINC complex proteins and nuclear membrane proteins have not been fluorescently tagged in mice, which prevents us from visualizing their dynamics in vivo. To establish these new tools and perform experiments would take more than a year, making it therefore beyond the scope of this current paper. Additionally, their relatively uniform distribution across the nuclear membrane would not allow us to visualize potential spinning of these components. We have made efforts towards the reviewer’s question by asking whether other compartments within the cell also spin in delaminating cells. To do this, we leveraged a mouse line developed by Claudio Franco’s lab (Barbacena et al., 2019), which fluorescently labels both the chromatin (H2B-GFP) and the Golgi (GTS-mCherry). As expected, this model showed a perinuclear and polarized Golgi in skin fibroblasts (Rebuttal Figure 2). However, this tool is incompatible with our questions in epidermal cells for a few reasons. First, the system is toxic to epithelial cells in vivo, resulting in apoptosis, nuclear fragmentation, and binucleate cells. Second, the Golgi is not discretely polarized (or even perinuclear) in epithelial cells (Rebuttal Figure 2). As such, although we observe chromatin spinning in delaminating basal cells, we are uncertain as to whether the whole nucleus or any other cellular compartments are spinning in these cells.

      Rebuttal Figure 2: Interrogation of intracellular spinning

      Given the above reasoning and efforts, we have altered the text and specified that we only have the capacity to visualize chromatin through the H2B-GFP allele and that we hypothesize the entire nucleus is spinning (page 11).

      Reviewer #2 (Public Review):

      In this work entitled "Live imaging reveals chromatin compaction transitions and dynamic transcriptional bursting during stem cell differentiation in vivo" the authors use a combination of genetic and imaging tools to characterize dynamic changes in chromatin compaction of cells undergoing epidermal stem cell differentiation and to relate chromatin compaction to transcriptional regulation in vivo. They track this phenomenon by imaging the epithelium at the ear of live mice, thus in a physiological context. By following individual nuclei expressing H2B-GFP along time ranges of hours and up to 3 days, they develop a strategy to quantify the profile of chromatin compaction across different epidermal layers based on normalized intensity profiles of H2B-GFP. They observe that cells belonging to the basal stem cell layer display a considerable level of internuclear variability in chromatin compaction that is cell-cycle independent. Instead, intercellular variability in chromatin compaction appears more related to the differentiation status of the cells as it is stable in the hours range but dynamic in the days range. The authors show that differentiated nuclei in the spinous layer exhibit higher chromatin compaction. They also identified a subset of cells in the basal stem layer with an intermediate profile of chromatin compaction and with the dynamic expression of the early differentiation marker keratin 10. Lastly, they show that the expression of keratin-10 precedes the chromatin compaction establishing relevant temporal relationships in the process of epidermal differentiation.

      This work includes a number of challenging approaches and techniques since it is carried out in living mice. Also, it provides nice tools and methods to study chromatin structure in vivo during multiple days and within a differentiation physiological system. On the other hand, the results are descriptive and, in some respect, expected in line with previous observations.

      Thank you very much for this great summary, kind words, and the recommendations listed below. We will address each of them specifically. We have also deepened the analysis of transcriptional dynamics in ways that are more comparable with how other groups have studied transcription and included those results in Figure 5.

      References

      Kanda, T., Sullivan, K.F., and Wahl, G.M. (1998). Histone–GFP fusion protein enables sensitive analysis of chromosome dynamics in living mammalian cells. Current Biology 8, 377–385. 10.1016/S09609822(98)70156-3.

      Tumbar, T., Guasch, G., Greco, V., Blanpain, C., Lowry, W.E., Rendl, M., and Fuchs, E. (2004). Defining the epithelial stem cell niche in skin. Science 303, 359–363. 10.1126/science.1092436.

      Kumar, A., Maitra, A., Sumit, M., Ramaswamy, S., and Shivashankar, G.V. (2014). Actomyosin contractility rotates the cell nucleus. Sci Rep 4, 3781. 10.1038/srep03781.

      Zhu, R., Liu, C., and Gundersen, G.G. (2018). Nuclear positioning in migrating fibroblasts. Seminars in Cell & Developmental Biology 82, 41–50. 10.1016/j.semcdb.2017.11.006.

      Sara Gallini, Nur-Taz Rahman, Karl Annusver, David G. Gonzalez, Sangwon Yun, Catherine Matte-Martone, Tianchi Xin, Elizabeth Lathrop, Kathleen C. Suozzi, Maria Kasper, Valentina Greco . Injury suppresses Ras cell competitive advantage through enhanced wild-type cell proliferation.<br /> bioRxiv 2022.01.05.475078; doi: https://doi.org/10.1101/2022.01.05.475078

      Pedro Barbacena, Marie Ouarné, Jody J Haigh, Francisca F Vasconcelos, Anna Pezzarossa, Claudio A Franco. GNrep mouse: A reporter mouse for front-rear cell polarity. Genesis 2019 Jun. DOI: 10.1002/dvg.23299

      Cristiana M Pineda, Sangbum Park, Kailin R Mesa, Markus Wolfel, David G Gonzalez, Ann M Haberman, Panteleimon Rompolas, Valentina Greco. Intravital imaging of hair follicle regeneration in the mouse. Nature Protocols 2015 July. DOI: 10.1038/nprot.2015.070

    1. Author Response

      Reviewer #1 (Public Review):

      Reviewer 1 confirmed the view that your paper provides new insight into YTHDC1 function in regulating SC activation/proliferation but added that some of the data could be improved to fully support the conclusions. Specifically:

      The title "Nuclear m6A Reader YTHDC1 Promotes Muscle Stem Cell Activation/Proliferation by Regulating mRNA Splicing and Nuclear Export" seems a bit overstated. Their data are not sufficient to show YTHDC1 regulating nuclear export. From figure 6 we could see some mRNAs export was inhibited upon YTHDC1 loss but intron retention also occurs on these mRNAs, for example, Dnajc14. Since intron retention could lead to mRNA nuclear retention, the mRNA export inhibition may be caused by splicing deficiency. From the data they provided we could not draw the conclusion that YTHDC1 directly affects mRNA export. I think they could not emphasize this point in the title.

      Thanks for the suggestion. It is true that in our initial submission, we had more data to support YTHDC1 regulation of mRNA splicing but not enough on nuclear export. It will take substantial amount of time and efforts to have thorough dissection on both mechanisms. Nevertheless, we argue that our data does provide evidence on YTHDC1 regulation of nuclear export. For example, in Figures 6 C, H, and M, only ~20% of the target mRNAs (such as Dnaj14) showed alteration in both splicing and export upon YTHDC1 loss while the majority of the export targets showed no splicing deficiency. For example, Btbd7 and Tiparp in Figure 6 N showed no intron retention. In addition, we have now performed Co-IP experiments to validate the interaction between YTHDC1 and THOC7 (new result added in Figure 7L), which provides extra evidence to support YTHDC1 function in regulating mRNA nuclear export. We thus would like to keep the original title in order to reflect the multifaceted function of YTHDC1 in muscle stem cells.

      The mechanism of YTHDC1 promoting muscle stem cell activation/proliferation is not solidified. The authors could strengthen their evidence through bioinformatics analysis or give more discussion. Besides, the previous work done by Zhao and colleagues (Zhao et al,. Nature 542, 475-478 (2017).) reported another m6A reader Ythdf2 promotes m6A-dependent maternal mRNA clearance to facilitate zebrafish maternal-to-zygotic transition. Does YTHDC1 regulate mRNA clearance during SC activation/proliferation? The authors should explore this possibility by deep-seq data analysis and give some discussion.

      Thanks for the critical comment. For the first concern, we think YTHDC1 promotes muscle stem cell activation/proliferation through the multi-level gene regulatory capabilities of YTHDC1 on both transcriptional and post-transcriptional processes and the myriads of targets regulated by YTHDC1. In addition, with the newly added data, we believe that YTHDC1’s function is largely dependent on its synergism with hnRNPG (Figure 7 K). We have added the discussion in lines 421-427 of the revised text. For the second question, our data showed that YTHDC1 predominantly localizes in the nucleus of SCs and myoblasts (Figure 1 F&G), thus it may not have a role in regulating mRNA clearance in the cytoplasm like YTHDF2. Nevertheless, there are a few existing reports1, 2 suggesting its possible role in mRNA degradation and stability which may arise from its transient shuttling to cytoplasm of cells. We have now added this point in lines 469-472 of the revised text.

      Reviewer #2 (Public Review):

      Reviewer 2 was similarly positive stating that several tour-de-force techniques were used to examine m6A and the biological consequence in satellite cells and that there was a large amount of data supporting the conclusions with only a few minor weaknesses.

      General points: The main body is lengthy, and some content can be reduced or condensed. For example, RNA-seq was used to determine gene expression in WT and cKO cells, but the purpose of this is not well justified given that YTHDC1 mainly functions to regulate splicing and nuclear expert of mRNA rather than controlling their expression levels. Does the RNA-seq data suggest that YTHDC1 may also regulate gene expression independent of m6A reader function?

      Thanks for the comment. We have now revised the entire text to condense the content. Nevertheless, we must point out that the purpose of the RNA-seq is to provide extra evidence for the proliferation defect of the YTHDC1 KO cells but not to search for the underlying mechanism. We have now revised in lines 159-160 to clarify this.

      Reference:

      1. Shima, H., Matsumoto, M., Ishigami, Y., Ebina, M., Muto, A., Sato, Y., Kumagai, S., Ochiai, K., Suzuki, T. & Igarashi, K. S-Adenosylmethionine Synthesis Is Regulated by Selective N(6)-Adenosine Methylation and mRNA Degradation Involving METTL16 and YTHDC1. Cell Rep 21, 3354-3363 (2017).
      2. Zhang, Z., Wang, Q., Zhao, X., Shao, L., Liu, G., Zheng, X., Xie, L., Zhang, Y., Sun, C. & Xu, R. YTHDC1 mitigates ischemic stroke by promoting Akt phosphorylation through destabilizing PTEN mRNA. Cell Death Dis 11, 977 (2020).
      3. He, P.C. & He, C. m(6) A RNA methylation: from mechanisms to therapeutic potential. EMBO J 40, e105977 (2021).
      4. Widagdo, J., Anggono, V. & Wong, J.J. The multifaceted effects of YTHDC1-mediated nuclear m(6)A recognition. Trends Genet 38, 325-332 (2022).
      5. Sheng, Y., Wei, J., Yu, F., Xu, H., Yu, C., Wu, Q., Liu, Y., Li, L., Cui, X.L., Gu, X., Shen, B., Li, W., Huang, Y., Bhaduri-Mcintosh, S., He, C. & Qian, Z. A Critical Role of Nuclear m6A Reader YTHDC1 in Leukemogenesis by Regulating MCM Complex-Mediated DNA Replication. Blood (2021).
      6. Cheng, Y., Xie, W., Pickering, B.F., Chu, K.L., Savino, A.M., Yang, X., Luo, H., Nguyen, D.T., Mo, S., Barin, E., Velleca, A., Rohwetter, T.M., Patel, D.J., Jaffrey, S.R. & Kharas, M.G. N(6)-Methyladenosine on mRNA facilitates a phase-separated nuclear body that suppresses myeloid leukemic differentiation. Cancer Cell 39, 958-972 e958 (2021).
      7. Chen, C., Liu, W., Guo, J., Liu, Y., Liu, X., Liu, J., Dou, X., Le, R., Huang, Y., Li, C., Yang, L., Kou, X., Zhao, Y., Wu, Y., Chen, J., Wang, H., Shen, B., Gao, Y. & Gao, S. Nuclear m(6)A reader YTHDC1 regulates the scaffold function of LINE1 RNA in mouse ESCs and early embryos. Protein Cell 12, 455-474 (2021).
      8. Xiao, W., Adhikari, S., Dahal, U., Chen, Y.S., Hao, Y.J., Sun, B.F., Sun, H.Y., Li, A., Ping, X.L., Lai, W.Y., Wang, X., Ma, H.L., Huang, C.M., Yang, Y., Huang, N., Jiang, G.B., Wang, H.L., Zhou, Q., Wang, X.J., Zhao, Y.L. & Yang, Y.G. Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing. Mol Cell 61, 507-519 (2016).
      9. Webster, M.T., Manor, U., Lippincott-Schwartz, J. & Fan, C.M. Intravital Imaging Reveals Ghost Fibers as Architectural Units Guiding Myogenic Progenitors during Regeneration. Cell Stem Cell 18, 243-252 (2016).
      10. Yankova, E., Blackaby, W., Albertella, M., Rak, J., De Braekeleer, E., Tsagkogeorga, G., Pilka, E.S., Aspris, D., Leggate, D., Hendrick, A.G., Webster, N.A., Andrews, B., Fosbeary, R., Guest, P., Irigoyen, N., Eleftheriou, M., Gozdecka, M., Dias, J.M.L., Bannister, A.J., Vick, B., Jeremias, I., Vassiliou, G.S., Rausch, O., Tzelepis, K. & Kouzarides, T. Small-molecule inhibition of METTL3 as a strategy against myeloid leukaemia. Nature 593, 597-601 (2021).
      11. Otto, A., Schmidt, C., Luke, G., Allen, S., Valasek, P., Muntoni, F., Lawrence-Watt, D. & Patel, K. Canonical Wnt signalling induces satellite-cell proliferation during adult skeletal muscle regeneration. J Cell Sci 121, 2939-2950 (2008).
      12. Liu, J., Gao, M., He, J., Wu, K., Lin, S., Jin, L., Chen, Y., Liu, H., Shi, J., Wang, X., Chang, L., Lin, Y., Zhao, Y.L., Zhang, X., Zhang, M., Luo, G.Z., Wu, G., Pei, D., Wang, J., Bao, X. & Chen, J. The RNA m(6)A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322-326 (2021).
      13. Xu, W., Li, J., He, C., Wen, J., Ma, H., Rong, B., Diao, J., Wang, L., Wang, J., Wu, F., Tan, L., Shi, Y.G., Shi, Y. & Shen, H. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317-321 (2021).
      14. Roberson, P.A., Romero, M.A., Osburn, S.C., Mumford, P.W., Vann, C.G., Fox, C.D., McCullough, D.J., Brown, M.D. & Roberts, M.D. Skeletal muscle LINE-1 ORF1 mRNA is higher in older humans but decreases with endurance exercise and is negatively associated with higher physical activity. J Appl Physiol (1985) 127, 895-904 (2019).
      15. Mumford, P.W., Romero, M.A., Osburn, S.C., Roberson, P.A., Vann, C.G., Mobley, C.B., Brown, M.D., Kavazis, A.N., Young, K.C. & Roberts, M.D. Skeletal muscle LINE-1 retrotransposon activity is upregulated in older versus younger rats. Am J Physiol Regul Integr Comp Physiol 317, R397-R406 (2019).
    1. Author Response

      Reviewer #1 (Public Review):

      Laurent et al. generate genotyping data from 259 individuals from Cabo Verde to investigate the histories and patterns of admixture in the set of islands that make up Cabo Verde. The authors had previously studied admixture in an earlier study but in a smaller set of individuals from two cities on one island (from Santiago) in Cabo Verde. Here, the authors sample from all the islands of Cabo Verde to study admixture in these islands and reveal that there is a varied picture of admixture in that the demographic histories are distinct amongst this set of islands.

      I found the article interesting and clearly written, and I like that it highlights that admixture is a dynamic process that has manifested differently in distinct geographical regions, which will be of broad interest. It also highlights how genetic ancestry patterns are correlated with the populations that were in power/enslaved during colonial times and proposes that certain social practices (e.g. legally enforced segregation) might have affected the distribution/length of runs of homozygosity.

      We thank the reviewer for this positive and encouraging appreciation of our work.

      My main suggestion is that the authors provide a set of hypotheses regarding admixture that may explain their observations, and it would be nice to see if at least one of these has some support using simulations. Could the authors run simulations under their proposed demographic model for populations in Cabo Verde vs what we would expect in a pseudo-panmictic population with two sources of admixture? The authors probably already have simulations they could use. And then see how pre/post admixture founding events change patterns of ancestry.

      As suggested by the reviewer, in the revised version of the manuscript, we conducted the same MetHis-ABC scenario-choice and posterior parameter inference considering the 225 Cabo Verde-born individuals as a single random-mating population, in addition to our main results considering each island of birth separately. Most interestingly, we find that our ABC inferences fail to accurately reconstruct the detailed admixture history of Cabo Verde when considered as a whole instead of per each island of birth separately. This is due to admixture histories substantially differing across islands of birth of individuals, also consistent with the significantly differentiated genetic patterns within Cabo Verde obtained from ADMIXTURE, local-ancestry inferences, ROH, and isolation-by-distance analyses. These results are now implemented throughout the revised version of the manuscript and in supplementary figures and tables. See in particular Results L758-769, and Appendix1-figures and tables, Figure7-figure supplement 1-3, and Appendix 5-table 10.

      Reviewer #2 (Public Review):

      In this article, the authors leveraged patterns on the empirical genomic data and the power of simulations and statistical inferences and aimed to address a few biologically and culturally relevant questions about Cabo Verde population's admixture history during the TAST era. Specifically, the authors provided evidence on which specific African and European populations contributed to the population per island if the genetic admixture history parallels language evolution, and the best-fitting admixture scenario that answers questions on when and which continental populations admixed on which island, and how that influenced the island population dynamics since then.

      Strengths

      1) This study sets a great example of studying population history through the lens of genetics and linguistics, jointly. Historically most of the genetic studies of population history either ignored the sociocultural aspects of the evidence or poorly (or wrongly) correlated that with genetic inference. This study identified components in language that are informative about cultural mixture (strictly African-origin words versus shared European-African words), and carefully examined the statistical correlation between genetic and linguistic variation that occurred through admixture, providing a complete picture of genetic and sociocultural transformation in the Cabo Verde islands during TAST.

      We thank the reviewer for this very enthusiastic and encouraging comment on our work.

      2) The statistical analyses are carefully designed and rigorously done. I especially appreciate the careful goodness-of-fit checking and parameter error rates estimation in the ABC part, making the inference results more convincing.

      Again, we thank the reviewer for this positive comment.

      Weaknesses

      1) Most of the methods in the main analyses here were previously developed (eg. MDS, MetHis, RF/NN-ABC). However, when being introduced and applied here, the authors didn't reinstate the necessary background (strength and weakness, limitations and usage) of these methods to make them justifiable over other methods. For example, why ADS-MDS is used here to examine the genetic relationship between Cabo Verde populations and other worldwide populations, rather than classic PCA and F-statistics?

      As mentioned in the answer to the general comments, we extensively modified our manuscript in both Results and Material and Methods, to clarify and justify our reasoning for each one of the analyses conducted, and to discuss pros and cons of the methods used. We warmly thank the reviewers for this request, as we believe it allowed us to strongly improve the accessibility of our work in particular for the less specialized audience, as well as equally crucially improve replicability of our work for specialists. See in particular Results L185-193, L245-250, L368-371, L380-386, L495-511, L567-571, L606-621, and the corresponding Material and Methods sections.

      For the particular example of PCA raised by the reviewer: see Results L185-193.

      For that of F-statistics, see Results L368-386. Note that we added the F-stat analysis suggested by the reviewer to the revised version of our manuscript (see detailed answers below), Figure 3-figure supplement 2.

      We believe that these changes strongly strengthen our manuscript and enlarged its potential readership, and we thank, again, the reviewer for this request.

      2) The senior author of this paper has an earlier published article (Verdu et al. 2017 Current Biology) on the same population, using a similar set of methods and drew similar conclusions on the source of genetic and linguistic variation in Cabo Verde. Although additional samples on island levels are added here and additional analyses on admixture history were performed, half of the main messages from this paper don't seem to provide new knowledge than what we already learned from the 2017 paper.

      We substantially modified the text of the revised version of the manuscript to address the concern raised by the reviewer in numerous locations of the Abstract, Introduction and Results and Discussion sections, thus hoping to highlight better what we think is the profound novelty brought by this study. In particular, see Introduction L128-153.

      3) Furthermore, there are a few essential factors that could confound different aspects of the major analyses in this article that I believe should be taken into account and discussed. Such factors include the demographic history of source populations prior to admixture, different scenarios of the recipient population size changes, differences in recombination rates across the genome and between African and European populations, etc.

      We thank the reviewer for these comments which allowed us to improve the clarity of our manuscript and rise very interesting discussion points that we had overlooked. As indicated in part in the general answer to reviewers above:

      1) We clarified our methods’ design and discussed extensively its limitations with respect to ancestral populations’ sizes mis-specifications. Indeed, ancestral source population sizes are not modelized in our MetHis-ABC approach. Instead, we consider that the observed proxy source populations from Africa and Europe are at the drift-mutation equilibrium and are large since the initial and recent founding of Cabo Verde in the 1460’s, and thus use observed genetic variation patterns in these populations to build virtual gamete reservoirs for the admixture history of Cabo Verde with the MetHis-ABC framework. Therefore, while we cannot evaluate explicitly the influence of ancestral source population sizes differences on our inferences in Cabo Verde, as we now state in the revised version of our manuscript: “we nevertheless implicitly take the real demographic histories of these source populations into account in our simulations, as we use observed genetic patterns themselves the product of this demographic history to create the virtual source populations at the root of the admixture history of each Cabo Verdean island.”. We then discuss the outcome of such an approach which mimics satisfactorily the real data for ABC inference. See in particular the revised versions of the Material and Methods L1454-1491 novel section “Simulating the admixed population from source-populations for 60,000 independent SNPs with MetHis”, and Results L637-649.

      2) Concerning the possibilities for population-size changes in the admixed population in our simulations and ABC inferences, we clarified our Material and Methods and explanations of our Results to better show that we readily consider various possible scenarios (for each island separately). Indeed, with our MetHis simulation design, given values of model-parameters correspond either to a constant, a linearly increasing, or a hyperbolic increase in reproductive size in the admixed population over time. We further clarified our Results and Discussion pointing out that we find, a posteriori, indeed, different demographic regimes among islands.

      Nevertheless, reviewers are right that we did not test the possibility for bottlenecks. We thus substantially expanded the Results and Discussion sections in multiple locations to highlight this limitation and the challenges involved in overcoming it in future work. See in particular Material and Methods L1386-1404 section “Hyperbolic increase, linear increase, or constant reproductive population size in the admixed population”, Results L739-742, and Discussion L934-941, and Perspectives.

      3) Finally, concerning recombination rate, we considered only independent SNPs in our simulation and inference process, as is now clarified in multiple locations throughout the text. Otherwise, we further discuss matters of recombination concern regarding specifically our ROH analyses, as suggested in the detailed reviewer’s comments. In brief, we note that in Figure 8 Pemberton 2012 (AJHG 91:275-292) shows that occurrence of long ROH at the same genomic location across individuals is correlated with low recombination rates, although the effect is relatively weak unless in extreme recombination cold spots. Unless there were many extreme recombination cold spots that were different among the islands or ancestral populations, we anticipate fine-scale recombination rate differences not to matter very much for total ROH levels in these data. Similarly, we do not expect large genome-wide differences in mutation rate, and therefore we don’t anticipate minor local variation in mutation rates to make a systematic difference in total ROH levels. We now refer to these important points in the revised version of our Results L414-415.

      Overall, the paper is of interest to the field of human evolutionary genetics - that not only does it tell the story of a historically important population, but also the methodology behind this paper sets a great example for future research to study genetic and sociocultural transformations under the same framework.

      We would like to thank the reviewer for this very encouraging conclusion and for the detailed revision of our work which, we believe, helped us to substantially improve our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The heat shock effect in the drosophila lines was not understood in the study. Why did some lines show phenotypes only at 29C but not 22C? The study showed data that ubiquilin 2 expression was not impacted by 29C, then what caused the phenotypic differences? In addition, the method section did not describe clearly whether a temperature sensitive promoter was used in the flies.

      The heat inducibility of the UBQLN2 transgenes is likely attributed to heat shock elements in the UAS promoter as noted in on page 6, line 4-14. The heat inducibility of dUbqln is interesting and may reflect transcriptional and/or posttranscriptional mechanisms. While it is possible that increased UBQLN2 contributes to the severe phenotypes in UBQLN24XALS flies reared at 29C; this is not seen for UBQLN2WT and UBQLN2P497H flies. Instead, we postulate that heat stress synergizes with the misfolded UBQLN24XALS protein to disrupt proteostasis and/or endolysosomal function. This clarification has been added to paragraph 2 of the Discussion (page 16, line 15-25) section of the revised MS: “The reason for enhanced toxicity of UBQLN24XALS is unclear; however, its enhanced aggregation potential may overwhelm cellular proteostasis machinery and/or accelerate disease mechanisms that are slow to manifest in neurons harboring ALS point mutations. This is consistent with the fact that UBQLN24XALS toxicity in flies was unmasked by HS, which is a well-known inducer of proteotoxicity.” We have also explicitly state the HS inducibility of the UAS-Gal4 in the revised Materials and methods (page 20, line 24-25).

      2) The study showed data on male and female flies separately in some but not all experiments. In addition, the manuscript largely avoided discussing whether there was a sex difference in those experiments.

      We showed separate male and female eye phenotypes in Figure 1 to clearly demonstrate that UBQLN24XALS toxicity is not sex dependent. Subtle sex differences were seen in the longevity and climbing assays and were reported in figures 4A and 4D. In Figure 4D, Unc-5 silencing extended the lifespan of Elav>Gal4 female control flies but not Elav>Gal4 male control flies. In Figure 4A, an Unc-5 KK RNAi line rescued climbing of D42>UBQLN24XALS male flies, but not female flies (a second Unc-5 RNAi line rescued both males and females). The reasons for sex differences in these specific experiments is unclear.

      3) Some data appear to be peripheral with no significant contribution to the main findings. Moreover, some data were introduced but were not explained. For instance, the RNA-Seq analysis (Fig 2) did not contribute much to the study. The rescue effect of UBA* (F594A mutant) in Fig 1-Supplemental 1B was interesting but was not elaborated or followed up. FUS flies in Fig 6-Supplement 2 were abrupted introduced with little discussion.

      We understand the reviewer’s point or the reviewer’s point is well taken. Appreciating the reviewer’s comment, we moved both figures to the supplementary data.

      RNA-Seq (Fig. 2)

      Although not essential, the RNA-Seq adds experimental rigor to the study by providing strong molecular correlates to eye degeneration phenotypes across different UBQLN2 genotypes. It shows the unique toxicity of UBQLN24XALS and reinforces phenotypic similarity between UBQLN2WT and UBQLN2P497H flies, which likely reflects non-specific toxicity of overexpressed UBQLN2 proteins. We have carried out additional data analyses requested by the reviewer and moved the RNA-Seq data to Figure 1-figure supplement 2.

      UBA mutant (Figure1-figure supplement 1)

      Both aggregation and toxicity of UBQLN24XALS were abolished by an inactivating F594A mutation in the UBA domain. While this implicates Ub binding in the biochemical mechanism of UBQLN2 toxicity, we have not followed up on the finding in either fly or iMN models and have chosen to remove the data (Figure1-figure supplement 1) from the revised MS.

      Lack of genetic interaction between FUS and Unc-5 (Figure 3-figure supplement 1).

      This data was included to show that shUnc-5 is not a general suppressor of eye toxicity in Drosophila. This contrasts with lilliputian, whose mutation rescues toxicity phenotypes elicited by FUS, TDP-43, and UBQLN2. We believe that the FUS control data enhances experimental rigor and have retained the data in the revised MS, with some additional clarification on page 10, line 5-8.

      4) The main quadrupole (4XALS) mutation used in the study was not found in patients. The relevance of the findings needs to be thoroughly justified.

      The use of combinatorial mutants—either in the same gene or same pathway—can sometimes be used to enhance neurodegenerative phenotypes in cellular and rodent models for neurodegenerative diseases, most notably, Alzheimer’s Disease. In the case of the 4XALS mutant, we reasoned that its enhanced aggregation might drive stronger phenotypes than those elicited by UBQLN2 clinical alleles, whose toxicity is barely discernible in flies (relative to overexpressed UBQLN2WT) or in iMNs. We have clarified the rationale for testing the 4XALS mutant and articulated its potential strengths and weaknesses in Results (page 5, line 14-page 6, line 2) and Discussion (page 16, line 15-25) sections.

      5) ALS and FTD are age-related neurodegenerative diseases, whereas the involvement of axon guidance genes in indicative of disruptions during the developmental stage. The manuscript did not discuss this potential caveat.

      We have inserted the following sentence in the discussion to note this caveat: “Consistent with this notion, UNC5B has been linked to neurodegeneration in the 6-OHDA model of Parkinson’s Disease (PD) and UNC5C has been nominated as a risk allele in late-onset Alzheimer’s Disease. Defining the contributions of pathologic UNC5 signaling to the development or progression of ALS-dementia awaits further study.” on Page 20, line 2-6. We have added a similar sentence to the Limitations paragraph at the end of the Discussion: “Third, it is possible that axon guidance genes are only relevant to UBQLN2 toxicity in the context of the developing nervous system”.

    1. Author Response

      Reviewer #1 (Public Review):

      This work describes a new method, Proteinfer, which uses dilated neural networks to predict protein function, using EC terms and GO terms. The software is fast and the server-side performance is fast and reliable. The method is very clearly described. However, it is hard to judge the accuracy of this method based on the current manuscript, and some more work is needed to do so.

      I would like to address the following statement by the authors: (p3, left column): "We focus on Swiss Prot to ensure that our models learn from human-curated labels, rather than labels generated by electronic annotation".

      There is a subtle but important point to be made here: while SwissProt (SP) entries are human-curated, they might still have their function annotated ("labeled") electronically only. The SP entry comprises the sequence, source organism, paper(s) (if any), annotations, cross-references, etc. A validated entry does not mean that the annotation was necessarily validated manually: but rather that there is a paper backing the veracity of the sequence itself, and that it is not an automatic generation from a genome project.

      Example: 009L_FRG3G is a reviewed entry, and has four function annotations, all generated by BLAST, with an IEA (inferred by electronic annotation) evidence code. Most GO annotations in SwissProt are generated that way: a reviewed Swissprot entry, unlike what the authors imply, does not guarantee that the function annotation was made by non-electronic means. If the authors would like to use non-electronic annotations for functional labels, they should use those that are annotated with the GO experimental evidence codes (or, at the very least, not exclusively annotated with IEA). Therefore, most of the annotations in the authors' gold standard protein annotations are simply generated by BLAST and not reviewed by a person. Essentially the authors are comparing predictions with predictions, or at least not taking care not to do so. This is an important point that the authors need to address since there is no apparent gold standard they are using.

      The above statement is relevant to GO. But since EC is mapped 1:1 to GO molecular function ontology (as a subset, there are many terms in GO MFO that are not enzymes of course), the authors can easily apply this to EC-based entries as well.

      This may explain why, in Figure S8(b), BLAST retains such a high and even plateau of the precision-recall curve: BLAST hits are used throughout as gold-standard, and therefore BLAST performs so well. This is in contrast, say to CAFA assessments which use as a gold standard only those proteins which have experimental GO evidence codes, and therefore BLAST performs much poorer upon assessment.

      We thank the reviewer for this point. We regret if we gave the impression that our training data derives exclusively, or even primarily, from direct experiments on the amino acid sequences in question. We had attempted to address this point in the discussion with this section:

      "On the other hand, many entries come from experts applying existing computational methods, including BLAST and HMM-based approaches, to identify protein function. Therefore, the data may be enriched for sequences with functions that are easily ascribable using these techniques which could limit the ability to estimate the added value of using an alternative alignment-free tool. An idealised dataset would involved training only on those sequences that have themselves been experimentally characterized, but at present too little data exists than would be needed for a fully supervised deep-learning approach."

      We have now added a sentence in the early sentence of of the manuscript reinforcing this point:

      "Despite its curated nature, SwissProt contains many proteins annotated only on the basis of electronic tools."

      We have also removed the phrase "rather than labels generated by a computational annotation pipeline" because we acknowledge that this could be read to imply that computational approaches are not used at all for SwissProt which would not be correct.

      While we agree that SwissProt contains many entries inferred via electronic means, we nevertheless think its curated nature makes an important difference. Curators as far as possible reconcile all known data for a protein, often looking for the presence of key residues in the active sites. There are proteins where electronic annotation would suggest functions in direct contradiction to experimental data, which are avoided due to this curation process. As one example, UniProt entry Q76NQ1 contains a rhomboid-like domain typically found in rhomboid proteases (IPR022764) and therefore inputting it into InterProScan results in a prediction of peptidase activity (GO:0004252). However this is in fact an inactive protein, as discovered by experiment, and so is not annotated with this activity in SwissProt. ProteInfer successfully avoids predicting peptidase activity as a result of this curated training data. (For transparency, ProteInfer is by no means perfect on this point: there are also cases in which UniProt curators have annotated single proteins as inactive but ProteInfer has not learnt this relationship, due to similar sequences which remain active).

      We had also attempted to address this point by comparing with phenotypes seen in a specific high-throughput experimental assay ("Comparison to experimental data" section).

      We have now added a new analysis in which we assess the recall of GO terms while excluding IEA annotation codes. We find that at the threshold that maximises F1 score in the full analysis, our approach is able to recall 60-75% (depending on ontology) of annotations. Inferring precision is challenging due to the fact that only a very small proportion of the possible function*gene combinations have in fact been tested, making it difficult to distinguish a true negative from a false negative.

      "We also tested how well our trained model was able to recall the subset of GO term annotations which are not associated with the "inferred from electronic annotation" (IEA) evidence code, indicating either experimental work or more intensely-curated evidence. We found that at the threshold that maximised F1 score for overall prediction, 75% of molecular function annotations could be successfully recalled, 61% of cellular component annotations, and 60% of biological process annotations."

      Pooling GO DAGs together: It is unclear how the authors generate performance data over GO as a whole. GO is really 3 disjoint DAGs (molecular function ontology or MFO, Biological Process or BPO, Cellular component or CCO). Any assessment of performance should be over each DAG separately, to make biological sense. Pooling together the three GO DAGs which describe completely different aspects of the function is not informative. Interestingly enough, in the browser applications, the GO DAG results are distinctly separated into the respective DAGs.

      Thank you for this suggestion. To answer the question of how we were previously generating performance data: this was simply by treating all terms equivalently, regardless of their ontology.

      We agree that it would be helpful to the reader to split out results by ontology type, especially given clear differences in performance.

      We now provide PR-curve graphs split by ontology type.

      We have also added the following text:

      "The same trends for the relative performance of different approaches were seen for each of the direct-acyclic graphs that make up the GO ontology (biological process, cellular component and molecular function), but there were substantial differences in absolute performance (Fig S10). Performance was highest for molecular function (max F1: 0.94), followed by biological process (max F1:0.86) and then cellular component (max F1:0.84)."

      Figure 3 and lack of baseline methods: the text refers to Figures 3A and 3B, but I could only see one figure with no panels. Is there an error here? It is not possible at this point to talk about the results in this figure as described. It looks like Figure 3A is missing, with Fmax scores. In any case, Figure 3(b?) has precision-recall curves showing the performance of predictions is the highest on Isomerases and lowest in hydrolases. It is hard to tell the Fmax values, but they seem reasonably high. However, there is no comparison with a baseline method such as BLAST or Naive, and those should be inserted. It is important to compare Proteinfer with these baseline methods to answer the following questions: (1) Does Proteinfer perform better than the go-to method of choice for most biologists? (2) does it perform better than what is expected given the frequency of these terms in the dataset? For an explanation of the Naive method which answers the latter question, see: ( https://www.nature.com/articles/nmeth.2340 )

      We apologise for the errors in figure referencing in the text here. This emerged in part from the two versions of text required to support an interactive and legacy PDF version. We had provided baseline comparisons with BLAST in Fig. 5 of the interactive version (correctly referenced in the interactive version) and in Fig. S7 of the PDF version (incorrectly referenced as Fig 3B).

      We have now moved the key panel of Fig S7 to the main-text of the PDF version (new Fig 3B), as suggested also by the editor, and updated the figure referencing appropriately. We have also added a Naive frequency-count based baseline. This baseline would not appear in Fig 3B due to axis truncation, but is shown in a supplemental figure, new Fig S9. We thank the reviewer and the editor for raising these points.

      Reviewer #2 (Public Review):

      In this paper, Sanderson et al. describe a convolutional neural network that predicts protein domains directly from amino acid sequences. They train this model with manually curated sequences from the Swiss-Prot database to predict Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. This paper builds on previous work by this group, where they trained a separate neural network to recognize each known protein domain. Here, they train one convolutional neural network to identify enzymatic functions or GO terms. They discuss how this change can deal with protein domains that frequently co-occur and more efficiently handle proteins of different lengths. The tool, ProteInfer, adds a useful new tool for computational analysis of proteins that complements existing methods like BLAST and Pfam.

      The authors make three claims:

      1) "ProteInfer models reproduce curator decisions for a variety of functional properties across sequences distant from the training data"

      This claim is well supported by the data presented in the paper. The authors compare the precision-recall curves of four model variations. The authors focus their training on the maximum F1 statistic of the precision-recall curve. Using precision-recall curves is appropriate for this kind of problem.

      2) "Attribution analysis shows that the predictions are driven by relevant regions of each protein sequence".

      This claim is very well supported by the data and particularly well illustrated by Figure 4. The examples on the interactive website are also very nice. This section is a substantial innovation of this method. It shows the value of scanning for multiple functions at the same time and the value of being able to scan proteins of any length.

      3) "ProteInfer models create a generalised mapping between sequence space and the space of protein functions, which is useful for tasks other than those for which the models were trained."

      This claim is also well supported. The print version of the figure is really clear, and the interactive version is even better. It is a clever use of UMAP representations to look at the abstract last layer of the network. It was very nice how each sub-functional class clustered.

      The interactive website was very easy to use with a good user interface. I expect will be accessible to experimental and computational biologists.

      The manuscript has many strengths. The main text is clearly written, with high-level descriptions of the modeling. I initially printed and read the static PDF version of the paper. The interactive form is much more fun to read because of the ability to analyze my favorite proteins and zoom in on their figures (e.g. Figure 8). The new Figure 1 motivates the work nicely. The website has an excellent interactive graphic showing how the number of layers in the network and the kernel size change how data is pooled across residues. I will use this tool in my teaching.

      We are grateful for these comments. We are excited that the reviewer hopes to use this figure for teaching, which is exactly the sort of impact we hoped for this interactive manuscript. We agree that the interactive manuscript is by far the most compelling version of this work.

      The manuscript has only minor weaknesses. It was not clear if the interactive model on the website was the Single CNN model or the Ensemble CNN model.

      We thank the reviewer for pointing out the ambiguity here. The model shown on the website is a Single CNN model, and is chosen with hyperparameters that achieve good performance whilst being readily downloadable to the user's machine for this demonstration without use of excessive bandwidth. We have added additional sentences to address this better in the manuscript.

      " When the user loads the tool, lightweight EC (5MB) and GO model (7MB) prediction models are downloaded and all predictions are then performed locally, with query sequences never leaving the user's computer. We selected the hyperparameters for these lightweight models by performing a tuning study in which we filtered results by the size of the model's parameters and then selected the best performing models. This approach uses a single neural network, rather than an ensemble. Inference in the browser for a 1500 amino-acid sequence takes < 1.5 seconds for both models "

      Overall, ProteInfer will be a very useful resource for a broad user base. The analysis of the 171 new proteins in Figure 7 was particularly compelling and serves as a great example of the utility and power of ProteInfer. It completes leading tools in a very valuable way. I anticipate adding it to my standard analysis workflows. The data and code are publicly available.

      Reviewer #3 (Public Review):

      In this work, the authors employ a deep convolutional neural network approach to map protein sequence to function. The rationales are that (i) once trained, the neural network would offer fast predictions for new sequences, facilitating exploration and discovery without the need for extensive computational resources, (ii) that the embedding of protein sequences in a fixed-dimensional space would allow potential analyses and interpretation of sequence-function relationships across proteins, and (iii) predicting protein function in a way that is different from alignment-based approaches could lead to new insights or superior performance, at least in certain regimes, thereby complementing existing approaches. I believe the authors demonstrate i and iii convincingly, whereas ii was left open-ended.

      A strength of the work is showing that the trained CNNs perform generally on par with existing alignment based-methods such as BLASTp, with a precision-recall tradeoff that differs from BLASTp. Because the method is more precise at lower recall values, whereas BLASTp has higher recall at lower precision values, it is indeed a good complement to BLASTp, as demonstrated by the top performance of the ensemble approach containing both methods.

      Another strength of the work is its emphasis on usability and interpretability, as demonstrated in the graphical interface, use of class activation mapping for sub-sequence attribution, and the analysis of hierarchical functional clustering when projecting the high-dimensional embedding into UMAP projections.

      We thank the reviewer for highlighting these points.

      However, a main weakness is the premise that this approach is new. For example, the authors claim that existing deep learning "models cannot infer functional annotation for full-length protein sequences." However, as the proposed method is a straightforward deep neural network implementation, there have been other very similar approaches published for protein function prediction. For example, Cai, Wang, and Deng, Frontiers in Bioengineering and Biotechnology (2020), the latter also being a CNN approach. As such, it is difficult to assess how this approach differs from or builds on previous work.

      We agree that there has been a great deal of exciting work looking at the application of deep learning to protein sequences. Our core code has been publicly available on GitHub since April 2019 , and our preprint has now been available for more than a year. We regret the time taken to release a manuscript and for it to reach review: this was in part due to the SARS-CoV-2 pandemic, which the first author was heavily involved in the scientific response to. Nevertheless, we believe that our work has a number of important features that distinguish it from much other work in this space.

      ● We train across the entire GO ontology. In the paper referenced by the reviewer, training is with 491 BP terms, 321 MF terms, and 240 CC terms. In contrast, we train with a vocabulary of 32,102 GO labels, and the majority of these are predicted at least once in our test set. ● We use a dilated convolutional approach. In the referenced paper the network used is instead of fixed dimensions. Such an approach means there is an upper limit on how large a protein can be input into the model, and also means that this maximum length defines the computational resources used for every protein, including much smaller ones. In contrast, our dilated network scales to any size of protein, but when used with smaller input sequences it performs only the calculations needed for this size of sequence.

      ● We use class-activation mapping to determine regions of a protein responsible for predictions, and therefore potentially involved in specific functions.

      ● We provide a TensorFlow.JS implementation of our approach that allows lightweight models to be tested without any downloads

      ● We provide a command-line tool that provides easy access to full models.

      We have made some changes to bring out these points more clearly in the text:

      "Since natural protein sequences can vary in length by at least three orders of magnitude, this pooling is advantageous because it allows our model to accommodate sequences of arbitrary length without imposing restrictive modeling assumptions or computational burdens that scale with sequence length. In contrast, many previous approaches operate on fixed sequence lengths: these techniques are unable to make predictions for proteins larger than this sequence length, and use unnecessary resources when employed on smaller proteins."

      We have added a table that sets out the vocabulary sizes used in our work (5,134 for EC and 32,109 for GO):

      "Gene Ontology (GO) terms describe important protein functional properties, with 32,109 such terms in Swiss-Pr ot (Table S6) that cov er the molecular functions of proteins (e.g. DNA-binding, amylase activity), the biological processes they are involved in (e.g. DNA replication, meiosis), and the cellular components to which they localise (e.g. mitochondrion, cytosol)."

      A second weakness is that it was not clear what new insights the UMAP projections of the sequence embedding could offer. For example, the authors mention that "a generalized mapping between sequence space and the space of protein functions...is useful for tasks other than those for which the models were trained." However, such tasks were not explicitly explained. The hierarchical clustering of enzymatic proteins shown in Fig. 5 and the clustering of non-enzymatic proteins in Fig. 6 are consistent with the expectation of separability in the high-dimensional embedding space that would be necessary for good CNN performance (although the sub-groups are sometimes not well-separated. For example, only the second level and leaf level are well-separated in the enzyme classification UMAP hierarchy). Therefore, the value-added of the UMAP representation should be something like using these plots to gain insight into a family or sub-family of enzymes.

      We thank the reviewer for highlighting this point. There are two types of embedding which we discuss in the paper. The first is the high-dimensional representation of the protein that the neural network constructs as part of the prediction process. This is the embedding we feel is most useful for downstream applications, and we discuss a specific example of training the EC-number network to recognise membrane proteins (a property on which it was not trained): "To quantitatively measure whether these embeddings capture the function of non-enzyme proteins, we trained a simple random forest classification model that used these embeddings to predict whether a protein was annotated with the intrinsic component of membrane GO term. We trained on a small set of non-enzymes containing 518 membrane proteins, and evaluated on the rest of the examples. This simple model achieved a precision of 97% and recall of 60% for an F1 score of 0.74. Model training and data-labelling took around 15 seconds. This demonstrates the power of embeddings to simplify other studies with limited labeled data, as has been observed in recent work (43, 72)."

      As the reviewer points out, there is a second embedding created by compressing this high-dimensional down to two dimensions using UMAP. This embedding can also be useful for understanding the properties seen by the network, for example the GO term s highlighted in Fig. 7 , but in general it will contain less information than the higher-dimensional embedding.

      The clear presentation, ease of use, and computationally accessible downstream analytics of this work make it of broad utility to the field.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Kschonsak et al. describes the rational structure-based design of novel hybrid inhibitors targeting human Nav1.7 channel. CryoEM structure of arylsulfonamide (GNE-3565) - VSD4 NaV1.7-NaVPas channel complex confirmed binding pose observed in x-ray structure GX-936 - VSD4 Nav1.7-NavAb channel. Remarkably, cryoEM structure of acylsulfonamide (GDC-0310) - VSD4 NaV1.7-NaVPas channel complex revealed a novel binding pocket between the S3 and S4 helices, with the S3 segment adopting a distinct conformation compared to the arylsulfonamide (GNE-3565) - VSD4 NaV1.7-NaVPas channel complex. Creatively, the authors designed a novel class of hybrid inhibitors that simultaneously occupy both the aryl- and acylsulfonamide binding pockets. This study underscores the power of structure-guided drug design to target transmembrane proteins and will be useful to develop safer and more effective therapeutics.

      We thank this Reviewer for the very positive feedback and for highlighting the importance of our work in utilizing structure-based drug design to target key membrane targets.

      Reviewer #2 (Public Review):

      In this manuscript, the authors identify a critical unmet need for the (structure-based) drug design of human Nav channels, which are of clinical interest. They cleverly rationalized a hybrid strategy for developing target-specific small molecule inhibitors, which integrate binding mechanisms of two drug candidates that act orthogonally on the VSD4 of Nav 1.7. Thus, the authors illustrate a promising outlook on pharmaceutical intervention on Nav channels.

      Overall, the cryo-EM structures of the ligand-bound Nav channels are convincing, with a clear indication of the site-specific, distinct density of the small molecules. At the moment, it is difficult to tell how innovative the pipeline is compared to conventional cryo-EM structure determination.

      We thank this Reviewer for this positive comments and for the very helpful suggestions. We are addressing the concerns regarding our cryoEM pipeline.

      Reviewer #3 (Public Review):

      This is an excellent manuscript, describing a few lines of discoveries:

      1. Establishment of a structural biological pipeline for iterative structural determination of an engineered Nav1.7;

      2. Illumination of the novel compound binding mode;

      3. Structure-based development of the hybrid compounds, which led to the novel Nav1.7 inhibitor;

      The cryo-EM study on the engineered Nav1.7 consistently reveals the map at the mid to low 2 Å range, which is unprecedented and impressive, thus, demonstrating the high value of this workflow. The further strength of this study is that the authors were able to develop a new compound by combining structural information gained from the two Nav1.7 structures complexed to two different compounds with different binding modes. Overall, the depth and quality of this study are excellent.

      We thank this Reviewer for highlighting the importance of this manuscript and specifically recognizing our accomplishments in enabling iterative high-resolution structure for this target which allowed us to perform SBDD and design a new series of hybrid compounds. We are also grateful for indicating the excellence of our studies.