5,954 Matching Annotations
  1. May 2023
    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Mohebi et al. examines a critical open question regarding the interaction of cholinergic interneurons of the striatum and transmitter release from dopaminergic axons in behaving animals. Activation of cholinergic interneurons in the striatum can evoke dopamine release in brain slices and in vivo as measured with voltammetry. However, it remains an open question in what context and to what extent this acetylcholine-mediated dopamine occurs in behaving animals. Here, the authors argue that CIN activity triggers dopamine release in the nucleus accumbens which encodes the motivation to obtain a reward through increasing "ramps" of dopamine release. Their data suggest that the ramps are not reflected in the firing of dopaminergic neurons. Rather, they provide compelling evidence that the ramps of dopamine release correlate with ramps in cholinergic interneuron activity as measured with GCaMP6. What's more, the authors show that ACh-mediated dopamine release has no paired-pulse depression, a striking result that differs from all prior ex vivo brain slice data. The manuscript is extremely well written and the data are of very high quality. Overall, this study represents an important step forward in our understanding of how ACh-mediated dopamine release regulates behavior, and more broadly how axons can generate behaviors independently from somatic activity.

      Major comments

      1) The complete absence of any short-term plasticity in CIN-mediated dopamine release is a striking result that is important for the field. The authors should strengthen this result with additional quantitative analysis demonstrating the lack of STP. They have analyzed paired-pulse ratios, but they should analyze this for stimuli at the higher frequencies (4 Hz, etc) that are more physiologically relevant. For example, Fig 1e shows a CIN-evoked DA release at many optically-stimulated frequencies. The authors should quantify short-term plasticity by generating fits of the single stimulus signal and comparing the mathematical sum predicted from 4 stim DA signals at different frequencies to the recorded data. A similar analysis has been done with Ca signals (Koester and Sakmann, 2000).

      Thank you for this very helpful suggestion. We have performed this analysis as recommended, and now confirm the lack of STP even at the higher frequencies (see new Supplementary Figure 1).

      2) The authors show that optical activation of CINs results in DA release as measured by dLight. To clearly establish that these signals are generated by DA release driven by nicotinic receptors (and not a partial effect of some unknown artifact), it would be useful to show that the optical CIN-evoked dLight signals shown in Fig. 1 are inhibited by nicotinic receptor antagonists such as DHbE. This control experiment would significantly strengthen the result shown here.

      We agree that combining drug manipulations with photometry would be useful, but as noted above this is not a methodology in our current technical repertoire.

      3) Similarly, the authors show clear correlations between CIN activity and DA release during behavior. The authors should consider determining whether CINs play a causal role in triggering DA release during behavior. For example, does infusion of DHbE in the NAc prevent the light-mediated DA release during behavior? As an alternative hypothesis, some groups have been suggesting that CIN activity has almost no direct influence over DA. Therefore, testing whether a causal relationship exists between CINs and DA release would be an important experiment in addressing these two opposing viewpoints.

      As noted above we are not currently able to combine drug manipulations with photometry in behaving animals.

      4) The ramps that are described in this manuscript are an order of magnitude faster (increasing over 100s of milliseconds) than ramps described in other studies that occur over seconds. In fact, the two signals may be completely different functionally. Discussion of this topic would be helpful.

      Dopamine ramps have indeed been reported over multiple different time scales, and as discussed in Berke 2018, this seems to reflect the duration of the approach behavior. We think further discussion of this topic is better saved for another paper, especially as we are now actively studying ramping over longer time scales (Krausz et al. 2023).

      Reviewer #3 (Public Review):

      This report by Mohebi et al. provides new answers to old questions by showing that the activity of striatal cholinergic interneurons (CINs) escalates progressively during specific reward-related behaviors and that this correlates with previously observed ramps in dopamine (DA) release in the nucleus accumbens core. The report is strong and provides evidence for the authors' hypothesis that DA ramps are independent of DA neuron activity, but are instead the result of CIN activity and corresponding acetylcholine (ACh) release. The authors further demonstrate that the fidelity of CIN activation and consequent driving of DA release is even more robust in vivo than observed ex vivo slice preparations, which is fundamental for understanding the role of ACh-DA interactions in behavior. The findings complement the authors' previous evidence ventral tegmental area (VTA) DA neuron firing patterns do not show a ramping pattern; the previously reported VTA data are appropriately included here (in Fig. 3) to illustrate the absence of VTA firing during the time-locked increases in CIN activity and DA release. The present studies stop short of showing a direct link between CIN activity and DA release, however, which would require examining DA release during behavior in the presence of an antagonist of nicotinic ACh receptors. The authors also extend the understanding of the regulation of DA release by acetylcholine (ACh) by showing that optical activation of CINs in vivo promotes DA release responses that do not attenuate with repetitive stimulation. This contrasts with previous results in ex vivo striatal slices in which ACh-evoked DA release has been found to decline progressively from rundown and/or receptor desensitization. The authors propose that in vivo, AChE may be more effective in curtailing local ACh levels than in slices because of the slightly lower temperature typically used for slice studies, as well as the use of superfusion that might facilitate some AChE washout (AChE inhibitors are still effective in slices, of course). Overall, the report not only provides evidence for the cellular substrate for DA ramps but also shows the robustness of ACh-driven DA release in vivo. A few points to strengthen the report are listed below.

      1) The authors give a few details about how CINs were activated at the beginning of the results, but say only that DA dynamics were monitored using fiber photometry. Given that the methods are at the end, a brief summary should be given here to indicate whether this means direct monitoring of DA or indirect via GCaMP, for example. It would be helpful to note the sensor used in the abstract, as well. In this light, as it were, RdLight1 should be described upon the first mention.

      We have now clarified in both abstract and text that we are using the direct DA sensor RdLight1.

      2) The authors show that infusion of DHbE in the NAc likelihood of decisions to approach the center port, as did antagonism of DA receptors. This supports the authors' argument that ramping of CIN activity and consequent ACh release underlies observed ramps in DA release. However, to show a causal interaction requires testing whether the observed DA ramps are absent after DHbE infusion in the NAc, under the same conditions that attenuated behavior.

      As noted above we are not currently able to combine drug manipulations with photometry in behaving animals.

      3) In Fig. 3, the y-axis title for the upper panels should specify VTA, not simply "rate". This is stated in the legend, but should also be specified in the figure panel.

      We have updated the y-axis titles in this figure.

      4) A recent preprint in BioRxiv by AC Krok, NX Tritsch et al. shows a related correlation between ACh and DA release in vivo in a reward task, as well as differences in other conditions. This report shows also that cortical input to CINs indeed plays a role, as suggested in the concluding sections of the present report. Consideration of the data in the preprint in the context of the present results could be valuable for the field.

      We have also noted those pre-prints with interest, even though they investigated different brain regions using different approaches. There are established differences between CIN-DA interactions in dorsal vs. ventral striatum that we suspect are relevant here. But given the rapid pace of developments in this subfield, we prefer not to speculate too much at this point and instead review the overall body of work once it is published.

    1. Author Response

      Reviewer #1 (Public Review):

      “The abstract does not adequately summarize the content of the paper. There is no mention of stimulation, or bilateral connectivity, which is a large part of the paper. The names of all five species should appear in the abstract, not just X. laevis.”

      In the revised manuscript, we have included all the names of the species and types of stimuli used to elicit fictive vocalizations in the abstract. In regard to bilateral connectivity, we believe that the reviewer was referring to the rostral-caudal connections between the parabrachial nucleus and nucleus ambiguus, which are critical for fast, but not for slow trill production. We have added this piece of information in the abstract. Furthermore, we have clarified the bilateral nature of the two central pattern generators (CPGs) in male X. laevis. In our previous study (Yamaguchi et al., 2017), we demonstrated that transections of the two commissures (one at the parabrachial nuclei level and the other at the nucleus ambiguus level) did not eliminate fictive advertisement calls in male X. laevis brains, indicating the presence of fast and slow trill CPGs in left and right hemi-brains of male X. laevis. This information was originally included in the results section (“Unilateral transection desynchronizes the fast clicks, but not the slow clicks across species”). We have now added this information to the introduction section to provide a clearer description of the anatomical organization of the two CPGs (p5, ln10 – 14).

      “The conclusion that the "fast and slow CPGs identified in male X. laevis are conserved across species." is contradicted by the last paragraph, which states, "Fast trill-like CPGs are likely present only in fast clickers..." This inherent contradiction needs to be resolved.”

      To resolve this contradiction, we have revised sentences in the abstract to clarify our findings. Specifically, we now state that “We found that even though the courtship calls of different Xenopus species vary in their click rates and duration, the CPGs used to generate clicks are conserved across species. The fast CPGs found in male X. laevis, which critically rely on reciprocal connections between the parabrachial nucleus and the nucleus ambiguus, are conserved among species that produce fast clicks. Similarly, the slow CPGs found in the caudal brainstem of male X. laevis are shared among species that produce slow clicks” (p2, ln 10 – 15) By making this change, we hope to provide greater clarity regarding our findings and help to resolve any contradictions.

      “The testosterone results are over-emphasized.” “The conclusion that there is differential expression of testosterone receptors in the brain of each species is completely speculative and not supported by the data presented here.”

      We have extensively revised our manuscript to ensure a more accurate interpretation of the results regarding testosterone experiments. Revised conclusions are outlined below:

      Abstract: “In addition, our results suggest that testosterone plays a role in organizing fast CPGs in fast-click species, but it does not appear to have the same effect in slow-click species.” (p2, ln 15 – 17)

      Introduction: “Additionally, we found that fast trill-like CPGs are present only in species that produce fast clicks and their presence appears to be regulated by testosterone in these species. “ (p6, ln 2 – 4)

      Discussion: “However, this effect of testosterone appears to be limited to the fast clicker species. Male X. tropicalis, a slow clicker species, has been shown to have comparable plasma levels of testosterone to male X. laevis (mean plasma levels of testosterone of male X. laevis: 13 to 22ng/ml (Hecker et al., 2005; Hayes et al., 2010), male X. tropicalis: ~20ng/ml, (Olmstead et al., 2009)), yet the synapses between the PBN and laryngeal motoneurons in male X. tropicalis remained weak, and PBN showed little activity during fictive advertisement calls. These results suggest that testosterone acts differently on the central vocal pathways of fast and slow clickers, promoting the emergence of fast trill-like CPGs in X. laevis but not in X. tropicalis. Although further experiments with controlled testosterone levels are necessary to validate these results, we hypothesize that changes in the androgen receptors (e.g., expression patterns, ligand affinity) may have contributed to the divergence of fast and slow clickers.“ (p26, ln 13 – 24)

      ”The use of the word "development" implies embryology. Here, adults were treated and looked at 13 weeks later. There is no data presented about development. ”

      In our revised manuscript, we have replaced the term “development” with “presence” or “acquisition” of neural circuitry to enhance the clarity and help to prevent any potential misunderstandings.

    1. Author Response

      Reviewer #1 (Public Review):

      By the in vitro DNA damage response (DDR) assay with a defined DNA substrate using Xenopus extracts and in vitro binding assays with purified proteins, the authors nicely showed the role of APE1 (APEX1) in ATRIP recruitment for DDR activation, particularly a non-enzymatic (structural) role of APE1 in the binding to both ssDNAs and ATRIP. The results described in the paper are very convincing to support the authors' claim. However, these studies lack the quantification with proper statistics (and/or mentioning the reproducibility of the results). And, given the important discovery of APE1 in the DDR activation in vitro, it would be nice to demonstrate the role of APE1(APEX1) in ATR activation in vivo using siRNA-mediated knockdown of mammalian cells or yeast cells.

      Thanks for the suggestion. As shown in our response to the #2 Essential Revisions, we have addressed this question by additional experiment and added extra description in our revised manuscript showing that APE1 is important for the ATR DDR following oxidative stress in culture human cell U2OS cells (Figure 1-figure supplement 1B). In addition, we have performed at least three independent experiments and statistical analysis to support our claims.

      Reviewer #2 (Public Review):

      ATM and Rad3-related (ATR) interact with ATRIP and plays a central role in DNA damage response. Previous studies have established the idea that ATR is recruited to RPA-covered ssDNA via ATRIP-RPA interaction. In this paper, the authors propose a new RPA-independent mechanism for ATR recruitment.

      Thanks for the nice summary of our major findings from the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, the authors explore the mechanism of ATRIP recruitment to single-stranded DNA (ssDNA), which is important for ATR activation and the subsequent control of DNA repair and cell cycle progression. Using Xenopus egg extracts, in vitro interaction assays, and ssDNA constructs, the authors found that AP endonuclease 1 (APE1) plays a role in the recruitment of ATRIP to ssDNA independently of RPA. Moreover, APE1 domains are characterized for ssDNA, ATRIP, and RPA interaction, determining that the nuclease activities of APE1 are not required for this new mode of ATRIP recruitment. Overall, the work presented makes a compelling case for a novel role for APE1 in ATRIP recruitment that seems crucial for ATR activation (at least in the Xenopus system). The results are likely to have an important impact on our understanding of the determinants for activation of ATR signaling and cellular responses to DNA damage and replication stress. It remains unclear whether the findings will be extended to other organisms and be relevant for different types of DNA lesions. Also, there are several points of concern in the manuscript that require further clarification, especially regarding some of the quantitative analyses presented and the claimed importance of the RPA-independent mode of ATRIP recruitment for ATR activation.

      We thank the reviewer’s overall positive evaluation of our initial submission. We have included additional experimental data using mammalian cells showing the significance of APE1 in the ATR DDR, and also additional discussion of other studies in the literature. We also provided further clarifications or responses to the major/minor concerns (please see below detailed responses). In particular, we revised the proposed model of APE1 in ATRIP recruitment and ATR DDR (Please see revised Figure 5).

    1. Author Response

      Reviewer #1 (Public Review):

      The data on embryonic "ventral nerve cord" glia are generated from whole embryos, and even provided that the ventral nerve cord harbors 75% of all glia and thus the majority is ventral nerve cord, the data should not be called vnc-specific. The vnc-specific data set (adult CNS) that is already published (Allen et al., 2020) is strangely not even mentioned in the current manuscript. The idea of having a comprehensive description of glial transcriptional profiles is great - but I was missing the integration of the midline glial cells, which can be considered as ensheathing glial cells that - as the cortex glia - also express wrapper (Stork et al., 2009).

      • We agree with Reviewer 1 that the embryonic glia dataset represents all glia and not just VNC glia. We have amended the text accordingly.

      • We now cite the Allen et al., 2020. Apologies for this omission.

      • Midline Glia:

      The embryonic glial cells analysed in the previous version of our manuscript included only repo+ glia only and therefore did not include midline glia, which do not express repo (Jacobs, 2000). In the revised manuscript, we reanalysed the complete embryonic dataset and identified the midline glia based on known markers and in vivo validation (Figure 3 – figure supplement 1). We also provide a list of genes that show enriched expression in the midline glial cluster as a supplementary file (Source data file 1).

      We performed hierarchical cluster analysis on midline glia, all embryonic repo+ glial clusters and embryonic neuronal clusters to determine the relationship of midline glia to other glia. Interestingly, midline glia formed an outgroup to both neurons and repo+ glia (Figure 3 – figure supplement 1F), suggesting that they are quite distinct from other (repo+) glial classes. This is expected given their mesectodermal origin (Kosman et al., 1991; Thomas et al., 1988). Indeed, although midline glia express wrapper, otherwise known as a cortex glia marker (Banerjee et al., 2017; Noordermeer et al., 1998; Stork et al., 2009), they do not resemble cortex glia in form or function but instead ensheath commissural axons and play critical roles in axon guidance and VNC morphogenesis (Jacobs, 2000). Midline glia have been characterised extensively by several groups (Hartenstein, 2011; Hidalgo, 2003; Jacobs, 2000; Kearney et al., 2004; Vasenkova et al., 2006; Wheeler et al., 2006), therefore, given their distinct origin and the ambiguity surrounding their functional classification, we instead focused our analyses on repo+ glia in this manuscript.

      Unfortunately, I found most of what is reported in this work not to be entirely new. The classification of glial diversity in the adult brain was presented by the Meinerzhagen and Gaul labs (Edwards and Meinertzhagen, 2010; Edwards et al., 2012; Kremer et al., 2017). The description of two astrocyte-like cell types is a reduction of data that defined three morphologically distinct astrocyte-like cells (Peco et al., 2016), which is not discussed. Some other aspects were ignored, too. Two other morphological distinct types of ensheathing glia exist, ensheathing glia and ensheathing/wrapping or track-associated glia were described but this is not discussed (Kremer et al., 2017; Peco et al., 2016).

      We respectfully disagree with Reviewer 1’s assessment that much of the work presented in not new. This work represents the first Drosophila glial cell atlas with thorough validation of cluster marker expression in vivo. It is also the first systematic exploration of the relationship between glial morphology and transcriptional signature, a controversial topic in the field of glial biology. We fully agree that much of the adult glial morphology had been characterised previously by the Meinerzhagen and Gaul labs among many others and we acknowledge this explicitly in our manuscript and in references to Figures 2 (one out of a total of 9 main figures). Indeed, it is because Drosophila glial morphology has been so well characterised that a comprehensive exploration of the relationship between morphology and transcriptional signature was even feasible. Moreover, our revised manuscript also provides more in-depth morphological characterisation and quantification of glial morphology and defines subclasses and morphologies not described previously (e.g. channel perineurial glia and astrocyte morphologies of the lobula and lobula plate). Indeed, even the channel subperineurial glia, which were identified based on lineage relationships, nuclear position and molecular markers, were not described in morphological terms.

      The 3 distinct astrocyte populations defined in Peco et al., (2016) refer to cell body position and neuropil domains covered by astrocytes. We now include this categorisation in our quantification of astrocyte morphology (See response to (6) and Figure 1 – figure supplement 2) and discuss their relationship to the type 1 and type 2 astrocyte morphologies that we observed.

      As well we now include the ensheathing/wrapping or tract ensheathing glia as a morphological category of ensheathing glia in the manuscript (Figure 1A,N,O).

    1. Author Response

      Reviewer #1 (Public Review):

      This is a simulation study comparing the performance of two major approaches for dealing with “population structure” when carrying out Genome-wide Association Studies - Principal Component Analysis and Linear Mixed-effects Models - a subject of considerable practical importance. The author correctly notes that previous comparisons have been quite limited. In particular, any study not concluding that LMM was superior has relied on very simple models of structure.

      The paper is clearly written and beautifully reviews the theoretical underpinnings (albeit in a manner that will be difficult to penetrate without deep knowledge of several fields). The simulations are well-designed and far better than previous studies. From a theoretical point of view, the work is somewhat limited by being strongly anchored in a very classical quantitative genetics framework that is focused on allele frequencies and inbreeding coefficients, and totally ignores coalescent theory, but this is a minor quibble. The simulations are limited by utilizing ridiculously small sample sizes by the standards of modern human GWAS. And of course, they do not include all the complexities of real data.

      The quantitative genetics framework we used was ideal for motivating and interpreting LMMs in particular, since they model relatedness with a kinship matrix which consists of IBD probabilities, all of which arose from quantitative genetics.

      We also added the following text to the discussion: “However, our conclusions are not expected to change with larger sample sizes, as cryptic family relatedness will continue to be abundant in such data, if not increase in abundance, and thus give LMMs an advantage over PCA (Henn et al., 2012; Shchur et al., 2018; Loh et al., 2018).”

      The main conclusion of the study is that LMM really are generally superior - as expected on theoretical grounds. However, the authors do address whether switching to LMM really is practicable given the sample size and lack of data sharing that characterize human genetics. Nor is it clear whether the difference in performance matters in real life given that the entire framework used is an idealized one - the fact that real human data suffers from environmental confounders that are correlated with “ancestry” is not addressed, to take the most obvious example. That said, it is surely important to note that the approach routinely used by the majority of users (PCA with 10 PCs) is most used for historical reasons and has little theoretical or empirical justification.

      We added simulations with environment effects correlated with ancestry, which we hope will make our study even more relevant as it does make our evaluations even more realistic than before. In the presence of environment effects, LMM without PCs remains among the best approaches, although occasionally LMM with PCs or PCA will perform slightly better. However, modeling environment directly (with the true variables) improves performance much more than by using PCs to model environment indirectly, so we believe that is not a strong reason for continuing to use PCs (in LMMs or otherwise) unless there is no choice.

      We also added the following text to the discussion: “However, recent approaches not tested in this work have made LMMs more scalable and applicable to biobank-scale data (Loh et al., 2015; Zhou et al., 2018; Mbatchou et al., 2021), so one clear next step is carefully evaluating these approaches in simulations with larger sample sizes.” As stated earlier, we believe that the difference in performance between LMM and PCA will remain in larger sample sizes because cryptic relatedness is more prevalent in that setting.

      We excluded the “lack of data sharing” point from our discussion because it does not align well with the goals of our manuscript. The current solution to the lack of data sharing is meta-analysis, but its use does not give PCA or LMM an inherent advantage, since it can be applied to the summary statistics of either (or even a combination of models, in theory). There is interesting recent work on “federated” PCA and LMM association (both versions exist), that allow a single model to be fit jointly to separate datasets (residing in different buildings across the world) as if they were combined into a single dataset. Thus, these issues do not explain or motivate why PCA or LMM should be used.

      Reviewer #2 (Public Review):

      Yao and Ochoa present a very nice paper examining the age-old question of whether LMM or PCA is a better way to adjust for structure (population, family, admixture). The authors provide a very nice and detailed overview of the previous research addressing this question, summarizing it in a table. They find that LMMs are generally better at accounting for population structure. However, I feel there are a couple of important factors that are missing. One is the consideration of environmental structure. Another is that the relationship between PCA and LMM is usually a bit more complicated in practice than depicted here, where the devil really lies in the details. Also, I think there are a couple of key reasons why LMMs haven’t been adapted as quickly as one might have expected, including case-control imbalance and cohort meta-analyses, which I feel the authors could point out. In fact, I believe LMMs have become sort of popular in recent years (e.g. Japan Biobank GWAS results).

      We added environment simulations, which we agree was an important shortcoming of the previous version of our work.

      We now discuss how the PCA and LMM connection can be more complicated in practice, but as the main difference is in how LD is handled, once that is correctly adjusted, PCs and random effects are still mostly modeling the same relatedness signals. Ultimately, our main conclusion is unchanged, namely that only LMMs can model family relatedness, which is their key advantage.

      We briefly commented on case-control imbalance in our discussion (now made more clear), but since this involves binary traits, which we did not explicitly test in this work, it is out of scope.

      Cohort meta-analysis does not influence whether to use PCA or LMM, since it can be performed with summary statistics from either model (and in theory even a combination of different models per cohort). The broad use of meta-analysis does not in itself prevent users from using PCA or LMM within individual cohorts. The use of meta-analysis is very interesting in its own right, but it is outside the scope of this work.

      Reviewer #3 (Public Review):

      This paper examines the relative performance of linear mixed models (LMMs), principal components (PCA), and their combination (PCA-LMM) for genetic association studies in human populations. The authors claim that previous papers examining this question are inadequate and that: (i) there remains confusion on which method is best and in which context, (ii) that the metrics used in previous evaluations were insufficient, and (iii) that the simulation settings used in previous papers were not comprehensive. To fix these problems the authors perform an extensive set of simulations within several frameworks and suggest two new metrics for evaluating performance.

      Strengths:

      The simulation framework used in this paper and the extensive number of simulations provide an opportunity to examine the relative properties of the three approaches (LMM, PCA, PCA-LMM) in a variety of contexts.

      The parameters of the simulation framework are based on highly diverged populations, which is an increasingly common analysis choice that has not been examined in detail via simulation previously.

      The evaluation metrics used in this paper are AUC and a test of the uniformity of the p-value distribution under the null. This is an improvement over some previous analyses which did not examine power and relied on less sensitive tests of type I error.

      Weaknesses:

      This paper has a limited set of population frameworks just like all papers before it. The breakdown of which method is best (LMM, PCA, PCA-LMM) will be a function of the simulation framework chosen.

      Ameliorating this issue, we added additional simulations with low heritability and with environment effects. We are pleased to report that all of our conclusions hold at low heritability (h2 = 0.3), and for the most part under environment effects (which occasionally give LMM with PCs and PCA a small advantage, but often LMM with no PCs remains best, and we show PCs are no replacement for directly modeling these environment effects).

      The frameworks chosen for this paper are certainly not comprehensive in contemporary human genetic studies. In fact, the authors make a number of unusual choices. For example, the populations in the simulated study have extremely large Fsts. While this is also a strength, the lack of more standard study designs is a weakness. More importantly, there is no simulation of family effects, which is the basis of many of the PCA-LMM papers reported in Table 1.

      We now better motivate in the introduction our focus on association studies of multiethnic and admixed individuals, which are nowadays very common and which have greater FST values than earlier studies. In reference to higher simulated FSTs, we also now cite our recent work, which has found that many previous FST estimates are downwardly biased (Ochoa and Storey, 2021, 2019). We simulated data that was fit to each of our three real datasets using our unbiased methods, so those values that (understandably) appear high are actually more correct (for multiethnic populations such as those in 1000 Genomes, HGDP, etc) than previous estimates in the literature. In our previous work we also determined that only previous pairwise FST estimators are unbiased (under some conditions), and using a previous pairwise FST estimator (from Bhatia et al., 2013) we obtained equally high values between the most diverged human populations (values from a revised version of Ochoa and Storey, 2019 that isn’t on bioRxiv yet): In HGDP, the largest pairwise FST is 0.479, between Pima and PapuanSepik; In Human Origins, the largest estimate is 0.396, between Cabecar and Baining_Malasait; Lastly, in 1000 Genomes, the largest estimate is 0.135, between YRI and JPT. (1000 Genomes was generally less structured than HGDP and Human Origins, because the latter include more diverse populations.) Several previous estimates from the literature, all between one hunter-gatherer Sub-Saharan African subpopulation and one non-African subpopulation resulted in values of about 0.25 (Bowcock et al., 1991, Henn et al., 2011, Bergstrom et al., 2020). FST estimates are also greater from whole-genome sequencing versus array data (revised version of Ochoa and Storey, 2019).

      Family (household) effects is a case where PCA is not expected to outperform LMM, though standard LMMs do not model this effect explicitly either and may not do much better. As this is a feature of family studies that ought to be absent in population studies (as usually only siblings are in the same household, and not more distant relatives), it is also not entirely relevant to the majority of our simulations. In these ways, including such a feature in our simulations does not align with the goals of this present work, but we agree this is an important framework that deserves more attention in future evaluations.

      The discussion (and simulations) of LMM vs PCA, particularly LMMs with PCs as fixed effects misses the critical distinction of whether PCs are in-sample (in which case including PCs as fixed effects effectively serves as a preconditioner for the kinship matrix, speeding up iterative methods such as BOLT), or projections of individuals onto out-of-sample principal axes. There is also no discussion of LOO methods to address “proximal contamination”, also quite relevant in evaluating power as a function of the number of PCs.

      We added the following to our discussion concerning out-of-sample PC projections: “We do not consider the case where samples are projected onto PCs estimated from an external sample (Prive et al., 2020), which is uncommon in association studies, and whose primary effect is shrinkage, so if all samples are projected then they are all equally affected and larger regression coefficients compensate for the shrinkage, although this will no longer be the case if only a portion of the sample is projected onto the PCs of the rest of the sample.”

      We also added the following to the discussion concerning the LOCO approach: “Similarly, the leave-onechromosome-out (LOCO) approach for estimating kinship matrices for LMMs prevents the test locus and loci in LD with it from being modeled by the random effect as well, which is called”proximal contamination” (Lippert et al., 2011, Yang et al., 2014). While LOCO kinship estimates vary for each chromosome, they continue to model family relatedness, thus maintaining their key advantage over PCA.”

      The same new discussion paragraph closes with the following thoughts concerning LOCO and related approaches: “LD effects must be adjusted for, if present, so in unfiltered data we advise the previous methods be applied. However, in this work, simulated genotypes do not have LD, and the real datasets were filtered to remove LD, so here there is no proximal contamination and LD confounding is minimized if present at all, so these evaluations may be considered the ideal situation where LD effects have been adjusted successfully, and in this setting LMM outperforms PCA. Overall, these alternative PCs or kinship matrices differ from their basic counterparts by either the extent to which LD influences the estimates (which may be a confounder in a small portion of the genome, by definition) or by sampling noise, neither of which are expected to change our key conclusion.”

      Lastly, we added the following to a different discussion paragraph: “A different benefit for including PCs were recently reported for BOLT-LMM, which does not result in greater power but rather in reduced runtime, a property that may be specific to its use of scalable algorithms such as conjugate gradient and variational Bayes (Loh et al., 2018).”

      There is no discussion/simulation of spatial/environmental effects or rare vs common PCs as raised in Zaidi et al 2020. There are some open questions here regarding relative performance the authors could have looked at. Same for LMMs with multiple GRMs corresponding to maf/ld bins and thresholded GRMs. For example, it would be helpful to know if multiple-GRM LMMs mitigate some of the problems raised in the Zaidi paper.

      We added simulations with environment effects, which are based on a two-level hierarchy of population labels so they are spatial to the extent that these labels capture spatial relationships between populations. However, our small sample size data are not well suited to study rare variants and their structure, so its out of scope. (The sample size limitation is also covered in a new discussion paragraph.) We hope to tackle this very interesting question in future work.

      We added the following paragraph to our discussion: “Another limitation of this work is ignoring rare variants, a necessity given our smaller sample sizes, where rare variant association is miscalibrated and underpowered. Using simulations mimicking the UK Biobank, recent work has found that rare variants can have a more pronounced structure than common variants, and that modeling this rare variant structure (with either PCA and LMM) may better model environment confounding, improve inflation in association studies, and ameliorate stratification in polygenic risk scores (Zaidi and Mathieson, 2020). Better modeling rare variants and their structure is a key next step in association studies.”

  2. Apr 2023
    1. Author Response:

      We thank the editors and reviewers for their assessment of our manuscript, and their agreement that we present compelling evidence for post-transcriptional regulation of AURKA through the 3’UTR.

      In response to Reviewer 1, we acknowledge that much of our study is performed exclusively in U2OS cells, and that study of alternative polyadenylation in additional cell lines would serve to further generalize our findings. However, as U2OS are a well-known model cell line for cell cycle studies we believe our demonstration of cell cycle regulation of AURKA through its 3’UTR offers a depth of understanding that is perhaps of greater interest than confirming the existence of alternative AURKA 3’UTRs in additional cell lines, using our methods. We note that the recent rapid growth in RNA seq data resources allows easy confirmation of the broad existence of alternative polyadenylation events on a genome-wide scale. For example, AURKA-specific data extracted from a recent benchmark study of Nanopore long read RNA sequencing (Chen et al., 2021) clearly shows the existence of two distinct AURKA 3’UTRs differentially expressed between a number of different cancer cell lines. In addition, a recent study investigating the landscape of APA at single-cell resolution detected AURKA APA isoforms in HeLa and MDA-MB-468 cell lines (Wang et al., 2022). Their study further identifies AURKA among genes showing negative correlation between generalized distal polyA site usage index (gDPAU) and expression levels, meaning preference to use the proximal polyA site when expression levels increase, and include AURKA in the gene cluster showing slight increase in usage of the distal polyA site from G1 to M phase (Wang et al., 2022). Both studies are in support of the evidence presented in our manuscript.

      We agree with Reviewer 2 that better information on translation rates would improve our understanding of the impact of translation regulation on AURKA levels. Some insight on the translation rate of AURKA in the cell cycle can be derived from inspection of the ribosome profiling dataset published by Tanenbaum et al., 2015. From their analysis, translation efficiency of AURKA mRNA in G2 is 1.59 times that in G1 and in G1 it is 0.69 times that in M phase, whilst in G2 it is 1.10 times higher than in M. Such data reveal a reversible increase in translation of AURKA mRNA, alongside other mitotic regulators, in preparation for M phase (Tanenbaum et al., 2015). These results are in accordance with our findings that translation rates contribute modestly to cell cycle changes in AURKA levels in normal cells, and we concur with Reviewer 3’s comment that the contribution of increased translation rate to AURKA levels at mitosis is less than the change in mRNA levels at this point in the cell cycle.

      We think the significance of the regulatory mechanism we describe lies rather in the large effect it has on AURKA levels in interphase (when AURKA expression is normally repressed at both mRNA and translation rate). We hypothesise that it is interphase regulation that may be relevant to roles of AURKA in cancer (and to the association of APA with cancer) (Bertolin and Tramier, 2020; Naso et al., 2021). It is indeed the case that (i) AURKA regulation by miRNA, (ii) cooperation between APA and translation and (iii) cell-cycle dependent control of AURKA at the translation level, are already known. We believe the novelty of our study lies in drawing together these elements to provide new insight into AURKA regulation, using tools that allow similar investigation of other APA events, and contributing new ideas for future therapeutic interventions for disease proteins regulated via APA.

    1. Author Response:

      We would like to thank you for your thorough review of the manuscript. We will take all comments into account in the revised version of the manuscript. Please find below our provisional responses to your comments.

      eLife assessment

      This study reports useful information on the limits of the organotypic culture of neonatal mouse testes, which has been regarded as an experimental strategy that can be extended to humans in the clinical setting for the conservation and subsequent re-use of testicular tissue. The evidence that the culture of testicular fragments of 6.5-day-old mouse testes does not allow optimal differentiation of steroidogenic cells is compelling and would be useful to the scientific community in the field for further optimizations.

      Thank you for this assessment. We will carefully consider all comments and make the requested revisions to improve the manuscript.

      Public Reviews

      Reviewer #1 (Public Review):

      In this manuscript, the authors aimed to compare, from testis tissues at different ages from mice in vivo and after culture, multiple aspects of Leydig cells. These aspects included mRNA levels, proliferation, apoptosis, steroid levels, protein levels, etc. A lot of work was put into this manuscript in terms of experiments, systems, and approaches. However, as written the manuscript is incredibly difficult to follow. The Introduction and Results sections contain rather loosely organized lists of information that were altogether confusing. At the end of reading these sections, it was unclear what advance was provided by this work. The technical aspects of this work may be of interest to labs working on the specific topics of in vitro spermatogenesis for fertility preservation but fail to appeal to a broader readership. This may be best exemplified by the statements at the end of both the Abstract and Discussion which state that more work needs to be done to improve this system.

      As explained below, we will rework and reorganize the manuscript to make it clearer, more meaningful and more precise. We believe that this work may be of interest to a broader readership. Indeed, the development of a model of in vitro spermatogenesis could be of interest for labs working on the specific period of puberty initiation, on germ and somatic cell maturation and on steroidogenesis during this period, and could even be useful for testing the toxicity of cancer therapies, drugs, chemicals and environmental agents (e.g. endocrine disruptors) on the developing testis.

      Reviewer #2 (Public Review):

      Preserving and restoring the fertility of prepubertal patients undergoing gonadotoxic treatments involves freezing testicular fragments and waking them up in a culture in the context of medically assisted procreation. This implies that spermatogenesis must be fully reproduced ex vivo. The parameters of this type of culture must be validated using non-human models. In this article, the authors make an extensive study of the quality of the organotypic culture of neonatal mouse testes, paying particular attention to the differentiation and endocrine function of Leydig cells. They show that fetal Leydig cells present at the start of culture fail to complete the differentiation process into adult Leydig cells, which has an impact on the nature of the steroids produced and even on the signaling of these hormones.

      The authors make an extensive study of the different populations of Leydig cells which are supposed to succeed each other during the first month of life of the mouse to end up with a population of adult and fully functional cells. The authors combine quantitative in situ studies with more global analyzes (RT-QtPCR Western blot, hormonal assays), which range from gene to hormone. This study is well written and illustrated, the description of the methods is honest, the analyses systematic, and are accompanied by multiple relevant control conditions.

      Since the aim of the study was to study Leydig cell differentiation in neonatal mouse testis cultures, the study is well conceived, the results answer the initial question and are not over-interpreted.

      My main concern is to understand why the authors have undertaken so much work when they mention RNA extractions and western blot, that the necrotic central part had to be carefully removed. There is no information on how this parameter was considered for immunohistochemistry and steroid measurements. The authors describe the initial material as a quarter testis, but they don't mention the resulting size of the fragment. A brief review of the literature shows that if often the culture medium is crucial for the quality of the culture (and in particular the supplementations as discussed by the authors here), the size of the fragments is also a determining factor, especially for long cultures. The main limitation of the study is therefore that the authors cannot exclude that central necrosis can have harmful effects on the survival and/or the growth and/or the differentiation of the testis in culture. In this sense, the general interpretation that the authors make of their work is correct, the culture conditions are not optimized.

      When using the organotypic culture system at a gas-liquid interphase, the central part of the testicular tissue becomes necrotic. As previously reported (Komeya et al., 2016), the central region receives insufficient nutrients and oxygen. In vitro spermatogenesis therefore only occurs in the seminiferous tubules present in the peripheral region. As in our previous publications and recent RNA-seq analyses (Dumont et al., 2023), the central necrotic area was removed so that transcript and protein levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. For histochemical and immunohistochemical analyses, only seminiferous tubules located at the periphery of the cultured fragments (outside of the necrotic region) were analyzed. Steroid measurements were performed on the entire fragments.

      The initial material was indeed a quarter testis, which represents approximately 0.75 mm3. No growth of the fragments was observed during the organotypic culture period. We agree with the reviewer that the composition of the culture medium is not the only parameter to be considered for the quality of the culture and that the size of the fragments is also a determining factor. We do not exclude that central necrosis can have harmful effects on the survival and/or the growth and/or the differentiation of the testis in culture. Optimization of the culture medium and culture design (so that the tissue center receives sufficient nutrients and oxygen) will be necessary to increase the yield of in vitro spermatogenesis.

      Organotypic culture is currently trying to cross the doors of academic research laboratories to become a clinical tool, but it requires many adjustments and many quality controls. This study shows a perfect example of the pitfall often associated with this approach. The road is still long, but every piece of information is useful.

      Reviewer #3 (Public Review):

      Moutard, Laura, et al. investigated the gene expression and functional aspects of Leydig cells in a cryopreservation/long-term culture system. The authors found that critical genetic markers for Leydig cells were diminished when compared to the in-vivo testis. The testis also showed less androgen production and androgen responsiveness. Although they did not produce normal testosterone concentrations in basal media conditions, the cultured testis still remained highly responsive to gonadotrophin exposure, exhibiting a large increase in androgen production. Even after the hCG-dependent increase in testosterone, genetic markers of Leydig cells remained low, which means there is still a missing factor in the culture media that facilitates proper Leydig cell differentiation. Optimizing this testis culture protocol to help maintain proper Leydig cell differentiation could be useful for future human testis biopsy cultures, which will help preserve fertility and child cancer patients.

      Methods: In line 226, there is mention that the central necrotic area was carefully removed before RNA extraction. This is particularly problematic for the inference of these results, especially for the RT-qPCR data. Was the central necrotic area consistent between all samples and variables (16 and 30FT)? How big was the area? This makes the in-vivo testis not a proper control for all comparisons. Leydig cells are not evenly distributed throughout the testis. A lot of Leydig cells can be found toward the center of the gonad, so the results might be driven by the loss of this region of the testis.

      When using the organotypic culture system at a gas-liquid interphase, the central part of the testicular tissue becomes necrotic. As previously reported (Komeya et al., 2016), the central region receives insufficient nutrients and oxygen. In vitro spermatogenesis therefore only occurs in the seminiferous tubules present in the peripheral region. As in our previous publications and recent RNA-seq analyses (Dumont et al., 2023), the central necrotic area was removed so that transcript levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. The transcript levels of the selected genes were of course normalized to housekeeping genes (Gapdh and Actb) or to the Leydig cell-specific gene Hsd3b1.

      The central necrotic area was consistent between all samples and variables: it represents on average 16-27% of the explants.

      Moreover, we would like to point out that the gonads were cut into four fragments before in vitro cultures. It is therefore the central part of these explants that was removed and not the central part of the gonads. The central part of the gonads was thus included in our analyses.

      What did the morphology of the testis look like after culturing for 16 and 30 days? These images will help confirm that the culturing method is like the Nature paper Sato et al. 2011 and also give a sense of how big the necrotic region was and how it varied with culturing time.

      This point will be addressed in the detailed responses to reviewers.

      There are multiple comparisons being made. Bonferroni corrections on p-value should be done.

      This point will be addressed in the detailed responses to reviewers.

      Results: In the discussion, it is mentioned that IGF1 may be a missing factor in the media that could help Leydig cell differentiation. Have the authors tried this experiment? Improving this existing culturing method will be highly valuable.

      The decreased Igf1 mRNA levels found in the present study are in line with the RNA-seq data of Yao et al., 2017. As mentioned in the Discussion section, the addition of IGF1 in the culture medium led to a modest increase in the percentages of round and elongated spermatids in cultured mouse testicular fragments (Yao et al., 2017). However, the effect of IGF1 supplementation on Leydig cell differentiation was not investigated. The supplementation of organotypic culture medium with IGF1 is currently being tested in our research team.

      Add p-values and SEM for qPCR data. This was done for hormones, should be the same way for other results.

      p-values and SEM are shown for both qPCR and hormone data.

      Regarding all RT-qPCR data-There is a switch between 3bHSD and Actb/Gapdh as housekeeping genes. There does not seem to be as some have 3bHSD and others do not. Why do Igf1 and Dhh not use 3bHSD for housekeeping? If this is the method to be used, then 3bHSD should be used as housekeeping for the protein data, instead of ACTB. Also, based on Figure 1B and Figure 2A (Hsd3b1) there does not seem to be a strong correlation between Leydig cell # and the gene expression of Hsd3b1. If Hsd3b1 is to be used as a housekeeper and a proxy for Leydig cell number a correlation between these two measurements is necessary. If there is no correlation a housekeeping gene that is stable among all samples should be used. Sorting Leydig cells and then conducting qPCR would be optimal for these experiments.

      Hsd3b1 was used as a housekeeping gene only to normalize the mRNA levels of Leydig cell-specific genes. Therefore, Igf1 and Dhh transcript levels were not normalized with Hsd3b1 since Igf1 is expressed by several cell types in the testis (Leydig cells, Sertoli cells, peritubular myoid cells) and Dhh is expressed by Sertoli cells.  

      Regarding western blots, the expression of AR, CYP19 and FAAH could not be normalized with 3bHSD since AR is expressed by Leydig cells, Sertoli cells and peritubular myoid cells, CYP19 is expressed by Leydig cells and germ cells and FAAH is expressed by Sertoli cells. We will review the western blot results for CYP17A1.

      As shown in Figure 1B, the number of Leydig cells per cm2 of testicular tissue is not significantly different between the different time points in vivo (6 d_pp_, 22 d_pp_ and 36 d_pp_), in vitro (D16 FT and D30 FT) and between the in vivo and in vitro conditions (22 d_pp_ versus D16 FT, 36 d_pp_ versus D30 FT). Similarly, our data in Figure 2A show that Hsd3b1 mRNA levels are not significantly different between the different time points in vivo (6 d_pp_, 22 d_pp_ and 36 d_pp_), in vitro (D16 FT and D30 FT) and between 22 d_pp_ and D16 FT. However, Hsd3b1_mRNA levels were significantly lower in D30 FT tissues compared to 36 d_pp. We will measure the correlation between the number of Leydig cells per cm2 of testicular tissue and Hsd3b1 mRNA levels, as suggested by the reviewer.

      Figure 2A (CYP17a1): It is surprising that the CYP17a1 gene and protein expression is very different between D30FT and 36.5dpp, however, the immunostaining looks identical between all groups. Why is this? A lower magnification image of the testis might make it easier to see the differences in Cyp17a1 expression. Leydig cells commonly have autofluorescence and need a background quencher (TrueBlack) to visualize the true signal in Leydig cells. This might reveal the true differences in Cyp17a1.

      This point will be addressed in the detailed responses to reviewers.

      Figure 3D: there are large differences in estradiol concentration in the testis. Could it be that the testis is becoming more female-like? Leydig and Sertoli cells with more granulosa and theca cell features? Were any female markers investigated?

      We show in the present study that the expression level of the Sertoli cell-specific gene Dhh is not reduced in organotypic cultures. We also previously found that the expression level of the Sertoli-cell specific gene Amh was not reduced in in vitro matured testicular tissues (Rondanino et al., 2017). Moreover, our recent unpublished data show that Sox9, a testis-specific transcription factor, is expressed in Sertoli cells in organotypic cultures. These results suggest that Sertoli cells are not becoming granulosa-like cells and that the testis is not becoming more female-like. Markers of granulosa and theca cells were not investigated.

      Figure 3D and Figure 5A: It is hard to imagine that intratesticular estradiol is maintained for 16-30 days without sufficient CYP19 activity or substrate (testosterone). 6.5 dpp was the last day with abundant CYP19 expression, so is most of the estrogen synthesized on this first day and it sticks around? Are there differences in estradiol metabolizing enzymes? Is there an alternative mechanism for E production?

      This point will be addressed in the detailed responses to reviewers.

    1. Author Response

      Reviewer #2 (Public Review):

      In a neonatal model of bacterial meningitis induced by s.c. injection of E. coli, transcriptional changes were found across all major cell types including endothelial cells, fibroblasts and macrophages. Among macrophages, they describe 2 resident subsets and 2 inflammatory subsets. By immunohistochemistry of arachnoid and dura flatmounts, they show vascular changes upon infection, including clustering of CLDN5 and PECAM1, and disorganized capillary morphology, which was dependent on Tlr4 signaling but independent of arachnoid macrophages.

      The manuscript would benefit from rewriting, it is not written in a concise manner and the rationale for experiments, time points for analyses and their conclusions are not clear. The model of s.c. bacterial infection is not well introduced and overall changes in the periphery, survival curves or bacterial counts (in the KO models) in the meninges/brain are not mentioned.

      Thank you for those comments. We hope that the text is now more readable. We have added a separate section to describe the meninges model and added data on survival and E coli counts (Supplemental Figure 3).

    1. Author Response

      Reviewer #1 (Public Review):

      This work puts forward a comprehensive characterisation of colorectal cancer (CCCRC), by classifying it into 4 subtypes with distinct TME features. It uses 10 public databases: 8 microarray datasets for the training of molecular classification and 2 RNAseq for validation (CRC-RNAseq) to identify the 4 subtypes using unsupervised machine learning (consensus clustering). These 4 subtypes were found to be somewhat distinct in terms of immune response and the possibilities for effective treatments. They found that one subtype may be more sensitive to chemotherapy, two to WNT pathway inhibitor SB216763 and Hedgehog pathway inhibitor vismodegib, and one to ICB treatment. They show an association with patient outcome in terms of PFS, validated in the validation cohort. They used histology to correspond the subtypes to known pathological types, as well as investigating their T cell makeup. They also investigated the genetic tumour evolution that may occur between the subtypes. A single-sample gene classifier was put forward as a way of identifying the class of cancer. The evidence for the main results of the work is convincing, but a few areas need to be clarified and extended.

      In the determination of the 4 subtypes (C1-C4) the methodology is clear, and the definition of the training and validation data are clear and well presented. The techniques used are well suited to the problem. The performance of the classification as a predictor of prognosis is presented as KM curves of PFS and OS for the training and validation sets. The training data shows a significant log-rank p-value in both PFS and OS. The validation data shows a significant effect in PFS.

      What follows is quite an exhaustive process of finding differences between the cohorts using a multitude of techniques and datasets, including genomics, epigenetics, transcriptomics, and proteomics. These sections are mainly descriptive and do add understanding to the classification, especially with regard to the T-cell populations that are invasive.

      Improvements could be made to the latter sections of the main paper. The basis for the potential clinical responses of the subtypes is arrived at via a "pre-clinical model" based on 81 genes. It would benefit from clarification on what genes were used in model training and details of the final model. Similarly the description of the "Single-sample gene classifier" could be enhanced similarly with a better description of which genes are in the final classifier.

      Thank you for taking the time to review our article and for your positive feedback. Your thorough evaluation of our work has been invaluable to us, and we appreciate your recognition of the effort we put into it.

      1) The basis for the potential clinical responses of the subtypes is arrived at via a "pre-clinical model" based on 81 genes.

      The exact details of the filtering criteria used to obtain the list of pre-clinical model genes have added to the Methods section of the study (Lines 1061-108, Lines 503-511) (Supplementary file 3a). To explore the treatment for each CCCRC subtype using cancer cell line drug-sensitivity experiments, we developed a pre-clinical model based on subtype-specific, cancer cell-intrinsic gene markers according to a previously published study (Eide et al., 2017). Firstly, the “limma” package was used to identify DEGs with FDR < 0.05 between each of the four subtypes and the remaining subtypes in the CRC-AFFY cohort. To identify subtype-specific genes in one of the subtypes, we excluded those that were found to be differentially expressed in comparisons between one of the other subtypes and the remaining subtypes. The upregulated subtype-specific genes (log2 (fold change, FC) > 0 and FDR < 0.05) was ranked based on their log2FC and selected the top 500 genes for further gene screening. Secondly, the GEP of human CRC tissues versus patient-derived xenografts (PDX) in the GSE35144 dataset by the R package “limma” was used to remove those genes associated with stromal and immune components. DEGs with FDR > 0.5 and log2 FC < 1 between human CRC tissues versus PDX were considered as cancer cell-intrinsic genes. Thirdly, we also utilized human CRC cell lines to obtained cancer cell-intrinsic genes. A total of 71 human CRC cell lines with RNAseq data (log2TPM) was obtained from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https://depmap.org/portal/download/all/), 43 of which had dose-response curve (area under the curve, AUC) values. The MSI status, FGA and TMB information of CRC cell lines was obtained from cbioportal website (https://www.cbioportal.org/study/summary?id=ccle_broad_2019). RNAseq data for 71 human CRC cell lines was used to further determine the cancer cell-intrinsic genes and genes among the top 25% within (i) the 10−90 % percentile range of the largest expression values and (ii) the highest expression in at least three samples. The subtype-specific genes and cancer cell-intrinsic genes were intersected to generate the gene list for developing the pre-clinical model. The pre-clinical model was developed using the nearest template prediction (NTP) function of R package “CMScaller”, which can be applied to cross-tissues and cross-platform predictions (Hoshida, 2010). The GEP (log2TPM) of 71 human CRC cell lines normalized by the Z-score were input into the pre-clinical model, and the cell lines were divided into four CCCRC subtypes. (Lines 1061-1088)

      Here we want to make a point that we changed from using the xgboost algorithm to using the NTP algorithm to build our pre-clinical model. Based on the genomic features of the cell line, we evaluated the reliability of the final pre-clinical model and found that the pre-clinical model built using the NTP algorithm is more reliable. As expected, the C4 subtype cell lines demonstrated the highest TMB values and MSI frequency while exhibiting the lowest FGA scores when compared to other subtypes (Figure 6-figure supplement 1G-I). In contrast, C1 and C3 subtype cell lines showed significantly higher FGA scores and significantly lower TMB values and MSI frequency. The C2 subtype cell lines had median FGA scores, TMB values, and MSI frequency. The pre-clinical model is publicly available at https://github.com/XiangkunWu/pre_clinical_model. (Lines 503-511)

      2) Similarly the description of the "Single-sample gene classifier" could be enhanced similarly with a better description of which genes are in the final classifier.

      We apologize for any confusion caused in our revised regarding the derivation of the CCCRC classifier. Specifically, we have added more details on the derivation of model genes and the establishment of the model, and ensured the availability of the CCCRC classifier. The method details and results of deriving the model genes and building the model are described next. (Lines 1102-1121) (Lines 562-579) (Supplementary file 3c)

      In order to facilitate the widespread application of CCCRC classification system, we established a simple gene classifier to predict CCCRC subtypes. Firstly, we filtered genes based on their mean expression and variance in the CRC-AFFY cohort, and genes with expression and variance below the bottom 25% were removed. Then, we applied the Random Forest algorithm (RF) in the R package "caret" to perform feature selection on the CCCRC subtype-specific genes of the CRC-AFFY cohort. The top 20 most informative features for each subtype were ranked and selected based on the impurity measure generated by the algorithm. This allowed us to identify critical genes that are strongly associated with each CCCRC subtype and develop the CCCRC classifier. Next, we randomly divided the CRC-AFFY cohort into training and validation sets at a ratio of 7:3 using “createDataPartition” function provided in the R package "caret" (seed=123). The GEP was normalized with Z-scores prior to model training and validation. The CCCRC classifiers were trained with the top 80 subtype-specific genes using the RF, Support Vector Machine (SVM), eXtreme Gradient Boosting (xgboost), and Logistic Regression algorithms implemented in the R package "caret". Finally, we validated the CCCRC classifier on the GSE14333 and GSE17536 datasets, as well as the CRC-AFFY cohort. We evaluated the predictive performance of the CCCRC classifier by evaluating measures such as accuracy value and F1 score, which were generated using the " confusionMatrix " function provided in the R package "caret". (Lines 1102-1121)

      We established the CCCRC classifier on the training set by utilizing multiple machine learning algorithms based on the GEP of 80 upregulated subtype-specific genes (Supplementary file 3c). Upon application to the test set, GSE14333, and GSE17536 datasets, the performance of the eXtreme Gradient Boosting (xgboost) algorithm was the best with the highest accuracy values and F1 scores compared to the Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression algorithms (Figure 6-figure supplement 4). Notably, the CCCRC classifier based on the xgboost algorithm displayed robust performance across gene expression platforms, Affymetrix and RNA-sequencing platforms, exhibiting a balanced accuracy of > 80% for all subtypes (Supplementary file 3d). These findings demonstrated the stability and cross-platform applicability of our classifier. The CCCRC classifier based on the xgboost algorithm is publicly available at https://github.com/XiangkunWu/CCCRC_classifier, and the CCCRC subtype information of CRC patients can be obtained by directly inputting the GEP of 80 upregulated subtype-specific mRNA genes. The CCCRC classifier might facilitate the discovery of new biomarkers and the personalized treatment of clinical patients with CRC. (Lines 562-579)

      Reviewer #2 (Public Review):

      This study aimed to classify colorectal cancer (CRC) samples based on the expression of genes in selected gene lists, where the gene lists were chosen to represent aspects of the tumour microenvironment, tumour-associated immune cells, and tumour cells. The resulting clusters were then used to define a classifier, followed by a detailed description of molecular features of the tumours and tumour microenvironments assigned to each cluster. The authors claim this study is more "holistic" than previous work on CRC clustering/classifiers because they aimed to explicitly include additional components of the tumour microenvironment in both the clustering/classifier definition and in the subsequent description of molecular characteristics.

      The CCCRC clustering and the resulting classifier presented in this paper are derived from published RNAseq studies. The multi-omics aspect of the work is restricted to smaller sample numbers for which both transcriptomic and another omics dataset were available in public resources and comprises a description or correlative analysis of each omics data type within each of the assigned CCCRC subtypes.

      By applying solid computational methods to a compendium of published RNAseq datasets (n~1500 tumours), they found that tumour samples from colorectal cancers clustered into 4 subtypes ("CCCRC" subtypes) on the basis of 61 pre-defined gene expression signatures. These subtypes correlated with but did not correspond to, the previously described Consensus Molecular Subtypes (CMS) of colorectal tumours.

      Other types of molecular data were available for some tumours, obtained from the same published resources: whole-slide images, mutations, tumour proteomics, and/or scRNAseq. The authors reanalysed these datasets using standard methods and drew correlations with the CCCRC subtypes they had assigned in this work. To (semi-)quantify immune infiltration characteristics from whole-slide images (WSI), they additionally performed automated segmentation in addition to review by pathologists, which in combination produced a convincing WSI-derived dataset.

      In combination with existing CRC classifications, this study could facilitate future biomarker discoveries. This appears to be the authors' main claim, and the data and methods broadly support this claim.

      Thank you for taking the time to review our article and for your positive feedback. Your thorough evaluation of our work has been invaluable to us, and we appreciate your recognition of the effort we put into it.

      Some aspects of the work need to be clarified: 1) This work relies on the definition of 4 clusters of CRC tumours based on consensus clustering of the 61 gene lists, which in turn depends on the choice of clustering method and the choice of gene lists. Sufficient detail is provided about the gene lists and resulting clusters, but this paper does not show how robust the 4 clusters are to these choices; for example, the "Energy" gene list appears to be a relatively strong component of clusters C2 and C3.

      Thank you very much for providing such detailed and insightful feedback.

      1.1. The reviewer has raised a valid concern about the impact of gene list selection on the robustness of the clusters. To address this issue, we used the “pamr.predict” function of the R package “pamr” (Tibshirani et al., 2002) to extract centroids of each subtype that best represent each subtype and establish a PAMR classifier. PAM (Prediction Analysis of Microarrays) is a statistical technique to identify subsets of features that best characterize each class using nearest shrunken centroids (Tibshirani et al., 2002). The technique is general and can be used in many other classification problems. As shown in Figure 1-figure supplement 2E, a threshold of 0.566 with minimum 10-fold cross-validation error was selected to identify the 61 TME-related signatures that exhibit at least one non-zero difference between each subtype (seed = 11). These signatures were then used to construct a PAMR classifier with superior predictive capability, exhibiting an overall error rate of 15%. We used the established PAMR classifier to predict the CCCRC subtypes on the CRC-RNAseq cohort and the same four CCCRC subtypes were revealed, with similar patterns of differences in the TME components (Fig. S2F, G). This indicated that the 61 TME-related signatures best represent each subtype and are indispensable for achieving the identification of the four CCCRC subtypes. (Lines 161-168)

      1.2. The reviewer has raised a valid concern about the impact of the clustering method selection on the robustness of the clusters.

      We performed extensive data analysis attempts during our unsupervised clustering analysis, which primarily involved attempting various clustering methods, including K-means clustering, non-negative matrix factorization (NMF) clustering, and hierarchical clustering, as well as replacing different sources and categories of the TME-related signatures. To determine the optimal clustering method and TME panel, we evaluated whether the TME panel could reproduce the heterogeneity of TME, the stability of the clustering itself, the biological characteristics of the subtypes, the correlation between subtypes and prognosis, and the correlation between subtypes and microsatellite instability (MSI), consensus molecular subtypes (CMS) classification system, and other molecular subtype systems. Due to the abundance of exploratory data analysis results, we ultimately selected the best clustering method and TME panel combination for showcase.

      1.3. Also, we analyzed the sensitivity analysis of the effect of TME-related signatures on the clustering results. Since the effect of removing one of the TME-related signatures on the clustering results was not well evaluated, we attempted to remove the entire category. We performed consensus clustering analysis again using the same parameters (partitioning around medoids (pam) clustering; "Pearson" distance; 1,000 iterations; from 2-6 clusters). When we conducted consensus clustering analysis using only immune-related signatures, we identified three subtypes: low (C2), moderate (C3), and high (C1) immune infiltration subtypes. When we included both immune-related and tumor-related signatures, we identified four subtypes: immunomodulatory (S1), cold (S2/S3), and immune-excluded (S4) subtypes. It appears that the immunosuppressed subtype in the CCCRC classification system may have been assigned to both S1 and S4 subtypes. Limiting the consensus clustering analysis to only immune-related or immune- and stroma-related signatures, as done in previous studies (Bagaev et al., 2021; He et al., 2018), did not allow reliable identification of all four CCCRC subtypes. These sensitivity analyses underscored the necessity of our well-designed TME panel to achieve the identification of the four CCCRC subtypes. (Lines 172-176) (Figure 1-figure supplement 4)

      2) The authors examined whether their CCCRC classification showed differential disease progression in available retrospective cohorts of people treated with anti-PDL1 therapy. The authors presented this work as "significance of CCCRC in guiding the clinical treatment of colorectal cancer", but the data presented in this section cannot support clinical treatment decisions, which would require prospective studies and clinical trial designs. However, this section is potentially useful for generating hypotheses about potential biomarkers related to the CCCRC subtypes, and might, in the future with additional evidence, contribute to the design of a trial. The authors point out that additional experimental evidence would be required.

      Thank you for your constructive suggestions. We agree that our retrospective analysis of the CCCRC classification in relation to disease progression under immune checkpoint blockade treatment does not directly support clinical treatment decisions. We acknowledge that additional experimental evidence would be required to fully support the use of the CCCRC classification as a clinical tool for guiding treatment decisions. We have highlighted in the corresponding section of the article that this research is pre-clinical and still requires substantial basic experiments and clinical trials to validate. (Lines 536, 751)

      3) Other prognostic or predictive clinicopathological variables for colorectal cancer are not discussed in detail in the present work but are important for further work on the prognostic and predictive value of CRC molecular subtypes and biomarker derivation. Discrepancies in treatment response have previously been observed in separate CRC trials of biologically targeted agents with different chemotherapy backbones and other authors have suggested that treatment interactions with the tumour microenvironment might in part explain these discrepancies (e.g. Aderka (2019) PMID:31044725, and others).

      3.1) Other prognostic or predictive clinicopathological variables for colorectal cancer are not discussed in detail in the present work but are important for further work on the prognostic and predictive value of CRC molecular subtypes and biomarker derivation.

      Thank you for bringing up this point. We apologize for not analyzing other clinicopathological variables for colorectal cancer in more detail in my original work. We agree that these variables are important for further study of our CCCRC classification system to guide biomarker derivation and to guide clinical treatment decisions. We added in the article the relationship between CCCRC subtypes and clinicopathological variables, as well as the comparison with CMS subtypes (Lines 256-262, 661-666). In addition, we have identified a clerical error in our manuscript and have corrected it accordingly. Specifically, the use of PFS as the endpoint in some parts of the manuscript was a mistake and has been corrected to DFS. We would like to clarify that the endpoint for the CRC-AFFY and CRC-RNAseq cohorts is DFS and OS, while the endpoint for the GSE104645 dataset is PFS and OS. For the immune checkpoint blockade therapy cohort, the endpoint for PRJEB23709 (Gide) is PFS and OS, and for the GSE135222 (Jung) dataset, the endpoint is PFS. Progression Free Survival (PFS) refers to the time from randomization (or treatment initiation) to the first occurrence of disease progression or death from any cause. The definition of Disease-Free Survival (DFS) is the time from randomization to the appearance of evidence of disease recurrence.

      We further analyzed the association of CCCRC subtypes with clinicopathological characteristics (Supplementary file 1f, Supplementary file 1g). We found that the C4 subtype was mostly diagnosed in right-sided CRC lesions and in females, which was consistent with the CMS1 subtype. The C1 and C3 subtypes were mainly observed in left-sided CRC lesions and in males, consistent with the CMS2 and CMS4 subtypes. The C3 subtype was strongly associated with more advanced tumor stages, which was the similarity to the CMS4 subtype, while the C4 subtype was associated with higher histopathologic grade, which was the similarity to the CMS1 subtype. Furthermore, our analysis using the Kaplan-Meier method demonstrated that patients with the C4 subtype had significantly higher disease-free survival (DFS) and overall survival (OS) compared to those with the C2 and C3 subtypes in the CRC-AFFY (Figure 1I, Figure 1-figure supplement 7A) and CRC-RNAseq cohorts (Figure 1-figure supplement 7B, C). Multivariate Cox proportional hazard regression analysis showed that the C4 subtype was an independent predictor of the best OS and DFS, whereas the C3 subtype was an independent predictor of the worst OS and DFS after adjustment for age, gender, tumor site, TNM stage, grade, adjuvant chemotherapy or not, MSI status, BRAF and KRAS mutations, and the CMS classification system in the combined cohort (the CRC-AFFY and CRC-RNAseq cohorts) (Supplementary file 1h). Considering that the C1, C2/C3, and C4 subtypes partially overlap with the CMS2, CMS4, and CMS1 subtypes, respectively, we also analyzed the prognostic differences between them in the combined cohort. We found that the DFS/OS of patients with the C1 subtype was worse than those with the CMS2 subtype (Figure 1-figure supplement 7D, E), the DFS/OS of patients with the C2 subtype was better than those with the CMS4 subtype (Figure 1-figure supplement 7F, G), the DFS/OS of patients with the C3 subtype was not significantly different from those with the CMS4 subtype (Figure 1-figure supplement 7F, G), and the DFS/OS of patients with the C4 subtype was significantly better than those with the CMS1 subtype (Figure 1-figure supplement 7H, I). Notably, the C2 subtype within the CMS4 subtype also had a better prognosis than the C3 subtype within the CMS4 subtype (Figure 1-figure supplement 7J, K). The above analysis demonstrated that the CCCRC classification system were closely associated with clinicopathological characteristics, were able to refine the CMS classification system and MSI status, as well as contributed to the understanding of the mechanisms underlying the different clinical phenotypes resulting from TME heterogeneity.

      3.2) Discrepancies in treatment response have previously been observed in separate CRC trials of biologically targeted agents with different chemotherapy backbones and other authors have suggested that treatment interactions with the tumour microenvironment might in part explain these discrepancies (e.g. Aderka (2019) PMID:31044725, and others).

      The reviewer's comments greatly contributed to the quality of our study. Aderka et al. discussed the reasons for the differences in the results of the CALGB/SWOG 80405 and FIRE-3 clinical trials, which may be related to differences in the chemotherapy backbone used and TME heterogeneity (Aderka et al., 2019). Both trials evaluated the combination of cetuximab or bevacizumab with a different chemotherapy backbone: in the CALGB/SWOG 80405 trial, 75% of patients received oxaliplatin, while in the FIRE-3 trial, all patients received irinotecan. The CCCRC classification system also facilitates the understanding of the differences in the results of the CALGB/SWOG 80405 and FIRE-3 clinical trials (Heinemann et al., 2014; Lenz et al., 2019). We have added this content to the discussion section of the article (Lines 753-777). Based on our examination of the results summarized in Figure 4 of the work by Aderka et al. (Aderka et al., 2019), we found that differences in the treatment outcomes of the CMS1 and CMS4 subtypes were the crucial factor behind the divergent results observed in the two clinical trials. The CMS1 and CMS4 subtypes have a microenvironment rich in CAFs. Our CCCRC classification results also showed that CMS1, in addition to mainly consisting of the C4 subtype, also contains a considerable number of the C2 subtype, while the CMS4 subtype mainly consists of the C2 and C3 subtypes. Furthermore, our study results indicated that the C2 subtype is suitable for chemotherapy in combination with bevacizumab, possibly because the combination can inhibit the CAFs and abnormal blood vessel formation in the microenvironment, thus alleviating the immune suppression of the immune cells. However, the C3 subtype is not suitable for chemotherapy in combination with bevacizumab because it only accumulates CAFs and abnormal blood vessel formation but lacks T cell infiltration. Therefore, we boldly speculate that the CMS1 and CMS4 subtypes in the CALGB/SWOG 80405 clinical trial may contain more C2 subtypes than those in the FIRE-3 clinical trial, leading to the CMS1 and CMS4 subtypes in the CALGB/SWOG 80405 clinical trial being more suitable for chemotherapy in combination with bevacizumab than cetuximab compared to the FIRE-3 clinical trial. Overall, the integration of CCCRC and CMS classification systems provides valuable insights for understanding the divergent outcomes of the two clinical trials (Lines 753-777).

      Reviewer #3 (Public Review):

      In their study: Comprehensive characterization of tumor microenvironment in colorectal cancer via histopathology-molecular analysis, Wu et al., aim to examine the contribution of the tumour microenvironment (TME) on biological and clinical heterogeneity in colorectal cancer (CRC).

      To achieve this the authors use a vast array of publicly available datasets across a variety of biological modalities (transcriptomic, epigenetic, mutational). Using thoughtfully curated genesets the authors classify CRC into 4 holistic comprehensive characterised CRC (CCCRC) subtypes which comprise immune, stromal, and tumour features of CRC biology.

      The authors investigate the association of their novel CCCRC subtypes with current "gold standard" classification schemes.

      The authors' integration of deep learning methods for HE classification and subsequent association with "Tumor level" CCCRC subtypes is a refreshing addition to the study. Comment on the degree of heterogeneity observed in HE samples and correlation to the heterogeneity of CCCRC subtypes would be a welcomed addition. It is likely publicly available datasets from such platforms as 10X Genomic Visium would be available for this type of analysis.

      Whilst one of the main outcomes of the study is the addition of another classification scheme to the field of colorectal cancer, the CCCRC scheme represents a holistic perspective on CRC classification.

      The authors provide a welcomed graphical overview of the complex narrative of the study in Figure 7.

      The authors focus on the classification of inter-patient heterogeneity and its associated predictive and prognostic utility. There appears to be a significant degree of overlap between immunosuppressive and immune excluded, and proliferative and immuno-modulatory signatures in Figure 1A. One of the major limitations of patient response to treatment is intra-patient heterogeneity, it would be nice for the authors to comment briefly on the degree of intra-patient heterogeneity of the CCCRC subtypes.

      Overall the authors succeed in providing a holistic deep characterization of CRC from the perspective of a variety of biological modalities. The authors provide a novel classification scheme for the field of CRC which demonstrates prognostic and predictive utility, which would benefit from further validation from external datasets. The authors demonstrate a pathway for integration and interpretation of complex high-dimensional data into clinically translatable currency such as the H&E.

      Thank you for taking the time to review our article and for your positive feedback. Your thorough evaluation of our work has been invaluable to us, and we appreciate your recognition of the effort we put into it.

      1) Comment on the degree of intra-patient heterogeneity of CCCRC subtypes would be nice.

      We have added intra-tumor heterogeneity analysis for each subtype (Lines 196-198). The level of intratumor heterogeneity (ITH) was significantly linked to poor prognosis and drug resistance (Caswell and Swanton, 2017). The ITH data used in our study for the CRC-RNAseq cohort was obtained from a previous study conducted by Thorsson et al. (Thorsson et al., 2018). As expected, the ITH of the C2 and C3 subtypes was higher than that of the other subtypes, while the ITH of the C4 subtype was the lowest (Figure 1F). Our analysis using the Kaplan-Meier method demonstrated that patients with the C4 subtype had significantly higher overall survival (OS) and disease-free survival (DFS) compared to those with the C2 and C3 subtypes. Furthermore, the C3 subtype was resistant to chemotherapy, cetuximab, bevacizumab, and ICB therapy. Our investigation of drug sensitivity data of cell lines also indicated that the C2 and C3 subtypes were generally not responsive to most drugs.

      2) A significant degree of overlap between immunosuppressive and immune excluded, and proliferative and immuno-modulatory signatures in Figure 1A is apparent and should be commented upon.

      Our research revealed that both C2 and C3 subtypes exhibited a high level of tumor stroma, while C1 and C4 subtypes were characterized by active DNA damage and repair and high tumor proliferation. Additionally, C2 and C4 subtypes had an abundance of immune components. This was consistent with our finding that there may be interconversion between the C1 and C4 subtypes, between the C4 and C2 subtypes, and between the C2 and C3 subtypes in this evolutionary pattern. The interconversion between C2 and C4 subtypes in this evolutionary pattern was the rarest situation, indicating that once the tumor enters the C2 subtype, it is difficult to reverse and will progress to the C3 subtype. (Lines 637-644)

      3) It is likely publicly available datasets from such platforms as 10X Genomic Visium would be available for this type of analysis.

      To investigate the spatial distribution relationship between four CCCRC subtypes of tumor cells, T cells, and stromal cells, we conducted a re-analysis of publicly available CRC spatial transcriptomics data (ST) obtained from the 10X website (https://www.10xgenomics.com/resources/datasets). The Space Ranger output files were then processed with Seurat (V4.1.1) (Hao et al., 2021) using SCTransform for normalization (Hafemeister and Satija, 2019). RunPCA were used to dimension reduction and RunUMAP to visualize the data. We used “ssGSEA” method implemented in the R package “GSVA” to score the six cell types (C1-C4 subtype cancer cells, mesenchymal cells, and T cells) (Hänzelmann et al., 2013). The “ssGSEA” method has been previously demonstrated to be highly reliable and suitable for ST data analysis (Wu et al., 2022). The cell-type-rich region was defined as the ssGSEA score of each cell type from one spot larger than the 75% quantile of this cell type. The markers for the six cell types are listed in the Supplementary file 1a and Supplementary file 3a. (Lines 1090-1102)

      The Cytassist and Visium samples had a total of 9080 and 2660 spots, respectively. We used “ssGSEA” method to quantify the six cell subpopulations of each spot and also visualized only the spots corresponding to the top 25% of the score ranking for each cell type (Figure 6-figure supplement 2AB, Figure 6-figure supplement 3AB). In Cytassist samples, we observed different spatial distribution patterns of the four subtypes of tumor cells (Figure 6-figure supplement 2B). Specifically, the C3 subtype of tumor cells was predominantly located in the tumor periphery with an enrichment of mesenchymal cells and T cells (areas selected by black dashed circles). In contrast, the C4 subtype of tumor cells was mainly present in the center of the tumor, accompanied by the presence of T cells. The C1 and C2 subtypes of tumor cells were distributed in relatively uniform areas, mainly in the tumor periphery, with fewer mesenchymal cells and T cells. However, the distribution areas of C2 subtype and C3 subtype of tumor cells also partially were in overlap (the area selected by red dashed circles). The same distribution patterns can also be observed in the Visium sample (Figure 6-figure supplement 3B). Further analysis of the correlation between the ssGSEA scores of each cell type in the cell-type-rich regions and those of other cell types was conducted (Figure 6-figure supplement 2D, E, Figure 6-figure supplement 3D, E). We found that in the C3 subtype-rich region of tumor cells, the C3 subtype score of tumor cells was significantly positively correlated with the mesenchymal cell score, while in the T cell-rich region, the C3 subtype score of tumor cells was significantly negatively correlated with the T cell score. The C4 subtype score of tumor cells was significantly positively correlated with the T cell score and negatively correlated with the mesenchymal cell score in the C4 subtype-rich, T cell-rich, and mesenchymal cell-rich regions. The C1 subtype and C2 subtype scores of tumor cells were negatively correlated with mesenchymal cell and T cell scores. Overall, these results were generally consistent with previous histopathologic analysis findings. (Lines 538-562)

    1. Author Response

      Reviewer #1 (Public review):

      1.0) This paper investigates the metabolic basis of a node, posterior cingulate cortex (PCC), in the default node network (DMN). They employed sophisticated MRI-PET methods to measure both BOLD and CMRglc changes (both magnitude and dynamics) during attention-demanding and working memory tasks. They found uncoupling of BOLD and CMRglc in PCC with these different tasks. The implications of these findings are poorly interpreted, with a conclusion that is purely based on other work independent of this study. Various suggestions could allow them to place some speculations in line with a stronger interpretation of their results.

      This is one of several papers in recent years investigating the metabolic underpinnings of activated (or task-positive) and deactivated (or task-negative) cortical areas in the human brain. In this study, they used BOLD fMRI and glucose PET scan to examine the metabolic distinction of the default node network (DMN), which is known to be deactivated during attention-demanding tasks, with different types of cognitively demanding tasks. Unlike the BOLD response in posteromedial DMN which is consistently negative, they found that CMRglc of the posteromedial DMN (a task-negative network) is dependent on the metabolic demands of adjacent task-positive networks like the dorsal attention network (DAN) and frontoparietal network (FPN). With attention-demanding tasks (like Tetris) the BOLD and CMRglc are both downregulated in DMN (specifically the posterior cingulate cortex, PCC, a task-negative node of DMN), but working memory induces CMRglc increase in PCC and which is decoupled from the negative BOLD response in PCC.

      We thank the reviewer for the constructive feedback and the possibility to improve our manuscript. We agree that the interpretation of the results should be strengthened to provide a stronger focus on our data. Regarding the uncoupling of BOLD and CMRGlu during working memory, we acknowledge the need to further elaborate on this topic in our discussion. These suggestions and comments have been incorporated into the revised manuscript as outlined below.

      1.1) These complicated results are the main findings, and to provide a biological basis to these data they rather surprisingly, but without their own experimental evidence, conclude that the negative BOLD and negative CMRglc in PCC during attention-demanding tasks is due to decreased glutamate signaling (which was not measured in this study) and the negative BOLD and positive CMRglc in PCC during working memory is due to increased GABAergic activity (which was not measured in this study). It is rather surprising that without measurement, a conclusion is made which would at best be considered a hypothesis to be tested. Thus, independent of these hypothesized mechanisms, they need to summarize their results based on their own measurements in this study (see 3 for a hint).

      Thank you for bringing up this point and for the insightful suggestion concerning point 3. We have now explicitly stated that the interpretation regarding glutamate and GABAergic signaling is of speculative nature as theses were not measured in the current work, moreover, we have substantially reduced this section. As such, we agree with the reviewer that this represents an interesting hypothesis to be tested in future work. For further details please see response to comments 1.3 and 1.4.

      Discussion, page 16, line 341:

      On the neurotransmitter level, one of the current hypotheses regarding BOLD deactivations proposes that CMRO2 and CBF are affected by the balance of the excitatory and inhibitory neurotransmitters, specifically GABA and glutamate (Buzsáki et al., 2007; Lauritzen et al., 2012; Sten et al., 2017). In the PCC, glutamate release prevents negative BOLD responses (Hu et al., 2013), whereas a lower glutamate/GABA ratio is associated with greater deactivation (Gu et al., 2019). As glutamate elicits proportional glucose consumption (Lundgaard et al., 2015; Zimmer et al., 2017), decreases in glutamate signaling in the pmDMN could indeed explain both, the decreased BOLD response and decreased CMRGlu during the Tetris® task. Conversely, increased GABA supports a negative BOLD response in the PCC (Hu et al., 2013), as do working memory tasks (Koush et al., 2021) and pharmacological stimulation with GABAergic benzodiazepines (Walter et al., 2016). In consequence, the observed dissociation between BOLD changes and CMRGlu during working memory could indeed result from metabolically expensive (Harris et al., 2012) GABAergic suppression of the BOLD signal (Stiernman et al., 2021). However, we need to emphasize that glutamate and GABAergic signaling was not measured in the current study, thus, the above interpretations are of speculative nature. Nonetheless, future work may test this promising hypothesis, e.g., using pharmacological alteration of GABAergic and glutamatergic signaling or optogenetic approaches modulating GABAergic interneuron activity.

      Furthermore, to maintain a more concise discussion that is closer aligned with the measured results, we have removed the following paragraph:

      Discussion, page 15, line 309:

      The associations of these metabolic demands between the DMN and task-positive networks is also reflected in their distance along a connectivity gradient, which is hierarchically organized from unimodal sensory/motor to complex associative functions and the DMN being at the end of the processing stream (Margulies et al., 2016; Smallwood et al., 2021). A corresponding decrease in pmDMN glucose metabolism was observed for tasks that activate unimodal networks and the DAN, but not for the FPN. The inverse influence of attention and control networks on the pmDMN may therefore suggest that connectivity gradients are supported by the underlying energy metabolism.

      1.2) It is mentioned that the FDG-PET scans allow quantitative CMRglc, both in terms of units of glucose use but also with high time resolution. Based on the method described, it isn't clear how this is possible. Important details of either prior work or their own work have been excluded that show how the time course of CMRglc (regardless of whether it's absolute or relative) can be compared with the BOLD time course. Furthermore, it is extremely difficult to conceive that quantitative CMRglc can be estimated without additional measurements (e.g., blood samples, etc). Significant methodological details have to be provided, which even should make their way to results given the importance of their BOLD-CMRglc coupling and decoupling in the same region.

      We thank the reviewer for this important comment and apologize for the lack of clarity. We would like to emphasize that in the current work only spatial patterns of CMRGlu and BOLD signal changes were compared, but not the time course of these signals. The manuscript was edited throughout to clarify this point.

      Introduction, page 5, line 110:

      Studies using simultaneous fPET/fMRI have shown a strong spatial correspondence between the BOLD signal changes and glucose metabolism in several task-positive networks and across various tasks requiring different levels of cognitive engagement (Hahn et al., 2020, 2016; Jamadar et al., 2019; Rischka et al., 2018; Stiernman et al., 2021; Villien et al., 2014).

      Introduction, page 5, line 123

      Specifically, it is unknown whether the observed dissociation between patterns of metabolism and BOLD changes in the DMN generalizes for complex cognitive tasks, and whether this in turn depends on the brain networks supporting the task performance and their interaction with the DMN.

      Results, page 7, line 143:

      From this dataset (DS1) we evaluated the spatial overlap of negative task responses in the cerebral metabolic rate of glucose (CMRGlu quantified with the Patlak plot) and the BOLD signal specifically in the pmDMN. […] After that, the distinct spatial activation patterns across different tasks were used to quantitatively characterize the CMRGlu response of the pmDMN in DS1.

      The method of functional PET (fPET) imaging indeed enables the evaluation of changes in glucose metabolism with a relatively high temporal resolution. That is, a conventional bolus application and subsequent quantification yield a single CMRGlu image per scan of about 60 min (typical frame length ~1-5 min) or a single SUV image from a static scan. In contrast, the constant infusion employed in fPET allows to assess baseline metabolism and changes induced by different tasks in a single scan by using a frame length currently down to 6-30 s (Rischka et al., 2018), where the latter was also used in the current study. A general description of the fPET approach is now also included in the manuscript.

      Introduction, page 5, line 99:

      In this context, functional PET (fPET) imaging represents a promising approach to investigate the dynamics of brain metabolism. fPET refers to the assessment of stimulation-induced changes in physiological processes such as glucose metabolism (Villien et al., 2014; Hahn et al., 2016) and neurotransmitter synthesis (Hahn et al., 2021) in a single scan. The temporal resolution of this approach of 6-30 s (Rischka et al., 2018) is considerably higher than that of a conventional bolus administration. This is achieved through the constant infusion of the radioligand, thereby providing free radioligand throughout the scan that is available to bind according to the actual task demands. Here, the term “functional” is used in analogy to fMRI, where paradigms are often presented in repeated blocks of stimulation, which can subsequently be assessed by the general linear model.

      Regarding the absolute quantification of CMRGlu, arterial blood samples were obtained from all subjects of DS1. These were used for absolution quantification of CMRGlu with the Patlak plot. Full details were already provided in the methods section and are now also mentioned in the results.

      Results, page 7, line 140:

      Simultaneous fPET/fMRI data and arterial blood samples were acquired from 50 healthy participants during the performance of the video game Tetris®, a challenging cognitive task requiring rapid visuo spatial processing and motor coordination (Hahn et al., 2020; Klug et al., 2022). From this dataset (DS1) we evaluated the spatial overlap of negative task responses in the cerebral metabolic rate of glucose (CMRGlu quantified with the Patlak plot) and the BOLD signal specifically in the pmDMN.

      Methods, page 19, line 399:

      For glucose metabolism, these changes are absolutely quantified in μmol/100g/min with the arterial input function and the Patlak plot.

      Methods, blood sampling, page 24, line 536:

      Before the PET/MRI scan blood glucose levels were assessed as triplicate (Gluplasma). During the PET/MRI acquisitions manual arterial blood samples were drawn at 3, 4, 5, 14, 25, 36 and 47 min after the start of the radiotracer administration (Rischka et al., 2018). From these samples whole-blood and plasma activity were measured in a gamma counter (Wizard2, Perkin Elmer). The arterial input function was obtained by linear interpolation of the manual samples to match PET frames and multiplication with the average plasma-to-whole-blood ratio.

      Methods, cerebral metabolic rate of glucose metabolism, page 25, line 561:

      Quantification was carried out with the Patlak plot (t* fixed to 15 min) and the influx constant Ki was converted to CMRGlu as CMRGlu = Ki * Gluplasma / LC * 100 with LC being the lumped constant = 0.89 (Graham et al. 2002, Wienhard 2002).

      1.3) It is surmised that the glutamatergic/GABAergic involvement of these metabolic differences in PCC is from another study, but what mechanism causes the BOLD signal to decrease in both stimuli? This is where the authors have to divulge the biophysical basis of the BOLD response. At the most basic level, the BOLD signal change (dS) can be positive or negative depending on the degree of coupling with changed blood flow (dCBF) and oxidative metabolism (dCMRO2) from resting condition. Unfortunately, neither CBF nor CMRO2 was measured in this study. In the absence of these additional measurements, the authors should at least discuss the basis of the BOLD response with regard to CBF and CMRO2. If we assume that both attention-demanding and working memory tasks decreased BOLD response in PCC in the same way, we have identical dCBF/dCMRO2 in PCC with both tasks, i.e., their results seem to suggest an alteration in aerobic glycolysis with different tasks. With attention-demanding tasks, CMRglc decreases similarly to CMRO2 decreases in PCC, whereas with working memory tasks, CMRglc increases differently from CMRO2 decreases. This suggests PCC may the oxygen to glucose index (OGI=CMRO2/CMRglc) would rise in PCC attention-demanding tasks, but fall in PCC with working memory tasks. This is obviously an implication rather than a conclusion as CBF or CMRO2 were not measured.

      1.4) Given the missing attention that gives rise to the BOLD contrast mechanism, it is almost necessary to discuss the biophysical basis of BOLD contrast and specifically how metabolic changes have been linked to both increases and decreases in neuronal activity in the past. Although this type of work has largely been conducted in animal models, it seems that this topic needs to be discussed as well.

      We would like to thank the reviewer for sharing these insightful ideas and for bringing up these aspects that indeed appear to be essential for the manuscript. Since the points 1.3. and 1.4 complement each other, we have combined them and created a shared response. To fully address the points, the following paragraphs were added to the manuscript.

      Discussion, page 15, line 310:

      Metabolic and neurophysiological considerations effects

      The distinct relationships between BOLD and CMRGlu signals that emerge during specific tasks highlight the different physiological processes contributing to neuronal activation of cognitive processing (Goyal and Snyder, 2021; Singh, 2012). While CMRGlu measured by fPET provides an absolute indicator for glucose consumption, the BOLD signal reflects deoxyhemoglobin concentration, which depends on various factors, such as cerebral blood flow (CBF), cerebral blood volume (CBV) and the cerebral metabolic rate of oxygen (CMRO2) (Goense et al., 2016). In simple terms, the BOLD signal relates to the ratio of ∆CBF/∆CMRO2. Assuming that the observed BOLD decreases during Tetris® and WM emerge from the same mechanisms, this would result in a comparable ∆CBF/∆CMRO2 in the pmDMN for both tasks. Given that these types of tasks (external attention and cognitive control) elicit a reduction in CBF in the pmDMN (Shulman 97, Zou 2011), CMRO2 also decreases albeit to a lesser extent (Raichle 2001). Therefore, the respective metabolic processes can be described by their oxygen-to-glucose index (OGI), the ratio of CMRO2/CMRGlu. Accordingly, our results suggest two distinct pathways underlying BOLD deactivations in the pmDMN that differ regarding their OGI. During Tetris® there is a BOLD deactivation with a high OGI, resulting from a larger decrease in CMRGlu than CMRO2. This metabolically inactive state is in line with electrophysiological recordings in humans (Fox et al., 2018) and in non-human primates showing a decrease of neuronal activity in the pmDMN that covaries with the degree of exteroceptive vigilance (Shmuel et al., 2006; Bentley et al., 2016; Hayden et al., 2009). Therefore, we suggest that the negative BOLD response during external tasks reflects a reduction of neuronal activity and their respective metabolic demands. On the other hand, the relatively increased CMRGlu without the corresponding surge in CMRO2 hints at another kind of BOLD deactivation with a low OGI in the pmDMN during working memory, indicating energy supply by aerobic glycolysis (Vaishnavi et al., 2010; Blazey et al., 2019). Previous work in non-human primates has indeed suggested a differential coupling of neuronal activity to hemodynamic oxygen supply in this region (Bentley et al., 2016). Furthermore, tonic suppression of PCC neuronal spiking during task performance was punctuated by positive phasic responses (Hayden et al., 2009), which could indicate differences between both tasks also at the level of electrophysiologically measured activity.

      Reviewer #2 (Public Review):

      2.0) This paper provides an important and insightful investigation into patterns of activations that emerge in external task states. The authors use state-of-the-art methods and novel analytic approaches to establish that deactivations in the default mode network during external tasks are driven by activity in brain regions that are important in the current tasks (such as the visual or dorsal attention networks). It will be important in the future to understand whether this is a symmetrical phenomenon by studying this behaviour in states that maximize activity within the default mode network and also drive reductions in networks that are not relevant to these situations.

      We thank the reviewer for the encouraging feedback and the constructive comments on our manuscript. We particularly appreciate the interest in the research and the insightful suggestions for future work.

      Reviewer #3 (Public Review):

      3.0) The authors report a study where, using multiple datasets with [18F]FDG PET bolus + continuous infusion ("functional PET") and BOLD fMRI data, they re-evaluate the metabolic and hemodynamic properties of the default mode network (DMN) in a task-evoked context, with a focus on posteromedial DMN due to its relevance for across-network integration. They show how posterior DMN is differently engaged depending on the chosen task: while visual and motor tasks lead to BOLD deactivations and glucose metabolic decrease, specifically in the dorsal posterior cingulate cortex (PCC) area, working memory tasks produce BOLD deactivations but metabolic increases, specifically in ventral PCC, as shown in their previous paper (Stiernman et al. 2021, https://doi.org/10.1073/pnas.2021913118). This aims to solve the controversies elicited by findings of both increased and decreased glucose consumption in the presence of BOLD deactivation in the DMN.

      Additionally, they show how task-evoked glucose metabolism in posterior DMN seems to be shaped by that of the corresponding task-positive networks, with a positive link with dorsal attention and a negative link with frontoparietal network metabolism. This is explored using a type of directional connectivity analysis called "metabolic connectivity mapping", drawn from their previous work (Riedl et al. 2016, https://doi.org/10.1073/pnas.1513752113; Hahn et al. 2020, https://doi.org/10.7554/eLife.52443). They go on to speculate that concomitant BOLD deactivation and reductions in glucose expense might relate to decreased glutamatergic signaling, while BOLD deactivations accompanied by increased glucose consumption might depend on increased GABAergic neuronal activity.

      This is a relevant topic because it not only shows how the DMN is flexibly engaged in different tasks but also allows us to better understand the complex relationships between BOLD fMRI and [18F]FDG PET signals, which are still not fully characterized to this day. Of course, while in resting state the situation is further complicated by the more uncertain physiological meaning of the resting BOLD signal, task-evoked states are expected to provide a more interpretable intermodal link between metabolism and hemodynamics, due to the known major changes in blood flow, blood volume, and glucose metabolism - which underlie BOLD and [18F]FDG signal changes - in response to neural activation. However, even in task states, there is not always a strong association between the two responses, as previously shown by the authors themselves (Rischka et al. 2018, https://doi.org/10.1016/j.neuroimage.2018.06.079). This is something I think the authors should stress out a little more, as they have previously done (Rischka et al. 2018, https://doi.org/10.1016/j.neuroimage.2018.06.079), both in the introduction and in reference to Figure 1, which shows clear differences between BOLD and [18F]FDG activations/deactivations (e.g., widespread negative responses in the cerebellum for [18F]FDG).

      Overall, the analyses reported in the manuscript are simple and seem mostly sound, drawing from well-established methods in PET and fMRI activation studies, with additional approaches previously developed by some of the authors themselves (e.g., "metabolic connectivity mapping", Riedl et al. 2016, https://doi.org/10.1073/pnas.1513752113). Moreover, a clear strength of the paper is the high number of subjects, at least from a PET perspective, i.e., n = 50 for the Tetris task, plus group averages of previously published data for working memory (Stiernman et al. 2021, https://doi.org/10.1073/pnas.2021913118) and motor tasks (Hahn et al. 2018, https://doi.org/10.1007/s00429-017-1558-0).

      The conclusions are in line with the results, and, though a little speculative, are potentially relevant for further exploration aimed at characterizing the neurotransmitter pathways underlying positive and negative BOLD and [18F]FDG responses. Moreover, the language is sufficiently clear to allow a proper understanding of the aims and the results, as well as the details of the analyses. As a side note, the title should probably be adjusted to "Task-evoked metabolic demands of the posteromedial default mode network are shaped by dorsal attention and frontoparietal control networks", to emphasize that the findings do not necessarily generalize to the resting state.

      In conclusion, I am overall quite positive about this manuscript, which seems to nicely position itself within the existing literature, making some additional contributions.

      We thank the reviewer for the thorough evaluation and the positive feedback on our manuscript, we appreciate the constructive and insightful suggestions. We agree that the differential spatial patterns of activation between the BOLD signal and CMRGlu response require further attention. To address this point in more detail, we have added the following information to the manuscript.

      Introduction, page 5, line 110:

      Studies using simultaneous fPET/fMRI have shown a strong spatial correspondence between the BOLD signal changes and glucose metabolism in several task-positive networks and across various tasks requiring different levels of cognitive engagement (Hahn et al., 2020, 2016; Jamadar et al., 2019; Rischka et al., 2018; Stiernman et al., 2021; Villien et al., 2014). […]. However, also regional differences in activation patterns have been observed previously between these modalities in these and previous studies (Wehrl et al., 2013). Moreover, a dissociation between BOLD changes (negative) and glucose metabolism (positive) has recently been observed even in the same region of the DMN during working memory (Stiernman et al., 2021), namely the posteromedial default mode network (pmDMN).

      Results, caption Figure 1, page 8, line 173

      White clusters represent the intersection of significant CMRGlu and BOLD signal changes, irrespective of direction. Note, that also relevant differences between both imaging parameters can be observed, such as decreased CMRGlu in the cerebellum (in both datasets), without changes in the BOLD signal.

      We appreciate the reviewer’s proposal for the title as it raises awareness that the activation patterns reflect task-specific inference.

      Title:

      Task-evoked metabolic demands of the posteromedial default mode network are shaped by dorsal attention and frontoparietal control networks

      We have limited the discussion of underlying neurotransmitter effects and explicitly mention that these are of speculative nature. For manuscript adaptation on this point, we would like to refer to points 1.1, 1.3, 1.4 that address this topic as well.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors assessed the association between exposures and obesity by environment-wide and epigenome-wide association studies. The strength of this study is that exposures, body mass index, and waist-hip ratio were measured three times from adolescence to early adulthood, and the associations were repeatedly evaluated. A weakness of this study is that a loose significance threshold was used for the epigenome-wide association study and only a small number of study subjects were measured in early adulthood. Since this is an observational study, the confounding effect should be considered when interpreting the exposures associated with obesity reported in this study.

      Thank you very much for your positive comments and helpful suggestions. We agree that the study has the limitation of the loose significance threshold used for the epigenome-wide association study and the limited sample size in early adulthood. Following the reviewer’s suggestion, we have revised the threshold of significance in the epigenome-wide association study to 1×10-6. We have added more discussion on confounding, and we are more cautious in the interpretation of the results.

      Reviewer #2 (Public Review):

      Since this study is a long-term cohort study in children and adolescents, it is advisable to decide whether to highlight differences by age group or to show consistent effect after exposure. In particular, obesity and related diseases are closely related to socio-economic environmental factors, and its impact might be different according to age (group) at exposure.

      Thank you very much for your insightful suggestions. We agree that the associations of exposures, including socio-economic and environmental factors, with obesity might vary by age, that is why we examined the associations of early life exposures with BMI and WHR at different ages. It is possible that the same exposure may impact obesity differently by age, so we also assessed the associations of exposures selected at earlier ages of BMI/WHR with BMI/WHR at older ages, and compared consistency in the direction of association at different time points. We have added more explanation in the introduction and methods:

      (Introduction, paragraph 4)

      “Considering that exposures related to obesity at the outset and at the end of puberty may be different, and the associations of the same exposure with obesity may vary by age, we conducted the EWAS at different ages.”

      (Methods, Statistical analysis)

      “Third, to assess whether the associations differed by age, we checked the associations for the selected exposures from the earlier age groups (~11.5 years and ~17.6 years) in the follow-up survey (n=308) at age ~23 years and compared the direction of associations with those at earlier age groups (~11.5 years and ~17.6 years). Associations with consistent directions of associations in earlier age groups (~11.5 years or ~17.6 years) with those at ~23 years suggest a consistent association by age.”

      The part described in comparison with previous studies is a good attempt. However, some results are consistent with those of previous studies and some are not. This may be related to the time difference in socio-economic environmental factors rather than simply the difference between the West and China (Hong Kong). According to modernization/urbanization, changes in living environment, changes in family relationships, and changes in the care environment can also be factors especially in children.

      Thank you very much for your positive comments and raising this interesting point. We totally agree that inconsistency with results of previous studies is not merely due to the difference between the West and China (Hong Kong), but also related to changes in structural socio-economic and environmental factors, as well as changes in living environment, family relationships, social and community network, housing and care environment, that affect individuals’ health. Hence, we provided the necessary clarification by adding the following sentences:

      (Discussion, Strengths and Limitations)

      “Fourth, the inconsistency between some of our findings and previous studies, such as chocolate, sweets, tea and coffee consumption, should be interpreted cautiously. It may not only reflect differences between the West and China (Hong Kong), but also may be due to changes in structural socio-economic and environmental factors, as well as changes in living environment, family relationships, social and community networks, housing and the care environment.”

      In studying the effect of environment on gene expression, it can be thought that the influence of genes and the degree of expression might be different depending on the age of the subject (newborn, infant, infant, adolescent, adult) duration of exposure and these still need to be elucidated.

      Thank you very much for raising this important point. We fully agree with you. It would be interesting to examine the association of gene expression at different ages with obesity. However, we only collected blood samples at the Biobank Clinical follow-up (age ~17.6 years), so in this study we only conducted the epigenome-wide association study for DNA methylation at ~17.6 years with BMI and WHR at ~23 years. We have added this in the limitation:

      (Discussion, Strengths and limitations)

      “Fifth, we only collected blood samples at the Biobank Clinical follow-up (age ~17.6 years), so we only conducted the epigenome-wide association study for DNA methylation at ~17.6 years. It would be worthwhile to examine the association of DNA methylation at different ages with obesity.”

    1. Author Response

      Reviewer #1 (Public Review):

      This study presents a valuable comparison of fibre orientation estimates from three different modalities: diffusion MRI, scattered light imaging, and x-ray scattering. The comparison is interesting as each modality is sensitive to different aspects of tissue microstructure - water anisotropy, micron-scale structural coherence, and myelin lamella respectively. Where scattered light and x-ray imaging can be only applied ex vivo, diffusion MRI has in vivo applications but suffers from being an indirect estimate of the microstructure of interest. By acquiring all modalities in both a vervet monkey and human brain sample, the authors provide quantitative, pixel/voxel-wise comparisons of fibre orientation estimates within the same tissue samples. The authors show convincing agreement in fibre orientations from all three methods, giving confidence in the fidelity of the methods for neuroanatomical investigations. Differences are also observed: SLI is shown to have less reliable estimates of fibre inclination, and the CSD analysis presented overestimates the number of crossing fibre populations when compared to the microscopy methods, particularly in single fibre regions such as the corpus callosum, a known artefact in some diffusion analyses.

      In the current PDF, it is very difficult to see fibre orientations in figures due to low resolution, limiting the reader's ability to assess the results. Higher-resolution images would provide more information and easier comparisons.

      The methods are generally clear though some additional information is needed:

      1) to specify the resolution that the orientations are compared in each figure and how data was up-/down-sampled for these comparisons respectively. For example, each SAXS pixel contains many SLI pixels. It is currently unclear whether the mean SLI orientation from a neighbourhood is equivalent to the SLI compared, or whether a comparison was made for each SLI pixel. Similarly, for the dMRI-microscopy comparisons.

      2) I also could not follow why two SLI methods are presented in the methods: SLI scatterometry relating to Figure 2, and angular SLI relating to all other results. Further clarification is needed.

      3) Since the quality of the data co-registration can strongly impact pixel/voxel-wise comparisons, quantification of the registration accuracy or overlays demonstrating the quality of the co-registration would be valuable.

      A primary weakness of the work as a diffusion MRI validation study is that though diffusion MRI supports many different models to extract fibre orientations with different outputs, here only a single model is compared to the microscopy data, which may affect the generalisability of the results. Further, it only compares the primary orientations from the diffusion MRI and does not consider each fibre population's magnitude (density of fibres) or the orientation dispersion, both of which can influence downstream analyses.

      The paper could be strengthened by a more detailed discussion on the differences between the imaging modalities - e.g. in terms of imaging resolution, signal-generating mechanisms, and sensitivity to specific aspects of the tissue microstructure - and how these differences may limit their application to specific neuroanatomical investigations, or ability to validate one another. For example, the microscopy sections are 80 microns thick whilst the diffusion voxel is 200 microns. I expect this could contribute to the difference in the number of fibre populations per voxel.

      The hypothesis that dMRI signal contributions from extra-axonal water result in additional fibre populations could be investigated by running CSD on both low and high-b-value data (for example using the openly available MGH dataset, Fan 2016) where fewer secondary fibre populations should be observed at high b-value.

      We sincerely thank Reviewer #1 for the constructive feedback, which helped us to significantly improve our manuscript. We hope to have done our best to address all concerns:

      First, we regret the insufficient resolution of figures. The resolution must have been reduced during the submission process, when generating the pdf version of our manuscript. We have now submitted all figures as separate files with the highest possible resolution. In addition, all parameter maps are publicly available and can be opened and zoomed in, e.g. with ImageJ, to see the fiber orientations of individual image pixels.

      As requested by the reviewer, we have modified our manuscript and added additional methods information.

      1) Concerning the data up-/downsampling: We have now specified in each figure caption at which resolution the images were compared and added the following explanation to the newly named Methods section “Image registration and pixel-wise comparison”: To minimize loss of information, the pixel/voxel-wise comparisons were performed at the spacing of the highest resolution image, i.e. the lower-resolution diffusion MRI (dMRI) and small-angle X-ray scattering (SAXS) images were upscaled to match the higher-resolution scattered light imaging (SLI) images. As a result, the fiber orientation of one SAXS pixel (px=150µm) was compared to the fiber orientations of 50x50 SLI pixels (px=3µm), and not to the mean; similarly for comparisons with dMRI.

      2) Concerning the two SLI methods: We have added the following explanation to the Methods section “Scattered Light Imaging” to clarify why we used two different methods: To generate the scattering patterns (upper Figure 2C), a time-consuming SLI scatterometry measurement was performed in which the sample was illuminated from 6,400 different angles, as described in Menzel et al. (2021b). This was necessary to achieve sufficiently resolved scattering patterns for a visual comparison with SAXS scattering patterns. The fiber orientations can also be extracted from the peak positions in the azimuthal profiles (cf. bottom Figure 2C), without taking the overall shape of the scattering patterns into account. Therefore, all other results were obtained from more time- and data-efficient angular SLI measurements in which the sample was illuminated from 24 different angles around a circle and the fiber orientations were derived from the peak positions in the resulting line profiles, as described in Menzel et al. (2021a).

      3) Concerning the quality of the co-registration: We thank the reviewer for this comment. We agree that the accuracy of image registration has a high impact on pixel/voxel-wise comparisons and determines the quality of our cross-validation study. We have added a new Discussion section “Quality of cross-validation” and inserted a new figure (Figure 4–figure supplement 1) to demonstrate the accuracy of image registration, both for the vervet and human brain samples: The reference and registered images are shown both in direct comparison (top and middle images, respectively) and as overlays (bottom images), as suggested by the reviewer. Reference and registered images show good correspondence (white/gray matter boundaries coincide). Only the fornix of the vervet brain section is not aligned (it moved when re-mounting the sample) so that this region was evaluated separately, as described in the manuscript. We found standard linear transformations (scaling, rotation, and translation) to be sufficient for achieving a fair comparison between the different modalities, demonstrating the experimental feasibility of our approach. There might still be individual voxels that were not sufficiently well aligned, especially when comparing sections (SLI/SAXS) to volumetric measurements (dMRI). However, this would only increase the angular differences between the fiber orientations. Our results can therefore be considered as an upper bound. Using standard linear transformations, we could already show that in-plane crossing orientations from SAXS and SLI, and through-plane orientations from SAXS and dMRI correspond very well to each other.

      We understand the focus of our work lying rather on the cross-validation/evaluation of light and X-ray scattering, in comparison to dMRI which is much longer established, than on a “diffusion MRI validation study”: the myelin specific SAXS orientations and crossings were cross-validated with the high-resolution SLI orientations, and SLI out-of-plane fibers were validated using SAXS/dMRI as ground truth data.

      The reviewer rightly noted that we used a single analysis method to extract fiber orientations from dMRI data (based on the MRtrix3 dwi2response and dwi2fod commands, using the dhollander and msmt_csd algorithms, respectively). Although to our knowledge this method is one of the most widely used for deriving fiber orientations for subsequent tractography, it is true that other methods might yield different results and that we cannot draw conclusions for diffusion MRI in general. We have included these considerations in the newly named Discussion section “Comparison of SAXS and SLI fiber orientations to dMRI”.

      It is also true that our comparison focused on primary dMRI orientations without taking fiber density or dispersion into account. We decided to do so because deriving such metrics from SLI or SAXS data has not been implemented yet. However, we expect this to happen in the following years, enriching future studies. We have also included these aspects in the Discussion section.

      We agree with the reviewer that our paper could be strengthened by a more detailed discussion on the differences between the imaging modalities. We have added a paragraph to the new Discussion section “Quality of cross-validation”: We compared results from three different imaging techniques (SLI, SAXS, dMRI) which all have different signal-generating mechanisms and resolutions. The different resolutions should be taken into account when interpreting the comparative studies. To investigate the relationship between SLI peak distance and fiber inclination, we used dMRI/SAXS images with at least 50 times lower in-plane resolution as reference (Figure 6). This is sufficient to validate the theoretical predictions, but insufficient to validate individual pixel values. To validate crossing fiber orientations from SAXS, we used SLI images with 30 times higher in-plane resolution, leading to a broad distribution of angular differences (depending on the region), but the mean difference around zero is evidence for a good overall correspondence (Figure 4). Finally, when comparing fiber orientations in SAXS and SLI to dMRI (Figure 5), it should be taken into account that dMRI voxels (with 200µm size) contain more fiber layers than the corresponding SAXS or SLI voxels (with 80µm section thickness), so that dMRI voxels might include additional fiber populations not present in SAXS or SLI data. On the other hand, fiber orientations that occur both in dMRI and SAXS voxels – like the out-of-plane fiber orientations from SAXS and dMRI (e.g. Figure 6B-C) – can be considered as reliable, given the substantially different contrast-generating mechanisms.

      Finally, we thank the reviewer for the suggestion to study different b-values (last comment). We agree that an analysis based on different b-values might yield different results. Especially, an analysis with high b-values is expected to be more specific to the fiber orientations, as most other components of the signal would have already been attenuated. To investigate this hypothesis, we have run a separate analysis with high b-values only (5 and 10ms/μm2) and added a new supplementary figure (Figure 5–figure supplement 4) that compares the results for all b-values to high b-values only. We found that the fiber orientation distributions are almost identical between all b-values and high b-values only.

      Reviewer #2 (Public Review):

      This work is a cross-validation of an x-ray tomography technique (SAXS) and an optical microscopy technique (SLI) for imaging axonal orientations ex vivo. These innovative methods were introduced in recent papers by the authors, who have teamed up here to compare them side-by-side on the same tissue samples for the first time. The two methods are both label-free (do not require staining) and they are quite complementary. SAXS can provide full 3D orientation measurements on intact tissue, but it operates at a mesoscopic resolution and requires access to a synchrotron. SLI can measure the orientations of multiple fascicles per voxel at a microscopic resolution and relies on more widely accessible equipment, but its accuracy suffers for fiber orientations perpendicular to the imaging plane and it requires tissue to be sectioned before it is imaged. Therefore it makes a lot of sense to explore the complementary strengths of these two techniques, and to use one to "fill in the blanks" of the other. The paper also compares the orientation measurements obtained with SAXS and SLI to those obtained with diffusion MRI. The latter provides only indirect measurements based on water diffusion, at a mesoscopic resolution somewhat lower than that of SAXS, but has the benefit of being feasible in vivo.

      A limitation of this study is that conclusions on the comparison between SAXS and SLI are drawn from only 2 sections of a partial monkey brain sample and 2 sections of a partial human brain sample. Conclusions on diffusion MRI are drawn only on the 2 human sample sections. This is particularly an issue for the comparison to diffusion MRI, as the diffusion MRI voxels are wider than the section thickness, hence one cannot preclude that any orientations detected with diffusion MRI but not with SAXS and SLI come from the portion of the voxel that is missing from the corresponding SAXS/SLI section.

      The stated aim of the paper is to provide a framework for combining the complementary benefits of SAXS and SLI, rather than simply presenting the results of a cross-validation study. This is a significant and ambitious aim. However, in order for this to serve as a framework, there would have to be clear prescriptions for how researchers interested in obtaining ground-truth measurements of axonal orientations would do so by using these two methods in tandem. This is not adequately developed in the paper in its present form. For example, the results show reasonable agreement between SAXS and SLI orientations when fibers lie within the SLI imaging plane and decreasing agreement for fibers with increasing through-plane inclination. How would the two methods be combined in voxels where they disagree? Would one use SLI orientations in voxels with fewer through-plane fibers and SAXS orientations in voxels with more through-plane fibers? How would voxels be assigned to each category? How would the orientation vectors from the two modalities be composed and how would the resolution difference between the two be handled? When the through-plane measurement of SLI is unreliable, is its in-plane measurement still reliable? That is if there were one mainly in-plane and one mainly through-plane fiber population, would the orientation of the former still be measured correctly by SLI? There is also considerable agreement reported here between through-plane orientations obtained with SAXS and diffusion MRI. Would this mean that diffusion MRI itself could be used to supplement SLI with through-plane orientations? Any clear set of prescriptions along these lines would represent a framework for imaging orientations by combining modalities. This, however, would require detailed steps for how to perform the combination and use the multi- vs. uni-modal framework to reconstruct connectional anatomy.

      A key advantage of SAXS is that it can be performed on intact samples, i.e., before any nonlinear distortions of the tissue are introduced by sectioning. Thus it can provide an undistorted reference, with contrast on axonal orientations that would be absent in, say, a structural MRI of comparable resolution. This contrast could be used to drive registration of the distorted SLI sections to an undistorted SAXS volume, and therefore is a key way in which the two techniques can complement each other. Here, however, this is not explored, as SAXS is performed after sectioning. It is not clear if this is the authors' prescription for how a combined SAXS/SLI framework would be implemented, or if it was done specifically for this study.

      First, it would seem that SAXS on the intact sample would be lower maintenance, requiring less setup time and hence potentially less overall beamtime than performing SAXS on each section separately. This would make it more practical for routine deployment beyond a few sections.

      Second, because the SAXS data are now nonlinearly distorted, they cannot be affinely aligned to the MRI volumes. While, in principle, performing both SAXS and SLI on the sections may facilitate the comparison between the two, having to unmount, rehydrate, and remount the sections in between may negate this advantage, as now there is no guarantee that SAXS and SLI can be affinely registered to each other. Here all these registration steps are performed affinely, so it is unclear to which extent the computed errors between modalities are characterizing the inherent limitations of the respective contrasts, or limitations of the registration technique. Some of the alignment is performed manually, for example, specific regions of the images are realigned by hand, and the slice of the diffusion MRI volume that is aligned to the SAXS/SLI sections is chosen by hand. Again, for this to serve as a framework that can be deployed on whole samples, there would have to be clear prescriptions for how to perform these steps robustly, how to ensure that the MRI can be acquired in a coordinate frame parallel to the sections, etc.

      Finally, the paper puts forth a general conclusion that diffusion MRI overestimates the number of fiber populations per voxel, on the basis of small ODF peaks appearing perpendicular to the main ODF peaks. Of all conclusions in the paper, this is the least convincingly supported by evidence. First, these small perpendicular peaks are a known artifact, which would be typically eliminated by ignoring ODF peaks below a certain amplitude, a common practice in diffusion tractography algorithms. The authors refrain from using an amplitude threshold, with the rationale that it may also remove true diffusion orientations. However, they apply a threshold when they detect SLI peaks (a rather stringent 8% of the maximum). Second, the explanation that these artifactual peaks may appear due to vessel walls is not convincing. Vasculature is sparse. A single vessel wall will not impact the diffusion signal in the same way as a bundle of parallel axons. In an axon bundle, water molecule displacements are restricted in all directions except parallel to the axons. A single vessel wall in a voxel will not have the same effect on displacements (which are much smaller than the size of the voxel). From Figure 5, it looks like there would be at most 1-2 of these vessels in a diffusion MRI voxel, and they would not be in all voxels. This cannot explain the widespread appearance of these small artifactual peaks. Third, many ODF reconstruction methods have parameters that can be adjusted to make these artifactual peaks more or less prominent. The default parameters may be optimal for in vivo but not ex vivo data, due to the effects of fixation. In light of these concerns, I would caution against making such a general statement about all diffusion MRI in the human brain, especially on the basis of a single diffusion reconstruction method applied to a single location in one brain.

      We sincerely thank Reviewer #2 for the constructive feedback, which helped us to significantly improve our manuscript. We hope to have done our best to address all concerns:

      First, regarding the limited number of tissue sections used for our study (second paragraph):

      It is true that we only evaluated a limited number of samples – mainly due to the limited beam time available for SAXS experiments. We believe that the main conclusions concerning the cross-validation of SAXS crossing fibers and SLI out-of-plane fibers still remain valid.

      The reviewer correctly points out that the dMRI voxels (with 200um size) are wider than the section thickness (80um) so that additional fiber orientations detected with dMRI might come from the portion of voxels missing in the corresponding SAXS/SLI measurement. We have added a clarifying paragraph in the newly named Discussion section “Comparison of SAXS and SLI fiber orientations to dMRI” as well as in the new Discussion section “Quality of cross-validation”. Nevertheless, we do not expect additional fiber orientations in comparable homogeneous regions like the corpus callosum, and fiber orientations that occur both in dMRI and SAXS/SLI – like the out-of-plane fiber orientations from dMRI and SAXS (e.g. Figure 6B-C) – can be considered as reliable, given the substantially different contrast-mechanisms of the microscopy and dMRI techniques.

      Concerning the aim of our paper and the questions raised by the reviewer in the third paragraph:

      We understand that the term “framework” is not the appropriate word in this context, as it can raise false expectations. Our aim was rather to provide a basis (“groundwork”) to enable combined measurements of SLI/SAXS (and dMRI) on the same tissue samples and cross-validate the techniques (the crossing fiber orientations in SAXS and the through-plane fiber orientations in SLI have not been validated using other techniques so far). We have changed the wording throughout the manuscript, explaining that we focused on laying the “groundwork” instead of providing a “framework”, and reformulated the corresponding sentences.

      Our aspiration was to provide a protocol how the complementary imaging techniques can be performed on the same tissue sample. When talking about a “combination” of techniques, we were referring to combined measurements (i.e. measurements on the same sample), and not to a combined analysis (e.g. in form of combined parameter maps and fiber orientation vectors). The latter, while very much needed in the field, would require many more and heterogeneous samples, and work beyond the scope of this manuscript, which we hope to perform in the future. Along these lines, we have removed the term “combined” throughout the manuscript, and wrote e.g. “measurements of SLI and 3D-sSAXS on the same tissue sample” instead of “combined measurements of SLI and 3D-sSAXS” to avoid confusion.

      However, it is of course a valid question how SAXS and SLI can be combined in voxels where they disagree, how the orientation vectors can be composed, and how the resolution difference between the methods can be handled. We have added a new Discussion section “Towards a combination of SLI, SAXS, and dMRI” to elaborate on how a combined analysis (e.g. in form of combined fiber orientation maps) can be achieved and what challenges we are facing.

      Concerning the reviewer’s question if the orientation of an in-plane fiber population would be correctly measured by SLI if there was another through-plane fiber population: We only evaluated regions belonging to a single fiber population (SLI azimuthal profiles with one or two dominant peaks) and regions belonging two in-plane crossing fiber populations (SLI azimuthal profiles with two dominant peak pairs). Voxels containing both in-plane and through-plane fibers were excluded from the analysis. The determined in-plane SLI orientations can thus be considered as reliable. We have added these aspects to the new Discussion section

      “Quality of cross-validation”.

      Regarding the reviewer’s question if dMRI itself could be used to supplement SLI with through-plane orientations: Diffusion MRI could indeed be used as a reference to enhance the interpretation of through-plane fiber orientations from SLI measurements. One disadvantage over SAXS is the lower resolution and that it cannot directly be performed on the same tissue section as SLI. These aspects have also been added to the new

      Discussion section.

      Concerning the reviewer’s suggestion to perform SAXS before sectioning and the problem of image registration (fourth paragraph):

      It is true that SAXS tensor tomography can be applied to larger tissue volumes and that it is not limited to tissue sections. However, the reconstruction of crossing fibers has so far only been realized in sections (Georgiadis et al., 2022) and not in intact samples. As we wanted to cross-validate these fiber crossings using SLI as reference, we decided to perform the SAXS measurements on the same tissue sections as the SLI measurements. A comparison to results from SAXS tensor tomography might still be interesting in the future. We have added these considerations to the new Discussion section “Towards a combination of SLI, SAXS, and dMRI”.

      It is also true that cutting a section from a brain tissue sample might introduce non-linear distortions; in particular, it is challenging to identify this particular section in the original tissue volume; unmounting and remounting of an already existing section introduces much less distortions. We have added a new figure (Figure 4–figure supplement 1) which shows that a co-registration with linear transformations (scaling, rotation, and translation) is already sufficient to allow for a fair comparison between the different image modalities, both for vervet and human brain samples. Only the fornix of the vervet brain section moved during remounting of the sample, and was therefore evaluated separately, as described in the manuscript. In any case, even if the angular differences in some image pixels were larger due to an imperfect co-registration, a perfect co-registration would only yield even smaller differences. Hence, the reported angular differences can be considered as upper bound, demonstrating that SAXS and SLI fiber orientations show already a very good correspondence. We have added a corresponding paragraph to the new Discussion section “Quality of cross-registration”.

      Finally, we agree that a clear prescription would be necessary to enable combined analysis on whole tissue samples. As mentioned further above, our aim was to provide the groundwork for combined measurements on the same tissue sample and cross-validate the different techniques, and not to provide combined fiber orientation maps or similar. We have added our thoughts on how to combine the different image modalities to the new Discussion section “Towards a combination of SLI, SAXS, and dMRI”.

      Concerning the final concern of the reviewer that an overestimation of the number of fiber populations per voxel is not sufficiently supported (last paragraph):

      We understand this concern and have removed all phrases that could be understood as generalized claims for MRI, including any reference to fiber orientations overestimation. Furthermore, we have extended the Discussion to indicate the non-generalizability of our results.

      Regarding the first point that the minor perpendicular ODF peaks could be removed by applying a suitable amplitude threshold: This is a valid remark and was discussed partly in the first version of the manuscript, when referring to increasing the threshold of secondary lobes prior to running tractography algorithms and to the problem that it might decrease the sensitivity for the cases where there exist actual but less prominent secondary fiber populations. We have extended the Discussion to address the concerns of the reviewer.

      Regarding the second point that the minor ODF peaks are probably not caused by vessel walls: We thank the reviewer for the valid remarks and have removed all mentions of blood vessels in the manuscript, including the arrows in Figure 5H.

      Regarding the third point that parameters can be adjusted to make the artifactual peaks more/less prominent, and that default parameters might be optimal for in vivo but not ex vivo data: We have added the remark that model parameters can be fine-tuned to decrease the percentage of false-positives to the Discussion.

      Finally, it is true that we only used a single diffusion reconstruction method and measured only a single location in one human brain with dMRI. As mentioned at the very beginning, the number of samples was limited, and we included the reviewer’s concerns in the newly named Discussion section “Comparison of SAXS and SLI fiber orientations to dMRI”. For the main purposes of the paper like the cross-validation of out-of-plane fibers in SAXS/SLI, the dMRI data was still sufficient as we could show a good correspondence between dMRI/SAXS in these regions.

    1. Author Response

      Reviewer #1 (Public Review):

      The study tackles the topic of male harm (sexual selection favoring male reproductive strategies that incur a reduction of female fitness) from an interesting angle. The authors put emphasis on using wild-collected populations and studying them within their normal thermal range of reproductive conditions. Where previous studies have used temperature variation as a proxy for stressful environmental change, this approach should instead clarify what can be the role of male harm on female fitness in natural conditions. A minor caveat regarding this point is the fact the polygamy treatment also has a heavily male-biased sex ratio (3:1). The authors argue that this sex ratio is within the range of normal variation in that species, but it is likely that the average is still (1:1) in natural populations and using a male-biased sex ratio could magnify the intensity of male harm. This does not undermine the conclusions regarding the temperature sensitivity of sexual conflict but should be acknowledged.

      The authors find that varying temperature within a range found in natural conditions affects the reproductive interactions between males and females, particularly through male-harm mechanisms. Male harm, measured as a reduction in lifetime reproductive success (LRS) from monogamy to polygamy settings is present at 20C, stronger at 24, and absent or undetectable at 28C. Female senescence is always faster in the polygamy mating systems as compared to monogamy, but the effect appears strongest at 20C. Mating behaviors of males and females in these different settings are used to attempt to uncover underlying mechanisms of the sensitivity of male harm to temperature.

      A weakness of the manuscript in its current form is the lack of clarity about the experimental design, which makes understanding the results a long and involved procedure, even for someone who is familiar with the field. If the authors consider revising the manuscript, I suggest giving a better overview of the experimental design(s) earlier in the manuscript, perhaps supported by a diagram or flowchart. I also suggest structuring the results better to aid the reader (e.g., make clearer distinctions between results that come from the different experiments). Finally, some additional figures and statistical tests corrected for multiple testing would help get a better feel of some aspects of the dataset.

      I believe that the conclusions are generally justified and the results overall convincing. Overall, this is an impressive study with a lot of dimensions to it. Its complexity is a challenge and may require additional effort from the authors to make it easier to access. The core of the question is answered by LRS measures, but the authors have also provided a wealth of behavioral data as well as other fitness components. The manuscript could be greatly improved by putting more effort into linking the different metrics together to track down potential mechanisms for the observed variation in male-harm-induced reduction in female LRS. The discussion would also benefit from considering the female side of the sexual conflict coevolution arms race.

      We are thankful for the nice words and constructive appraisal of our work. As stated above, reviews like this are extraordinarily helpful. The reviewer mentions four main points that we have addressed:

      1. We now expand a bit on the justification to use a (3:1) male-biased sex ratio in the methods section (lines 150-155). We also acknowledge potential limitations of this design in the discussion (lines 563-571).
      2. To clarify the methods, we have placed this section before the results. This, in itself, has significantly improved the clarity of the manuscript. We have also substantially re-written the methods and results (including adding some tables) to streamline the text while providing all the necessary details, and have also included several diagrams to illustrate all our experiments (in the SM, see Figs. S1.1 to S1.5) along with a general schematic figure of the general design that we present early on in the main text (in the introduction, see Fig. 1).
      3. As suggested, we have re-run all analyses using the Benjamini-Hochberg procedure in order to correct for inflation of type I error rate due to multiple testing. We have also included in the SM a complementary set of models that also test for this via post hoc Tukey contrasts. Both these approached corroborate our initial findings, and thus contribute to strengthen our results.
      4. We now explicitly discuss the female side of things in the discussion (lines 636-647).

      Reviewer #2 (Public Review):

      Londoño-Nieto et al. investigated the influence of temperature on the form and intensity of sexual conflict in Drosophila melanogaster. They aimed to test the effect of naturally occurring temperature fluctuations on a wild population of Drosophila while disentangling pre- and postcopulatory episodes of sexual conflict. To this end, they exposed females to males under monogamy or polyandry, hence manipulating the degree of male harm experienced by females. The effect of temperature was explored by exposing these groups to 20, 24, or 28{degree sign}C. They found that female fitness suffered from male harm most at 24{degree sign}C and less at the other two temperatures. Interestingly, pre- and postcopulatory episodes of sexual conflict were affected differently by temperature. Overall, these data suggest that the relationship between sexual conflict and temperature can be strong and complex. Hence, these results can have important implications for the impact of sexual conflict on population viability, especially in light of the climate crisis.

      We want to thank the reviewer for the time invested in reading and reviewing our work. We are glad to read that the reviewer found our results interesting and considered our study to be of importance to the field.

      This paper tackles a highly relevant question using an established model organism for sexual conflict and contains a rich dataset obtained using a series of carefully planned experiments and analysed in an appropriate way. Importantly, the authors used biologically meaningful temperatures and mating treatments, which increases the relevance of the data. The main conclusions are well supported by the data. Nevertheless, the devil is in the detail, and given the way the authors frame their study (i.e. testing a natural population under naturally occurring temperature fluctuations) and their results (i.e. sexual conflict is buffered by temperature effects in the wild) there are some limitations to be considered:

      We appreciate the positive feedback! The reviewer identified potential limitations and made good suggestions that have only served to improve our manuscript considerably, for which we are very grateful. Details follow on how we have dealt with each specific comment.

      1) The authors frame their study as addressing the question of how sexual conflict reacts to naturally occurring temperature fluctuations in the wild. Nevertheless, the population used in this experiment had been kept for nearly 3 years in the laboratory prior to the experiment. Importantly, the authors ensured that the laboratory population maintained genetic diversity, by regularly crossing wild lines into it. Nevertheless, this population remained for some time in the laboratory under standardized conditions. The applied temperature fluctuations are in a biologically meaningful range (though only during the reproductive season), but it remains unclear if the applied fluctuations were in a standardized way (i.e. pre-programmed) or included random fluctuations (i.e. a more natural setting). This laboratory setup has certainly clear advantages, for example, it enables the exclusion of any effects other than the temperature on sexual conflict. Nevertheless, how these will then ultimately play out in the wild could be a different story.

      Agree. We clarify now that we meant pre-programmed fluctuations and acknowledge this limitation in the methods (lines 124-131).

      2) The authors highlight clearly that temperature fluctuations in the wild might play an important part in how sexual conflict plays out in natural populations. This very interesting and highly relevant point might lead the reader to assume that this is what was actually tested in the experiment. Nevertheless, in the experiments, different constant temperatures were applied to the flies, while only the stock population was kept at a fluctuating temperature regime. Hence, the influence of fluctuations during episodes of sexual conflict remains untested. While the present data show that sexual conflict can be modulated by temperature, the effect of naturally occurring fluctuations on the net cost of sexual conflict to a population remains unclear.

      Again, a fair point that we acknowledge in the current version (lines 571-575). “Second, our treatment temperatures were stable, designed to study how coarse-grain changes in temperature across the adult lifespan of flies may influence how sexual conflict unfolds in nature. Thus, future studies will need to encompass how fine-grained fluctuation (i.e., repeated variation of temperature across an adult’s lifespan) may affect male harm for a more comprehensive picture of temperature effects on sexual conflict in the wild”.

      3) The authors conclude that the effect of sexual conflict can be buffered by temperature in the wild. In general, I agree with this, although a more conservative way of framing this would be to say that temperature modulates or moderates sexual conflict instead of buffers it. If there really is a buffering effect of temperature in the wild remains to be tested, I believe. This will depend on how actual changes in temperature affect this dynamic (see point 2). In addition, I think another interesting open question is what the mechanism behind the observed differences might be. Are male and female interests really more aligned at different temperatures (i.e. males plastically reduce harm)? This would really buffer the harm of sexual conflict at those temperatures. Nevertheless, alternatively, males might not be perfectly adapted to manipulate the female optimally at lower or higher temperatures. This would mean that if the temperatures change, males might evolve to increase the manipulation of females, and hence the scope for sexual conflict might not change in the end under this scenario. Nevertheless, as the authors themselves state: 'An intriguing possibility is thus that SFPs are more effective at lowering female re-mating rates at warm temperatures, thereby buffering these costs.' Therefore, a temperature-dependent increase in the effectiveness of male manipulation might counterintuitively reduce sexual conflict in this species.

      We echo both points in the current version of the paper (see lines 633-655).

      4) In the end the authors argue that the climate crisis might have 'unexpected positive consequences via its effect on male harm'. Sexual conflict is indeed widespread, but it takes many different forms (as has been nicely described in the introduction of this paper). Because the studied system seems to be quite a specific example, it is questionable how far spread this phenomenon is in nature. In addition, it remains unclear how male harm will evolve in response to the climate crisis (see point 3). Finally, the relative fitness of females increased in the present experiment, as the tested range was within the reproductive optimum of the species. Nevertheless, the relative importance of the positive effect of sexual conflict on fitness outside of optimal temperatures seems questionable.

      Agree. Altogether, we have tried to tone down our conclusions regarding the implication of our results for a climate change scenario, and acknowledge all the points highlighted by the reviewer in the current version of the manuscript (see lines 563-575).

      Nonetheless, I believe these results to be of exceeding interest to the scientific community and of importance to the field. It opens up many potential research directions and adds further data to the fascinating field of sexual conflict, SFPs, and male harm in Drosophila.

      We are thrilled to read that the reviewer found our study of exceeding interest.

      Reviewer #3 (Public Review):

      In this paper, the authors explore the effects of the environment, specifically temperature, on male harm to females. Male harm is the phenomenon where males reduce female fitness in polyandrous systems, where a single female may mate with multiple males. The selection of males to increase their reproductive success in male-male competition can lead to genetic conflict that increases male fitness at the expense of female fitness. Typically, male harm has been studied in single environments under optimal conditions. However, there is an increasing focus on the effect of the environment on fitness costs of male harm to females, as a way to better understand the effect of male harm on population fitness in more realistic ecological contexts. In this paper, the authors add to these studies by exploring the effect of temperature on male harm and female fitness, using the fruit fly Drosophila melanogaster, as a model system. They find that temperature affects the impact of male harm on female fitness, with male harm having the greatest effect at 24˚C relative to 20˚C and 28˚C. The authors then go on to disentangle how temperature affects the various components of male harm that impact female fitness (e.g. harassment, ejaculate toxicity). The paper demonstrates that male harm depends on ecological context, which has implications for understanding its impact on population fitness under realistic ecological scenarios, particularly with respect to climate change.

      The strength of the paper is that it demonstrates that male harm (presented as differences in female life reproductive success between monogamous and polyandrous matings) changes with temperature. The authors dissect this general observation by showing that different aspects of precopulatory reproductive behavior, for example, male-male aggression, copulation rate, and female rejection rate, also change with temperature. Further, they demonstrate that correlates for male ejaculate quality also change with temperature, suggesting that temperature also affects postcopulatory mechanisms of male harm.

      The weakness of the paper is that the method and results section are difficult to follow, which negatively impacts the interpretation of the data. The experiments are complex and need to be for what the authors are studying. Nevertheless, the paper is written in a way that makes it challenging for the reader to fully understand how precisely the experiments were conducted. Further, the authors do not explain clearly how some of the experiments relate to the phenomenon ostensibly being assayed. For example, a more detailed explanation of why mating duration and remating latency are assays for ejaculate quality in the context of sperm competition would be very helpful in interpreting the data. Further, a clearer explanation of the statistical analyses conducted

      Thank you for the positive, detailed and constructive review. We agree with all the weaknesses laid out and we have strived to address all of them in the current version. This includes a mayor rearrangement, structuring and re-write of the methods and results section and extra statistical analyses. Please find the details below.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors generated a detailed single-cell RNAseq dataset for the microfilariae stage of the human nematode parasite Brugia malayi. This is an impressive and important achievement, given that it is difficult to obtain sufficient material from human parasites and the microfilariae are protected by a chitin sheath. The authors collected microfilariae from jirds and carefully worked out a protocol of digestion, dissociation and filtering, to obtain single-cell material for sequencing.

      The single-cell resource was complemented with a dataset derived from FACS-sorted large secretory cells, allowing the identification of several specific proteins expressed in this unique microfilarial cell-type important for immune evasion.

      The authors also generated new data for secretory cells of Caenorhabditis elegans and concluded that there is limited similarity between the composition of Brugia and C. elegans secretory cell types.

      In a further set of experiments, the authors analysed gene expression changes in dissociated Brugia cells to the commonly used anthelminthic drug ivermectin. This revealed specific gene expression changes across various cell types, providing new insights into how the drug effects the parasite.

      Finally, the authors developed a method to keep dissociated Brugia cells alive in culture for two days. This method will aid cellular studies of this parasite.

      The authors may want to explore the new resource in more detail to reach more specific biological conclusions. For example, the authors mention that the large secretory cells are critical to parasite survival and immune evasion. With a more complete list of genes expressed in these cells the authors could try to reach more specific conclusions or predictions. Are there newly identified secreted factors that could contribute to immune evasion? It would be important to read in more detail about such proteins (including an analysis of the sequences and phylogenies), especially if the authors could identify new candidates as potential vaccine or diagnostic targets. Likewise, can the data be used to understand in more detail the mechanism of immune evasion or ivermectin action?

      Thank you for this comment. We have since added a source data file with the list of secretory cell DEGs along with gene ontology (GO) analysis. We have added a main figure to the revised manuscript that takes a deeper look at transcripts enriched in the secretory cell compared to other annotated cell types. Lastly, we included a deeper look at the paralogous expansion of C2H2 transcription factors that localize near exclusively to the secretory cell. This family of transcription factors is diverse and the significant presence in the secretory cell may play a role in adapting to varying host environmental conditions or in the expression of proteins contributing to immune evasion. Our single-cell data show specific transcriptional shifts in cells expressing putative IVM targets and recapitulate changes identified in whole-parasite drug exposure experiments and highlight the importance of cell connectivity to the in vitro phenotype. These supplemental analyses of the secretory cell will seed future lines of investigation about secretion and aid in further dissecting anthelmintic mode of action.

      The authors searched for known secreted proteins, including antigens, vaccine targets, and diagnostic markers and mapped the expression of these to the single-cell atlas. It is not clear from the paper how comprehensive previous studies to identify secretory proteins were. With the new resource in hand, the authors could look at all secreted proteins (with a signal peptide) expressed in the ES and other cells. The paper would benefit from a more comprehensive overview of the classes of secretory proteins and their expression.

      Thank you for this suggestion. We have completed a computational prediction of signal peptides in differentially expressed secretory cell transcripts (Figure 4) and show that although there is an enrichment of signal peptide-containing sequences enriched in the secretory cell compared to other cell types, less than half of the proteins identified contained signal peptide sequences.

      This was unsurprising as most secreted proteins identified in the literature (diagnostic and vaccine targets) do not have a signal peptides. The routes of exit for these prominent circulating targets remain murky. We also carried out transmembrane prediction on protein-coding genes that are differentially expressed in the secretory cell (and other cell types) and note that some of these are established components of exosome-like vesicles, emerging as important players in host modulation. This additional analysis has been added to a new figure (Figure 4) and the accompanying results section.

      The authors show that an abundance of C2H2 transcription factors is localizing almost exclusively to the secretory cells. It would be useful to see a classification of these proteins and phylogenetic analysis relating them to C2H2 from C. elegans and other animals.

      The C. elegans genome contains 106 annotated C2H2 zinc finger transcription factors. Based on a reverse phylogenetic approach, we identified a total of 241 orthologous C2H2 zinc finger transcription factors in B. malayi, many of which exhibit strong and/or exclusive expression in the secretory cell. This analysis has been added to an additional figure (Figure 4) describing the secretory cell in more depth alongside signal peptide and transmembrane domain analysis of differentially expressed genes in the secretory cell compared to other identified major cell types.

      In general, a more detailed bioinformatic analysis of secretory products and more discussions of potential functions (e.g. serpins etc.) would make the paper more interesting and could stimulate more mechanistic thinking.

    1. Author Response

      Reviewer #1 (Public Review):

      Tunneling nanotubes, contrary to exosomes, directly connect remote cells and have been shown to allow the transfer of material between cells, including cellular organelles and RNAs. However, whether sorting mechanisms exist that allow to specifically transfer subspecies of RNAs, especially of mRNA, has not been shown, and the transcriptional consequences of RNA transfer have not been addressed yet.

      Using cocultures (or mix or single cultures as controls) of human MCF7 breast cancer cell line, and immortalized mouse embryo fibroblasts (MEFs), followed by separation of human and mouse cells by cell sorting, the authors performed deep sequencing of the human mRNAs detected in mouse cells. An accurate analysis of the transferred material shows that all donor cell mRNAs transfer in a manner that correlates with their expression level, with less than 1% of total mRNA being transferred in acceptor cells.

      These results show that the process of RNA transfer is nonselective and that the consequences on the cells receiving the RNAs should depend on the phenotype of the sending cells.

      Although we did not address this last point in the original paper, we concur with this statement since we presented evidence to this effect in our previous publication (Haimovich et al., 2017) and which we discussed in the in the original Discussion section (lines 498-508 in the original manuscript; lines 529-539 in the revised manuscript). We have now amended the Introduction (line 91 of the modified manuscript) to reflect this idea.

      These results are complemented by the last part of the manuscript where the authors convincingly show that the coculture of the two cell lines results in significant transcriptomic changes in acceptor MEF cells that could become CAF-like cells.

      Reviewer #2 (Public Review):

      In this manuscript, the authors characterize the extent of RNA transfer between cells in culture, with an emphasis on trying to identify RNAs that are transferred through tunneling nanotubes (TNTs). They use an in vitro human-mouse cell co-culture model, consisting of mouse embryonic fibroblasts and human MCF7 breast cancer cells. They take advantage of the CD326 cell surface molecule, which is specifically expressed on MCF7 cells, to separate the two cell populations using magnetic beads conjugated to anti-CD326 antibodies, followed by deep sequencing to identify human RNAs present in mouse cells. They identify many 'transferred' RNAs. Further analysis of sequencing data together with experiments using synthetic reporters indicate that RNA transfer is non-selective, that the amount of transfer strongly correlates with the level of expression in donor cells, and does not appear to require specific RNA motifs. The authors also note that co-culture with MCF7 cells leads to significant changes in the MEF transcriptome.

      The experiments are overall carefully designed, and the data are clearly and quite carefully presented to point out limitations in interpretation and to distinguish speculations from experimental conclusions.

      We thank the reviewer for this comment.

      It should however be kept in mind that it is unclear to what extent these limitations influence the conclusions reached. For example, the identification of transferred RNAs relies on the purity of the isolated cell populations ad, while the authors provide some supporting evidence for this, nevertheless potential caveats remain. For instance, the isolated MEF samples used for analysis appear to lack single MCF7 cells, but still contain components, labeled as 'double stained' and 'unstained' cells, which are uncharacterized. The authors present some arguments as to why these would not contribute to 'transferred' reads, but given the low level of detectable transferred RNAs, and the unclear origin of these components, whether they influence the results could be debatable.

      It is unlikely that these populations contributed to the human mRNA signals in the MEFs, since the percentage of these populations was substantially higher in the “Mix” samples than in the “Co-culture” samples. We now added the following text (lines 174-181 in the revised manuscript) which clarifies this point: “In addition, we found small sub-populations of double-stained and unstained cells within the purified populations that we suspect are mostly MEFs (see Methods). These sub-populations were greater in the Mix-derived MEFs vs. the Co-culture-derived MEFs (i.e. 0.08% and 0.03% double-stained, and 2.8% and 2.67% unstained in Mix samples vs. 0% and 0.03% double-stained, and 1% and 1.9% unstained in the Co-culture samples). As a consequence, if these double-stained and unstained cells had contributed to the background of human reads in the MEFs, we would’ve expected to have many more human reads in the Mix-derived MEFs.” However, this was not the case, rather we observed a 6.6-fold increase in human RNA presence in the Co-culture-derived MEFs (versus that in the Mix-derived MEFs) after subtraction of the single culture background. In addition, we note that the level of detectable human RNAs in the MEFs is not low, rather it is the percentage of human RNA that undergoes transfer that is low.

      Furthermore, the small number of replicates (2 replicates for the genome-wide studies and 1 replicate for most of the subsequent experiments) minimizes the confidence in the conclusions.

      We apologize for not stating it clearly that the smFISH, RT-qPCR ,and quadrapod experiments were all performed in 2 replicates. This information has now been added to the figure legends.

      In this context, it is also notable that the profile of transferred RNAs between the two replicates of co-cultured samples appears quite different by PCA analysis. It is thus conceivable that there might be specificity in the RNA 'transferome', influenced by unknown experimental variables, which is though masked when averaging those samples in subsequent analyses.

      We have replied to Reviewer #1 on this issue. PCA analysis (Figure 2B) of the heat map data (Figure 2A) reveals the similarity between the different samples, whereby 78% of the variability in the data is revealed by PC1 and 6.7% by PC2. Given that PC2 measures only 6.7% of the variation in the data, it likely results from small differences in the individual co-culture samples (such differences are often observed within replicas of RNA-seq experiments) and not via major differences in the measured transferomes. This indicates that the co-culture samples were overall quite similar as can also be observed from the heat map shown in Figure 2A, as differentiated from the controls (e.g. Mix, Single culture). Thus, we do not believe that further replicas will greatly change the results showing the abundant presence of human RNAs in the mouse cells after subtraction of the Mix background. We included additional sentences in the text and figure legend to clarify this point (lines 208-212 in the revised manuscript).

      While the manuscript emphasizes the role of TNTs in RNA transfer, the actual involvement of TNTs relies solely on the observation that potential TNTs form between co-cultured cells. Other means of transfer, such as through engulfment or phagocytosis of cell fragments, could still possibly contribute.

      While it is possible that transfer might occur through other means, our earlier paper (Haimovich et al., 2017) showed that engulfed apoptotic bodies rarely contribute to mRNA transfer, even upon near-100% of donor cell death. Moreover, RNAs in apoptotic bodies found in acceptor cells can be clearly identified by smFISH, as the RNAs are tightly clumped together. Likewise, our quadrapod experiments (Figure 6-figure supplement 1) might have revealed RNA transfer if engulfment of cell fragments had occurred.

      Furthermore, the dependence of mRNA transfer on direct cell-to-cell contact is demonstrated for 5 RNAs and extrapolated to transcriptome-wide RNA transfer, an assumption which might, or might not, be valid.

      We concur that we extrapolate from the few validated examples and have now added the following text (line 604-611 in the revised manuscript): “We validated several examples of transferred mRNAs that transfer via a contact-dependent mechanism, likely TNTs (Figure 6 and Figure 6-figure supplements 1 and 3), and extrapolate from them to the entire transcriptome. Although it is possible that some or many mRNAs transfer by means other than TNTs, we think it unlikely, since the results on TNT-mediated cell-to-cell transfer in both this and our previous publication (Haimovich, 2017), as well as by others (Ortin-Martinez et al., 2021; Su and Igyarto, 2019), tested a variety of mRNAs from different families and which localize to various sub-cellular localizations. This indicates that the pathway we have uncovered is more general than the few examples presented here.” In addition, we now cite in the Discussion (lines 611-621 in the revised manuscript) a new pre-print recently posted to bioRxiv that shows similar results of mRNA transfer in a human-mouse cells co-culture model.

      Finally, the results on gene expression changes induced by co-culture (Figures 7, 8) are of unclear relevance. As the authors point out, it is uncertain whether RNA transfer or other paracrine or adhesion-mediated signaling events, underlie these changes. It is therefore not easy to see how these results relate to the rest of the presented work. Furthermore, while the authors expand on the potential significance of changes observed in genes related to cancer-associated fibroblasts or to immunity-related genes, these remain speculative and untested.

      We concur that the part of the paper regarding the consequences of co-culture (upon the endogenous transcriptome) does not clarify the specific contribution of the “transferome” to the phenomenon. Future co-culture studies measuring transcriptome-wide transfer using the quadrapod co-culture system versus cell-cell contact co-culture could be performed. Yet, to make the distinction between TNT-dependent and -independent effects when cells are in contact will require further mechanistic knowledge of TNT-mediated mRNA transfer, which is beyond the scope of this paper. Nevertheless, we believe that the data on the endogenous gene expression in co-culture is important and could be useful to the cancer research community outside the context of the transferome information.

      Overall, the manuscript presents evidence indicating that RNA is transferred non-selectively in co-cultured cells, under specific conditions and between the cell types tested. The impact of the work is reduced by the lack of mechanistic understanding underlying this transfer and the uncertainty of whether this phenomenon has any subsequent physiological relevance.

      Our global analysis of TNT-mediated transfer (the transferome) is only a second step towards understanding this important and only recently identified process (i.e. the first step). Obviously, we would be happy to gain more mechanistic insight and knowledge of physiological relevance. We are currently working on several projects to try and answer some of these questions, but as one can understand, these are technically challenging, and have not yet come to fruition.

    1. Author Response

      Reviewer #1 (Public Review):

      The human genetic variant Dantu increases the surface tension of red blood cells making it hard for malaria parasites to invade. This was shown beautifully by Kariuki et al in 2020 (doi.org/10.1038/s41586-020-2726-6) by analysing blood from children using in vitro assays with cultured malaria parasites. Now Kariuki et al show that parasite growth is indeed restricted in vivo by infecting Dantu adults under controlled conditions with cryopreserved Plasmodium falciparum sporozoites and analysing parasite growth by qPCR. The authors compare parasite growth, peak parasitaemia and if / when treatment was sought for malaria symptoms between non-Dantu (111) and Dantu heterozygous (27) and homozygous (3) participants. Dantu either completely prevented malaria parasite detection in the blood (for 21 days) or slowed down parasite growth considerably.

      The authors present compelling in vivo evidence that Dantu conveys protection by preventing malaria parasites from establishing a blood-stage infection. Because the effect on parasite growth is crystal clear the link to uncomplicated malaria follows - no/less parasites leads to less participants experiencing malaria symptoms and seeking treatment. It should however be noted that the paper does not show that Dantu reduces symptomatology at identical parasite densities to non-Dantu. Its protective effect seems to be purely parasitological.

      Given that all volunteers were exposed to malaria prior to being experimentally infected (in various transmission settings ranging from low to high) the authors state that they adjusted for factors like schizont antibody concentration in their multi-variate analysis. More details on the assumptions and which dependent / independent variables were included would benefit interpretation. It would be also good to see if Dantu individuals were spread homogeneously across all transmission settings - if e.g. they all had history of intense malaria exposure and thus strong pre-existing anti-malaria immunity this might account in part for reduced parasite growth when compared to non-Dantu from lower transmission settings. Being able to de-convolute the effect of pre-existing immunity from Dantu would strengthen the paper.

      Thank you for the positive feedback and summary of the key findings. We absolutely agree that breaking down the impact of Dantu genotype by transmission would have been very interesting, but the sample numbers for some of the genotypic groups were simply too small to make stratification by area of residence meaningful. Instead, to address the core issue of whether prior immunity is a complicating factor in our analysis, we used measurements of antibodies to whole schizont extract as a proxy indicator of transmission setting or “malaria exposure” in our multivariate analyses. There was no difference in anti-schizont antibody levels across Dantu genotype groups – these data are now included in Figure 3 – figure supplement 1, as requested. This suggests that differences in pre-existing anti-malaria immunity between Dantu and non-Dantu cannot explain the differences seen in our current study. Regarding the comment about assumptions and variables in the multivariate analysis, we have added more details as requested, as outlined in further detail in subsequent points below.

      The authors also presents data on other red cell polymorphisms known to modulate malaria infection and improve outcome: G6PD, blood group O, alpha thalassaemia and ATP2B4. However, no statistically significant differences between non-carriers and hetero/homozygous individuals were observed. This is probably because these mutations exert their effect not directly on parasite growth but modulate disease symptoms when parasite burden is high - which cannot be investigated in controlled human malaria infection settings as ethical considerations mandate treatment of all volunteers at parasite densities >500 parasites/ ul or any parasitaemia with symptoms. Controlled infections need to be complemented with other methods to understand the protective impact of genetic polymorphisms.

      We thank the reviewer for this helpful observation with which we completely agree. To acknowledge this issue, we have added some consideration of this point to the Discussion section of the revised manuscript, within the sub-section that discusses protective mechanisms of other red cell polymorphisms on page 14.

    1. Author Response

      Reviewer #2 (Public Review):

      Despite high bone mineral density, increased fracture risk has been associated with T2D in humans. In this study, the authors established a model that could mimic some aspects of T2D in mice and then study bone turnover and metabolism in detail.

      Strengths

      This is an exciting study, the methods are detailed and well done, and the results are presented coherently and support the conclusions.

      Previous work from Dr. Long's group over this last decade has established a requirement for glycolysis in osteoblast differentiation. They showed the requirement for glycolysis not only for the anabolic action of PTH but also as an effector downstream of Wnt signaling. Using the T2D mouse model they have generated, they test if manipulating glycolysis and oxidative phosphorylation can rescue some of the detrimental effects on bone in this model. They use several novel approaches, they use glucose-labeling studies that are relatively underutilized, and it provides some insights into defective TCA cycle. They also utilize BMSCs that have been sorted for performing single-cell sequencing studies to identify specific populations modified with T2D. Unfortunately, the results are modest and need some clarification on what these populations add to the story.

      We appreciate the positive comments. Although T2D had only modest effect on the relative pool size of each cell population, the changes in metabolic pathways (glycolysis and oxphos) in several clusters were notable and provided support to the central notion that T2D altered cellular metabolism in osteoblast-lineage and other bone marrow cells.

      The authors use two approaches: a drug (Metformin) and a number of mouse genetic models to over-express genes involved in the glycolytic pathway using Dox inducible models. The results with overexpressing HIF1 and PFKFB3 show a potential rescue of bone defects with T2D, and Glut1 overexpression does not rescue T2D-induced bone loss.

      Concerns

      The authors have generated several overexpression models to manipulate the glycolytic pathway to recuse T2D-induced bone loss. The use of DOX in drinking water has been shown to affect mitochondrial metabolism. Did the authors control for these effects? Since both the groups of mice got the DOX in drinking water, there is internal control.

      The experiments were controlled for any potential effects of DOX per se as all animals were subjected to the same DOX regimen.

      Only one of the rescue experiments had control with the Chow diet. There are some studies that have shown a high-fat diet to be protective of bone loss in TID models.

      We have now added the chow diet control for the Hif1a rescue experiment as well (Fig. 7).

      The use of metformin to correct metabolic dysfunction and, thereby, bone mass is an exciting result. Did the authors test to see if they had in any way rescued this phenotype because of reducing ROS levels? The decrease in OxsPhos seen with the seahorse experiments suggests there could be mitochondrial dysfunction often associated with ROS generation.

      I appreciate the reviewer’s insight here. We have not examined ROS levels but agree that changes in ROS levels could potentially contribute to the bone phenotype in diabetes.

      All of the experiments used male mice (because STZ use and ease of T2D establishment in males). It would be better if this were made clear in the title.

      The title has been revised to specify male mice.

      Is the T2D model presented really represent what is observed in humans? Some experiments to test the other factors implicated in T2D and whether those are modulated in the rescue experiments might help address this.

      Our T2D model exhibited all typical features of T2D patients, those including obesity, glucose intolerance and insulin resistance. We have shown that metformin modestly improved glucose tolerance and insulin sensitivity in the T2D mice (Fig. 6C, E). We have not examined whether those global metabolic features were modulated in the genetic rescue experiments which targeted only osteoblasts.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper establishes a strong case for the post-translational modification of C/EBPalpha to play a strong role in its effects, in this case, to promote macrophage differentiation in collaboration with PU.1. The cellular system being used for most of the experiments here takes advantage of the dual roles of PU.1 in B cells, which normally do not express C/EBP family factors, and in myeloid cells, which normally do express C/EBP family factors. The authors and others have previously shown that PU.1 and C/EBPalpha are very powerful collaborators, both needed to establish a macrophage identity. Thus, the title of the paper provocatively implies that the C/EBP modification that keeps it from being methylated on Arg35 works by increasing the re-distribution of PU.1 from B cells to myeloid gene sites in combination with C/EBP. Indeed, the authors show proximity ligation data to show that PU.1-C/EBPalpha juxtaposition is more frequent in the nucleus if C/EBPalpha cannot be Arg-methylated. The paper also shows careful and thorough characterization of the B to myeloid lineage conversion gene expression changes and the mapping of the Arg residues in C/EBPalpha that are most important to keep demethylation. Similarly, the paper provides strong evidence that it is Carm1, and not another protein arginine methyltransferase, that is responsible for the regulatory modification. This is a valuable and well-characterized demonstration of a mechanism that should be considered more generally as a regulator of transcription factor action.

      The mechanism proposed by the authors is that C/EBPalpha relocates PU.1 to macrophage sites and that C/EBPalpha R35A binds and relocates PU.1 more efficiently than wildtype, and this seems likely and appealing. However, it is not as strongly supported by data within the paper itself as the other points in the paper are. There is a puzzling gap in the data: no direct evidence is shown that C/EBPalpha is really relocating PU.1 from B cell to macrophage regulatory elements at all. Despite the figure titles (Fig. 4 and Fig. S4), there is no ChIP-seq data to show PU.1 binding sites before and after interaction with either wildtype or R35A mutant C/EBPalpha, just accessibility data. There is also a question of whether such a redistribution would occur fast enough to account for the impressive speed of the R35A mutant's other effects. These questions seem fairly straightforward to address. If relevant data could be added, it would greatly increase the impact and generality of the paper. The paper could be published with this claim converted to a suggestion, based on the current data, or it could be published in a higher-impact form if additional data could be provided to demonstrate the relocation more directly. The authors would be more expert about the logistics of the experiment, but it seems that a direct ChIP-seq-based comparison should be feasible and powerful for the argument of the paper.

      We have now included PU.1 and C/EBPa ChIP-seq experiments, using C/EBPaWT and C/EBPaR35A- induced cells, replacing the virtual ChIP-seq experiments. Integrating the data obtained with our dynamic ATACseq data, the new findings largely support the previously proposed PU.1 redistribution (‘theft’) model. To make the data easier to understand, we now first show the PU.1 and C/EBPa binding to distinct B cell- and macrophage- restricted GREs contained in a single genomic fragment (new Fig. 5). The findings nicely visualize how PU.1 becomes redistributed from B-GREs to M-GREs, in a C/EBPa mutant-accelerated manner. We were also happy to see that a genome-wide analysis of the data again shows the accelerated redistribution of PU.1 by C/EBPaR35A (new Fig. 6). Finally, the comparison of the ChIP-seq and ATAC-seq data also added more mechanistic detail, such as by revealing that chromatin remodeling of lineage restricted GREs can be uncoupled from the regulation of associated genes.

      Finally, the effect of the mutation is assumed to be only on the interface for interaction between C/EBPalpha and PU.1 (or other co-factors). However, C/EBPalpha is such a short-lived protein that any modification that slightly increased its half-life could increase its potency. It seems important to present some quantitative protein staining evidence to clarify whether the steady-state level of C/EBPalpha in C/EBPalpha R35A-expressing cells is really unchanged from C/EBPalpha wild-type-expressing cells.

      We agree that this is an important issue and have therefore now performed a cycloheximide experiment with 3T3 cells expressing inducible forms of the two proteins. The data in Figure S4C show that C/EBPaR35A exhibits a similar stability than wild type protein and is expressed at 20-30% lower levels under steady-state conditions in uninduced cells. They also show that C/EBPa is surprisingly stable. These new findings are in line with the comparison of the two proteins by Western blots of mutant and wild type transfected 293T cells and of infected B cells, which also show similar levels of the two proteins (Fig. 7C and D). Therefore, the finding that expression of C/EBPaR35A is similar or slightly lower than that of the wild type argues against the possibility that an elevated expression level of the mutant could explain the effects observed.

      Finally, although not requested by the reviewer, we have now addressed the possibility that that the effect of the alanine replacement of R35 is mostly due to a change from a charged to a non-charged hydrophobic residue. This is not the case, as a replacement of arginine 35 by the charged amino acid lysine still leads to an accelerated BMT induction (Figure S7).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper is based on the premise that ketamine exerts antidepressant effects that are rapid by increasing glutamatergic transmission. However, the authors note that how this effect occurs is unclear because ketamine antagonizes the NMDA receptor, a glutamatergic receptor. Others have suggested a compensatory change in the glutamatergic transmission and the authors suggest how this might occur. The authors should clarify if prior studies suggested a mechanism different from theirs and if so, which might be correct.

      There are also other mechanisms, such as the block of NMDA receptors on interneurons and the disinhibition of principal cells. It is important to clarify if this has already been addressed in the literature. Also, if their cultures are primarily glutamatergic neurons or they include interneurons and glia.

      The authors show calcineurin is reduced after ketamine exposure and this increases AMPA receptor GluA1 phosphorylation. They also show that Calcium permeable AMPA receptors (CP-AMPARs) increase.

      They also use suggest that the CP-AMPARs and other changes lead to enhanced synaptic plasticity, which could lead to antidepressant effects.

      Although a lot of work is done in cultured hippocampal neurons, 14 days in vitro, they show effects in vivo that are consistent with the data from cultures. For example, ketamine increases GluA1 phosphorylation. Also, blocking CPAMPARs in vivo reduces anxiety/depressive behaviors such as the open field and tail suspension tests.

      Overall the study appears to be done well and the presentation, writing, and references are good. There are important concerns regarding statistics, behavior, and pharmacology and several minor concerns.

      Major concerns

      1) Statistics.

      What was the stat test if the control was always 1? Often the control group is 1.00 with no SD but in other tests, the control group is 1.000 with an SD.

      In the previous submission, we neglected to include this information. Immunoblotting data have variable raw values; hence, the control group was used to normalize each group and was compared to the experimental groups. Thus, the control value for immunoblotting was always 1.000 without SD. Similarly, for imaging data, the average peak amplitude in control cells was used to normalize the peak amplitude in each cell and was compared to the experimental groups' average; thus, the control group is 1.000 with SD. The Franklin A. Graybill Statistical Laboratory at Colorado State University has been consulted for statistical analysis in the current study, including sample size determination, randomization, experiment conception and design, data analysis, and interpretation. Grouped results of single comparisons were tested for normality with the Shapiro-Wilk normality or Kolmogorov-Smirnov test and analyzed using the unpaired two-tailed Student’s t-test when data are normally distributed. Differences between multiple groups with normalized data were assessed by nonparametric Kruskal-Wallis test with the Dunn’s test.

      2) Behavior.

      It is not clear that the open field and tail suspension tests measure antidepressant actions. Why were more standard tests such as forced swim or sucrose preference, novelty-suppressed feeding, etc not used?

      We agree with the Reviewer’s concern. However, both the open field test and tail suspension test have long been used to determine animals’ anxiety-like and depression-like behaviors, respectively, in rodents (Seibenhener and Wooten, 2015; Ueno et al., 2022). Specifically, the open field test has been widely used to measure the ketamine effects on anxiety-like behavior in rodents (Guarraci et al., 2018; Pitsikas et al., 2019; Shin et al., 2019; Akillioglu and Karadepe, 2021; Yang et al., 2022; Acevedo et al., 2023). The tail suspension test has also been used to examine the ketamine effects on depression-like behavior in animals (Fukumoto et al., 2017; Yang et al., 2018; Ouyang et al., 2021; Rawat et al., 2022; Viktorov et al., 2022). Studies suggest that the forced swim test and the tail suspension test are based on the same principle: measurement of immobility duration while rodents are exposed to an inescapable situation (Castagne et al., 2011). Importantly, it has been suggested that the tail suspension test is more sensitive to antidepressant agents than the forced swim test because the animal will remain immobile longer in the tail suspension test than the forced swim test (Cryan et al., 2005). For this reason, we chose to use the tail suspension test instead of the forced swim test. This information has now been included in the revised manuscript. Additionally, because ketamine produces antidepressant effects within one hour after administration in humans (Berman et al., 2000; Zarate et al., 2006; Liebrenz et al., 2009), our study aims to understand the mechanism underlying ketamine's rapid (less than an hour) antidepressant effects. Given that sucrose preference test and the novelty suppressed feeding test need multiple days, it would not be suitable to achieve our goals.

      3) Pharmacology.

      The conclusions rest on the specificity of drugs.

      Is 5 uM FK506 specific?

      20 μM 1-naphthyl acetyl spermine (NASPM)?

      10 mg/kg IEM-1460?

      We neglected to add the rationale for the drug concentrations in the previous submission. Previous research, including our own, has employed FK506 at a variety of different concentrations to inhibit neuronal calcineurin activity (1 - 50 μM) (Hsieh et al., 2006; Schwartz et al., 2009; Kim and Ziff, 2014). Specifically, we have shown that 5 μM FK506 treatment for 12 hours significantly reduces neuronal calcineurin activity to increase GluA1 phosphorylation, which induces the expression of CP-AMPARs to elevate AMPAR-mediated synaptic activity (Kim and Ziff, 2014). Moreover, previous studies, including our own, have used NASPM at a variety of different concentrations to inhibit CP-AMPARs (3 - 250 μM) (Tsubokawa et al., 1995; Koike et al., 1997; Noh et al., 2005; Nilsen and England, 2007; Hou et al., 2008; Kim and Ziff, 2014). In fact, we have shown that 20 μM NASPM significantly reduces CP-AMPAR-mediated synaptic and Ca2+ activity (Kim and Ziff, 2014; Kim et al., 2015b). Finally, multiple reports demonstrate that 10 mg/kg IEM-1460 significantly reduces in vivo CP-AMPAR activity (Wiltgen et al., 2010; Szczurowska and Mares, 2015; Adotevi et al., 2020). This information has now been included in the revised manuscript.

      Reviewer #3 (Public Review):

      Ketamine has been shown to be effective at producing a rapid-antidepressant effect at low doses, but the underlying molecular mechanism of this effect is still not clear. Previous studies have suggested that the effect of low-dose ketamine may occur by promoting neuronal plasticity in the hippocampus. However, this goes against the findings that ketamine acts as a noncompetitive NMDA receptor antagonist, which should prevent NMDAR-dependent plasticity. Furthermore, a therapeutic dose of ketamine has been shown to increase neuronal Ca2+ signaling, which again does not conform to its antagonistic action on NMDA receptors. In this paper, the authors provide evidence that therapeutic low-dose ketamine increases the expression of Ca2+-permeable AMPA receptors (CP-AMPARs) by increasing phosphorylation of GluA1 subunit of AMPARs and surface expression of GluA1-containing CP-AMPARs. They further provide evidence that this is likely mediated by a decrease in calcineurin activity and that blocking CP-AMPARs prevent the antidepressant effect of ketamine in mice. One interesting finding of this study is that the authors see heightened sensitivity of ketamine in female mice, both at the level of behavioral readout and for molecular correlates. This finding is interesting in light of the different pharmacokinetics of ketamine reported in females and that ketamine metabolites can bind estrogen receptors.

      Based on their data and previous findings, the authors outline a plausible molecular signaling mechanism for the antidepressant effect of ketamine. Specifically, the authors propose that reduced neuronal activity, which could be triggered by ketamine-induced NMDAR antagonism, causes homeostatic plasticity to upregulate GluA1-containing CP-AMPARs. Their data would support this idea, as phosphorylation of GluA1 as well as increased surface expression and functional incorporation of CP-AMPARs at synapses have been shown before in models of homeostatic plasticity.

      1) Overall, the study is well-done and the data presented support the main conclusions. One main question is whether the current finding provides a conceptual advancement in our understanding of the molecular signaling involved in ketamine's antidepressant effects.

      We thank the reviewer's critique. In fact, research suggests multiple potential mechanisms of ketamine-induced neural plasticity. The main mechanism by which ketamine produce their therapeutic benefits on mood recovery is the enhancement of neural plasticity in the hippocampus (Miller et al., 2016; Aleksandrova et al., 2020; Kavalali and Monteggia, 2020; Grieco et al., 2022). However, ketamine is a noncompetitive NMDAR antagonist that inhibits excitatory synaptic transmission (Anis et al., 1983). A hypothesis to explain these paradoxical effects is that ketamine acts via direct inhibition of NMDARs localized on inhibitory interneurons, leading to disinhibition of excitatory neurons and a resultant rapid increase in glutamatergic synaptic activity to activate Ca2+ signaling pathway (Deyama and Duman, 2020; Gerhard et al., 2020). This stimulates the brain-derived neurotrophic factor (BDNF) signal pathway, which subsequently increases the translation and synthesis of synaptic proteins to enhance AMPAR-mediated synaptic plasticity (Deyama and Duman, 2020). Another potential explanation is that ketamine inhibits NMDARs on excitatory neurons, which induces a cell-autonomous form of homeostatic synaptic plasticity resulting in increased excitatory synaptic drive onto these neurons (Miller et al., 2016; Kavalali and Monteggia, 2020). Homeostatic synaptic plasticity is a negative-feedback response employed to compensate for functional disturbances in neurons and expressed via the regulation of AMPAR trafficking and synaptic expression (Wang et al., 2012). According to this hypothesis, ketamine disrupts basal activation of NMDARs on excitatory neurons, which engages a mechanism of homeostatic synaptic plasticity that results in a rapid compensatory increase in synaptic AMPAR expression in these neurons in a protein-synthesis dependent manner (Kavalali and Monteggia, 2023). Additionally, there is a NMDAR inhibition-independent mechanism mediated by hydroxynorketamine (HNK), the ketamine metabolite that lacks NMDAR inhibition properties (Carrier and Kabbaj, 2013; Franceschelli et al., 2015; Zanos et al., 2016). The current study offers a new neurobiological basis for ketamine’s actions that depend on the NMDAR inhibition-mediated elevation of GluA1-containing AMPAR trafficking, which is likely independent from the previous described mechanisms including the BDNF-induced protein synthesis-dependent (Deyama and Duman, 2020) or the NMDAR inhibition-independent pathway (Carrier and Kabbaj, 2013; Franceschelli et al., 2015; Zanos et al., 2016). Nonetheless, there are still many important questions surrounding the molecular mechanisms of ketamine's actions. This new information has now been included in the revised manuscript.

      2) There are previous studies that showed an increase in CP-AMPARs in the nucleus accumbens and an increase in the expression of GluA1 in the hippocampus with low-dose ketamine. In addition, ketamine's antidepressant effect has been shown to require GluA1 phosphorylation. The main contribution of this paper might be that it provides the potential molecular signaling within the same preparation (i.e. hippocampal neurons) and provides a causal link of CP-AMPARs in mediating the behaviorally measured antidepressant effect of ketamine.

      The study showing that ketamine induces the insertion of CP-AMPARs in the nucleus accumbens did not examine whether this change resulted in antidepressant behaviors (Skiteva et al., 2021). Therefore, it is difficult to conclude that the ketamine-induced expression of CP-AMPARs in the nucleus accumbens plays a role in behaviors. Moreover, as described above, a recent study shows that the hippocampus is selectively targeted by ketamine (Davoudian et al., 2023). We thus chose the hippocampus as our experimental model to test our hypothesis. However, we are unable to rule out the potential role of nucleus accumbens in ketamine’s antidepressant actions.

      3) Another question is whether the behavioral effect of ketamine is due to molecular changes in the hippocampus as outlined in this paper. A more targeted inhibition of CP-AMPAR function could resolve this issue. With the systemic application of CP-AMPAR antagonist as done in this study, it would be hard to know the role of CP-AMPAR upregulation in the hippocampus in mediating ketamine's effect. Especially, considering that low-dose ketamine has been shown to upregulate CP-AMPARs in the nucleus accumbens. While it would have been nice to know the site of action, this does not alter the conclusion that CP-AMPARs are involved in mediating the antidepressant effect of ketamine on behavioral readouts.

      We agree with this point. We have thus removed “the hippocampus” in the title and have further made equivalent revisions in the other parts of the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have used computational models and protein design to enhance antibody binding, which should have broad applications pending a few additional controls. The authors' new method could have a broad and immediate impact on a variety of diagnostic procedures that use antibodies as sensitivity is often an issue in these kinds of experiments and the sensitivity enhancement achieved in the two test cases is substantial. Affinity maturation is a viable approach, but it is laborious and expensive. If the catenation method is generalizable, it will open up opportunities for antibody optimization for cases where affinity maturation is either not feasible or otherwise impractical. Less clear is how this method might enhance therapeutic potency. Issues that arise when using therapeutic antibodies are often multifactorial and vary depending on the target and disease state. Many issues that occur with antibody-based therapies will not be rectified with affinity enhancement.

      We agree with the limitation.

      Reviewer #2 (Public Review):

      The paper presents an interesting design approach to having homodimeric IgGs with higher binding affinity to the antigens on a surface by fusing a weakly homodimerizing protein (a catenator) to the C-terminus of IgG. Considering the homodimeric IgGs with likely enhanced antigen binding ability and their stabilization with a reversible catenation when bound to the surface is an interesting idea. With agent-based modeling - the simulations based on Markov Chain Monte Carlo (MCMC) sampling - and proof of concept experiments, it has been possible to show the enhanced antigen binding ability of the homodimer Igs for many folds, where the weakly homodimerizing ability of the catenator is indicated to have a central role, enabling proximity effect driven catenation on the antigen bound surfaces. While the results render the enhanced binding affinity of the catenated homodimeric IgGs, the study would benefit from a more elaborated interpretation and discussions of the results.

      The following discussion is now stated in the revision (pages 19-20, in the revision); “While we demonstrated that dual catenator-fused heterodimeric IgGs can enhance binding avidity, the oligomer formation or potential intramolecular homodimerization of the catenator necessitates the development of a more robust catenator for application to conventional homodimeric IgGs. Specifically, the ideal catenator should geometrically disallow intramolecular homodimerization, exhibit fast association kinetics, and be able to withstand the standard low pH purification step. On the other hand, our demonstration indicates that this approach can be applied to bispecific antibodies employing a heterodimeric Fc.”

      One interesting base of the discussion may include how the fusion of the catenator may likely affect the binding behavior, the intrinsic binding behavior, and/or on the global structural changes, of IgGs (monomeric and homodimeric (catenated) per se beyond its proximity-driven contribution. Would it lead to a more restricted structure in the mobility in the unbound states so as to decrease the entropic cost for the binding and thus increase the binding avidity/affinity (in addition to external proximity-driven association). In other words, what would be the role of entropy in the free energy of binding, given that the enthalpic contributions remain the same? Possible effects of the length of the catenator should also in parts be related to the entropy. For example, if a longer and more flexible catenator is considered, what would the resulting observation experimentally and computationally be?

      The binding site occupancy depends on [catAb]/KD. Figure 4-figure supplement 2 shows the binding site occupancy and (KD)eff as a function of (KD)catenator. In this simulation, [catAb] was fixed (10-9 M) while KD was varied (from 10-8 to 10-6). In the figure legend and in the main text, we now explicitly state that KD was varied from 10-8 to 10-6 (page 30, in the revision). To address this comment, we set KD = 10 nM (as used for simulation in Figures 3 and 4), and varied [catAb] from 0.1 to 10 nM. The binding site occupancy and (KD)eff as a function of [catAb] are plotted for three different set values of (KD)catenator (1 μM, 10 μM and 100 μM). The new figures are now presented as Figure 4-figure supplement 3. This simulation shows that the enhancement of (KD)eff by increasing the concentration of catAb is much less dramatic than that by increasing the affinity for catenator homodimerization at [catAb] > 10 nM.

      On the other side, simple simulation approaches have a high value with a level of abstraction while still keeping the physical and biological relevance. In the simulations, i.e. in the sampling of various states, three main terms/rules to govern the behavior are implemented. One is a term favoring an increase in the ability to bind (preventing to unbinding) to the surface upon the catenation of IgGs. This may need to be substantiated for the simulations not imposing a preassumed ability to increase the binding (or decrease the unbinding) ability upon the catenation.

      We agree with the review in that the third rule favors the binding ability of catenated IgGs, because it assumes that catenated antibodies are not allowed to dissociate from the binding site. While this assumption is not exactly correct, we think that it is valid, considering the behavior of a multivalent ligand. When the IgG portion dissociates completely from the binding site, it is still anchored by the catenation arm, and thus it will rebind the same binding site immediately. This postulation agrees with the quantitative analysis showing that multivalent ligand exhibits orders of magnitude binding likelihood increase when the ligand size is comparable to the stretch length of a conjugating linker [Liese, S. & Netz, R. R., ACS Nano, 12, 4140 (2018)].

      The weakly homodimerizing state of the catenator appears as one of the important aspects of the proposed design strategy. Would it also be possible that the experimental observations may readily also imply the higher binding ability of the catenator fused IfgG without the homodimerization on the surface (due to the reduced entropic cost for the binding)? The presentation of the evidence of the homodimerization of the catenator and the catenated IgGs on the surface would strengthen the findings and discussions.

      To fully address this comment, we would need to consider the detailed molecular behavior of the IgG part, the catenator and the linker, probably using molecular dynamics simulation, which we think is outside the scope of the current work. We like to qualitatively describe what we think about the raised issues. Fused to the C-terminus of Fc, the catenator won’t affect the complementary determining region (CDR) of Fab which is located on the opposite side of the C-terminus of Fc. This notion is supported by the observation that the SDF-1α-fused antibodies exhibited association kinetics similar to those of the mother antibodies (Figure 5).

      Regarding the mobility of the structure, we presume that the fused catenator would not interact with the antibody portion and thus it would not affect the intrinsic structural mobility of the antibody.

      Since the catenator is fused to the C-terminus of Fc by a flexible linker, the homodimerization of catenator would decrease the entropy upon catenation. However, the enthalpic contribution would overcome the entropic loss, and result in negative free energy of the catenator homodimerization.

      Figure 2-figure supplement 1 (in the revision) shows the simulation for five different values of the reach length (R), which is the sum of the linker length and half of the catenator length. The simulation results show that the likelihood of catenation decreases as the linker length increases over the distance (d) between the two adjacent catAb-2Ag complexes, while it is maximum when the reach length equals d. Since the catenator length is fixed, increasing the linker length (such that R > d) will lower the catenation effect.

      Reviewer #3 (Public Review):

      The authors proposed an antibody catenation strategy by fusing a homodimeric protein (catenator) to the C-terminus of IgG heavy chain and hypothesized that the catenated IgGs would enhance their overall antigen-binding strength (avidity) compared to individual IgGs. The thermodynamic simulations supported the hypothesis and indicated that the fold enhancement in antibody-antigen binding depended on the density of the antigen. The authors tested a catenator candidate, stromal cell-derived factor 1α (SDF-1α), on two purposely weakened antibodies, Trastuzumab(N30A/H91A), a weakened variant of the clinically used anti-HER2 antibody Trastuzumab, and glCV30, the germline version of a neutralizing antibody CV30 against SARS-CoV-2. Measured by a binding assay, the catenator-fused antibodies enhanced the two weak antibody-antigen binding by hundreds and thousands of folds, largely through slowing down the dissociation of the antibody-antigen interaction. Thus, the experimental data supported the catenation strategy and provided proof-of-concept for the enhanced overall antibody-antigen binding strength. Depending on specific applications, an enhanced antibody-antigen binding strength may improve an antibody's diagnostic sensitivity or therapeutic efficacy, thus holding clinical potential.

      Thanks for the favorable comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The introduction does not clearly set up the background for the key questions that the manuscript addresses. One of the key parts of the manuscript is to attempt to determine whether locomotory behaviour evolves because of direct or indirect selection of the traits. However, the authors don't provide an argument for why a salty environment would select for locomotory traits. Indeed, in the discussion, the authors point out that it is likely an unmeasured trait (body size) correlated with locomotory traits that are under selection. They present arguments for why this might be the case and point to un-included data that show body size significantly genetically covaries with all of the traits studied. Since the authors appear to have these data, and one of their key questions is comparing direct vs. indirect responses to selection, it would be more powerful to include the body size data and estimate selection on all traits together.

      We now include body size in all of our phenotypic and genetic analyses. We also include estimates of selection gradients from the ancestral selection differentials and the Gmatrix. We detail in the Introduction the biological significance of locomotion traits and their potential relationship with body size, in low and high salt environments. The experimental results show that divergence in locomotion traits (Figure 6) correlates with adaptation (Figure 5), because of direct and indirect selection (Figure 9).

      Phenotypic plasticity was estimated from a series of univariate models, with estimates arranged in a vector. As the authors point out in the manuscript, traits that are not included in a model but covary with traits that are can largely bias estimates of the traits that are included. For this reason, it would make sense to estimate phenotypic plasticity using a multivariate model, as has been done for G matrices.

      We analyze the ancestral phenotypic plasticity and the phenotypic divergence during evolution using a multivariate approach (MANOVA). This approach simplifies the text as from the eigen decomposition of the SSCP matrices we can estimate canonical traits of ancestral phenotypic plasticity (pmax; see Table 1 with notation definitions) and phenotypic divergence in the new target high salt environment (dmax). We continue to do the univariate analysis as it allows us to estimate BLUPs for each inbred line (used for visual representation), as well as the significance of phenotypic divergence at each replicate population relative to the ancestral population (delta_q). Both multivariate and univariate approaches led to similar results (shown as supplementary figures).

      The estimation and interpretation of G matrices are a critical part of the manuscript. The authors state that broad sense estimates of G are a good proxy for additive genetic variation in this system, but in the Discussion they also state that overdominance was likely important during evolution to the salt environment, leading to some lack of clarity on whether dominance is important or not.

      We are sorry for the lack of clarity. We have eliminated the discussion on overdominance as it was peripheral to our results. Broad-sense genetic variances should be a good proxy for additive genetic variances when there is no inbreeding depression and no directional dominance or dominance epistasis; cf. Lynch and Walsh 1998. We previously showed that there is no inbreeding depression for the trait we use as surrogate for relative fitness (self-fertility) and also that there is no directional dominance for locomotion behavior traits. We now explain our use of broad-sense genetic (co)variances as a proxy for additive genetic (co)variances in the Introduction and Methods.

      It is also unclear how uncertainty in estimated G matrices was assessed. Showing that G differs from noise is critical to the majority of the results presented. The authors cite Morrissey and Bonnet (2019) as providing the method for generating the null distribution of G, however, this paper does not appear to propose or describe a method to do this.

      Thanks for this comment. Morrissey and Bonnet (J Heredity, 2019) was incorrectly cited and the explanation for finding the expected noise distributions was misleading. In brief, we produced a set of 1000 G-matrices each computed after shuffling the line ID and the block ID from the phenotypic dataset. This was done to produce random expectations of the genetic variances as the MCMC estimates are positive-definite. We computed the posterior mode for each of these 1000 G-matrices to obtain a null distribution (shown in orange). To infer significance, we compared the posterior mode of the empirical estimate with the 95% CI of the posterior mode distribution obtained from the randomized G-matrices. When determining which eigenvectors explain standing genetic variation we also used the distribution of posterior modes of the randomized G-matrices. However, as pointed out by Sztepanacz and Blows (Genetics, 2017), the eigenvalues of the eigenvectors do not follow a uniform distribution, as would be expected by chance. Because of this we asked the question of whether the amount of variance in the eigenvectors of the empirical G-matrix (gmax, g2, etc.) was expected, by projecting the random G-matrices onto these eigenvectors. This is a null that is conditional on the observed data. We show these results in Figure 2 - supplement figure 3. Both approaches are similar, particularly for the first 2 eigenvectors. There is now a paragraph in the Discussion about finding potential consequences for adaptation of traits with little genetic variance.

      Although the figure captions state that they are showing estimates of genetic variances, it appears to be heritability (bounded between 0 and 1). Whether the authors are studying heritability or genetic variance is an important difference, particularly in the context of a changing environment and phenotypic plasticity, where environmental variation is important and expected to change. For example, the result that G is smaller in evolved populations could simply be due to their being larger environmental variance in the salt environment (as you would expect). This is unrelated to an evolutionary response.

      There might have been some confusion because transition rates are positive and not normally distributed. To achieve normality they were log transformed. We have not reported estimates of heritability, all estimates presented are of genetic variances, unscaled. The only exception is body size where the raw data was multiplied by 50 in order to have a similar phenotypic scale as the transition rates when estimating genetic (co)variances, not heritability. We agree that the evolution of environmental stochastic variance is interesting but not immediately relevant to the questions we address.

      It seems that comparisons to the ancestral population were done for A160, not the founding population for each evolved line at G0. It is not clear whether the founder effects of each replicate are important and if this is the most appropriate comparison (the Discussion suggests that founder effects are important).

      We have better detailed in the Methods, and also with an introductory section in the Results section, the derivation of the experimental populations. The population acronyms might have been misleading. The A6140 is a population that was domesticated to the lab conditions for 140 generations (replicate #6 of the domestication process). We report the evolution of 3 GA populations, which were all derived from A6140 with minimal sampling problems for the estimated effective population sizes (sampled 10^4 individuals from A6140 for each GA, for Ne of 1000 during domestication - Chelo and Teotónio Evolution 2013 -). Therefore, GA populations after 50 generations of evolution are appropriately compared with their (unique) ancestor population. We no longer discuss potential founder effects.

      Overall, there is much interesting data collected and analysed in this manuscript, addressing a valuable question. However, it is not obvious whether the estimates of G matrices are different from noise, and heritability may not be the most appropriate scale to ask questions about phenotypic plasticity and evolution in a novel stressful environment that may affect levels of environmental variation.

      Please see previous replies. Our ancestral G-matrix estimates indicate that at least 3 eigentraits are different from random expectation in both environments (Figure 2, supplement figure 3), and in high salt evolved populations continue to have more than expected genetic variance at 3-5 eigentraits (Figure 7, supplement 2). We are conservative in these estimates as depending on the null we could consider more eigentraits. In the previous version of the manuscript we concluded that only 2 ancestral eigentraits were orthogonal due to an error in the code (we did not divide by 2 the null expectations). But even presuming that only one eigentrait (gmax) has genetic variance in the ancestral population, we previously reported that mutational variance is not in the same trait (see Mallard et al., G3, 2023; and mmax in Table 3), and further that the trait under selection is neither gmax or mmax (compared in Table 3 the selection gradients with gmax or mmax). At a minimum there are 3 genetically or environmentally independent traits. As noted in previous replies, we estimate and present genetic variances throughout. We do not present estimates of environmental variances and feel that doing so would make the manuscript overly complicated.

      Reviewer #2 (Public Review):

      Response to selection: It was not clear to me that it was appropriate to interpret locomotor behavior as having evolved in response to the salinity environment. Specifically, where is the evidence that any change in trait means is a (direct or indirect) response to selection imposed by increased salinity rather than the neutral drift of a trait due to the reduction in population size caused by the salinity? Strong evidence of adaptive evolution would be provided by all 3 replicates significantly diverging from the ancestor in the same direction. Model 2 seems to aim to test the null hypothesis that the three replicates diverged from one another via a random effects model - but with only three replicates, there is very low power, and variance is likely to be estimated as zero. I'm not sure what is shown in Tables 3 & 4, or how these results relate to models 2 & 3, so my interpretation of the information may be incorrect. Nonetheless, and noting that the errors around estimates are not presented, there seems to be considerable heterogeneity in size and direction of divergence between replicates for most of the traits. Is this study really dissecting responses to directional selection, or is it dissecting drift?

      We have modified the statistical modelling of the phenotypic data. Model 2 is no longer presented. We provide a MANOVA multivariate analysis equivalent to model 2 (with replicate populations as fixed effects) but now including both environments, together with the univariate models. MANOVA results show that all traits are significantly different across populations (i.e., at least two populations differ from one another). The fitted estimates from the MANOVA are not reported with errors in R but it is obvious that not all traits evolved in each replicate GA population (Figure 6). We therefore tested the difference between each of the evolved populations and the ancestral population using a univariate approach (Figure 6, supplemental source data table 2). In this univariate analysis, block was modeled as having random effects (which we could not model with MANOVA). In the high salt environment, the replicates GA 1,2,4 differed significantly for respectively 4, 6 and 4 transitions rates (out of 6). The traits are all evolving in the same direction, and this even when the trait difference between evolved and ancestral populations is not significant. We provide compelling evidence of parallel evolution and thus selection (see review about how to infer selection in evolution experiments in Teotónio et al. Genetics 2017). We tried to be exhaustive in our statistical reporting but would happily provide additional details if requested.

      What are the traits, and what is the confidence in G? My outsider's interpretation of these results is that defining 6 transition states is a way of getting at a single behavioral trait, and I was not convinced that these data were suitable for addressing questions about multivariate evolution. Genetic parameters were estimated using MCMCglmm, which imposes boundaries on estimates. The authors state that they followed Morrissey and Bonnet 2019, but I was unable to infer what this means with respect to accounting for the contribution of sampling error to covariances (or how they accounted for the positive variance constraint). Because I was unsure how sampling error was being assessed for G, I was not confident about the interpretation of statistical support for individual parameters, or for eigenvalues of G. Following this forward, if the measured characteristics constitute a single trait, with an entirely shared genetic basis, then the results of strong alignment of everything with gmax makes complete sense - there is a single trait, that is heritable and plastic, and for which the mean evolved.

      Our initial draft was misleading and we now provide more detailed description (see also replies #5 and #12 above). We computed 1000 randomized G matrices to account for the constraints imposed by the MCMCglmm algorithm. This should account for the bias inherent with variance estimation and the eigen decomposition we did given our sample sizes. You will find that all 6 transition rates show genetic variance (Figure 2, supplement figure 2) and that up to three eigentraits have more genetic variance than the randomized G-matrices (Figure 2, supplement figure 3).

      The 6 transition rates are the mathematical description of changing movement states in 1-dimensional space (under memoryless assumptions). A priori we do not know how many relevant traits there are, if they are genetically or environmentally independent. To help the reader, we provide a Table 3 with the trait loadings for the several canonical traits of phenotypic plasticity, divergence and selection. The first canonical trait of standing genetic variation, gmax, is indeed aligned with phenotypic divergence (dmax; Figure 8, panels A and B) and with the axis of genetic variance reduction during evolution (emax; Figure 8, panels C and D), but not with ancestral plasticity (pmax; Figure 3) or mutational variance (mmax, from Mallard et al. G3 2023). pmax, for example, is aligned with g3, the third eigenvector of the ancestral G matrix. Note, however, that we do not have any power to detect the influence of g2 or g3 on phenotypic divergence or genetic divergence (Figure 8), though they together explain about 15% of the genetic variance. This is because performing such a test would require an alignment of the deviations in divergence not explained by gmax with g2 or g3. We now mention this issue in the Discussion. Overall, however, there are clearly several behavioral traits.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Zheng et al. examined the disease-causing mechanisms of two missense mutations within the homeodomain (HD) of CRX protein. Both mutations were found in humans and can produce severe dominant retinopathy. The authors investigated the two CRX HD mutants via in vitro DNA-binding assay (Spec-seq), in vivo chromatin-binding assay (ChIP-seq), in vivo expression assay of downstream target genes (RNA-seq), and retinal histological and functional assays. They concluded that p.E80A increased the transactivation activity of CRX and resulted in precocious photoreceptor differentiation, whereas p.K88N significantly changed the binding specificity of CRX and led to defects in photoreceptor differentiation and maintenance. The authors performed a significant amount of analyses. The claims are sufficiently supported by the data. The results not only uncovered the underlying disease-causing mechanisms, but also can significantly improve our understanding of the interaction between HD-TF and DNA during development.

      Thank you for summarizing the key findings and strengths of our manuscript.

      Minor concerns:

      1) The E80A, K88N and R90W (previously reported by the same group) mutations are located very close to each other in the homeodomain (Figure 1A), but had distinct effects on the activity of CRX. Has the structure of the homeodomain (of CRX) been resolved? If so, could the authors discuss this phenomenon (mutations close to each other but have distinct effects) based on the HD-DNA structure?

      In paragraphs 2, 4, 5 of the discussion section, we have added explanations on how each mutation could affect CRX HD-DNA interactions differently based on published structural studies. And we further explain how these biochemical changes relate to the molecular perturbations and cellular phenotypes seen in vivo.

      In addition, has this phenomenon been observed in other homeodomain TFs?

      Disease associated missense mutations at residues HD50 (K88) and HD52 (R90) have also been reported in other HD TFs implicated in CNS development (see discussion paragraph 7). Distinctively, different substitutions at CRX E80 residue have been reported in multiple CoRD cases, suggesting its essential role in HD-DNA-mediated regulation during retinal development. These new points are now included in the discussion section.

      2) The authors should briefly summarize the effects/disease-causing-mechanisms of all the reported CRX mutations in the discussion part. The readers can then have a better overview of the topic.

      We have added a concise summary of previously proposed CRX mutation classification scheme, all characterized Crx mutant mouse models and their pathogenic mechanisms. Please see paragraph 9 in the discussion section.

      3) CRX can also function as a pioneer factor (reported by the same group). Would these HD mutations distinctively affect chromatin accessibility (which then leads to ectopic binding on the genome)?

      Prior evidence has demonstrated that regulatory regions for many photoreceptor genes failed to stay accessible upon loss of CRX in the Crx-/- model (PMID: 30068366). It is unclear with the existing data whether CRX could initiate the chromatin remodeling (true pioneering function) of these regions, or it simply maintains the accessibility once these regions became accessible. Future studies comparing epigenomic landscape changes in mutant Crx KI models at various ages can be informative, particularly for the CRX K88N ectopic binding events. Determining how the CRX K88N mutant protein alters chromatin landscape important for photoreceptor fate and/or differentiation during development would shed light on the nature of these ectopic binding events.

      4) The discussion part can be shortened and simplified.

      We have re-written the discussion section to make it concise and to incorporate discussions on mutant CRX HD structures. Please see the revised manuscript.

      Reviewer #2 (Public Review):

      Zheng et al., investigated the molecular and functional mechanisms of two homeodomain missense mutations causing human retinal photoreceptor degeneration diseases in photoreceptor development regulated by the CRX transcription factor. They analyzed the E80A mutation associated with dominant cone-rod dystrophy (CRD) and the K88N mutation associated with dominant Leber Congenital Amaurosis (LCA). The authors found that E80A CRX binds to the same target DNA sites as WT CRX, but the binding specificity of K88N CRX is altered from that of WT in an in vitro assay. They generated Crx(E80A) and Crx(K88N) KI mice and performed ChIP assay and observed that K88N CRX binds to novel genomic regions from the WT-binding sites, while E80A binds to the WT sites. In addition, using the KI mice, they found that E80A and K88N differently affect the expression of Crx target genes. This study is well executed with proper and solid methodologies, and the manuscript is clearly written. This study gives us the insights how single missense CRX mutations lead to different types of human retinal photoreceptor degeneration diseases.

      We greatly appreciate the reviewer’s summary and positive comments.

      While the study has strengths in principle, it has a couple of weaknesses. One is how well E80A KI mice function as a pathological model of dominant CRD, in which cones are mainly first affected, is not clearly shown in this study. More data investigating how cones are affected by performing histological, molecular, and physiological analyses will be helpful and useful. For example, in the Discussion, the authors describe that E80A associates with S-cone opsin promoter results is "data now shown". This data must be presented for the readers. In addition, more molecular insights as to how E80A affects cones will strengthen this study.

      The mouse retina is rod dominant and contains only a small number of cones (3% of all photoreceptors) that are born prenatally. This poses technical challenges to appropriately assess cone-specific changes during disease initiation/progression. We are in the process of developing cellular/molecular tools to investigate how cones are being affected in Crx E80A KI model, but this is beyond the scope of the current study.

      At the same time, we have added a supplemental panel showing that, based on P0 retinal immunostaining of the early cone marker RXRγ, cones were initially born, and fate specified in CrxE80A retinas (see Figure S7A). Since the E80A protein also hyper-activated S-cone opsin promoter-luciferase (Sop-luc) reporter in HEK293 cells (see Figure S7B), we predict that CRX E80A affects cone photoreceptor differentiation in a similar manner as rod photoreceptors. Furthermore, the cone transcriptional program might be more prone to perturbations by abnormal CRX activities. These possibilities require future investigations. For this manuscript, we have included all these points in the discussion section.

      Another point is that it will be very valuable if the authors could show how E80A and K88N differently affect the 3D structure of the CRX homeodomain. Even a simulation model would be valuable.

      Please see our answer to Point 1 of Reviewer #1. In short, we have added in the discussion section our explanations on how each mutation could affect CRX HD-DNA interactions differently based on structural studies. We further explain how these biochemical changes relate to the molecular perturbations and cellular phenotypes seen in vivo. Additionally, since TF-DNA interactions are diverse and dynamic across binding sites with different sequence features and genomic environments, future studies that systematically and quantitatively evaluate CRX transcriptional activity at different regulatory sequences would be important.

    1. Author Response:

      We thank the reviewers for their insightful comments and will resubmit a revised version where we address most of the issues raised. At this time, our immediate responses are as follows.

      1. We have data to confirm the presence of the merodiploid strain by PCR but did not show the data in the original version for brevity. We will show that data in the revised version.

      2. We also have, of course, a no ATC control in our CRISPRi experiments and will also show that data in the resubmission.

      3. As a loading control for the SecA2 strains, we will show PknG blots (a protein secreted by SecA2;PMID: 29709019) that we have with us.

      4. In the nanoluc assays, the construct we made that was fused to CFP10 was generated so that there was a long linker between the C-terminus of CFP10 and nanoluc. We also have other controls in that experiment to show that the CFP10-nanoluc protein was secreted in the ΔRD10 strain and not in the ΔSecA2 strain. We will attempt to show fusion protein secretion using CFP10 antibodies in the revised version of the manuscript.

      5. We will perform experiments with the inhibitor using the merodiploid strain and in partial knockdown strains to confirm that the inhibitor does indeed specifically act on Rv1636.

      6. We will modify the discussion to talk more about the role and processes of cAMP synthesis and degradation in the revised version of the paper. Further, the manuscript will be checked for spelling and grammatical errors before resubmission, and the arrangement of data modified as suggested by the reviewers.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes the differences in the plasma proteome and metabolome in healthy Tanzanian and healthy Dutch adults. The inflammatory plasma proteome was measured using the Olink 92 Inflammation panel, while the plasma metabolome was analyzed using a mass spectrometry-based untargeted approach. The plasma metabolome was measured only in the Tanzanian cohort. This study aimed to link the pro-inflammatory proteome of Tanzanian and Dutch healthy individuals with environmental factors and dietary lifestyles.

      The correlation between the plasma proteome and food-derived metabolome profiles can shed light on the development of non-communicable diseases. This observation stresses the importance of dietary transition and lifestyle changes in expressing inflammation-related molecules. Moreover, this study describes the inflammatory proteome profile in healthy Tanzanian individuals covering a cohort with limited studies. The molecular differences in circulating biomolecules between healthy individuals living in East Africa and individuals living in Western Europe and the correlations with intrinsic and environmental features are novel.

      This study lacks a robust and solid validation of some of the differentially regulated circulating proteins and correlations between food-derived metabolites and proteins in a selected cohort. The discovery-driven approach in this manuscript highlights potential findings that need to be supported by a validation phase. According to this reviewer, the lack of such validation impacts the robustness of the results and the hypotheses generated. Due to that, the manuscript should incorporate validation experiments.

      We acknowledge that our study was limited by the lack of a validation phase. To address this issue, we have undertaken additional analyses to validate our key findings related to the proteins associated with mTOR and Wnt/β-catenin pathways. These analyses involved data from a proof-of-concept intervention study conducted at the same site. Our response below provides more information on these validations.

    1. Author Response

      Reviewer #1 (Public Review):

      For PRLR, the question being asked is whether and how the intracellular domain (ICD) interacts with the cellular membrane or how the disordered ICD can relay and transmit information. The authors show that PI(4,5)P2 in the membrane localizes around the transmembrane domain (TMD) due to charge interactions and facilitates binding of the ICD to the membrane, even in the absence of the TMD. Furthermore, the ICD and PI(4,5)P2 form a co-structure with JAK2 which locks a disordered part of the ICD into an extended conformation, allowing for signal relay and, through multiple complex conformations, may enable switching signalling on and off.

      Strengths:

      • NMR paired with MD is a powerful way to probe an interaction especially when peaks disappear and become difficult to probe by NMR.

      • Using NMR and MD to formulate hypotheses which are then tested by cell studies is quite informative. The combination of MD, NMR, and cell biology is a strength.

      • The authors are diligent in testing MD simulations on systems with and without PIP2.

      • The use of Pep1 and Pep2 to differentiate the KxK region that interacts with PIP2 is helpful.

      • The four utilized mutants help illustrate the co-dependence of the respective regions in the formation of the co-structure.

      Weaknesses:

      • In Figure 2G, there is a big change in CSP between 280 and 290, which the authors do not comment about.

      The region referred to contributes to binding but is on the edge of the main binding site and where the local affinities are weaker. Therefore, the exchange rate is high and allows for following the chemical shift changes. In support of this, we see an almost inverse correlation between the CSPs and the changes in intensities. For the main binding site, the exchange rate between bound and free states is slower because the affinity is stronger. Therefore, we cannot follow the chemical shifts to extract the CSPs to the bound state, as the peaks disappear. We have commented on this in the main text (p.8) as follows:

      “In the region from D285-E292 we observed an almost inverse correlation between the CSPs and the intensities. This suggests that in contrast to the preceding region, a faster local exchange rate allows us to follow the resonances from the bound state in this region, giving rise to the large CSPs.”

      • The data in Figure 2 are summarized as indicating the formation of extended structure in the ICD upon binding. It is not clear to me what data show an extended structure.

      The information on the extended structure comes from the analyses of the peptide Pep1 titrated with C8-PI(4,5)P2. The CD signature that develops in the bound state has a minimum ellipticity at 218 nm, which is a strong indicator of extended structure. We find this information adequately described in the main text (p.8), but have emphasized this further as follows:

      “In contrast, for Pep1, large spectral changes were seen, which were unrelated to helix formation. Subtracting the spectra in the presence and absence of C8-PI(4,5)P2, revealed a negative ellipticity minimum at 218 nm, a strong indication of B-strands, showing that when bound to C8-PI(4,5)P2, a distinct extended (strand-like structure) signature was seen (Figure 2G).”

      • No modelling or experiments were done with PIP3 despite conclusions and models which rely on the phosphorylation of PIP2 to PIP3. At the very least, these would be useful as negative controls.

      We have in a previous work addressed the affinity for phosphoinositides using lipid dot blots where we observed a preference for certain species, including PI(4,5)P2 (Haxholm et al., BJ, 2015). In this study, we also observed that there was no affinity for PI(3,4,5)P3, but may not have highlight this sufficiently in the introduction. This can have caused some confusion in understanding our choices. We have now more explicitly described these data, both in the introduction (p.4), in the result section (p. 8) and later in the discussion (p. 21). We thank the reviewer for bringing this up.

      • Only R2 experiments were done when the authors mention investigating dynamics. R1 and -HetNOE dynamics would be useful for creating a complete picture.

      Our aim with recording the R2 values was not to map the detailed dynamics of the disordered regions, but to explain the changes in the peak intensities we see for the variants when adding C8PI(4,5)P2. In this case, the R2 values supported our suggestion of internal contacts and, although we agree with the reviewer that R1s and HetNOEs would be important and relevant for a more in-depth and complete analyses of the dynamics, we find that in this case, the R2 values suffice.

      • Some of the exciting results are under-emphasized including Fig 3H and 3I.

      A new version of Figure 3 has been generated to consider the reviewers’ comments and suggestions. This figure has been restructured to further emphasize some of the major conclusions obtained from the simulations. We have moved the former Figure 3 A, B, C and D to the supplemental information to increase this focus.

      Reviewer #2 (Public Review):

      The authors combine NMR experiments, cell experiments, and molecular simulations to address the question of how lipid interactions of the prolactin receptor contribute to signalling. They assess the interactions of the disordered cytoplasmic tail of the receptor with phosphoinositides among others by chemical shift perturbations from NMR for different PIP2-containing membranes, by coarse-grained simulations, as well as site-directed mutagenesis and subsequent cell signalling experiments to monitor the activation of the mutants. A major result is that PIP2 interactions are functionally important, which so far has not been known for this receptor. Their results are likely relevant for other non-receptor tyrosine kinases.

      The hypothesis that the protein complex is regulated by IDR-membrane interactions is very novel. A major strength is the close connection of and feedback between state-of-the-art experiments and simulations.

      We thank the reviewer for the positive comments on our work and on the novelty and importance of the work

      This is where I see weaknesses:

      1) The motivation of focusing on LID1 is limited.

      We have now provided our rationale for selectively focusing on the LID1 in the PRLR. The selection was done to address the conundrum on how structural disorder in the juxtamembrane regions would be able to transmit the knowledge of extracellular hormone binding to the bound JAK2. This constitute the first step of signaling on the intracellular side and given the distance to the other two LIDs (LID2 and LID3) and their disconnect to the TMD by long disordered regions, they were disregarded, focusing on LID1 in this work. We have emphasized this choice in the introduction and in more detail in the result section (p. 5-6).

      2) The data and analysis for the JAK2-PRLR complex appear somewhat superficial, and a connection between conformational states to their functional relevance is lacking. In fact, the majority of the simulation part of the paper is about suggesting different states of the PRLR-JAK2 complex but the states and their hypothesized functional relevance are not further taken up, e.g. by experiments, and yet presented as major results, e.g. in the abstract.

      In the original manuscript we already provided a detailed analysis of the different states, highlighting accessible residues and lipid interacting residues and compare these across the states. From our experiment, including those performed in cellular assay, we cannot with certainty link the two major state to active and/or inactive states. We have therefore no intention or support from the data to claim this. However, what we do put forward as a major result, in the presence of more than one major state as also stated in the abstract and in the conclusion of the result section as follows:

      “Another key observation is the existence of different states in which different regions of both JAK2 FERM-SH2 domain and LID1 of PRLR are exposed to the solvent or hidden below the bilayer.”

      In the discussion we do speculate as to which state may be the active and/or inactive dimer/monomer but make no firm claims. We have now made the major find of more states clearer in the text, and further compare the two major states, the Y and the Flat state, to the resent cryo-EM structures of JAK1 bound to IFNAR1, which lend some support to our speculations. The abstract now reads:

      “We find that the co-structure exists in different states which we speculate could be relevant for turning signalling on and off.”

      To discern the functional relevance of these state, if possible, will require experiments also in cells that by themselves would be a new study. We have to the best of our ability clarified that the functional relevance of the states has not been elucidated by the current work.

      3) The connection between simulations and mutational study is not very direct. An open question is if the mutants can distinguish between the effects of PRLR-PIP2 interaction or PRLR-JAK2 interaction, even though this conclusion is still drawn from the data.

      We have now explained in much more detail by which arguments the different mutations were selected (see also answer above), which property of the co-structure they are most likely to engage in and affect, and we have emphasized that the separation of function by mutation may be complicated by the intimate structure formation among the three components of the co-structure. The conclusion has therefore also been softened.

      4) The conclusions drawn from the mutagenesis study (lines 547-555) are not directly supported by data. Only a partial correlation between PRLR membrane localisation and STAT5 activation is no reason to attribute the unexplained part of the STAT5 activation to PRLR-JAK2 interactions without further studies.

      We have now explained in much more detail by which arguments the different mutations were selected (see also answer above), which property of the co-structure they are most likely to affect and emphasized that the separation of function by mutation may be complicated by the intimate structure formation among the three components of the co-structure. The conclusion has therefore also been softened.

      5) PIP2 is identified as an important regulator, with very solid support from the presented data. PIP3 is part of the model but not discussed before or as part of the results. The analysis could be similarly applied or the data directly relevant to the understanding of PIP3 plays a similar role, as interactions are likely primarily electrostatically driven.

      We have in a previous work addressed the affinity for phosphoinositides using lipid dot blots where we observed a preference for certain species, including PI(4,5)P2 (Haxholm et al., BJ, 2015). In this study, we also observed that there was no affinity for PI(3,4,5)P3, but we agree that we did not highlight this sufficiently in the introduction. This have caused some confusion in understanding our choices. We have not more explicitly described these data, both in the introduction (p.4), in the result section (p. 8) and later in the discussion (p. 21). We thank the reviewer for bringing this up.

      Reviewer #3 (Public Review):

      Araya-Secchi and coauthors present a very interesting study on the role of PIP2 lipids in the potential modulation of prolactin receptor signaling. The study is well-conducted and employs an integrated approach that combines NMR spectroscopy, modeling (primarily coarse-grain MD simulations), and cell biology. This combination of methods is crucial for gaining a deeper understanding of cell receptors, from their biophysical properties to their cellular functions.

      The modelling work is mainly based on both coarse grain forcefield versions Martini2.2 and Martini3. These two versions of the forcefield may produce different results. Therefore, depending on the system being modeled, the results presented here should be considered in light of the limitations inherent to each version of the forcefield.

      We thank the reviewer for the positive appraisal of our work and the approach we employed. It is true that one must be aware of the limitations of the tools and models employed in this type of work. We agree that perhaps we were not too explicit about limitations of our methods in the presentation of the results. However, we have addressed and discussed such limitations in the revised version of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates that Chinmo promotes larval development as part of the metamorphic gene network (MGN), in part by regulating Br-C expression in some tissues (exemplified in the wing disc) and in a Br-C independent manner in other tissues such as the salivary gland. I have included below the following comments on the submitted version of this manuscript:

      1) The authors have shown experimentally that Chinmo regulates Br-C expression in the wing disc but not the larval salivary gland. Based on this, they posit that Chinmo promotes larval development in a Br-C-dependent manner in imaginal tissues and a Br-C-independent manner in other larval tissues. This generalization of Chinmo's role in development would be more compelling if the relationship between Chinmo and Br-C were explored in other examples of imaginal/larval tissues.

      We agree with the referee that confirmation of our observations in other tissues might help to generalize Chinmo’s role. To this aim, we have analyzed the role of chinmo in an additional larval, the larval tracheal system, and imaginal tissue, the eye disc. Consistent with the results reported in the manuscript, we found that the mode of action of Chinmo is conserved, as depletion of Br-C in the eye disc is able to rescue the lack of chinmo, whereas in the tracheal system it is not. We included this new information in the main text and in new SFigures 1 and 3.

      2) Chinmo, Br-C, and E93 have all been shown to be EcR-regulated in larval tissues, including the brain and wing disc (as in Zhou et al. 2006, Dev Cell; Narbonne-Reveau and Maurange 2019, PLOS Biology; Uyeharu et al. 2017, ). It would be interesting (and I believe relevant to this study) to know whether the roles of these factors in their respective developmental stages are EcR-dependent and whether their regulation by EcR (or lack thereof) depends on whether the tissue is larval or imaginal.

      Although the relevance of EcR on the regulation of the genes that conform the metamorphic gene network has been already established, a different response of EcR-mediated signalling of these genes in larval and imaginal tissues is still not properly addressed. Finding this possible different output of the EcR signalling would be very interesting. However, we think that this is out of scope of this report as the main aim of this study was to determine the main role of the temporal genes during development and their repressive interactions.

      3) In the chinmo qPCR analysis shown in Fig1A, whether animals were sex-matched or controlled was not indicated. Since Chinmo has a published role in regulating sexual identity (Ma et al. 2014, Dev Cell; Grmai et al. 2018, PLOS Genetics), and since growth/body size is known to be a sexually dimorphic trait (Rideout et al. 2015, PLOS Genetics), it seems important to establish whether the requirement of Chinmo for larval development and/or growth. I recommend either 1) controlling for sex by repeating qPCRs in Fig 1A in either males or females, or 2) reporting male/female chinmo levels at each stage side-by-side.

      As the referee pointed out, chinmo has been related to sexual identity raising the possibility of a different effect of chinmo in growth of males and females during development. However, several observations discard this option. First of all, the role of chinmo in sexual identity has been only reported in adult testis and specifically in cyst stem cells. In fact, specific mutations of chinmo that only affects the expression of chinmo in testis, do not affect testis formation but its maturation, suggesting a role of chinmo in sex determination specifically in the testis cyst stem cells (Ma et al. 2014, Dev Cell; Grmai et al. 2018, PLOS Genetics). Second, it has been described a sex dependent growth rate during larval development (Rideout et al. 2015, PLOS Genetics; Sawala A. and Gould AP, PLoS Biol, 2017). However, the main difference in growth rate between males and females is found in L3 larvae (Sawala A. and Gould AP, PLoS Biol, 2017), when the expression of chinmo strongly declines in both males and females, indicating that chinmo impact on sex dimorphism during larval development might be at least, limited.

      Thus, considering that, based on our results, chinmo exerts its main role in larval tissue growth during L1 and L2 stages and that body growth is practically identical in male and female during these stages (Sawala A. and Gould AP, PLoS Biol, 2017), we can assume that chinmo might not contribute to sexual body size dimorphism.

      Nevertheless, we would like to clarify that we have performed the measurements of chinmo expression always in females, when sex identification was possible, namely in L3 larvae. L1 and L2 larvae qPCRs were not sex-discriminated as sex identification was not possible in our conditions.

      4) In Fig2E, the authors show that salivary gland secretion (sgs) genes are repressed in salivary glands lacking chinmo. Sgs genes are expressed during late larval stages as the animal prepares to pupate. Thus, based on the proposed model where Chinmo promotes larval development and represses the larval-to-pupal transition, one might expect that larval salivary glands lacking chinmo would express higher than normal levels of sgs genes. This expectation directly opposes the observed result - it would be helpful to speculate on this in the interpretation of results.

      This is an interesting observation. As Sgs genes are regulated by Br-C (Duan et al. Cell Reports 2020), precocious expression of this transcription factor in chinmo depleted animals might result in an early activation of those genes. Interestingly, we were not able to detect any Sgs genes expression in chinmo depleted salivary glands. We think that this is due to the fact that in absence of chinmo, this organ does not properly develop and mature, and therefore it is unable to express Sgs genes. Proof of that is that the double knockdown of Br-C and chinmo shows the same dramatically low levels of those genes. Altogether, these results strongly suggest that SGs lacking chinmo expression are unable to grow and synthesise Sgs proteins, even in the premature presence of Br-C. We discussed this point in the main text of the edited Ms. Please also see the response to referee 2.

      Reviewer #2 (Public Review):

      The evolution and control of the three-part life history of holometabolous insects have been controversial issues for over a century. While the functioning of broad as a master gene controlling the pupal stage and of E93 as a master gene for the adult stage has been known for about a decade or more, chinmo has only recently been proposed as being the master gene responsible for maintaining the larval stage (Truman & Riddiford, 2022). While the former paper focused on the embryonic and early larval function of Chinmo, this paper explores its metamorphic effects and defines the roles of Broad and E93 in the phenotypes produced by manipulations of Chinmo expression.

      Overall, the paper is well presented but in places, readers would be helped if the authors were more explicit about the logic and details of their manipulations. There are a couple of conceptual issues that the authors should address.

      The role of Broad in larval tissues:

      One intriguing issue relates to the relationship of Chinmo to Broad and E93 in larval versus imaginal tissues prior to metamorphosis. The knock-down of chinmo in imaginal discs results in severe suppression of growth and the lack of metamorphic patterning genes such as cut and wingless. Normal growth and patterning are reestablished though, if broad is also knocked-down, supporting the notion that the effects of the lack of Chinmo are mediated through the premature expression of Broad.

      In the salivary glands, by contrast, chinmo knock-down suppresses growth, and this growth suppression is not reversed by simultaneous broad knockdown. They properly conclude that the role of Chinmo in supporting the growth of larval tissues does not involve Broad, but their data on the expression of salivary gland proteins suggest that Broad still plays some role in Chinmo function in salivary glands. Fig. 5E shows the levels of various salivary glue proteins in the glands of Chinmo knock-down larvae. The levels are reduced, as expected by the lack of salivary gland growth, but a significant finding is that they are there at all! The Costantino et al. (2008) paper shows that these genes are only induced in the mid-L3. Ecdysone, acting through Broad isoforms, is necessary for their appearance and these SGS genes can be induced in the L1 and L2 stages by ectopic expression of some Broad isoforms. Their low levels in Fig 5, would be due to the small size of the gland, but the gland's premature expression of Broad likely causes their induction. In larval cells, then, Chinmo may feed into two parallel pathways, one that does not involve broad and regulates growth and the other, utilizing Broad, regulating premetamorphic changes.

      It would be useful to look at early larval salivary gland proteins such as ng-1 to -3 that are expressed in salivary glands before the critical weight. Also, it would be interesting if the appearance of the SGS proteins after chinmo knock-down (Fig 5E) is abolished by simultaneous knock-down of broad.

      This is an interesting observation. We think that the main problem has derived from the way we presented the data. Our results showed that depletion of chinmo in the SGs dramatically impairs the induction of Sgs gene expression, even with the premature presence of Br-C, which has been shown to be responsible for Sgs expression (Duan et al. Cell Reports 2020). The confusion might come from the way we presented the level of expression of those genes. In fact, the levels of Sgs in both chinmoRNAi and chinmoRNAi/Br-CRNAi SGs were virtually undetectable, suggesting that chinmo in the SG is not only required for Br-C repression but also for proper development of the gland. We believe that based on the fact that the very low levels of expression of Sgs genes in chinmo depleted SGs are still detected in the double knockdown chinmoRNAi/Br-CRNAi. Dramatically reduced expression of the early larval SGs ng1-3 genes in chinmoRNAi and double knockdown chinmoRNAi/Br-CRNAi supports this statement. Altogether these results suggest that Br-C is necessary but not sufficient for the expression of those specific SGs genes. We have changed the plots in Figure 2 and 3 to clarify this point and added the levels of expression of ng1-3.

      Role of Chinmo and Broad in Hemimetabolous insects:

      In the conclusion of their comparative studies on the cockroach (line 342), the authors state that Broad exerts no role in the development of hemimetabolous insects. However, this conclusion is not consistent with the literature. The first study of broad knockdown in a hemimetabolous insect was in the milkweed bug Oncopeltus fasciatus by Erezyilmaz et al. (2006). Surprisingly to Erezyilmaz et al., broad knock-down in early-stage nymphs did not cause premature metamorphosis. However, Broad expression was essential for tissues of the wing pads and dorsal thorax to undergo morphogenetic growth (rather than simple isomorphic growth), and for stage-specific changes in coloration through the nymphal series (but not for the nymph to adult color change). A similar function for Broad on wing growth during the later nymphal stages was later shown in Blattella (Fernandez-Nicolas et al., 2022; Huang et al., 2013). The wing- and genital pads represent "imaginal" tissues in the nymph and the need for Broad in these tissues are the same as seen in imaginal discs as the latter shift from isomorphic growth to morphogenesis at the critical weight checkpoint in the L3. This would suggest that important roles for Broad and E93 are already established in the hemimetabolous insects with E93 controlling the shift from immature (nymphal) to adult phenotypes and Broad controlling the premetamorphic growth of imaginal tissues in early-stage nymphs. Chinmo might then be needed to keep both in check.

      We are sorry for not having dealt with these observations in the submitted manuscript. We have taken them into consideration in the new version to discuss about the role of Br-C in the transition from hemimetabolous to holometabolous.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors study single and pairs of MDCK cells adherent to an H-shaped geometry on a flat surface. In this pattern, the cells form strong peripheral stress fibers. To a lesser extent, these cells also exhibit stress fibers in the cell interior, which otherwise has a rather homogenous actin distribution. Using a combination of traction force microscopy, from which they infer the stress distribution by monolayer stress microscopy, and "contour analysis" the authors quantify the 'bulk' and the 'surface' stress in these cells. This analysis shows that single cells are mechanically polarized whereas pairs are not.

      The authors then go on to optogenetically activate the actomyosin contractility of either one half of a single cell or one cell of a pair. Combining their stress measurements in these situations and using a finite element mechanical model, the authors convincingly show that the mechanical response in the non-activated part is active. By varying the aspect ratio of the adhesion patterns, they also find that the efficacy of active stress propagation depends on the mechanical and structural polarity of the cell. Furthermore, they provide evidence that their results on cell pairs generalize to tissues.

      Strengths:

      This study uses a nice combination of physical tools to address an important question in tissue mechanics. The data is compelling and fully supports the authors' conclusions.

      Weaknesses:

      There are no major weaknesses.

      In summary, although the fact that mechanical stress propagation in tissues is an active process might not come as a surprise, the study makes substantial contributions to a quantitative contribution of this process. As such it is of fundamental significance in the field. It will be interesting to explore the consequences of this mechanism for mechanical stress propagation in the context of developmental processes. It will be also of great interest to study how this local process can be accounted for in large-scale theories.

      We thank reviewer #1 for this very positive assessment. We agree that in the future, our results should be used on the theory side to upscale them to tissue level. One way to do this would be the discontinuous Galerkin method, but it will take time to work this out. We also note that we would have loved to experimentally study intermediate cases between two and many cells, but it turned out to be very difficult to position few cells on micropattern and to repeat the force propagation analysis which we present here for two cells and for small tissues. In fact, it might be more rewarding to use optogenetics early in a developmental process with clearly defined cell positioning. In the revised manuscript, we now have added a comment on the challenge to work with three or four cells with the micropatterning approach, and that therefore we turned to small monolayers.

    1. Author Response

      Reviewer #3 (Public Review):

      In this study, the authors probe the molecular changes that occur in a neural circuit for learned behavior that depends on sensory input to maintain stereotypy. Songbirds, as the Bengalese finches used here are, are premier systems in which to ask these questions because they produce a highly stereotyped song that emerges after sensory learning becomes integrated into the function of a sensorimotor neural circuit responsible for singing. By deafening a group of birds (who show a shift in their song structure) and comparing them to hearing birds, clues as to how plasticity in motor output may emerge from genomic changes that alter the function of cells within the various components of the neural circuit.

      There are multiple strengths of the paper:

      1) The results may have broad implications because the type of sensorimotor neural circuit (cortico-basal ganglia-thalamic-cortical loop) used for singing is generally necessary for learned behaviors.

      2) The methods and analyses are generally rigorous, including the parsing of song elements, and the type of detailed RNA sequencing and analysis that demonstrates the power of a genomic view of neural plasticity as it relates to behavioral plasticity.

      3) Because the authors assayed the pallial (cortical) areas, as well as the basal ganglia component, of the sensorimotor circuit they were able to creatively compare how different facets of the network contributed to a) unmodified brain properties, b) properties perturbed after the loss of the auditory input that is required to stabilize song structure. As a result, they have added to the known molecular profiles for each of these brain areas, the accounting of how they may be specialized in comparison to the surrounding non-song brain, and what changes occur after deafening. Utilizing some existing single-cell sequencing data, the authors present for the first time some insight into what cell types may be showing the most robust changes, and therefore which may be driving the shift in song structure. The analysis further pushes in new ways to suggest how the molecular properties of a given brain area may relate to those of directly-connected areas. Together, these findings provide valuable clues as to the specific cell types and signaling properties that may be central to the production of stabilized, learned behavior.

      4) One of the cortical brain areas, LMAN, was lesioned in a subset of the hearing subjects because it projects to the area that showed the greatest molecular difference between deafened and hearing birds (RA). The idea was to compare how this affected molecular properties with properties after the loss of auditory input; because RA is the output motor area for the song, its properties may be most directly tied to song structure. Using unilateral lesions was a strong choice of experimental design that allowed for rigorous analysis of this idea, and was interpretable because birds do not have a direct inter-hemispheric callosum.

      The foundation of the paper is solid, though the results shown raise several questions that are not fully addressed, and limit some of the power of the implications.

      The biggest questions arise from the finding that RA shows the largest number of molecular changes after deafening. The analysis and interpretations do not fully incorporate what we know of this circuit, at least from another well-studied songbird, the zebra finch, from which the authors derive other types of information. For example, it is not yet clear if RA is most changed because it is most directly involved in song output or because it receives projections from two areas within the sensorimotor circuit (LMAN and HVC). How do we consider the fact that by adulthood, LMAN and HVC cells project onto the same RA neurons? Are those the cell types being identified here? Would HVC lesions be expected to have the same effect as LMAN lesions? Are the cell types showing the greatest change those that are most involved in song output (e.g. are they projecting to nXIIts)? How do these results relate to the findings of changes in RA after HVC and LMAN lesions reported decades ago? How do these findings compare to an earlier study that also performed sequencing on areas from the sensorimotor circuit in deafened juveniles? Further, RA also receives information from the auditory processing regions of the brain, via the immediate structure RA-cup. It is not yet explicitly addressed how some effects may be from the loss of this more direct access to auditory information, rather than from information and projections originating within the sensorimotor circuit, and reinforces the question of whether or not the number of inputs to a particular brain area is a driving factor in the general pattern of changed RNAs after perturbation.

      We thank the reviewer for their review and for their excellent suggestions on how to improve its impact. The reviewer raises several important points, which we have expanded on in the Discussion of the revised manuscript, and will address here:

      First, there is the general consideration of how the structure of inputs to RA influences the interpretation of our results. There is the question about whether we can consider RA expression alterations as due to its direct projections to song motoneurons (‘output’) or the convergence of two important song nuclei, HVC and LMAN, onto RA (‘input’). This is a difficult question to untangle. We could interpret ‘output’ only effects as local perturbations that do not depend on song circuit afferent activity, such as hormonal fluctuations associated with the loss of hearing. ‘Input’ effects would occur through changes in afferent activity, such as those that elicit plasticity associated with song destabilization or more general alterations to the amount of afferent neural activity (a point addressed in the revised manuscript, lines 842-848). By focusing on a measure of song destabilization in our differential expression analysis, we are specifically seeking to identify gene expression responses that are associated with changes to behavioral output. Yet these behavioral changes are certainly driven by alterations in upstream regions or the manner in which they converge onto RA. The reviewer also notes inputs from RA-cup as a potential avenue through which the loss of auditory information could more directly influence expression in RA. It is certainly possible that the loss of auditory information itself could influence gene expression in different components of the song system, a point we note in the revised Discussion (lines 848-853). We also note there that future experiments leveraging different plasticity induction techniques (TS cut, delayed auditory feedback) will be important to resolve the influence of this input.

      Our lesion experiments aimed to characterize how input from LMAN influences expression in RA, due to LMAN’s important role in mediating song plasticity. We would expect HVC lesions to elicit different expression responses because of its distinct mode of transmission onto RA projection neurons (primarily AMPAR in contrast to primarily NMDAR for LMAN), the distinct activity patterns of HVC and LMAN, and likely distinct neuromodulatory signaling from the two afferents (e.g. LMAN acts as source of BDNF). We discuss how HVC lesions would be useful to further disentangle the influence of afferent input on RA gene expression in the Discussion of the revised manuscript (lines 926-946). In the revised manuscript, we also cite previous work that examined the influence of HVC and LMAN on RA neural activity, morphology, and cell survival (lines 928-932).

      As to the cell types in RA that show expression changes following deafening, we show in Figure 5 that both glutamatergic projection neurons (‘RA_Glut’), i.e. the neurons that project to subcerebral structures such as nXIIts, as well as GABAergic interneurons (‘GABA’) show substantial expression alterations. In the Discussion, we highlight the functional roles of several genes that have enriched expression in each class (lines 864-873 and 887-893).

      In the revised manuscript, we have added a paragraph in the Discussion (lines 854-862) that references results from Mori, C. & Wada, K. Audition-independent vocal crystallization associated with intrinsic developmental gene expression dynamics. J. Neurosci. 35, 878–889 (2015). This work examined the influence of early deafening on gene expression in the song motor pathway and identified a strong developmental and audition-independent expression response. It identified an important separation between developmentally-driven and experience-dependent molecular responses in the song system. We note that the aims were distinct from the present study, which sought to identify gene expression responses to deafening-induced song plasticity.

      Importantly, since the LMAN lesions did not create significant changes in the song structure, it is difficult to know how to interpret the meaning of these molecular changes in RA, alone and in combination with the comparison to the RA profiles from deafened birds. Of importance is the question of whether or which molecular profiles are thus signatures of behavioral plasticity or not.

      The reviewer raises an important set of followup experiments that assess the extent to which the transcriptional state of the song system tracks with song plasticity state. Coupling LMAN lesions with deafening, a manipulation that prevents song degradation, would be a strong approach to identify genes whose expression is closely tied to song destabilization, a possibility that we now discuss (lines 936-946).

    1. Author Response

      Reviewer #1 (Public Review):

      1) There are two main 'weaknesses'. The first is the limited power that comes from only using measuring the phenotype of 387 strains. Whether this is because of the expense/ difficulty of the inToxSa is not discussed, leaving open the question of how much this assay could be scaled up in the future.

      A previous study investigating the toxicity of S. aureus culture supernatants assessed 217 clinical strains (https://doi.org/10.1371/journal.pbio.1002229).That study had sufficient power to uncover important genetic determinants of S. aureus virulence. Here, we significantly increased the throughput to 387 clinical strains combined with a sophisticated cell toxicity assay that measures the kinetics of cytotoxicity caused by intracellular S. aureus. We have investigated the S. aureus genetic associations using this rich dataset (each of the 387 strains were assessed in 3 to 15 replicates, accruing 655,005 measurements corresponding to kinetic cytotoxicity assessments of intracellular S. aureus). This rich dataset enabled the accurate identification of genomic signatures that modulate cytotoxicity; genomic signatures that we then validated by reconstructing the mutations, thus demonstrating the power of our approach. The upscaling of this method (4-fold, with adequate technical adjustments) should be possible with the adoption of a 384-well plate format instead of a 96-well plate. We will continue to investigate additional clinical isolates and explore the use of 384-well plates, but the analysis we present of data from the 96-well format is already a substantial advance for the field.

      Across this study, and as presented in the current manuscript, the maximum throughput of the InToxSa assay was of 7x 96 well plates per week, thus corresponding to 98 distinct clinical strains testable per week (encompassing 6 individual replicates, each tested across 2 different days/plates). Following the reviewer suggestion, we have added this information to the discussion (Lines: 406-409).

      2) The second is that the main output of the assay is actually reduced intracellular toxicity (PI uptake AUC), which is inferred to be strongly linked to increased intracellular persistence. The linkage between the phenotypes comes primarily from microscopic studies on a limited number of strains. It may be true of all cases, but the possibility exists that for some of the strains, reduced cytotoxicity may be associated with intracellular elimination, which would presumably be a negative outcome for systemic infection.

      Whilst the reviewer’s comment is pertinent, we note that none of the least cytotoxic S. aureus isolates identified by the InToxSa assay have resulted from bacterial clearance, intracellular bacterial growth defects or evasion from their cellular niche, as we have assessed intracellular bacterial loads at 3h and 24h (post-bacterial uptake) in experimental conditions using cell-impermeant antibiotics (which would kill extracellular bacteria and prevent over-infection of non-infected bystander HeLa cells), as shown in figures 5F and 5H and also in Figure 5 Supplementary figure 5, highlighting an inverse correlation between cytotoxicity and intracellular persistence.

      Reviewer #2 (Public Review):

      1) …Thus, my concerns are focused on further understanding the practical utility of the approach and whether or not the HeLa cell model recapitulates what happens in professional phagocytes.

      HeLa cells have proven a useful cellular model in infection and in pathogen biology to assess the ability of bacterial pathogens to invade, persist and replicate within host cells. Several studies have convincingly used HeLa cells to assess S. aureus phenotypes at the bacteria-host cell interface, as exemplified by the following recent research (DOIs: 10.1128/mBio.02250-20, 10.1371/journal.ppat.1009874, 0.138/s41598-019-51894-3, and 10.1128/mSphere.00374-18). We do also acknowledge the limitations of cell line models in the discussion (Lines 494-510).

      2) …it is not clear to me that this system has the statistical power to find novel, biologically relevant rare mutations without first being very mindful in selecting strains that are extremely genetically similar.

      As described, this is a S. aureus bacteraemia study, wherein the strains composing the collection are, by definition, closely related. We articulated this in the manuscript “We used InToxSa to identify S. aureus pathoadaptive mutations, enriched in bacterial populations that are associated with human disease (e.g., upon transit from colonising to invasive”. “We hypothesised that these mutations would support an intracellular persistence for S. aureus.”) We see no foreseeable reasons preventing this type of study of being replicated elsewhere.

      3) It is also not clear to me that the toxicity assay captures the important features of the intracellular persistence that occurs in vivo within professional phagocytic cells.

      Response: Indeed, it is possible that InToxSa using HeLa cells may not capture the features of intracellular S. aureus persistence within professional phagocytes. However, our data shows that it remains possible to uncover genomic features related to intracellular cytotoxicity and persistence, both traits relevant S. aureus-host cell biology. The cells forming physical barriers, such as the epithelial cells and endothelial cells play major roles in staphylococcal pathobiology. Whilst HeLa cells are a model cell line, their tractability makes them ideal for high throughput studies tested over longer infection times.

    1. Author Response

      Reviewer #1 (Public Review):

      Mermithid nematodes are ecologically important parasitoids of arthropods, annelids and mollusks today. Their fossil record in amber reaches back into the Early Cretaceous, some 135 million years ago. Luo et al. more than triple this record by presenting, with ample illustrations, exceptionally well preserved new specimens from the beginning of the Late Cretaceous (99 Ma ago) of Myanmar. Their most important finding is that mermithids parasitized a number of insect clades in the Cretaceous that they are not known to infect today or in Cenozoic amber; further, the proportion of holometabolous insects among the hosts is found to be lower in the Cretaceous than in the Cenozoic. The strengths of the paper lie in the specimens, the illustrations of the specimens, and the documentation of when, where and how the specimens were acquired. Certain nomenclatural aspects of the paper require improvement. A potential weakness of the paper could be collection bias: it is not tested whether the collections used to show the shift toward holometabolous hosts from the mid-Cretaceous to the Cenozoic are representative of the fossil record as it is preserved and accessible today.

      Thank you very much for pointing out these issues. We have added a new Figure 10 and Table 1 to our paper. Indeed, collection bias is almost present in all amber biotas. However, we believe we have robust reasons to argue that the shift to holometabolous hosts does exist. Although Kachin amber has only been studied extensively in the last two decades (compared with centuries of study in Baltic amber or Dominican amber), it has become by far the most intensively studied amber biota since its Cretaceous age was appreciated, now comprising an exceptional 700 families (Ross, 2023). Also, the fossil record of holometabolous insects is clearly much better than heterometabolous insects in Kachin amber (1296 spp. vs 465 spp. respectively). But as shown in our paper, the nematodes we found in Kachin amber are mainly associated with heterometabolous insects. Therefore, even if collection bias might exist, such as the presence of some unreported nematode-Holometabola associations, we believe our conclusion about the shift is robust. We also add some explanation in our paper.

      Reviewer #2 (Public Review):

      This manuscript reports on mermithid nematode fossils from amber which dates from the Cretaceous period. The specimens described in the manuscript consist of insects and associated nematodes which have been trapped in amber and fossilised. The nematodes have been identified as belonging to the Mermithidae family, a family of nematode worm that infect insects. The findings of this manuscript provide an insight into the evolution history of nematodes and parasitism. Despite the ubiquity of both nematodes and parasites in extant ecosystems, fossil records of both are very rare. This is because nematodes and many parasites are soft bodied, and many are located inside their hosts' bodies, thus they rarely become fossilised. Thus, most of what is known about the evolutionary history of nematodes, and evolution of parasitism are based on what could be inferred from extant examples.

      The specimens described in this manuscript provides a valuable contribution to our understanding of parasitism in the geological past. These amber specimens are a snapshot of parasite-host interactions - interactions which are commonly found in nature but are rarely captured in fossils. The identification of the specimens as mermithid nematodes are based on sound scientific reasoning. The worms' morphology and position in relation to the insects are consistent with what have been observed with extant mermithid nematodes.

      Additionally, one of the values of such parasite fossils is that they provide us with insight into parasite-host combinations or interactions which may have existed throughout the geological past, but no longer exist today or cannot be inferred from extant taxa. It helps fill in major gaps in our understanding of parasitism. This was the case with the amber fossil that contained a bristletail with its nematode parasite.

      We are very grateful for the positive and encouraging comments.

      Reviewer #3 (Public Review):

      The authors provide a timely description of new mermithid nematodes from Cretaceous amber and use it to argue an important shift in insect host exploitation. The descriptions are state-of-the-art and will become valid once the appropriate zoobank numbers are used after publication. The authors also compiled crucial and detailed new information on the host exploitation in amber nematodes in the supplementary material. This data is also depicted in pie diagrams and seems at first glance to support their interpretations of a shifts in host exploitation in fossil amber deposits when analysed appropriately and statistically but such an true analysis and depiction should be part of the main manuscript to do the compilation and interpretation justice. For the sake of reproducibility and the field, such fundamental statistical analysis as well as a statistical comparison with modern hosts would make this broad-sweeping claim of a major host shift and importance of amber deposits containing such nematode-insect interactions since the Cretaceous (even) more robust and fundamental.

      Thanks. We realized this drawback and now we calculated the 95% CI using the Agresti-Coull method of the “binom.confint” function from the binom R package (https://cran.r-project.org/package=binom) of R 4.2.2. We also added a new Figure 10 and Table 1 in our paper. But, since we compiled the “occurrence” of invertebrate–nematode associations from these amber localities, it is impossible to compare with modern mermithids. For example, the parasite of Cretacimermis chironomae occurs five times in Kachin amber, but an extant dipteran-parasitized mermithid species can occur many times just in a single pond. However, it is evident that mermithids and all invertebrate-parasitized nematodes prefer to infect holometabolous insects rather than other invertebrates (Poinar, 1975; Poinar, personal observation). We have also added some explanation to our paper.

    1. Author Response

      Reviewer #2 (Public Review):

      Yamaguchi et al. studied the roles of two proteins, Calaxin and Armc4, in the assembly of the outer arm dynein (OAD) docking complex (DC). By combination of the improved cryo-ET analysis and gene knockout zebrafish lacking each of these proteins, they found that Armc4 plays a critical role in the docking of OAD and that Calaxin stabilizes the molecular interaction in the docking.They further showed an evidence that Calaxin changes the conformation of another compartment of DC comprising CCDC151/114. This new information provides an important basis for understanding how the DC is assembled and regulates docking of OAD. The authors' conclusion is well supported by the data but some data presentation and discussion need to be completed.

      Gui et al. (2021) already reported on a cryo-EM observation in bovine tracheal cilia, with the conclusion similar to this paper in the structure of OAD/DC on DMT. Using knockout zebrafish strain, the authors present detailed interaction of calaxin with other DC components. They show that the binding of calaxin induces the changes of conformation in N-terminal region of CCDC151/114. The conformation further changes in the presence of Ca2+; specific conformation of N-terminal region of CCDC151/114 becomes undetectable, instead additional structure appears in the vicinity of calaxin.

      1) The authors conclude that the Ca2+-dependent conformational change of DC is subtle and not dynamic. This result is eventually valuable information but may be somewhat unexpected from the point of view that calaxin plays an important role in the regulation of flagellar motility in Ciona sperm. The authors found that calaxin changes the conformation of N-terminal CCDC151/114 region but the core dynein structure shows no dynamic change. What about the changes in the interaction between calaxin, core dynein, and DMT? Is this beyond the resolution of cryo-ET analysis?

      Since Mizuno et al., 2009 reported that Ciona Calaxin switches its interactor depending on Ca2+ concentration, it is highly expected that zebrafish Calaxin also changes its interactor in 1 mM Ca2+ buffer conditions. However, the resolution of our cryo-ET data was insufficient to detect the change of Calaxin interactors. More detailed structural analyses are required to understand the OAD structures in the Ca2+ buffer conditions. We discussed this point as follows:

      (line 389-395)

      Regarding the Calaxin conformation, a previous biochemical analysis reported that Ciona Calaxin switches its interactor depending on Ca2+: β-tubulin at lower Ca2+ concentration and OAD γ-HC at higher Ca2+ concentration (Mizuno et al., 2009). Moreover, a crystal structure analysis revealed the conformational transition of Ciona Calaxin toward the closed state by Ca2+-binding (Shojima et al., 2018). In this study, however, such conformation change of Calaxin was not detected, probably due to insufficient resolution of our cryo-ET analysis. More detailed structural analyses in the Ca2+ condition are required to understand the mechanism of the Ca2+-dependent OAD regulation.

      2) It would be very helpful if the authors could add the cryo-ET images of calaxin-/- axoneme in the presence of 1 mM EGTA in Figure 7. Although these images are thought to be similar or identical to Figure 4F, it would help to confirm that the conformational changes in CCDC151/114 and additional part of DC are induced in a Ca2+-dependent manner.

      We added the cryo-ET images of calaxin-/- OAD-DC (1 mM EGTA) in Figure 7D.

      3) To clarify the molecular interaction of calaxin with other components, it would also be helpful if the authors add the images rotated 80 degree to Figure 4F and G, in similar way in Figure 7.

      We added the images of OADs rotated 80 degrees in Figure 4F and G.

      4) Despite the molecular phylogenetic difference, there are several similarities between calaxin and Chlamydomonas DC3, not only in the in situ structure and configuration but in the phenotype of mutants; Chlamydomonas mutant lacking DC3 shows OAD loss in the distal part of a flagellum (Casey et al, MBC, 2003). It may be a good reference if the authors add the position of DC3 in Figure 4. A', B', and C.

      To answer this comment, we created Figure 4—figure supplement 1, which shows the cryo-ET structures and models of OAD-DCs in vertebrates and Chlamydomonas.

      5) There is a significant difference in sperm motility between WT and calaxin-/- or WT and armc4-/- (Figure 2E). However, it is not clear whether immotile sperm were included in the data for VAP (Figure 2F) or BCF (Figure 2G). For example, WT and calaxin-/- show similar VAP, although both are significantly different in the percent of motile sperm.

      In our CASA study, spermatozoa with less than 20 μm/s velocities were considered immotile and excluded from the data for VAP (Figure 2F) and BCF (Figure 2G). To clarify this point, we revised the manuscript as follows:

      before

      Swimming velocity and beating frequency were calculated from the trajectories of the motile spermatozoa (Figure 2F-G; Figure 2—figure supplement 1; Video3).

      after (line 139-141)

      Swimming velocity (VAP) and beating frequency (BCF) were calculated from the trajectories of the motile spermatozoa, which have 20 μm/s or more velocities (Figure 2F-G; Figure 2—figure supplement 2; Video3).

      6) In calaxin-/- zebrafish, OAD was clearly detected from the base to two-thirds of a flagellum with unclear border (Figure 2A). Typical distribution of OAD+class and OAD-class are shown in Figure 5 in the ~3 micrometer tomograms. Were these taken from around this unclear border? Are proximal most region of a flagellum occupied with OAD+class only? The authors should clearly indicate the region of a flagellum where the tomograms in Figure 5C and D were selected.

      7) Line 229~: It is not clear what the authors meant by "probably reflecting the different distance from the sperm head". In relation to this and the comment 6, does the "proximal" in the sentence "OAD loss occurred even in the proximal part of the flagella" (line 232) indicate the region near the base of a flagellum?

      In general, axonemes are tangled on the cryo-TEM grids, which makes it difficult to identify the ends of all axonemes, especially for the long zebrafish sperm flagella. Thus, we could not clarify the region of a flagellum about the tomograms shown in Figure 5D.

      However, to answer comments (6) and (7), we created Figure 5—figure supplement 1. In this experiment, we newly generated cryo-TEM grids with sparse sperm axonemes and succeeded in finding two areas containing clear axonemal ends with suitable ice conditions for cryo-ET observations (Figure 5—figure supplement 1B). The polarity of the axonemes was judged from the 3D-reconstructed structures of the axonemes (Figure 5—figure supplement 1B, red dotted lines). By the structural classification of OAD+ class and OAD- class in the tomograms, we confirmed the OAD loss in calaxin-/- even in the proximal part of the flagella, which is near the base of a flagellum (Figure 5—figure supplement 1D, (a) and (c)). To clarify these points, we revised the manuscript as follows:

      before

      In calaxin-/-, the ratio of OAD+ class to OAD- class varied among tomograms (Figure 5D), probably reflecting the different distance from the sperm head. However, all calaxin-/- tomograms showed multiple clusters of OAD- class, indicating that the OAD loss occurred even in the proximal part of the flagella.

      after (line 236-239)

      In calaxin-/-, the ratio of OAD+ class to OAD- class varied among tomograms (Figure 5D), reflecting the different distances from the sperm head. Analysis of detailed OAD distributions along calaxin-/- axoneme revealed that OAD loss occurred even in the proximal part of the flagella (Figure 5—figure supplement 1D).

      8) In conjugation with comment 7, it would be appreciated to show an authors' idea on why distal region of flagella tends to lack calaxin, if they do not discuss anywhere in the text.

      We discussed this point as follows:

      (line 316-323)

      calaxin-/- spermatozoa exhibited a unique OAD distribution, with OAD-missing clusters at various regions of the flagella. Interestingly, OADs decreased gradually toward the distal end, by which the mechanism is unclear. The axoneme is elongated by adding flagellar components to its distal end during ciliogenesis (Johnson & Rosenbaum, 1992). IFT88, a component of the IFT machinery, disappears as the spermatozoa mature (San Agustin et al., 2015). Thus, we speculate that the OAD supply at the distal sperm axoneme is insufficient to compensate for the OAD dissociation in the calaxin-/-. Consistent with this idea, distal OAD loss is the sperm-specific phenotype, as olfactory epithelial cells in calaxin-/- have Dnah8 along the entire length of the cilia (Figure 6B).

      9) Immunofluorescence in twister-/- epithelial cilia showed that the localization of calaxin is independent of OAD (line 271-274). Based on the authors' finding, the localization of calaxin requires Armc4, which is preassembled with calaxin in the cytoplasm. If this is true and the localization of calaxin is NOT resulting from diffusion, Armc4 must be localized with calaxin along the entire length of cilia in twister-/- epithelial cilia (Figure 6D). Although Armc4 is shown localized in cryo-ET images (e.g. Figure 1, Figure 7), authors may provide the immunofluorescence of Armc4 along the entire length of sperm flagella and epithelial cilia.

      To answer this comment, we obtained a commercially available anti-ARMC4 (human) antibody and checked the cross-reactivity of the antibody against zebrafish Armc4, but no signal was detected in our western blot analysis. Thus, we could not assess the localization of zebrafish Armc4 in twister-/- epithelial cilia.

      In our study, we found an ectopic accumulation of Calaxin at the ciliary base in armc4-/- cells (Figure 6C, white arrowheads). The small molecular weight of Calaxin (~25 kDa) suggests the possible diffusional entry of Calaxin into the ciliary compartment. However, in armc4-/- cells, Calaxin accumulated at the ciliary base, strongly suggesting that Calaxin requires Armc4 to be localized to cilia.

      Reviewer #3 (Public Review):

      ODA-DC anchors ODA, the main force generator of ciliary beating, onto the doublet microtubules. Vertebrate ODA-DC contains 5 proteins, including Calaxin and Armc4, whose mutations are associated with defective ciliary motility in animals and human. By generating calaxin-/- and armc4-/- knockout zebrafish lines, this manuscript examined the Kupffer's vesicle cilia and spermatozoa. They showed that calaxin-/- and armc4-/- knockouts both affect ciliary motility but to different degrees. The authors conducted careful structural analyses using cryo-ET and subtomo averaging on both mutants, revealing a partial loss of ODA in calaxin-/- and a complete loss of ODA in armc4-/-. I really like the distribution analysis of calaxin-/- OADs (Figure 5), which emphasizes the strength of cryo-ET in uncovering the molecule distribution of distinct conformational states in situ. Fitting of the atomic models of ODA and ODA-DC into the cryo-ET density maps and Calaxin rescue experiments showed how Calaxin stabilizes ODA at a molecular detail. By using olfactory epithelium, the authors also presented the possible assembly mechanism of ODA-DC proteins, which is also a beautiful experiment. Finally, the authors also investigated how Ca2+ regulate the ODA-DC using cryo-ET.

      The thorough structural and functional analyses of Calaxin and Armc4 in WT and gene KO animals could serve as a reference for future study of the detailed function of other ciliary proteins. The experiments are overall well designed and conducted, but some aspects need to be clarified and improved.

      The authors interpret the vertebrate ODC-DC to include four linkers (line 193). However, the authors also said that loss of one linker (Calaxin) makes ODA to attach on the DMT through two linkers (line 199 and 246). These descriptions are confusing. It would make more sense to interpret the vertebrate ODC-DC as containing three linkers (CCDC151/114, Armc4/TTC25, Calaxin).

      This comment is reasonable because vertebrate OAD is tethered to DMT through three linker structures (the distal CCDC151/114, Armc4/TTC25, and Calaxin). However, vertebrate DC is composed of four parts (a) Calaxin, (b) the Armc4-TTC25 complex, (c) the proximal CCDC151/114, and (d) the distal CCDC151/114 (Figure 4E). The (c) part is embedded in the cleft between protofilaments A07 and A08. To clarify this point, we revised the manuscript as follows:

      before

      The bovine DC model shows that vertebrate DC is composed of four linker structures: (a) Calaxin, (b) the Armc4-TTC25 complex, (c) the proximal CCDC151/114, and (d) the distal CCDC151/114 (Figure 4E).

      after (line 196-200)

      The bovine DC model shows that vertebrate DC is composed of four parts: (a) Calaxin, (b) the Armc4-TTC25 complex, (c) the proximal CCDC151/114, and (d) the distal CCDC151/114 (Figure 4E). Among the four parts, three (a, b, and d) work as linkers between OAD and DMT, while (c) the proximal CCDC151/114 is embedded in the cleft between protofilaments of the DMT.

      To confirm whether Calaxin directly interacts with β-tubulin (line 213), a control experiment could be needed by incubating WT axoneme with mEGFP-Calaxin followed by IF imaging.

      In our manuscript, we wrote as follows:

      (line 218-224)

      To assess the specificity of Calaxin binding, we also performed a rescue experiment with mEGFP-Calaxin (Figure 4H-I; Figure 4—figure supplement 2). Ciona Calaxin was reported to interact with β-tubulin (Mizuno et al., 2009), suggesting the possible binding of Calaxin along the entire length of the axoneme. However, the rescued axonemes showed partial loss of EGFP signal (Figure 4H, white arrowheads). This pattern resembled the OAD localization of calaxin-/- in immunofluorescence microscopy, suggesting the preferential binding of Calaxin to the remaining OAD-DC. mEGFP alone showed no interaction with the axoneme (Figure 4H, asterisk).

      Therefore, our manuscript is NOT intended to support or deny the interaction between Calaxin and β-tubulin, which was reported by Mizuno et al., 2009. Instead, we focused on the interaction between Calaxin and OAD-DC, revealing that Calaxin binds to Calaxin-deficient OAD-DC (Figure 4G, H, and I). Thus, we assume this comment refers to the interaction between Calaxin and OAD-DC.

      To further discuss the interaction between Calaxin and OAD-DC, we created Figure 4—figure supplement 2. We tested Calaxin’s interaction by incubating recombinant mEGFP-Calaxin with sperm axonemes of calaxin-/-, armc4-/- (representing OAD-missing DMT), and WT (representing DMT with Calaxin and OAD). The localization of mEGFP-Calaxin was assessed by fluorescence microscopy of mEGFP signals. In calaxin-/-, mEGFP-Calaxin was bound to the limited region of the axoneme, with the partial loss of EGFP signals (Figure 4—figure supplement 2A, white arrowheads), consistent with Figure 4H. On the other hand, mEGFP-Calaxin showed no significant interaction with armc4-/- axoneme (Figure 4—figure supplement 2B) or WT axoneme (Figure 4—figure supplement 2C). These data show the preferential binding of Calaxin to the Calaxin-deficient OAD-DC than OAD-missing DMT or WT OAD. Although Mizuno et al., 2009 reported the interaction between Calaxin and β-tubulin, our analysis could not detect the signals for such interaction, probably due to the different binding affinity of Calaxin against OAD-DC and β-tubulin.

      The Immunoblotting experiment should be improved in Figure 5E. Could the authors get the same results in repeating experiments? Why is the Dnah8 signal higher in 50 mM NaCl of the (+)Calaxin group compared to that in 0 NaCl? This makes me doubt if the difference between (-)Calaxin and (+)Calaxin groups are significant.

      This comment is reasonable because NaCl concentration-dependent detachment of OAD-DMT suggests the highest Dnah8 signal in 0 mM NaCl of the (+)Calaxin group. To discuss this point, we created Figure 5—figure supplement 2, which shows the experimental replication of the immunoblot analysis in Figure 5E. In this experiment, we used calaxin-/- sperm axonemes collected independently of the Figure 5E data.

      However, again, the Dnah8 signal was higher in 50 mM NaCl of the (+)Calaxin group than that in 0 mM NaCl, confirming the result in Figure 5E. One possible explanation for this result is that the NaCl concentration affects the rescue efficiency of the Calaxin protein. We speculate that the Calaxin protein requires NaCl for efficient binding to OAD-DC, which caused the lower amount of OAD in 0 mM NaCl of the (+)Calaxin group compared to that in 50 mM NaCl.

      The authors have covered several important points in the Discussion section. Now that the function of Calaxin in both mouse and zebrafish have been reported, the authors could discuss the similarity and difference of Calaxin function in different species and tissues.

      To discuss this point, we inserted the following paragraph:

      (line 324-333)

      In mouse Calaxin-/- mutant, motile cilia in various organs (sperm flagella, tracheal cilia, and brain cilia) showed abnormal motilities, although OADs in the mutant cilia/flagella seemed mostly intact when observed by conventional transmission electron microscopy (Sasaki et al., 2019). In our study, however, we revealed that mutation of zebrafish calaxin caused OAD-missing clusters at various regions of the flagella, by using detailed cryo-ET analysis and immunofluorescence microscopy. Thus, we speculate that the same OAD defects to zebrafish calaxin-/- caused abnormal ciliary motilities in mouse Calaxin-/- mutant. One exception is the mouse nodal cilia. In mouse Calaxin-/- mutant, the formation of nodal cilia was significantly disrupted (Sasaki et al., 2019). On the other hand, zebrafish calaxin-/- mutant showed the normal formation of Kupffer’s vesicle cilia (orthologous to the mouse nodal cilia), suggesting the tissue-specific function of Calaxin on the ciliary formation.

      Because of the limited resolution, the authors should be more careful when interpreting the small densities in the difference map, for example, in Figure 4F-G black arrows. Considering that the CCDC151/114 coiled coil is overall poorly resolved both in the WT and mutant cryo-ET maps, the different densities could be due to different map quality or data processing. This makes the following statement suspicious "This structure corresponds to the N-terminus region of CCDC151/114, suggesting that Calaxin affects the conformation of neighboring DC components".

      This comment is reasonable because the resolution of our cryo-ET data was insufficient to identify each molecule in the cryo-ET map. To be more careful about the interpretation of our cryo-ET structures, we revised the manuscript as follows:

      before

      However, the difference map also showed an additional missing structure adjacent to Calaxin (Figure 4F’, black arrowhead). This structure corresponds to the N-terminus region of CCDC151/114, suggesting that Calaxin affects the conformation of neighboring DC components.

      after (line 207-210)

      However, the difference map also showed an additional missing structure adjacent to Calaxin (Figure 4F’, black arrowhead). When fitting the bovine DC model, this structure overlapped the N-terminus region of CCDC151/114, indicating that Calaxin can affect the conformation of neighboring DC components.

      To discuss the map quality and data processing of our cryo-ET analysis, we summarized the following points that can support the confidence of our data:

      (1) Two independent experiments showed the same results of OAD-DC structures, suggesting that the small changes in DC conformations were not due to different map quality or data processing:

      (a) For OAD structures in 1 mM EGTA condition, we analyzed the WT OAD (Figure 4D) and the calaxin-/- OAD rescued with recombinant Calaxin (Figure 4G). These samples were prepared in completely independent processes. However, in both cases, the small densities overlapping the N-terminus region of CCDC151/114 were visualized adjacent to Calaxin (Figure 4D and G, black arrowhead).

      (b) For OAD structures in 1 mM Ca2+ condition, we analyzed the WT OAD (Figure 7B) and the calaxin-/- OAD rescued with recombinant Calaxin (Figure 7C). These samples were prepared in completely independent processes. However, in both cases, the small densities overlapping the N-terminus region of CCDC151/114 were not observed. Instead, the additional densities appeared around DC (Figure 7B and C, white arrowheads).

      (2) We assessed the statistical significance of the changes in DC conformations. We applied Student’s t-test for WT and calaxin-/- OAD-DC structures and created Figure 7—figure supplement 1. p-values of each voxel were calculated as described in Oda & Kikkawa, 2013. The isosurface threshold of p-values corresponds to 0.05% probability in one-tailed test. p-value maps indicate not only Calaxin structures but also the adjacent small density (Figure 7—figure supplement 1A, black arrowhead) and the additional density around DC (Figure 7—figure supplement 1B, white arrowheads) as the statistically significant difference between WT and calaxin-/- OAD-DC.

    1. Author Response

      Reviewer #1 (Public Review):

      This project aimed to understand if decision making impairments commonly observed in older adults arise from working memory (WM) or reinforcement learning (RL) deficits. Evidence in the paper suggests it is the former; they observe poorer task accuracy in older adults that is accompanied by a faster memory decay in older adults using a novel hierarchical instantiation of a previously validated computational model. There were no similar changes in RL in this model. These results are extended using Magnetic Resonance Spectroscopy (MRS) to measure glutamate and GABA levels in striatum, prefrontal and parietal regions. They found that impairments in working memory were linked to reductions of glutamate in PFC, particularly in the older adult group.

      The task employed is elegant and has been studied extensively in different populations and is well-validated (though here a hierarchical Bayesian extension is developed and validated). The results however may not be definitive in some respects; the paper did not replicate previously observed RL deficits. It therefore, remains possible that this is due to the sensitivity of the task to this RL component in ageing and future work is needed to fully bridge the gap in the literature.

      Thank you for the comment. If our understanding of the comment is correct, our results suggesting no impairments in the RL system conflict with previously observed RL deficits in older adults. In the introduction section, we discuss previous literature on RL deficits in old adults which yields largely mixed conclusions, wherein some experiments show RL impairments (Frank and Kong, 2008; Hämmerer et al., 2011; Samanez-Larkin et. al, 2014) and some do not (Grogan et al., 2019; Radulescu et al., 2016). Placing our experiment in the context of these mixed results, we aimed to use a task that addresses these inconsistencies, by reasoning that commonly used RL tasks and models do not account for additional processes that may contribute to learning (e.g. executive function/WM/attention), hence explaining why sometimes the deficits are observed and sometimes they are not. We can also point to our model parameter recovery (Appendix 1 - Figure 9), where we show that RL model parameters (e.g. learning rate) are successfully recovered - indicating that our model is sensitive to RL variability in participants, but we observe no differences split across age groups.

      Although the study is well-executed, there is an obvious limitation in the use of a cross-sectional design to address this question. The authors acknowledge this limitation in the discussion but could go further to highlight the potential confound of cohort effects on gaming, RL and WM tasks more generally. Without within-person change data, the evidence can only be suggestive of potential age-related decline. For this reason, it may be more appropriate to use the terminology "age-related differences' rather than "age-related declines" given the study design.

      Thank you for the comment. We have attempted to address the cohort effects by administering RBANS to old and young participants. Age-normed total RBANS (Randolph et al., 1998) scores were similar in both age groups (described in the first paragraph of the results section), which we took to suggest that our cohorts reflected comparable samples of the population with respect to overall cognitive ability. In addition, we show that certain aspects of performance (e.g. accuracy) decline within the group of older adults, and not just between the two groups, which would constitute an argument against cohort-based effects. We now elaborate further on the point of cross-sectional design in the discussion section on lines 410-417. As suggested by the reviewer, we have also adjusted the language throughout the manuscript to imply age-related differences instead of age-related decline.

      Reviewer #2 (Public Review):

      In this study, Rmus and colleagues contribute to the important open question of whether reinforcement learning deficits observed in older adults are due to impairments in basic learning processes, or can be attributed to a decline in working memory function. The authors present cross-sectional behavioral data from a task designed to assess the role of working memory in reinforcement learning. And they use computational modeling in conjunction with MR spectroscopy to demonstrate a relationship between prefrontal glutamate and age-related impairments in learning specific to working memory decay. I found the overall story compelling, the data novel, and the analysis carefully executed. Below I outline some areas in which the claims of the manuscript could be strengthened.

      1) I may have missed this, but does glutamate correlate with other model parameters? Or did the authors only focus on the WM parameters because of the age difference? In support of the specificity argument, it would be important to show that glutamate only predicts WM related parameters regardless of whether there was an age difference or not.

      Thank you for your suggestion. In Appendix 1-figure 7, we show correlations between glutamate and all model parameters. If glutamate captured impairments in RL computational processes, we would expect to see a correlation between glutamate and the learning rate. Below we show that glutamate does positively correlate with RL learning rate. However, there are parameter correlations within the model itself – making the direct correlations hard to interpret.To better understand the relationships between learning rate, working memory, and glutamate, we ran a model predicting MFG glutamate using all parameters that significantly correlated with MFG glutamate (MFG glutamate ~ 1 + learning rate + decay + omega3 + negative learning rate), and found that only WM decay predicted MFG glutamate when controlling for other factors (learning rate: t = -0.42, β = -.03, p =0.67; WM decay: t = -3.14, β = -0.30, p = .002; omega3: t = 1.84, β = .16, p = .07; negative learning rate: t = .56, β = .03, p = .57). Thus, while glutamate measures correlate with RL learning rate, these correlations seem to be driven by the fact that both glutamate and RL learning rate correlate with WM Decay. Note that negative learning rate influences both RL and WM processes’ updating (see computational modeling section), and thus cannot help us make claims about specificity of RL or WM mechanisms alone being related to glutamate.

      2) As it is somewhat common with these tasks, it seems like the model does not fully capture the performance deficit in OA (Fig. 2B), even when all the individual difference parameters in WM are allowed to vary. Can the authors say more about the discrepancy? This is an interesting datapoint which may give clues to mechanism.

      Thank you for your comment. We elaborated on this in detail in the Appendix 1 (Posterior predictive checks section). We have observed that in some blocks (particularly in ns=6 blocks), older adults only learned a correct response for a subset of the presented stimuli, and neglected to learn responses to other stimuli altogether. We have interpreted this as a possible strategy older adults used to reduce the difficulty of the ns=6 condition. This would explain the discrepancy between the data and the model predictions, as the model has no way of accounting for stimulus identity effects on learning (since the model predicts similar performance for all the stimuli). To test our reasoning, we have fit the model to a subset of data - excluding participants who have implemented this strategy, and predicted that this should reduce the model misfit. We found that this is indeed the case (Appendix 1 - Figure 4). This confirms that strategic prioritization of stimuli in some older adults negatively affected the fit of the model. While we believe that a better understanding of these contaminant response patterns in the RL-WM model is worthy of further investigation, we feel that it is beyond the scope of this paper, and might require task designs with even higher set sizes to elicit the strategic stimulus prioritization more robustly. We have now added a paragraph in the discussion to discuss this issue.

      3) Relatedly, it may not be possible with these data alone, but can authors discuss what the WM decay parameter captures? In particular for OA, the distinction between generating and maintaining a "task set" has been extensively written about. Older adults tend to have difficulty internally generating and flexibly deploying task sets, but somewhat paradoxically can perform better than YA in certain decision situations (e.g. when reward is dependent on previous choices, see Worthy et. Al. 2011). The task in this study necessarily pushes OA in a regime in which relying on familiar decision strategies is sub-optimal, and task sets must be continuously generated. Is there a type of intervention do authors expect would reverse the observed deficit in WM?

      In the RLWM model, WM stores stimulus-action-outcome weights. Using WM decay we can gradually reduce the stimulus-dependent weights on each trial where the stimulus is not observed (e.g. forgetting). These weights, therefore, get reduced with the rate of decay, by being pulled towards the uniform/uninformative values (1/nA, where nA is the number of actions) they were initialized to. It effectively captures forgetting of information with increased time delays (here time = number of intervening trials between successive stimulus presentations where the stimulus is not observed). It is possible that older adults might be prioritizing storage of different types of (irrelevant) task information (e.g. category of stimuli, or relationships between the stimuli), resulting in a tradeoff that might lead to faster decay in older adults, and that the younger adults neglect such information. This could also explain discrepancies between our model and older adults described above, as the model does not hold any assumptions about how stimulus identities might impact task performance strategy. If this was the case, if probed about such task-irrelevant prioritized information older adults could potentially perform better than younger adults (in a way that in the Worthy et al. (2011) paper the older adults perform better on a choice dependent task compared to younger adults). We are unable to test this idea in our dataset, but we believe that it could be a promising avenue for future research.

      4) There is a wealth of evidence suggesting striatal DA loss in older adults, which served as the basis for many of the original investigations and hypotheses regarding a simple RL deficit in OA (e.g. work by Shu-Chen Li and others). While the authors do not directly measure DA in this study, it would be helpful to place the results in the context of that literature.

      Thank you for pointing this out. In the introduction, we have discussed the mixed results from research on RL/dopamine deficits in older adults. Some of the literature suggests no impairments in striatal dopamine in older adults (Samanez-Larkin et. al, 2014; Bäckman et al., 2006), while some suggests absence of impairments (Grogan et al., 2019). Furthermore, while DA is important for RL updating, it is also potentially important for WM updating (O’Reilly and Frank, 2006), therefore a potential DA loss could affect both RL and WM, and not RL exclusively. Prior research also suggests that although correlative relationship between DA and cognitive functions has been recorded, the extent of generality/specificity of the effects of DA on cognition in aging (Bäckman et al., 2006), compared to resulting noise that impairs cognition (Li et al.,2001) should be studied more extensively in the future. We have not focused on dopamine in the study, but have now added a paragraph in the discussion section to address this on lines 402-407.

      5) Finally, the main argument of the paper as I read it is that PFC glutamate mediates the performance deficits observed in RL because it reflects a compromised WM system. Sample size permitting, it would be helpful to see a formal test of this mediation relationship.

      As highlighted in the response to the mediation point in essential revisions, we observe that glutamate mediates effect of WM on task performance, but that this mediation approach might be difficult to justify, due to WM decay and task performance having shared signal and noise (since WM decay is estimated from task performance). We have now included the mediation analysis in our Appendix 1 information and provided a conservative interpretation of it in the results section.

      Reviewer #3 (Public Review):

      Aging impacts many cognitive functions, and how these changes affect performance in different tasks is an important question. By testing 42 older and 36 younger healthy adults with a novel learning task and MR spectroscopy, Rmus et al addressed the important question whether age-related declines in learning are driven by WM, or by deficiencies of the RL system. The task varied the role of working memory in learning by asking participants to learn about either 3 or 6 stimulus response associations from feedback (set sizes 3 and 6). The paper combines a detailed computational account of participants behaviour and striatal and prefrontal/parietal MR spectroscopy in order to assess individual glutamate and GABA levels.

      The authors report an effect of set-size on learning in both are groups, and show that participant age is associated with (1) worse accuracy, (2) a larger set size performance difference, and (3) a heightened sensitivity to reward. Computational modeling showed that working memory decay differed between age groups, but that reliance on WM to perform the task at hand was similar in both age groups (similarly differing between conditions in both groups). Turning to the MRS results, the paper shows that an aggregate measure of glutamate relates to aggregate task performance, that prefrontal glutamate specifically relates to WM decay observed in the task, and that age was negatively associated with glutamate levels.

      While the paper is well worth reading and offers many interesting data points, the title's suggestion that "Age-related decline in prefrontal glutamate predicts failure to efficiently deploy working memory in working memory" is, in my opinion, not fully supported by the evidence. First, the authors don't report clear evidence for any age-related differences in WM reliance in the task overall. Second, the authors find that MFG glutamate relates significantly only to WM decay, not the parameter that captures WM deployment. Third, correlations don't imply predictive relations.

      We apologize for the lack of clarity in our wording. We agree that the title of the paper implies that the reliance on WM parameter differentiates older and young adults, while the results show that the difference is mostly captured by the WM decay parameter. We meant to communicate that the age-difference seems to be particularly rooted in the WM, but have chosen misleading/confusing words. We have proposed changing the title of the manuscript to “Age-related differences in prefrontal glutamate are associated with increased working memory decay that gives appearance of learning deficits” to minimize confusion. With regards to your last point, as outlined in our response to essential revisions, we agree that we should modify the language used in our manuscript to be more consistent with the associative rather than predictive nature of our results.

      Another important open question relates to the relatively large age difference in the effect of set-size on performance. The authors write that working memory will contribute less to performance in higher set size conditions. Yet, age differences are largest in the set size 6 condition, suggesting that RL-dependent learning is most severely impaired in learning (set size 6 performance), rather than WM dependent learning (set size 3 performance). Finally, a statistically significant age difference in reward sensitivity seems to be hardly integrated into the authors' overall interpretation.

      Working memory does contribute less in higher set-size condition; however, given the higher number of items, the delays between successive presentations of the stimuli in the high set-size condition are on average longer - which makes the effect of WM forgetting more pronounced. Furthermore, a WM impairment can have an indirect effect in RL, in that frequent failure to select correct action through WM leads to reduced ability to train RL on encoding correct responses (especially earlier in training, when the incremental RL hasn’t ‘caught up’ yet), and thus worse performance overall. As such, a larger effect of set size could potentially be indicative of either or both WM or RL process deficits. This most clearly underscores the importance of modeling - these complex interactions are difficult to intuit, but modeling allows us to establish cleaner mechanistic explanations of observed behavioral patterns/group performance deficits (e.g. while on the surface impairment might look to be RL driven, it is actually better explained by a WM parameter, such as WM decay in older adults - this can). With regards to reward sensitivity, the same explanation applies - there are multiple mechanisms through which differences in reward sensitivity could occur (e.g. slower learning rate, or increased RL recruitment due to failure of WM), which further emphasizes the need for modeling.

      In short, in a complex task, there are often multiple ways to explain the same qualitative feature and here we have leaned on computational modeling to identify the computational elements that differed across groups. However we have now also simulated data from our computational models using posterior predictive checks to show that they can reproduce core descriptive features of the original data, including those noted above, and to examine the degree to which different features can be mapped onto the working memory decay parameter (Appendix 1 Figure 5).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents a thorough biochemical characterization of inferred ancestral versions of the Dicer helicase function. Probably the most significant finding is that the deepest ancestral protein reconstructed (AncD1D2) has significant double-stranded RNA-stimulated ATPase activity that was lost later, along the vertebrate lineage. These results strongly suggest that the previously known differences in ATPase activity between extant vertebrates and, for example, extant arthropods is due to loss of the ATPase activity over evolutionary time as opposed to gains in specific lineages. Based on their analysis, the authors also "restore" ATPase function in the vertebrate dicer, but they did so by making many (over 40) mutations in the vertebrate protein, and it is not clear which of these many mutations is required for the restoration of the activity. Thus, it is difficult to discern how the results of this experiment relate to the evolutionary history.

      We completely agree with this reviewer's assessment of our paper. Our Michaelis-Menten analyses raised the intriguing idea that loss of ATPase activity in the helicase domain of the vertebrate ancestor may indicate loss of the ability to couple dsRNA binding to formation of the active conformation. Our rescue experiments support this idea, albeit in future studies we hope to create an active ancestor with fewer amino acid changes. While the rescue experiments validate what these analyses told us, as the reviewer suggests, they do not themselves inform on the evolutionary history.

      A criticism of the paper is the authors' tendency (probably unconscious) to ascribe a purposefulness to evolution. For example, in the introduction, "We speculate that the unique role of the RLR's in the interferon signaling pathway in vertebrates...created an incentive to jettison an active helicase in vertebrates." Although this sentence is clearly labelled as speculation and "incentive" is clearly a metaphor, the implication is that evolution somehow has forethought. (There are other instances of this notion in the paper, for example, in the last line of the abstract). The author's statement also implies that the developing interferon system somehow caused the loss of active helicase, but it seems equally plausible that the helicase function was lost before the interferon system co-opted it.

      We agree with the stated critiques and have rephrased language that suggests that evolution is an active force. In addition to changing the last line of the abstract (page 2, line 35), and removing the quoted sentence from the Introduction, we have included a more nuanced discussion of the order of evolutionary events that may have preceded or followed the loss of helicase function in Dicer (page 18, lines 418-430)

      Reviewer #2 (Public Review):

      The manuscript by Aderounmu presents an interesting attempt to reconstruct evolution of the function of the helicase domain in ancestral Dicers, RNase III enzymes producing siRNAs from long double-stranded RNA and microRNAs from small hairpin precursors. The helicase has a role in long dsRNA recognition and processing and this function could have an antiviral role. Authors show on reconstructed ancestral Dicer variants that the helicase was losing dsRNA binding affinity and ATPase activity during evolution of the lineage leading to vertebrates while an early divergent Dicer-2 variant in Arthropods retained high activity and seemed better adapted for blunt ended long dsRNA, which would be consistent with antiviral function.

      The work is consistent with apparent adaptation of vertebrate Dicers for miRNA biogenesis and two known modes of substrate loading: "bottom up" dsRNA threading through the helicase domain where the helicase domain recognizes the end of dsRNA and feeds it into the enzyme and "top-down" where the substrate is first anchored in the PAZ domain before it locks into the enzyme. Some extant Dicer variants are known to be adapted for just one of these two modes while Dicer in C. elegans exemplifies an "ambidextrous" variant. The reconstruction of the helicase domain complex enabled authors to test how well would be ancestral helicases supporting the "bottom up" feeding of long dsRNA and whether the helicase would be distinguishing blunt-end dsRNA and 3' 2 nucleotide overhang. Although the reconstruction of an ancestral protein from highly divergent extant sequences yields just a hypothetical ancestor, which cannot be validated, the work provides remarkable data for interpreting evolutionary history of the helicase domain and RNA silencing in more general. While it is not surprising that the ancestral helicase was a functional ATPase stimulated by dsRNA, particularly new and interesting are data that the decline of the helicase function started already at the level of the common deuterostome ancestor and the helicase was essentially dead in the vertebrate ancestor. It has been reported two decades ago that human Dicer carries a helicase, which has highly conserved critical residues in the ATPase domain but it is non-functional (10.1093/emboj/cdf582). Recently published mouse mutants showed that these highly conserved residues are not important in vivo (10.1016/j.molcel.2022.10.010). Aderounmu et al. now suggest that Dicer carried this dead ATPase with conserved residues for over 500 million years of vertebrate evolution.

      I do not have any major comments to the biochemical analyses and while I think that the ancestral protein reconstruction could yield hypothetical sequences, which did not exist, I think they represent reasonable reconstructions, which yielded data worth of interpretations. My major criticism of the work concerns clarity for the readership and interpretations of some results where I wish authors would clarify/revise the text. The following three examples are particularly significant:

      1) It should be explained to which common ancestor during metazoan evolution belongs the ancestral helicase AncD1D2 or at least what that sequence might represent in terms of common ancestry during metazoan evolution.

      We thank the reviewer for bringing this issue to our attention, and we have now included a brief discussion of the complexity in identifying AncD1D2’s exact position in metazoan evolution (page 6, lines 124-134). Our maximum likelihood phylogeny is constructed from Dicer’s helicase and DUF283 subdomains which evidently do not contain enough phylogenetic signal to resolve the finer details of early metazoan evolutionary events surrounding the divergence of non-bilaterians: Porifera, Ctenophora, Cnidaria and Placozoa. In our tree, Cnidaria even diverges later than the Nematode bilaterian branch reflecting the fact that our reported phylogeny does not match consensus species relationships, especially in the invertebrate clades. This means we cannot pinpoint AncD1D2’s exact position with certainty. While we do not intend to overinterpret the evolutionary trends from these hypothetical ancestral constructs, we believe the functional differences in biochemical activity are meaningful and correspond to big-picture changes over evolutionary time. AncD1D2 thus corresponds to some early metazoan ancestor that existed before the divergence of bilaterians from non-bilaterians. In support of this interpretation, when the phylogeny is constrained such that the bilaterian branches match the consensus species tree (Figure 1-figure supplement 2A) we observe that AncD1D2 is ancestral to the bilaterian ancestor, AncD1BILAT (now labeled on the figure), but retains 95% identity to the version of AncD1D2 constructed from the maximum likelihood phylogeny (Figure 1-figure supplement 3B).

      2) This is linked to the first point - authors work with phylogenetic trees reconstructed from a single protein sequence, which are not well aligned with predicted early metazoan divergence (https://doi.org/10.1098/rstb.2015.0036). While their sequence-based trees show early branching of Dicer-2 as if the two Dicers existed in the common ancestor of almost all animals (except of Placozoa), I do not think there is sufficient support for such a statement, especially since antiviral RNAi-dedicated Dicers evolve faster and Dicer-2 is restricted to a few distant taxonomic group, which might be better explained by independent duplications of ambidextrous ancestral Dicers. I would appreciate if authors would discuss this issue in more detail and make readers more aware of the complexity of the problem.

      We agree with the reviewer that in our initial submission we did not properly address the incongruence between our maximum likelihood phylogeny and the consensus species tree of life. We have now addressed this by revisions that discuss the difficulty in using a single gene or protein to accurately date ancient evolutionary events, especially in the case of Dicer, a protein whose evolutionary history is littered with multiple duplication events (page 6, lines 124-147, beginning with “Importantly, we observed multiple instances…”; page 16, lines 365-371, sentence beginning with “Uncertainty in the single gene or protein phylogeny…”). Our assumption that an early gene duplication produced the arthropod Dicer-2 clade is consistent with previous Dicer phylogenies that have been constructed with maximum likelihood algorithms with different parameters (https://doi.org/10.1371/journal.pone.0095350, https://doi.org/10.1093/molbev/msx187, https://doi.org/10.1093/molbev/mss263) using full length Dicer sequences with different taxon sampling depths and tree construction parameters. Removing other fast evolving taxa with long branch lengths from the sequence alignment still resulted in arthropod Dicer-2 branching out early in metazoan phylogeny (https://doi.org/10.1093/molbev/mss263).

      In analyses not included in our manuscript, we also independently constructed trees using full-length metazoan Dicers, helicase and DUF-283 subdomains using both RAXML-NG and MrBayes. We tried different taxon sampling depths and tried rooting the tree using either a non-bilaterian outgroup or a fungal outgroup and also tried breaking up potential long-branch attraction with deep taxon sampling. In every iteration, the arthropod Dicer-2 clade diverged early in animal evolution at some point before or during non-bilaterian evolution. We recognize that all these efforts are still prone to long-branch attraction that may cause the rapidly evolving Dicer-2 clade to artificially cluster with distant outgroups, but so far, the only evidence to support an arthropod-specific duplication event is parsimony. This parsimony model is plausible and one might expect a recently duplicated arthropod Dicer-2 to cluster closely with nematode Dicer-1, another antiviral Dicer that would have descended from a common ecdysozoan ancestor but this is not the case. The nematode HEL-DUF clade does get attracted to non-bilaterian Cnidaria clade in our ML tree, but unlike the arthropod Dicer-2 clade, this position varied depending on the parameters of phylogenetic analysis, and so we cannot conclude that arthropod Dicer-2’s position is due to long branch attraction. More sophisticated phylogenetic and statistical tools are needed to answer this question definitively, so we decided to proceed with the highest scoring maximum-likelihood phylogeny generated by our analysis.

      While we have now included a short discussion on the nature of this uncertainty in the revised manuscript (page 6, line 124., page 16, lines 365-371), we have excluded these additional details (paragraph above) from the main text in an attempt to prioritize readability for the generalist reader, and we hope that more specialized readers will find this discussion in the public comments helpful.

      3) Authors should take more into the account existing literature and data when hypothesizing about sequences of events. Some decline of the helicase activity is apparent in AncD1DEUT suggesting that it initiated between AncD1D2 and AncD1DEUT. This implies that a) antiviral role of Dicer was becoming redundant with other cellular protein sensors by then and b) Dicer was already becoming adapted for miRNA biogenesis, which further progressed in the lineage leading to vertebrates to the unique top-down loading with the distinct pre-dicing state where the helicase forms a rigid arm. Authors even cite Qiao et al. (https://doi.org/10.1016/j.dci.2021.103997) who report primitive interferon-like system in molluscs - this places the ancestry of the interferon response upstream of AncD1DEUT and suggests that this ancestral protein-based system was taking over antiviral role of Dicer much earlier. In fact, a bit weaker performance of AncD1LOPH/DEUT combined with the aforementioned interferon-like system and massive miRNA expansion in extant molluscs (10.1126/sciadv.add9938) suggests that molluscs possibly followed a convergent path like mammals. While I am missing this kind of discussion in the manuscript, I think that the model where "interferon appears ..." in AncD1VERT (Fig. 6) is incorrect and misleading.

      This comment is similar to others, including point 3 of Essential revisions, and we have revised our model in Figure 6 accordingly. We agree with the reviewer that we did not sufficiently explore the significance of the decline in Dicer helicase function between AncD1D2 and AncD1DEUT. In addition to the changes noted in point 3 of Essential revisions, we have corrected this by adding or modifying sentences in the Results (page 9, sentence beginning on line 197 “This reduction in ATP hydrolysis efficiency prior to deuterostome divergence may have coincided with…”, and page 11, sentence beginning on line 247 “One possibility is that between AncD1D2 and the deuterostome ancestor…”).

      We did not intend to suggest that this loss of Dicer helicase function was unique to vertebrates, but we focused on the deuterostome-to-vertebrate transition for the following reasons:

      a) The mollusk clade in our analysis is incongruent with its expected species position as a protostome. In our tree it clusters with deuterostomes instead. On one hand, this is probably an artefact of incomplete lineage sorting or long branch attraction. On the other hand, it is possible that this clade’s position is an underlying signal of the convergent evolution proposed by the reviewer. In support of the latter, some extant mollusk Dicer helicases (ACCESSION: XP_014781474, ACCESSION: XP_022331683) show a loss of amino acid conservation in Dicer’s ATPase motifs implying that extant mollusks have also lost Dicer helicase function like vertebrates. However, this is in contrast to vertebrate Dicer helicase where loss of function exists, but ATPase motifs remain conserved. We do not discuss this in the paper because the evidence remains inconclusive until extant mollusk Dicers can be functionally characterized, similar to Human Dicer and Drosophila Dicer-1, to determine that they are truly specialized for miRNA processing to the detriment of helicase function.

      b) Caenorhabditis elegans Dicer is an example of an ambidextrous Dicer, that processes both miRNAs, with the top-down mechanism, and viral dsRNAs, with the bottom-up mechanism. Recently, work has been published that suggests that C. elegans also possesses a protein-based innate immune defense mechanism, but instead of competing with the RNA interference mechanism, both mechanisms seem to work in concert and even share a protein in both pathways: DRH-1, a RIG-I-Like receptor homolog (https://doi.org/10.1128/JVI.01173-19). Furthermore, a protein-based pathway has also been reported in Drosophila and in this scenario Drosophila Dicer-2 is the dsRNA sensor that is common to both pathways (https://doi.org/10.1371/journal.pntd.0002823). This collaboration observed in ecdysozoan invertebrates is different from the competition that has been well established in vertebrates. More data is needed to understand whether a model of competition or collaboration exists in lophotrochozoan invertebrates like mollusks.

    1. Author Response

      Reviewer #1 (Public Review):

      VO2max is one of the most important gross criteria of peak performance ability and a plethora of studies focused on VO2max prediction. This manuscript provides huge and comprehensive data from male runners and male cyclists. The endurance-trained athletes performed cardiopulmonary exercise testing on a treadmill (n= 3330) or cycle ergometer (n=1094). In contrast to former studies, the authors used machine learning for algorithms and VO2max prediction. Models were derived and internally validated with multiple linear regression. The present study substantially expands current research.

      Sadly, the manuscript has an important and relevant main shortcoming as the limitations of the study had not been addressed properly:

      • The authors paid no attention to the fact that their results are strongly influenced by the exercise protocol used. It is obvious e.g. that maximal performance attainable in protocols with 2-minute exercise steps will be higher compared to an identical protocol with 3- or 4-minute steps.

      • The exercise intensity was kept constant for only 2 minutes before the workload was increased (by 1km/h treadmill or by 20-30 W cycle ergometer). Due to the kinetics of lactate, VO2, etc., it is evident that the short 2-min intervals aggravate the correct determination of aerobic and anaerobic threshold. It is well-known that longer-lasting constant exercise steps (e.g. 4 minutes) are better when the focus is centered on threshold determinations.

      The quality of this manuscript will be substantially improved when the authors could implement a comprehensive and blunt paragraph showing the limitations of their study.

      We have completed our manuscript by indicating its limits as recommended. It is reasonable to suspect that the type of protocol used matters in the cardiorespiratory indices obtained. Interestingly, according to available studies, this effect is more pronounced for the determination of cyclists' threshold power output or runners' treadmill running speed than for threshold and maximum cardiorespiratory indices such as VO2max or Hrmax (Silva et al. 2021; Weston et al. 2002; Vucetić et al. 2014).

      In the regression models presented, the main explanatory variables with the largest effect on the prediction value are the AT/RCP threshold VO2 values (rVO2RCP; rVO2AT). The coefficients for the other explanatory variables are relatively low and differences in their values due to the use of potentially different protocols appear to be marginal. Nevertheless, we see the possibility of worsening the prediction when using less suitable testing protocols for athletes such as ramp tests or typically clinical tests such as the Bruce test.

    1. Author Response

      Reviewer #1 (Public Review):

      This study represents an important work in the field of (CAR)T-cell immunotherapy by analyzing the effect of different oxygen tension on the function and differentiation of T-cells (especially CD8+). Although it has been described that low oxygen levels can influence effector function/differentiation of T-cells, as nicely acknowledged by the authors in the introduction, a comprehensive analysis in the context of immunotherapy has been missing so far and this study adds significant findings that will be relevant for patient care in all fields applying (CAR)T-cell immunotherapy.

      The strength of the evidence is generally solid although there are some discrepancies between the different ways to induce HIF-1α (i.e. low O2, pharmacological inhibition, shRNA knockdown) that need to be clearly stated and/or discussed.

      1) The first section of the results determines the impact of low oxygen and pharmacological HIF-1α stabilization on CD8+ T-cell activation/differentiation. Low oxygen diminishes cell growth but induces T-cell activation and effector cytokines, while HIF-1a stabilization mimics the effects on activation without alterations in expansion. Unfortunately, it remains unclear why effects upon low O2 are more pronounced although pharmacological HIF-1a stabilization is more efficient.

      2) As a next step, in vitro conditioned T-cells are transferred into a subcutaneous B16-OVA model. Although only the low O2 levels increase T-cell numbers in vivo after the transfer, the initial tumor burden was nicely decreased by both low O2 and HIF-1a stabilization. However, only the latter significantly improved survival and it remains unclear and uncommented why.

      3) Next, the authors address whether pre-conditioning of human CART-cells to induce HIF-1α either by pharmacological stabilization or by silencing of VHL shows similar effects. Surprisingly, both ways of HIF-1a stabilization resulted in different effects concerning differential gene expression and cytotoxic capacity of CART-cells. Accordingly, pharmacologically pre-conditioned CART-cells did not have a significant impact on survival in an in vivo model, while the VHL-silenced ones did significantly improve animal survival. This discrepancy between the two modes of HIF-1a stabilization remains uncommented. Unfortunately, it also remains unclear why the pharmacological HIF-1a stabilization significantly improved the survival in animals of the B16-OVA model and not in the human CART-cell model.

      4) After this, the researchers determine how the timing of hypoxic conditioning affects the (CAR)T-cells. Here it is convincingly shown that already a short period of hypoxic conditioning (1 day) with a subsequent expansion phase (additional 6 days) is sufficient to induce HIF-1a mediated alterations (e.g. metabolic changes, calcium flux, intracellular signaling). Although this section is coherent in itself, the switch between different times of hypoxic conditioning, expansion, and analysis is difficult to follow and might lead to confusion. The expression pattern of e.g. HIF-1a on day 1 and day 7 together with the nuclear amounts of NFAT and c-Myc might be misunderstood, like the other presented data as well.

      5) Last, short-term hypoxic conditioning of CART cells is tested in a solid tumour mouse model. The previously identified conditioning protocol also increases CART-cell function against solid tumours (as shown by enhanced cytotoxicity, reduced tumour burden, and prolonged survival). Unfortunately, although both HER2-CART-cells and CD19-CART-cells are shown to have superior cytotoxicity in vitro after the pre-conditioning, only HER2-CART-cells are demonstrated to be superior upon low O2 conditioning in an in vivo adoptive transfer mouse model and CD19-CART-cells remain an open question.

      Generally spoken, the limitations of the manuscript are:

      1) The occurring discrepancies of determining effects caused by the different modes of Hif-1a stabilization which certainly are caused by the complex nature of Hif-1a regulatory network, and;

      We now extend our observations and discuss these concerns more extensively in the manuscript.

      2) The limitation of detected effects primarily on CD8+ T cells while CART-cells products usually are a mixture of CD4+ and CD8+ ones.

      Figure S6H now shows that the effects of shorter periods of low oxygen conditioning obtained with CAR-T cells generated from isolated CD8+ T cells are reproducible in CAR-T cells generated from PBMCs. We have found that a 24h incubation of PBMC-derived CAR-T cells in 1 %O2 increases cytotoxicity against target cell effector differentiation at day 7, when compared to the cytotoxic effects of cells cultured at 21% oxygen levels.

      Reviewer #3 (Public Review):

      In this study, Cunha et al. examined the role of different oxygen tensions (21%, 5%, and 1% O2) and HIF-1α stabilisation in regulating murine and human CD8+ T cell proliferation and function. The authors find that hypoxia (1% O2) and pharmacological PHD inhibition with FG-4592, enhance murine T cell activation but impair proliferation. Furthermore, adoptive cell transfer (ACT) therapy of CD8+ T cells from both conditions reduced tumour burden in a B16-OVA melanoma model. Short hypoxic conditioning (1% O2) of human CD8+ T cells for 1 day increased HIF-1α stabilisation, with increased activation, glycolysis, and mitochondrial function still observed following 6 days of normoxic cell culture. Short hypoxic conditioning of HER2 and CD19 CAR-T cells improved their activation and cytotoxicity in vitro, while HER2 CAR-T cell counts were increased in vivo, reducing tumour burden, and increasing survival when compared to 21% O2.

      Strengths:

      The paper convincingly demonstrates that short hypoxic conditioning in a defined window improves CAR-T cell function through in vitro cytotoxicity assays and following adoptive transfer in a preclinical HER2+-SKOV3+ positive tumour model. Thus, the major conclusion of the paper is mostly well supported by the data and could represent a novel strategy to improve CAR-T cell immunotherapy for solid tumours in the future.

      Weaknesses:

      The extent to which hypoxic conditioning-mediated improvement in CAR-T cell function is dependent on HIF-1-driven metabolic reprogramming is unclear and other potential mechanisms are not explored. 5FG-4592 and VHL silencing in HER2 CAR-T cells did not phenocopy each other faithfully. In addition, neither approach was as effective as short hypoxic conditioning with 1% O2 in improving CAR-T cell function in vitro or in vivo. Although the authors suggest the temporal dynamics of HIF-1α stabilisation is the key point, this is not convincingly proven, and no metabolic characterisation of these CAR-T cells was performed.

      The revised manuscript now includes live metabolic analyses in a Seahorse set up, using T cells following FG-4592 treatment or VHL silencing. We found exposure of human CD8+ T cells to FG-4592 leads to a suppression of their oxygen consumption rates, both at basal and maximal levels. This can underpin the observed reduced expression of effector molecules (PMID: 33398183). Treatment of human T cells with FG-4592 resulted in a dose-dependent reduction of in vitro cytotoxicity, similar to that observed with exposure to low oxygen (e.g., 7 day OT-I expansion in 1%O2 impairs antitumour function [Figure supplement 6L]).

      Regarding VHL silencing, we did not observe metabolic differences compared to controls. This might arise from the fact that shVHL vectors only caused an overall 30% reduction in VHL protein expression, and that the silencing occurred after T cells had been activated. As we show, the moment of activation is key for T cell differentiation and function, and this could explain the lack of metabolic differences between shNCT and shVHL-expressing cells. These points are now added to 5th paragraph of the Discussion section.

      It is unclear how changes elicited during short hypoxic conditioning are maintained following continued normoxic cell culture. Hypoxia is known to rapidly regulate histone methylation and chromatin structure in a HIF-independent manner (PMID: 30872525; PMID: 30872526). Are similar epigenetic changes observed in T cells, and if so, could these epigenetic changes underlie improved T cell activation?

      We thank the reviewer for the insightful comment on potential epigenetic changes observed in T cells cultured in hypoxia. We have now carried out an extensive analysis of histone methylation and acetylation (Figure 4H). Human CD8+ T cells cultured for 1 day in 1% and 6 days in 21% showed decreased acetylation of H3K9 and H3K27, reduced trimethylation of H3K4 and H3K27 and increased methylation of H3K9me2, as compared to the levels of cells continuously grown in ambient oxygen. These differences might underpin the altered differentiation and metabolic shifts of 1% cultured T cells and further indicate that the oxygen tensions during the first 24 hours of activation elicit permanent alterations in T cells. Future work will be dedicated to understanding the link between the observed alteration in histone post-translational modifications and T cell function in response to hypoxia.

      Complications may also arise when comparing different oxygen tensions given recent data that suggests standard cell culture conditions can lead to local hypoxia through a combination (https://www.biorxiv.org/content/10.1101/2022.11.29.516437v1) of cellular respiration and poor O2 diffusion. Although it is unclear how this will impact suspension T cells it does beg the question as to whether HIF-1α stability following T cell activation is (at least in part) mediated by pericellular O2 limitations in cell culture over time, even in presumed hyperoxic (21% O2) conditions? Or if T cells subsequently cultured at 21% O2 following short hypoxic conditioning (1% O2) still experience local hypoxia during the 6-day culturing protocol? It would be important to assess this in future work and at least discuss these potential weaknesses.

      Upon analysing HIF-1α accumulation on day 7, we only found substantial HIF levels in cells that had been in low oxygen tensions for the last 3 days of culture (Figure S4G). This suggests that cells were not experiencing hypoxia at the time of analysis on day 7, given that we did not observe substantial HIF accumulation. We have additionally designed an experiment where 21% and 1% 1 day T cells were cultured for 7 days with a single media change on day 4 (standard) or with 5 media changes (each media change performed on separate days to minimize local hypoxia in ambient oxygen). Regardless of the number of media changes, 1% 1d cultures showed increased effector differentiation and expression of effector molecules, relative to 21% cells (Figure S4H). We also did not observe any differences between control cells cultured with 1 or 5 media changes. As hypoxia elicits changes in T cell differentiation, this suggests cells do not experience local hypoxia during the phase of ambient oxygen expansion. Nevertheless, we very much agree that it is important to accurately assess oxygen concentrations in cell culture media.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors provide evidence for chromatin, which in Drosophila muscle cells is peripherally localized in the nucleus, whereas the central region is depleted of chromatin, and is organised such that RNA polymerase II (RNAp) is surrounding dense regions of chromatin. The authors theoretically study the formation of these regions by describing chromatin as a multi-block copolymer, where the blocks correspond to active and inactive chromatin regions. These regions are assumed to phase separately and to have different solvability. The solvability of the active region is regulated by binding RNAp. The authors study the core-shell organization in a layered geometry by analyzing the various contributions to free energy. In this way, they in particular obtain the dependence of the shell-layer thickness, which is described as a polymer brush. From these results, they infer chromatin organization in spherical coreshell chromatin domains and compare these results to Brownian dynamics simulations.

      The work is well done and even though it uses standard methods for studying block copolymers and polymer brushes obtains interesting information about local chromatin organization. These findings should be of great interest to researchers in the field of chromatin organization and in general to everybody interested in understanding the physical principles of biological organization.

      The work has two main weaknesses: The experimental evidence for RNAp and chromatin microorganization is weak as only one example is shown. It remains unclear whether the observed organization pattern is common or not. Also, no data is shown concerning the dependence of the extensions of the active and inactive phases on parameters, for example, solvent properties or transcriptional activity. Second, some parts could prove difficult for biologists to assess. For example, the expression for the brush-free energy should be explained in more detail and notions like that of 'mushrooms' need to be introduced. As a second example, biologists might benefit from a better explanation of the concept of a theta solvent and its relevance.

      We thank Reviewer #1 for the positive review and critical feedback. Below we answer the points raised in the last paragraph of its review.

      In the original version of the manuscript we only showed a representative image of nuclei of muscle cells in an intact, live Drosophila larvae. Notably, this organization is representative of many nuclei analyzed in muscle tissue. In the revised version we show that in a distinct tissue, e.g. salivary gland epithelium of live Drosophila larvae, RNA Pol II distribution is similarly facing the nucleoplasm, although chromatin condensation differs due to higher DNA ploidy. The new images were added as Supplement information (Fig A1). Since these representative images are the main motivation behind our theoretical analysis, we think that including them will help the reader in understanding the relevance of our minimal model. The effect of different biological perturbations, such as changes in the repressive marks and how these change the core-shell structure require extensive experiments that are outside the scope of the present paper. We also note, that in live organisms (not just live cells) such as those studied here, one can only reliably use genetic perturbations; solvent quality is regulated by the organism and cannot be controlled as in synthetic polymer experiments. Our main focus in the present paper is to highlight an area that has been relatively unexplored by the chromatin organization community, which is how changes in concentrations binding-partners of chromatin may have a strong effect in nuclear architecture.

      We have also improved the explanation of the physical concepts for biologists. We added a more thorough explanation of the concept of a polymer brush and explained more clearly what the concept of theta solvent in terms of the scaling properties of a polymer in solution. We quote these revisions below.

      Reviewer #2 (Public Review):

      This work formulates a detailed theoretical polymer physics model intended to explain the observed morphology of chromatin in the Drosophila cell nucleus. The model is examined in detail by both analytical calculation and computer simulation. The central premise of the suggested theory is that it is again based on equilibrium statistical mechanics. Within this paradigm, authors explore the model that views chromatin fiber as a block copolymer and, most importantly, describes the role of RNA polymerase as it interacts with one of the copolymer blocks and regulates its effective solvent quality. Blocks are assumed to be fixed on the time scale of interest by, e.g., different levels of acetylation or methylation. RNA polymerase is supposed to interact only with one of the chromatin blocks, called active, and assumed interaction is quite peculiar. Namely, RNA polymerase complex may absorb on chromatin fiber and, the model assumes, the fiber decorated with absorbed RNA polymerase molecules is less sticky to itself, or more repulsive than the fiber itself. This peculiar assumption allows authors to make interesting predictions about how proteins can regulate the genome folding architecture.

      We thank the reviewer for the positive and critical feedback. We agree that our assumption of changes in the effective solvent stemming from protein complexes binding to chromatin is at the core of our analysis and we justify it further below.

      STRENGTH

      The work includes a rather detailed theoretical description of the model and its equilibrium statistical mechanics. As both analytical theory and accompanying simulation indicate, the assumptions put forward in formulating the model do indeed produce the desired morphology, with isolated regions ("micelles") of core inactive chromatin surrounded by the less dense shell region in which RNA polymerization may potentially take place. Having such a detailed theory is potentially beneficial for the field and opens up avenues for further exploration.

      We thank the referee for appreciating the potential benefit of our minimal theory of solvent-quality regulation by binding processes.

      WEAKNESS

      The underlying assumption about the interaction of RNA polymerase complex with the fiber, although important and organic for the model, does not seem easy to justify from a molecular standpoint, especially thinking of the charges and electrostatic interactions.

      We visualize that the binding of RNA Pol II (mediated by different transcription factors) to chromatin is also associated with larger protein complexes that may contain hydrophobic and hydrophilic components, such as pre-initiation complexes. Some regions of these complexes might associate directly with chromatin due to positive charges on the surface of the Pol II complex , whereas the hydrophilic negative regions may be directed towards the solvent. Our theory is typical of the approach used in polymer physics where coarse-grained interactions are considered. While the origin of hydrophilic interactions lies in electrostatics, such interactions are highly screened in cells (typically 200 mM concentration of salts) and can be considered as short-ranged and competitive with hydrophobic interactions. Chromatin in solution is known to condense (see Gibson, et. al., Cell 2019 and Strickfaden, et. al., Cell 2020) and even phase separate from the nucleoplasm (see Amiad-Pavlov, et. al., Science Advances, 2021); this can arise either from hydrophobic interactions of the histone tails or from opposite charge attraction of the histones and linker DNA. In our model, this competes with the binding of protein complexes which then disrupt the self-attraction of chromatin. Previous work has shown that RNA Pol II associating with chromatin (in the absence of transcription) prevents the coarsening of dense chromatin domains (see Hilbert, et. al. Nat. Comm. 2021), which agrees with our modeling of protein complexes that bind to chromatin and interfere with its condensation; in addition, the binding of Pol-II and all its binding partners that form the pre-initiation complex (see Hahn, Nat. Struct. & Mol. Biol. 2004, 11) will result in effective, steric repulsion between different active and Pol II bound chromatin domains. Another interesting observation is that most of the surface of RNA Polymerase II is negatively charged with a few positively charged patches with which it specifically interacts with DNA while others serve as exit paths of RNA (see Cramer, et. al., Science, 2001.). We agree that a more thorough analysis of the molecular interactions between what we name protein complexes and chromatin is interesting, but it is out of the scope of our paper that uses a coarsegrained, polymer physics approach. This approach also allows our model to be to be predictive as to the physical organization and growth of the domains, independent of those molecular details that are as yet unknown.

      Reviewer #3 (Public Review):

      This theoretical study provides a theoretical explanation for a puzzling question arising from recent experiments: How can chromosomes behave like polymers collapsed in a poor solvent but also contain "open" active chromatin sections? The authors propose that the binding of proteins (e.g. RNAP's) to the active sections can effectively change the solvent quality for these sections and thus open them. They suggest further that chromosomes show micellar structures with inactive blocks forming the cores of the micelles. Protein binding causes swelling of the micellar shells which affects the whole chromosome structure by changing the total number of micelles. This theory fits well to live imaging data of chromatin in Drosophila larvae, like the one shown in the striking Figure 1.

      The manuscript is written very clearly.

      My only suggestion is that the authors, in both the theory and simulation parts, are more explicit about how the interactions between the various components are modeled. From what I could see, in the theory part, one needs to look closely at Eq. 5 to understand how the influence of the binding of proteins affects the interaction between active monomers, and in the simulation part, one needs to go to the appendix to learn that interaction strengths between monomers within the active blocks and monomers within the inactive blocks have different values. The latter is crucial to understand the micellar structure shown at the top of Fig. 5A.

      We thank the reviewer for his positive response. We have explained Eq. 5 more carefully now and included other explanatory remarks throughout the text. We also explained more clearly the interactions considered in the simulations. Below we answer point by point and add quotes from the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Marchal-Duval et al studied the role of Prrx1 in lung fibroblasts. Prrx1 is a transcription factor expressed in lung fibroblasts but not in other cell types. The authors showed that Prrx1 gene expression was enhanced in IPF patients. Immunohistochemistry in IPF tissue suggested that Prrx1 was expressed in fibroblasts in fibroblastic foci. The authors then showed that Prrx1 expression was regulated by TGF-b1 stimulation or stiffness of substrate by in vitro experiments using primary human lung fibroblasts from either normal or IPF lungs. The authors also showed that Prrx1 regulated fibroblast proliferation and TGF-b signaling by regulating PPM1A and Tgfbr2 expression. Finally, the authors revealed that Prrx1 knockdown suppressed fibrosis in bleomycin-induced fibrosis or PCLS. This manuscript identified novel molecular roles of Prrx1 in fibroblast activation, which is expressed in not only lung fibroblasts but also in other injured or developing organs. To support the idea that Prrx1 plays a critical role in lung fibrosis, however, some discrepancies between in vitro and in vivo data need to be clarified.

      Comment #1. Although the authors showed that Prrx1 knockdown in primary fibroblasts reduced Smad2/3 phosphorylation, the reduction of Acta2 or Col1a1 after Prrx1 knockdown and TGF-b1 stimulation was not impressive (Fig. S6), suggesting that the inhibition of TGF-b signaling by Prrx1 knockdown is only partial. In contrast, Prrx1 knockdown by ASO in bleomycin-induced fibrosis showed remarkable fibrosis suppression (Fig. 6, 7). Admittedly there are differences in models and nucleotides used, but this discrepancy needs to be addressed.

      We agree with the reviewer that Prrx1 inhibition only partially affects the upregulation of ACTA2, but this effect was significant (around 50% inhibition at the protein level). As stated in the discussion (lines 569-572), our data show that key ECM proteins such as Collagen 1 and Fibronectin were still upregulated in TGF-1 stimulated lung fibroblasts transfected with PRRX1 siRNA, whereas TNC and ELN mRNA expression levels were perturbed. These findings suggest that broader phenotypical changes are associated with Prrx1 knockdown. Notably, we also observed that Prrx1 inhibition impacted cell proliferation in vitro. We believe that the observed suppression of fibrosis in bleomycin treated mice following Prrx1 knockdown by ASO is the result of both the partial inhibition of TGF-β1 effect and the decrease in mesenchymal cell proliferation. Supporting this hypothesis, we observed a decrease in PDGFR-positive cell proliferation in Prrx1 ASO-treated animals (see comment #4 hereafter).

      Comment #2. Fig.6 and 7 lack control groups, where mice are treated with PBS instead of bleomycin and treated with either control ASO or Prrx1 ASO.

      As stated in the revised version of the material and method (line 683-686), the knockdown efficiency of Prrx1 ASO and lack of effects of control ASO were first validated in naive mice, which were treated with either Prrx1 ASO or control ASO, compared to PBS-treated mice (see Figure R2 in the answer to comment #11 of reviewer 2). Those groups were not repeated / included in the first set of bleomycin experiments in order to comply with institutional regulation to limit animal usage. In the first set of experiments (Prrx1 ASO treatment between day 7 and day 13 after bleomycin insult), the saline + PBS was just used to confirm fibrosis development while the bleomycin + Control ASO was the proper control of the bleomycin + Prrx1 ASO group. In the new second set of experiment (ASO treatment between day 21 and day28 suggested by reviewer #2), we were authorized by our local animal ethical committee to include a control ASO group in the saline treated group to confirm that the lack of effect of these control ASO compared to the PBS group (see new Figure 7-figure supplement 1).

      Comment #3. In Fig. 6F, the hydroxyproline content is shown with ug collagen/ug protein. Total protein in the lung is influenced by infiltration of hematopoietic cells, which are the major population in injured lungs by cell count. Fibrosis should be ideally assessed as ug hydroxyproline/lung (or lobe).

      We completely agree with the reviewer that hydroxyproline content should ideally be assessed by lobe/lung. As stated in the revised material and methods (lines 882-885), hydroxyproline and protein contents were measured using paraffin lung sections (15 sections of 10µm per sample) with the Quickzyme Biosciences hydroxyproline assay and total protein assay kits; due to limited material access and to refine its use to limit animal usage. Furthermore, the infiltration of hematopoietic cells would rather undermine the effect of Prrx1 ASO (less fibrosis and inflammation) since the contribution of those cells would be higher in control ASO-treated bleomycin mice. Considering the reviewer’s concern, a complete lobe was used to measure hydroxyproline content in the new set of experiments generated during the revision of the manuscript (see new Figure 7-figure supplement 1).

      Comment #4. Major proliferating populations in bleomycin-treated lungs are not mesenchymal cells but epithelial/endothelial/hematopoietic cells. Mki67+ cells (Fig. 7D) need to be identified by co-staining with mesenchymal markers if the authors claim that Prrx1 knockdown suppresses fibroblast proliferation in vivo.

      We agree with the reviewer that epithelial/endothelial/hematopoietic cells are the main proliferating populations in bleomycin treated animals at day 14. As suggested by the reviewer, we performed a MKI67 / PDGFR co-staining to identify proliferating mesenchymal cells and confirmed a decrease in proliferation in these cells after Prrx1 knock down in bleomycin treated mice (see lines 448-451 and Figure 6-figure supplement 3).

      Comment #5 Bleomycin-injured lungs or IPF tissue are patchy and mixed with normal and abnormal areas. Therefore, how areas of interest are chosen for histological quantifications (Fig. 6C, S14D) need to be described in the methods section.

      As now stated in the revised material section (lines 864-866), areas of interest were chosen according to the presence of major alveolar thickening as well as fibrous changes and masses (confirmed by picrosirius staining on serial section).

      Reviewer #2 (Public Review):

      The paper from Marchal-Duval et al reports for the first time the important role played by the transcription factor PRRX1, expressed specifically in the mesenchyme of the lung, in the context of fibrosis. The authors used a combination of human (Donor and IPF) and mouse lungs (saline and bleomycin treated) as well as associated fibroblasts and PCLS to test the functional role of PRRX1 in the context of proliferation and differentiation induced by TGFb1. The work is supported by an impressive amount of data (7 main figures and 14 supplementary figures).

      Comment #1: A main weakness in this work is the counterintuitive result that PRRX1 is downregulated in human lung fibroblasts (from both IPF and Donor) treated with TGFb1.

      We agree with reviewer that PRRX1 downregulation upon TGFb1 treatment may appear counterintuitive. First, as stated in the manuscript, this inhibitory effect is partial. Secondly, we performed additional experiments in the revised manuscript to better understand (timewise) the downregulation of PRRX1 in response to TGF-b1 in lung fibroblast as suggested by the reviewer. Time course analysis of PRRX1 isoform expression levels showed that PRRX1 was downregulated only after 48h. This late downregulation of PRRX1 in response to TGF-b1, could be the signature of a negative feedback loop to limit cell-responsiveness to TGF-b1 when lung fibroblasts are fully differentiated into myofibroblasts at 48h as discussed in the revised manuscript (see lines 175-180 and lines 589-594).

      Comment #2: Another smaller weakness is the inactivation of Prrx1 in vivo using ASO starting at d7 post bleomycin treatment.

      In our study of Prrx1 inhibition in vivo, we followed a therapeutic/interventional protocol consistent with current literature on the bleomycin model of lung fibrosis (Moeller A. et al, Int J Biochem Cell Biol 2008 and Kolb M. et al., Eur Resp J. 2020), treating the animals with either control or Prrx1 ASO every other day between day 7 and day 14 during the active fibrotic phase. In the revised manuscript, we extended our investigation to assess the potential effect of Prrx1 inhibition during the late fibrosis phase after bleomycin treatment at day 28, treating the animals with either control or Prrx1 ASO every other day between day 21 and day 27. Interestingly, we found that the effects of Prrx1 inhibition during the late fibrosis phase were less (but still) potent compared to the active fibrotic phase (see Figure 7-figure supplement 1).

    1. Author Response

      Reviewer #2 (Public Review):

      We thank the reviewer for their assessment that our work “supports the idea that epithelial-endothelial crosstalk is important for lung regeneration and proposes a potential candidate for this process” and their helpful suggestions for strengthening and clarifying our work.

      1) The scRNA-seq analysis is performed in two separate objects ("control lung" and "H1N1 infected lung 14dpi"). For these two sets of data to be comparable, the authors should have integrated the objects and analyzed them together. This is not only important for deciding the clusters' identities and making sure that the same clusters are compared between control and infected, but also necessary to compare gene expression.

      We have integrated the control and H1N1-infected scRNA-seq datasets and reanalyzed the integrated data. We then analyzed CAP1_A and CAP1_B populations, comparing their gene expression between control and influenza conditions. Unbiased clustering of the integrated dataset reveals the same clusters we identified in the individual datasets, with cells from control and flu contributing to each cluster (with the exception of proliferating endothelial cells, which are found only in the H1N1-infected lung). We have added a supplemental figure outlining these data (Figure 1 – Figure Supplement 3).

      2) ATF3 is not only present in Cap1_B, in the infected lung there seems like Cap1_A express less ATF3. The authors should comment on this difference.

      We have added violin plots to Figure 1, which we feel will better represent the greater Atf3 expression in CAP1_Bs relative to other endothelial cell subtypes. The reviewer is correct that Atf3-expressing cells are found in large vessels, but they are also numerous in the alveolar capillary space and increase with influenza in these regions. We have added lower-magnification, higher-resolution images of Atf3CreER; ROSA26tdTomato animals, both control and influenza-infected, to illustrate this expansion in a new Figure 2 – Figure Supplement 3. This increase is also quantified in Figure 2C. We have also clarified this in the text.

      3) It is unclear how the clusters Cap1_A and Cap1_B were decided. The manuscript would benefit from clarification.

      We have added text to the Materials and Methods section to clarify this.

      4) It would be beneficial to see via immunofluorescence the morphological and spatial differences between ATF3-expressing and non-expressing endothelial cells since this transcription factor is expressed in multiple endothelial cell types.

      We have added lower-magnification, higher-resolution images of Atf3CreER; ROSA26tdTomato animals, both control and influenza-infected, to illustrate the spatial distribution of Atf3-expressing endothelial cells. This data is now shown in the new Figure 2 – Figure Supplement 3. We have also added further data to the new Figure 5 – Figure Supplement 1 to include the cytoplasmic endothelial marker Endomucin-1 (EndoM1) in an analysis of the spatial distribution of endothelial cells in wild-type and Atf3-knockout animals at 21 dpi.

      5) The authors mention ATF3 is not endothelial-specific. Expression of ATF3 in other cell types should be evaluated via immunofluorescence.

      This data is present in Figure 2 – Figure Supplement 2.

      6) The authors should have shown evidence of the deletion in their Atf3EC-KO mouse and addressed whether they had residual ATF3. If there is no antibody available, RNAscope could be used, or Western Blot or RT-PCR on sorted endothelial cells.

      We agree that this is an important quantification to make. We have performed qRT-PCR for Atf3 in both the animals used to perform the RNA sequencing experiment as well as a new cohort of animals to confirm Atf3 deletion. We have added these results to a new supplemental figure accompanying Figure 4 (Figure 4 – Figure Supplement 1).

      7) The authors only show the epithelium as evidence that the alveolar region is altered in their mutant after infection. The endothelium should have also been investigated, especially since their mutant is an endothelial-specific deletion. Within this, the different endothelial cells should have been assessed by a method other than RNAscope such as immunofluorescence, given that this method is unable to show morphology and there are antibodies available.

      This data is present in Figure 5. We have also added additional data to the new Figure 5 – Figure Supplement 1 to extend our analysis to 21 dpi and to incorporate a cytoplasmic marker of endothelial cells, Endomucin (EndoM1).

      8) Bulk RNA-seq from endothelial cells is used in the manuscript. However, because ATF3 is not specific to Cap1_B cells or even capillaries alone, the downstream gene expression analysis of bulk RNA should be placed into the context of lung endothelial heterogeneity.

      We have added qRT-PCR analysis of several downstream genes to address the comments of Reviewer #3, point #3. To place this into the context of endothelial heterogeneity, we have added dot plots to show the expression of selected genes from the RNA-seq analysis in each endothelial subtype from the H1N1 scRNA-seq dataset. These data can be found in the new Figure 4 – Figure Supplement 1. However, because of the relatively low sequencing depth of scRNA-seq compared to bulk RNA-seq, many of the transcripts examined were only present in a small percentage of endothelial cells in the scRNA-seq dataset, so the differences seen are more striking in the RNA-seq data.

      9) Although the authors mentioned that the infection with H1N1 influenza can have regional differences, they do not show how they picked regions for their analysis and quantification, and whether ATF3 upregulation was found in more severely affected regions. Furthermore, since they quantified via FACS, this heterogeneity in the infection itself could have affected their observations.

      We agree that it is essential both to define the extent of H1N1-mediated inflammation in Atf3 wild-type and knockout mice and to compare this factor between genotypes. We have therefore used a previously published method for quantifying regions of severe, damaged, and normal tissue structure (Liberti et al., Cell Reports 2021) in both Atf3 wild-type and knockout animals. Our results show that Atf3 wild-type and knockout mice have similar levels of tissue damage, and we have added a supplemental figure demonstrating these data (new Figure 3 – Figure Supplement 2). We have also clarified how regions were selected for quantification of alveolar area.

      H1N1 influenza injury in mice is heterogeneous, with regions of severe alveolar destruction marked by densely packed immune cells, adjacent regions of damaged tissue, and regions of tissue that appear to have normal tissue structure, as we and others have previously described (Zacharias, Frank et al., Nature 2018; Liberti et al., Cell Reports 2021; Niethamer et al., eLife 2020). However, it has become increasingly apparent that these regions where tissue structure appears normal are actually regions of active regeneration, and endothelial cell proliferation is increased in these regions (Niethamer et al., eLife 2020). We therefore selected 20X fields in these areas to use for quantifying alveolar area, as these are actively regenerating regions where alveolar structures are present for quantification. Because of the changes to tissue structure seen in damaged or destroyed tissue areas, we did not select these regions for quantification, although they were present at similar frequency in Atf3 wild-type and knockout animals.

    1. Author Response

      Joint Public Review:

      These RNAs come from a screen which is not well described and the descriptions of the sequence analyses are unclear, so it is difficult to know exactly what they are analyzing in the manuscript.

      We apologize for not including the required details in the manuscript. The cell cycle lncRNA screen where we identified the initial SNUL-1 probe was published in an earlier paper 6. By performing RNA-seq in cell cycle synchronized samples, we identified several hundreds of lncRNAs that differentially expressed in a particular stage of the cell cycle. We performed a large-scale RNA-FISH-based screen to characterize the localization of these cell cycle-regulated lncRNAs. One of the probes in this screen hybridized to SNUL-1 RNA in the nucleolus. The original double-stranded DNA probe that detected the SNUL-1 RNA cloud(s) was mapped to hg38-Chr17: 39549507-39550130 genomic region, encoding a lncRNA. However, other unique non-overlapping probes generated from the Chr17-encoded lncRNA failed to detect the SNUL-1 RNA cloud. Furthermore, BLAST-based analyses failed to align the SNUL-1 hybridized sequence to any other genomic loci. Since a large proportion of the p-arms of nucleolus-associated NOR-containing acrocentric chromosomes is not yet annotated, we speculated that SNUL-1 could be transcribed from an unannotated genomic region from the acrocentric p-arms.

      We have now provided the information in the revised manuscript. Specifically, we have provided the details of the PacBio iso-seq, nanopore seq analyses as well as the bioinformatic approaches that were conducted to determine the identity of the full-length SNUL-1 ncRNA.

      If these are RNAs with reasonable abundance, then they should be findable without the extensive PCR amplification they appear to have done for the PacBio sequencing (the methods section is not clear on exactly how many rounds of PCR were performed).

      We apologize for not providing the essential details. In the PacBio-iso-seq analyses, we utilized the standard protocol (recommended by the scientists from PacBio, who are authors in the manuscript), which included 13 PCR cycles. However, as described in the manuscript, in parallel to PacBio-seq, we also performed nano-pore sequencing of the nucleolus-enriched RNA without any amplification. The SNUL-1 full-length candidate sequence (CS) that we described in the manuscript is the ncRNA that showed 100% sequence similarity in both independent PacBio Iso-seq as well as nanopore seq analyses. We argue that if the SNUL-1 candidate transcripts would have been an artifact of PCR amplification in PacBio-seq, then we would not have obtained the full-length sequence with 100% match in the nanopore-seq reads. We have now included the detailed bioinformatic analyses in the methods section of the ms.

      Moreover, given the acknowledged sequence similarities of the SNULs with other RNAs, the possibility of chimaera formation during PCR amplification is high. They are clearly detecting RNAs associated with nucleoli but exactly what they are examining is unclear.

      Please see our response above (public Reviewer comment 2). In addition, we performed detailed bioinformatic analyses to test whether the SNUL-1 full-length sequence obtained in the PacBio-seq is not an artifact of PCR amplification. This analysis is described in detail in the methods section under the sub-title “sequencing analyses”.

      It is possible that a clear determination of the genomic origin of these RNAs will be complicated by the repetitive sequences in the regions of the genome where they reside.

      We thank this reviewer for acknowledging the technical limitation of mapping the genomic locus of SNUL1 genes. We have pointed out this as the limitation of the present manuscript. Mapping the SNUL-1 genomic locus and characterizing the regulatory sequence elements and factors that control the monoallelic expression of SNULs will be part of future research plans.

      Note also that the idea of monoallelic expression from rRNA encoding loci is interesting, but has been established in 2009. Title: Allelic inactivation of rDNA loci. Genes Dev. 2009 Oct 15;23(20):2437-47. doi: 10.1101/gad.544509.

      We thank the reviewer for pointing out the study from Cedar lab published in 2009. To test the idea that SNULs contribute to allele specific expression of rRNA, which was previously reported by Cedar lab in their 2009 G&D paper, we performed the same set of experiments described in their paper in three different cell lines in the presence or absence of SNULs (please see the response to Editorial comment-2). However, we could not reproduce any of the data presented in the G&D manuscript. Also, we have not seen any other follow up study, where mono-allelic expression of rDNA genes was observed. Currently, no concrete data supports monoallelic expression of rRNA 5. We, therefore, argue that our current study is the first one, demonstrating the mono-allelic association of a ncRNA from the p-arm containing rDNA cluster.

    1. Author Response

      Reviewer #1 (Public Review):

      The shift from outcrossing to selfing is one of the most prevalent evolutionary events in flowering plants. The ecological and genetic backgrounds of these transitions have been of major interest for decades, and one of the key questions was the dating of this transition. Timing of pseudogenization of the self-incompatibility (SI) genes has been used as a proxy for this transition because loss-of-function mutations of SI genes are often responsible for the evolution of predominant selfing. However, SI genes are identified only in a limited number of taxa, and in some cases, the evolution of selfing is not necessarily associated with loss of SI. Therefore, an independent time estimate of the evolution of selfing by genome-wide polymorphism data has been considered important in this field.

      This study provides two statistical methods: SMC-based and ABC-based methods. Both methods intend to detect the genome-wide signatures of the outcrossing-to-selfing transition that alters the ratio of population recombination rate and mutation rate. Authors validated these methods by using the simulated data, confirming that both methods can generally infer the timing of the outcrossing-to-selfing transition jointly with population size changes, although its precision depends on several population history settings.

      This study would be an important contribution to the field of mating system evolution. By applying the proposed methods to many other selfing organisms, we may be able to see a general picture of the timescale of the outcrossing-to-selfing transition combined with population size dynamics. At the same time, this is one of the extensions of the SMC method, which has already been well utilized for various inferences, including population size and recombination rate heterogeneity.

      We thank the reviewer for his positive comments and acknowledging the novelty and relevance of our study for the field.

      I do not find a major weakness in the methodologies of this study, but I have a few comments on their applications to the data of Arabidopsis thaliana. It is important that these estimates largely depend on what input data is used, especially the mutation rate and recombination rate. While the authors claim that their estimate is older than Bechsgaard's estimate (<413 kyrs), these two studies used different mutation rates: the authors used Ossowski's mutation rate, and Bechsgaard used Koch's mutation rate (Koch et al. MBE 2010). To compare these two estimates, it is important to use the same mutation rate. Shimizu & Tsuchimatsu (2015; Ann Rev Eco Evo Syst) in detail discussed this point and showed that Bechsgaard's estimate becomes <1.48 myrs when Ossowski's mutation rate was used (see Figure 4). Then it happens to overlap with the estimate of this study.

      Thank you very much for identifying this important problem. It is indeed critical to re-scale Bechsgaard’s age of the transition using the same mutation rate as used in our analysis (Ossowski et al 2010). We now use the rescaled estimate published in your review (Shimizu and Tsuchimatsu 2015, figure 4). We note that Bechsgaard et al did not publish a measure of uncertainty around their estimate of the transition; making it difficult to compare it with our posterior distributions. However, Bechsgaard’s estimate is not contained within the credibility intervals of our posteriors for t_sigma and therefore we consider both results significantly different. We have modified the text accordingly, at page 4 l. 8-10; and p.12 l. 27 to p.13 l. 5

      I am also concerned about the genomic regions of Arabidopsis thaliana used for this study. Authors chose specific five regions based on homogeneity of recombination rates and diversity, but how does the estimated change when randomly chosen genomic regions are used? If it is important to choose "preferable" regions according to the homogeneity of recombination rates and diversity, it may be useful to describe how these regions should be chosen for future applications of this method to other organisms.

      The genomic intervals used for the application to A. thaliana are indeed not random. They were defined such as to avoid, on each chromosome, the increased diversity observed at and surrounding pericentromeric regions. This effect has already been described by Clarck et al (2007, Science) but however, no explanation for this pattern has been published yet. We have updated the text, including a recommendation for future application to other species, at lines p. 13 l. 8-15 and p. 18 l.25-30, and Figure S15. We have also replicated our analysis of the A. thaliana data using a different set of genomic intervals located outside pericentromeric regions (Figure S15 and S16)

      Reviewer #2 (Public Review):

      This submission seeks to detect changes in the rate of selfing through pairwise comparison of haplotypes sampled from a population. It begins, as did a previous paper by a subset of the authors (Sellinger et al. 2020), with the well-known theoretical finding that partial selfing increases the rate of coalescence and decreases the rate of crossing-over events in genealogical histories.

      I am supportive of pitching this contribution as primarily theoretical, with the very short discussion of the Arabidopsis data provided as a worked example. This perspective increases my enthusiasm, compared to an initial reading. My comments are intended to encourage development.

      Some thematic characteristics reduce the impact of the submission. Among these are:

      (1) a rather less than a scholarly perspective on previous literature;

      (2) tendency to avoid theoretical development in favor of computation;

      (3) little interpretation of results of their only analysis of real data.

      We have now revised the manuscript along the lines suggested by reviewer 2. We provide more references when needed, have emphasized in the abstract and in the theoretical part of the manuscript that it is primarily a new theoretical/methodological development with an application to A. thaliana data, and have improved the interpretation of the A. thaliana data (see reply to reviewer 1).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of this study sought to test whether the optogenetic induction of context-related freezing behavior could be enhanced by synchronizing light pulses to the ongoing hippocampal theta rhythm. Theta is a hippocampus-wide oscillation that strongly modulates almost every cell in this structure, which suggests that causal interventions locked to theta could have a more pronounced impact than open-loop ones. Indeed, the authors found that activating engram-associated dentate gyrus (DG) neurons at the trough of theta resulted in an increase in freezing relative to baseline when averaging across all stimulation epochs. In contrast, open-loop stimulation and peak-locked stimulation had weaker effects. Analysis of local field potentials showed that only the theta-locked stimulation facilitated coupling between theta and mid-gamma, indicating that this manipulation likely enhances the flow of activity from DG to CA1 via CA3 (as opposed to promoting transmission from entorhinal cortex to CA1). Previous results from mice, rats, and humans support the hypothesis that memory encoding and recall occur at distinct phases of theta. This work further strengthens the case for phase-specific segregation of memory-related functions and opens up a path toward more precise clinical interventions that take advantage of intrinsic theta rhythm.

      Strengths:

      This study recognizes that, when artificially reactivating a context-specific memory, the brain's internal context matters. In contrast to previous attempts at optogenetically inducing recall, this work adds an additional layer of precision by synchronizing the light stimulus to the ongoing theta rhythm. This approach is more challenging, because, in addition to viral expression and bilateral optical fibers, it also requires a recording electrode and real-time signal processing. The results indicate that this additional effort is worth it, as it results in a more effective intervention.

      The findings on theta-gamma cross-frequency coupling suggest a possible mechanism underlying the observed behavioral effects: trough stimulation enhances DG to CA1 interactions via CA3. LFP recordings showed that stimulation increases the coupling between theta and mid-gamma (though not in all mice), and the percentage of freezing during reactivation is correlated with the gamma modulation index.

      Weaknesses:

      Given the precision of the intervention being performed, one might expect to see a stronger behavioral impact. Instead, the overall effect is subtle, and quite variable across mice. Looking at individual data points, the biggest overall increase in freezing actually occurred in 2 mice during the 6 Hz stimulation condition. Furthermore, trough stimulation decreased freezing in 3 mice. This is not a weakness in itself; rather, the weakness lies in the lack of an attempt to make sense of this variability. There are a number of factors that could explain these differences, such as viral expression levels, electrode/fiber placement, and behavior during baseline. There is of course a risk of over-interpreting results from a few mice, but there is also a chance that the results will appear more consistent after accounting for these additional sources of variation.

      Although two mice that had negative light induced freezing for trough stimulation, the other 15 mice showed a positive result. Stringent inclusion criteria were used to ensure that mice had adequate viral expression levels and behavior during baseline. Mice without at least 5% light induced freezing in at least two of the four epochs were not included in the study. The negative behavior from some mice is further explained through the correlation between MI and light induced freezing (Figure 5D). 6 Hz showed mixed behavioral results across the different behavioral measures quantified. Additionally, 6 Hz did not show the physiological hallmarks of memory reactivation through the theta-gamma modulation index so having an increased number of negative light induced freezing samples is expected.

      Finally, the elevated baseline freezing rate relative to previous literature could have masked some of the behavioral effect.

      In the revised manuscript, we discuss the effects of exclusion criteria more clearly.

      While trough-locked optogenetic stimulation significantly increases freezing, the effects are much weaker than placing the mouse in the actual fear-conditioned context (average time freezing of 15% vs. 50%). The discussion would benefit from additional treatment of ways to further increase the specificity and effectiveness of artificial memory reactivation.

      We have content on future directions for artificial memory reactivation to further approach the behavioral response of natural recall. We believe that incorporating time varying stimulation to different cells or parts of the hippocampus could improve the induced recall value as all current methods stimulate the entire sub-region simultaneously.

      Using an open-source platform (RTXI) for real-time signal processing is commendable; however, more work could be done to make it easier to adopt these methods and make them compatible with other tools. The RTXI plugin used for closed-loop stimulation should be fully documented and publicly available, to allow others to replicate these results.

      The RTXI plugin can be found here: (https://github.com/ndlBU/phase_specific_stim). The URL has been added in the description of Figure 1.

    1. Author Response

      Reviewer #1 (Public Review):

      The screening effort has revealed a number of interesting and novel suggestions of new modulators of nuclear appearance that are exciting and have the potential to be of value to the field.

      We appreciate the reviewer’s view that identification of new modulators of nuclear morphology is exciting and of value to the field.

      Major Points:

      1) The discussion of the screen hits and prior knowledge key to their interpretation is lacking. For example, the authors only report on the purported localization of the hits without an unbiased analysis of their function(s). As a sole example, multiple members of the condensin complex are hits in Fig.1 while multiple members of the cohesin complex are hits in Fig. 2 - but there are many more factors worthy of further discussion. Moreover, the authors need to provide more information on the data used to assign the localization of the hits and how rigorous these assignments may be. For example, multiple CHMP proteins (ESCRTs) are listed - indeed CHMP4B is the highest scoring hit in Fig.1 - but this protein does not reside at the nuclear envelope at steady-state; rather, it is specifically recruited at mitotic exit to drive nuclear envelope sealing. Moreover, there are many hits for which there is prior published evidence of a connection to nuclear shape or size that are ignored: examples include BANF1, CHMP7, Nup155 (and likely far more that I am not aware of). This is a missed opportunity to put the findings into context and to provide a more mechanistic interpretation of the type of effects that lead to the observed changes in nuclear appearance. For example, there is already hints as to whether the effects occur as a mitotic exit defect versus an interphase defect, but conceptually this is not addressed.

      We appreciate this important point. We find that one of the major challenges in presentation of screening results is to provide detailed information on all interesting hits within the length limits of a manuscript! To provide a more comprehensive picture, we have now performed pathway analysis using STRING to display protein interaction networks to more comprehensively classify hits and groups of hits (Figures S7 and S8). We find highly connected regions in the network corresponding to condensin and histone modifiers in fibroblast hits altering nuclear shape. In contrast, MCF10AT hits showed increased connectivity with nucleoporin proteins. Fibroblast hits displaying an increase in nuclear size identified multiple nucleoporins and MCF10AT hit analysis identified components of DNA replication. We have added these findings to Supplementary Figures 7 and 8 and discuss them on page 16. Also, as requested, we added more than 20 new references and additional information on previously identified functions of some hits discussed in the text on p. 22-24.

      2) Validation of the screen is lacking. There appears to be no evidence that the authors validated the initial screen hits by addition siRNA experiments in which the levels of the knock-down could be assessed. As an example: do nucleoporin hits decrease in their abundance at the nuclear envelope in these conditions? This validation is absolutely essential.

      As requested, we now include in Tables S6A-C, data from independent validation experiments in which we selected the primary hits and validated them using an independent set of siRNAs with distinct chemistry and target sequences. Additionally, we demonstrate efficient knockdown capabilities for 8 targets in Supplementary Figure 9 with knockdown levels for most siRNAs of at least 60%. We find no strong relationship between knockdown efficiency and the extent of the observed phenotype (compare Figure S9 and Figure S10).

      3) Differences in cell type - the authors' interpretation that a lack of overlap in the hits across cell types reveals that there are fundamentally cell type-specific mechanisms at play is a stretch. This could also reflect a lack of robustness in the screen, which should be addressed by directly testing the knock-down of the hits from one cell line in the other. Even if this approach reinforces the cell type specificity, the differences in the biology beyond the nucleus itself - an obvious example being the mechanical state of the cell - organization of the cytoskeleton, adhesions, etc that influence forces exerted on the nucleus are different rather than the nuclear response is different. These caveats needs to be explicitly acknowledged.

      As requested, we have now performed side by side experiments between both cell lines to directly compare a subset of nuclear morphology hits in parallel. They are shown in Supplementary Figure 10. We find a number of hits display strong nuclear shape abnormalities in either fibroblasts or MCF10AT cells but not both, with the exception of LMNA, which confirms our screen data. In addition, we compared the hits from our screen with previously published reports of other factors which regulate nuclear morphology to further strengthen our findings. We mention these results on p. 16. Despite these results, we have now toned down our statements regarding cell-type specificity of individual hits considering the small number of cell lines analyzed and the possible cellular factors which could contribute to cell-type specific differences.

      4) There are major issues with the interpretation of the presented biochemistry. For example, the basis for the supposed effect of monomer/dimer state of lamin is confusing and likely misinterpreted. It is well established that GST imposes dimerization on proteins expressed as GST fusions independent of cysteines. Any effect of DDT would have to manifest through some other mechanism (disulfides between the lamin domains - assumedly what the authors are thinking). Further, GST will impose dimerization of lamin A and lamin C in the co-incubation experiments. It is therefore entirely expected that if lamin A binds H3 and lamin C does not that the mixed dimers will bind H3 with lower affinity. Critically, this does not, however, address how full-length lamin C influences binding of lamin A to H3 in vivo. Last, how an effect of lamin C on lamin A would manifest through a disulfide bond in the nucleus, which has a reducing environment, is entirely unclear.

      We directly tested the possibility that GST causes artifactual dimerization of lamins by mutating cysteines to alanine in GST-lamin and assessing their effect on histone binding experiments. We show the results in Supplementary Figure 14E. If the observed binding were artifactually due to GST-mediated dimerization, we should not expect an effect of the cystine mutants on histone binding. We find, however, that the C522A mutation in lamin A results in increased binding of H3 in the presence of lamin C, demonstrating that the observed effects are not due to GST dimerization. We discuss these results on p. 18 and p. 19.

      We agree with the referee that it will be exceptionally challenging to determine the in-vivo relevance of disulfide bonds, not knowing what the precise environment of the nucleus is. Given these caveats, we have now toned down this point and discuss the limitations of these findings in more detail on p. 19, 23, 24, and 25.

      5) It is important for the authors to address the concept of nuclear size changes versus changes in the nuclear to cell volume ratio – biologically these could be quite different conditions, but obviously these cannot be distinguished by measuring nuclear volume alone. Addressing this experimentally would be best (to provide more depth to the size measurements).

      This is an important point. As requested, we now clearly indicate on p. 23 that we are measuring nuclear area using nuclear cross-sections as a proxy for nuclear size rather than nuclear to cell volume ratio. We have found in our imaging studies over the past two decades that measuring cell volumes is exquisitely challenging and often highly inaccurate. A major challenge in these approaches is the correct identification of cell boundaries and this is particularly challenging in a high-throughput setting since cell volume measurements require z-stacks that greatly complicates the imaging and quantitative analysis and increases the complexity of this kind of analysis of the millions of cells analyzed in a screen. Ultimately, measurements of cell volume for adherent cells will only be estimates (see for example PMID 28622449). We now clearly indicate this limitation of our approach and discuss on p. 15 and 23 previous studies measuring nuclear size and cell volume ratio measurements and how it compares to measuring nuclear area alone. We have also added several references on this topic on p. 15 and 23.

      6) There are important caveats to the approach of using the nuclear area as proxy measurement for nuclear size, most prominently that it is highly responsive to changes in nuclear height that can occur for a multitude of reasons (increased height = small radius and decreased height = larger radius), particularly given the different cell types. This needs to be acknowledged directly.

      Along the lines of point 5 and as requested, we now more clearly acknowledge on p. 23 these caveats due to our screening method of measuring nuclear area as a proxy for nuclear size. Nuclear cross-sectional area has been experimentally shown to be a good proxy for nuclear size in many systems (see PMID 31085625). For this reason, and because quantifying nuclear size from z-stacks would have greatly complicated the imaging and quantitative analysis, we chose to use nuclear cross-sectional area as our metric for nuclear size. In looking through our data, we did not find any significant differences in nuclear height between the two cell lines used or amongst hits and non-hits. With respect to the issue of different cell types, our analysis focused on RNAi knockdowns that altered nuclear morphology in a given cell line and we did not compare cell lines against each other. Separate analyses were performed for each cell line, so possible differences in nuclear height between the different cell lines used should not affect our analysis. We now discuss these issues on p. 23.

      7) What is the evidence that the H3 effects manifest through lamins rather than directly?

      We apologize for not being clear. We did not mean to intend to state that H3 acts via lamins. We do find that H3 physically interacts with lamins and that H3.3 mutants (K9M, K27M, and K36M) result in nuclear morphology defects. We now also show in the new Figure S17 that H3.3 mutants slightly affect lamin levels. However, as pointed out by the reviewer, these observations do not categorically rule out non-lamin related mechanisms and we now make it clear in our discussion on p. 20 that the effect of H3 may either be mediated via lamins or independently.

      8) Context is needed for the "methyl-methyl" histone states described as being the highest binders in the peptide array experiments. Are these states commonly found? Where in the genome? Does this match any DamID data? Again - more depth of investigation is required.

      This is a good point. Unfortunately, to our knowledge there is currently no ChIP-seq human genome map of di-methyl modifications on histone tails available. We were unable to generate or procure the individual dually methylated peptides and methyl-methyl H3 antibodies are not available and we are thus not able to perform quantitative binding assays. However, to begin to address this issue, we now provide in a new Supplementary Table 8 quantitative data of binding intensities. Given these limitations, we have now toned the claims regarding the methyl-binding sites.

      9) That oncohistones induce changes in nuclear shape or size does not mean that this is related to the mechanism in cancer. Also - how over-expression of H3 without its obligate partner H4 could disrupt the cell or an assessment of the extent of the oncohistone incorporation into chromatin achieved in these experiments makes it challenging to interpret.

      We agree and did not intend to imply that the oncogenic function of the histone mutants involves changes in nuclear morphology. We now clearly state so on p. 25 and we also mention the caveat of the overexpression experiment.

      10) Throughout the manuscript it would be helpful to the reader if the author would provide at minimum a brief statement on the previously identified functions of the hits that are explicitly discussed beyond their localization (membrane versus chromatin). References would also be helpful (for example, again - what is the evidence that SLC27A3 resides at the nuclear envelope?).

      As requested, we added more than 20 new references and now provide additional information and previously identified functions of many of the hits mentioned in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Huang et al. examines the potential "self-policing" of Bacillus cells within a biofilm. The authors first discover the co-regulation of lethal extracellular toxins (BAs) and the self-immunity mechanisms; the global regulator Spo0A controls both. The authors further show that a subpopulation of cells co-express these genes and speculate that these cells engage in preferential cooperation for biofilm formation (over cells that produce neither). Based on previous literature, the authors then evaluate the relative fitness of the wild-type strain compared to mutants locked into either constantly exporting the toxins or permanently immune to these poisons. The wild-type exhibited increased fitness (compared to the mutants) for the tested biofilm conditions. The manuscript raises interesting ideas and provides a potential model to probe questions of cooperatively in Bacillus biofilms.

      Strengths:

      • The authors use fluorescence-producing reporter strains to discern the spatial expression patterns within biofilms. This real-time imaging provides striking confirmation of their conclusions about shared co-regulation.

      • The authors also nicely deploy genetic constructs in microbiological assays to show how toxin production and immunity can influence biofilm phenotypes, including resilience to stress.

      Thank you very much for your positive comments. The detailed response to your comments and suggestions are as follows.

      Concerns:

      • My biggest concern is that the claim of policing on a single-cell level needs more quantitive microscopy, particularly of the xylose-induced strain. The data support a more tempered consideration of self-policing via BAs and self-resistance in this Bacillus species. It seems sufficient that this manuscript opens the door for a novel and readily examinable system for examining potential cooperation and its molecular controls (without making broader claims).

      Thank you very much for your comments. We demonstrated the policing system on a single-cell level by re-filming the progress of individual nonproducers from alive to death and even disappearance in a biofilm population (please see the pictures in Figure 2 and the statistical data in Figure 2-figure supplement 1 of the revised manuscript, as well as revised Figure 2-video 1-4). Alternatively, the xylose-induced strain (SQR9-Pxyl-accDA) was constructed to assess the involvement of AccDA expression (controlled by Spo0A~P in wild-type while induced by exogenous xylose here) in regulating BAs synthesis and immunity. The expression of AccDA is likely to be homogeneous in the colony with xylose addition, instead of a heterogeneous expression in the wild-type population.

      • The discussion is more speculative than the presented data warrants. For example, the speculation in lines 289 - 310 is not anchored in the results. It is hard for this reviewer to imagine how one would use the genetic framework and tools developed in this manuscript to address the ideas proposed in lines 289 - 310.

      Thank you for your comments. We have revised the discussion to ensure it is more related to data warrants than speculation. As a complement to the molecular mechanism of the policing system in the discussion, the hypothesis of the evolution of this system (Lines 289-310 in the original version) was included to give a possibility that how it raised, which is based on a couple of ecological theories with regards to division of labour and kin selection4-6; we have shortened this discussion in the revised manuscript.

      • Some conclusions (in the results section) are more decisive than the data supports. For example, the microscopy of the PI staining (as presented in Figure 2 and the supplemental movies) does not prove that only non-expressing cells die. Yet the conclusion in line 143 states that "ECM and BAs producers selectively punish the nonproducing siblings." Also, the presented data shows many non-labeled cells without PI; why do some nearby non-gfp-expressing cells remain alive?

      Thank you for your constructive comments. According to the reviewer's suggestion, an observation covering more complete biofilm forming process, as well as a more convinced data statistics, should be performed. We then re-conducted microscope observation lasting for 3 h during biofilm formation, and assess the source and location of dead cells for statistical analysis. The results showed that all dead cells were originated from the subpopulation that didn't express the gfp (the nonproducers), and the number of dead cells adjacent to the producers was significantly higher than that closed to the non-producers (please see the pictures in revised Figure 2 and Figure 2-figure supplement 1).

      In addition, regarding the survival of some non-gfp-expressing cells near the producers, based on several relevant literatures1-3 and the observation in the present study, we speculate that the coordination system for optimizing the division of labor is relatively temperate, thus only a part of the nonproducers (relative sensitive cells or facing higher concentrations of the toxin) are eliminated. We think this scene is a balance between restraining the cheater-like subpopulation and retaining the advantages of cell differentiation.

    1. Author Response

      Reviewer #1 (Public Review):

      The work in this study builds on previous studies by some of the same authors and aims to test whether the heartbeat evoked response was modulated by the local/global auditory regularities and whether this differed in post-comatose patients with different contagiousness diagnosis. The authors report that during the global effect there were differences between the MCS and UWS patients.

      The study is well constructed and analysed and has data from 148 participants (although the maximum in anyone group was 59). The reporting of the results is excellent and the conclusions are supported by the results presented. This study and the results presented are discussed as evidence that EEG based techniques maybe a low cost diagnostic tool for consciousness in post-comatose patients, although it should be stressed that here no classification of diagnostics was performed on the EEG data.

      One potential weakness was the relationship between the design of the experiment and the analysis pathway for the results. If I have understood correctly the experimental design the auditory regularity changed on whether the local/global regularity was standard/deviant. In the analysis the differences between all conditions in which the local or global regularity were compared between the standard and deviant trials. This difference was then compared between MCS and UWS patient groups. For these analyses the results for the health and emerging MCS were not included. If this is correct it would be interesting to understand the motivation for this. Relatedly, it would be good to clarify if the effects reported were corrected for the multiple planned contrasts and if not why they should not be corrected.

      Thanks for the appreciation and constructive comments to our work. The misdiagnosis of MCS/UWS patients in the clinical practice occurs because of misdetection of covert consciousness given the absence of overt behavioral signs of consciousness. Therefore, the main motivation of our study is to contribute to a better distinction between those two patients’ groups.

      We have modified the introduction to clarify that the objective of the paper is to show in major detail the group differences between MCS and UWS patients:

      "In this study, we analyze HERs following the presentation of auditory irregularities, with special regard on distinguishing UWS (n=40) and MCS (n=46) patients. Note that the automated classification of this cohort was previously performed in another study (Raimondo et al., 2017). Therefore, our aim is to characterize the group-wise differences between UWS and MCS patients that may allow a multi-dimensional cognitive evaluation to infer the presence of consciousness (Sergent et al., 2017), but also complement the bedside diagnosis performed with neuroimaging methods that capture neural correlates of covert consciousness (Sanz et al., 2021)."

      Reviewer #2 (Public Review):

      The goal of this study was to determine whether heartbeat-evoked responses measured at the scalp level with EEG, which followed regularity violations, could potential help inform the diagnosis of patients with altered states of consciousness.

      The authors use high density EEG and an oddball paradigm that probes violations of both local and global regularities. Four groups were considered including unresponsive wakefulness syndrome patents, minimally consciousness patients, emerging minimally consciousness patients and healthy controls. A difference was found between unresponsive and minimally conscious patients in the amplitude of the heartbeat evoked responses measure with EEG following a sound that violated a global regularity. Similarly, differences were found between the variance of these responses between the two above mentioned groups (N=58 and N=59), but no differences were found in relation to the healthy control group, which appear to be "in between" the two other groups (at least for global effect of HER). I thought this was a little counterintuitive and raises some questions about what this neural signature can tell us about the state of consciousness. Having said that, the healthy control sample was very small, more than 5 times smaller (only N=11).

      Thanks to the reviewer for their comments. As described above, distinguishing between MCS/UWS patients is one of the main challenges in the clinical practice. We have modified the manuscript to show the differences between these two patients’ groups. Further data on EMCS and healthy participants is not included in this revision because of the new inclusion criteria.

      In general, I thought the Discussion section was a little light on the implications of the findings, what they tell us about the brain mechanisms of consciousness and their different levels/states. A question is raised about whether it is necessary to lock EEG to heartbeats to find differences between patients. The data appeared to say that this is not the case but the discussion does not appear to reflect that very clearly.

      We have enriched the discussion to comment on the relation of HERs in perception:

      "Our results contribute to the extensive experimental evidence showing that brain-heart interactions, as measured with HERs, are related to perceptual awareness (Azzalini et al., 2019; Skora et al., 2022). For instance, neural responses to heartbeats correlate with perception in a visual detection task (Park et al., 2014). Further evidence exists on somatosensory perception, where a higher detection of somatosensory stimuli occurs when the cardiac cycle is in diastole and it is reflected in HERs (Al et al., 2020). Evidence on heart transplanted patients shows that the ability of heartbeats sensation is reduced after surgery and recovered after one year, with the evolution of the heartbeats sensation recovery reflected in the neural responses to heartbeats as well (Salamone et al., 2020). The responses to heartbeats also covary with self-perception: bodily-self-identification of the full body (Park et al., 2016), and face (Sel et al., 2017), and the self-relatedness of spontaneous thoughts (Babo-Rebelo et al., 2016) and imagination (Babo-Rebelo et al., 2019). Moreover, brain-heart interactions measured from heart rate variability correlate with conscious auditory perception as well (Banellis and Cruse, 2020; Pérez et al., 2021; Pfeiffer and Lucia, 2017)."

      Reviewer #3 (Public Review):

      I found the results very interesting but wondered why the ERP results for the global vs. local effects are not reported. This analysis is mentioned in the methods section, but I do not find it in the results. Is this what is shown in the mid row in panel D? If yes, it should be made clearer. Is there a significant local and global deviant response in each patient group?

      We thank the reviewer for their appreciation of our work and their comments.

      We have reported the new results showing clustered effects in both ERPs and HERs.

      Additionally, eyeballing Figure 1, there are a few potential issues that may be affecting the conclusion re HER:

      (1) Panel D top: it seems that the orange trace (MCS) is largely the same in both the "Local" and "global" condition. But the blue trace (UWS) shows a larger negative going deflection in the "global" case. Put differently, the UWS, but not MCS patients appear to generate a different response to the Global effect relative to the local effect. Is this the case?

      We have separated the Figure 1 into 3 new figures to clarify on the results. And we also provide a more detailed description of our results.

      In brief, our results show that MCS may have a distinctive response to global and local effects. We have included new correlation analysis in which we show that the responses to global and local effects are uncorrelated (Table 2):

      With respect to the “negative” responses in UWS. Note that the measured effect correspond to a linear combination of evoked potentials, e.g.: global effect = mean(global deviants) – mean(global standard). Therefore, the negative group-wise response may imply that global standard responses are larger than global deviants. We have included in Table 1 the statistical tests to show whether the responses to local and global effects are different from zero:

      (2) There are some MCS subjects that appear to show a global effect that is larger than that observed in EMCS and healthy controls. How do you interpret these data?

      We have included in the discussion a paragraph in which we discuss on the outliers:

      "Note that outliers are expected in disorders of consciousness and exact physiological characterization of the different levels of consciousness remains challenging. First, the standard assessment of consciousness based on behavioral measures has shown a high rate of misdiagnosis in MCS and UWS (Stender et al., 2014). The cause of the misdiagnosis of consciousness arises because consciousness does not necessarily translate into overt behavior (Hermann et al., 2021). Unresponsive and minimally conscious patients, namely non-behavioral MCS (Thibaut et al., 2021), represent the main diagnostic challenge in clinical practice. Second, some of these patients suffer from conditions that may translate to no response to stimuli, even in presence of consciousness. For instance, when they suffer from constant pain, fluctuations in arousal levels, or sensory impairments caused by brain damage (Chennu et al., 2013). Third, these patients were recorded in clinical setups, which may lead to a lower signal-to-noise ratio, and lead to biased measurements in evoked potentials (Clayson et al., 2013)."

      (3) How do you interpret the negative average HER data shown by many UWS patients?

      As mentioned above, the negative HER is a result of a linear combination of different HER-based markers (deviants minus standard).

    1. Author Response

      Reviewer #3 (Public Review):

      1) While the data are generally very convincing, the authors overstated the conclusions in several instances. For example, the authors state that EPAC and PKCε are "required" or "essential" for vesicle docking and release. However, the author's own data show that both vesicle docking and release are clearly present (though reduced) in the absence of EPAC and PKCε, demonstrating they are not absolutely required. The language could be toned down without diminishing the impact of the excellent work.

      We thank you for these important comments. We have double-checked the manuscript and modified the language of our statements. In particular, we have changed the unnecessary words “required” and “essential” to “regulate” or “important”.

      2) The authors used analysis of cumulative EPSCs to estimate release probability (Pr) and the readily releasable pool (RRP) size. Unfortunately, this approach is likely not suited for low release probability synapses such as parallel fibers (the authors estimate Pr to be 0.04-0.06). Thanawala and Regehr (2016) extensively investigated the validity of cumulative EPSC analysis under a variety of conditions. They found that this analysis produces large errors in Pr and RRP at synapses with a Pr below ~0.2. In addition, 20 Hz EPSC stimulation (as was used here) produces much larger errors compared to the more commonly used 100 Hz stimulation. Between the low Pr at parallel fiber synapses and the low stimulus frequency used, it is likely that the cumulative EPSC analysis provides a poor estimate of Pr and RRP in this case.

      Thanks for the very insightful comment. In the previous experiments, we measured RRP and Pr based on parameter taken from the work in the hippocampal CA1 neurons (He et al., 2019), which, in our opinion, is similar to PF-PC synapses concerning low release probability. We have carefully read Thanawala and Regher (2016) paper and compared different methods. While the performance of the EQ method is in general more reliable to estimate small RRP and low Pr, it relies on p to be constant throughout a stimulus train (Thanawala and Regher, 2016). Although p may be constant for the calyx of Held synapses they studied, it cannot be case for PF-PC synapses. Therefore, we decided to redo the estimations of RRP and Pr using 100-Hz train (previously 20-Hz train). This method does not require constant p and allows us to have a better estimation on RRP and Pr at PF-PC synapses (Thanawala and Regher, 2016).

      The new results have been presented in new Fig. 2E and 2F. The PF-PC synapses were stimulated at the frequency of 100 Hz, and the artifacts were truncated and the EPSCs were aligned (Fig. 2E and 2F). Note that the aim of this experiment was to investigate whether there is difference between control and cKO mice. Indeed, we found that the amplitudes of both EPSC0 and follow-up EPSCs were smaller in cKO mice, indicating that both the initial release and the replenishment are reduced by the conditional knockout o EPACs or PKCε. Compared to 20-Hz train, the 100-Hz train resulted in steady-state EPSCs brought EPSCs into steady state faster. We created linear fit from normalized steady-state EPSCs and back-extrapolated the curve to the y-axis to calculate Pr. Indeed, we found that the Pr value estimated from the 100-Hz train stimulus was significantly larger than that from the 20-Hz train, showing 0.17 (Math1-cre) and 0.19 (PKCεf/f) with 100-Hz, but 0.07 (Math1-cre) and 0.08 (PKCεf/f) in previous submission. This result was similar to Thanawala and Regher (2016), in which they claimed that the accuracy of estimation from a 100-Hz train is about three times of that from a 20-Hz train. Moreover, we found that the conditional knockout of either EPACs or PKCε produced significant decrease on Pr (Math1-cre 0.17 vs Math1-cre;EPAC1cKOEPAC2cKO 0.11; PKCεf/f 0.19 vs PKCεcKO 0.12). These results have been added in the text and figure legend (Fig. 2E and 2F), and corresponding methods have also been updated.

      3) Using a combination of genetic knockouts and pharmacology, this paper convincingly shows that presynaptic EPAC/PCKε are necessary for presynaptic LTP, but do not alter postsynaptic LTP/LTD. However, given the experimental conditions in the slice experiments, it is difficult to extrapolate from the slice data to in vivo plasticity during motor learning. Synaptic plasticity in the cerebellar cortex is quite complex and can depend significantly on age, temperature, location, and ionic conditions. Unfortunately, these were not well matched between slice and in vivo experiments. Slice experiments used P21 mice, while in vivo experiments were performed at P60. Slice experiments were performed in the vermis, while VOR expression/adaptation generally requires the vestibulo-cerebellum/flocculus. Slice experiments were performed at room temperature, not physiological temperature. Lastly, slice experiments used 2 mM Ca2+ in the ACSF, somewhat high compared to the physiological extracellular fluid. Each of these factors can significantly affect the induction and expression of plasticity. These differences leave one wondering how well the slice data translate into understanding plasticity in the in vivo context.

      This is a great question. To date, almost all PC plasticity in published work were recorded in young adult mice (< 1 month) and at room temperature, and most behavioral experiments were conducted around 2-3 months of age. To better answer the reviewer’s comment, we tried our best to redo the LTP experiments under the requested, alternative conditions (in 2-month-old mice, low Ca2+ or high recording temperature). Our new data show that, under these conditions, EPACs and PKCε are still needed for the induction of presynaptic PC-LTP (Figure 3–figure supplement 2-4). In addition, we have tried to record PC EPSCs in the flocculus. Unfortunately, we found PC EPSCs there were quite unstable, which might be due to the more complex orientation of PCs and their innervations. We have discussed the reviewer’s comment in the revised manuscript “Second, presynaptic PF-PC LTP was performed in the cerebellar vermis in the present work, whereas VOR learning generally requires PC activity in the flocculus. Unfortunately, we found that PC-EPSCs in the flocculus were not suitable to record PC plasticity because they were unstable” (Line 557).

      4) Many experiments use synaptosomal preparation. The authors identify excitatory synapses by VGLUT labelling, but it is unclear how, or if, the authors distinguish between parallel fiber, climbing fiber, and mossy fiber synaptosomes. These synapses likely have very different properties and molecular composition, some quantification or estimation of how many synaptosomes are derived from each type of synapse would be helpful.

      We have performed synaptosome staining vGluT1/vGluT2, EAAT4 and bassoon to identify PF-PC synapses (vGluT1+EAAT4+) or CF-PC (vGluT2+EAAT4+) synapses. Our staining results showed that PF-PC synapses covered 88.8% of the total and CF-PC synapses covered 7.5% of the total. Thus, we estimated the number of mossy fiber synapses to be less than 3.7%, which would not affect our conclusion. These results have been presented in Figure 1–figure supplement 1.

      5) The math1-cre mouse line is used to selectively knockout EPAC or PKCε expression in cerebellar granule cells. This line also expresses Cre in unipolar brush cells (UBCs) of the cerebellum (Wang et al., 2021). This is likely not a factor in the molecular/slice studies of EPAC/PKC signaling, but UBC dysfunction could play a role in motor/learning deficits observed in vivo. This possibility is not considered in the text.

      There is indeed evidence that UBCs are involved in cerebellar ataxias (Kreko-Pierce et al., 2020). How UBCs precisely participate in motor learning or VOR learning is unclear, but they are suggested be involved in motor performance (Mugnaini et al., 2011; Guo et al., 2021). So, we agree with the reviewer that this option cannot be excluded. Therefore, we have revised the discussion about the potential role of UBCs “Two caveats should be considered in the present studies. First, Math1-Cre-induced deletion of EPAC or PKCε might affect the function of unipolar brush cells (UBCs), which are involved in cerebellar ataxias (Kreko-Pierce et al., 2020). However, we believe that the EPAC-PKCε module regulates VOR learning through presynaptic plasticity mechanism at PF-PC synapses rather than UBCs, in line with the observations in other granule cell-specific mutations (Galliano et al., 2013; Schonewille et al., 2021).” (Line 552).

      References:

      Mugnaini E, Sekerková G, Martina M. The unipolar brush cell: a remarkable neuron finally receiving deserved attention. Brain Res Rev. 2011;66(1-2):220-45.

      Guo C, Rudolph S, Neuwirth ME, Regehr WG. Purkinje cell outputs selectively inhibit a subset of unipolar brush cells in the input layer of the cerebellar cortex. Elife. 2021;10:e68802.

      Kreko-Pierce T, Boiko N, Harbidge DG, Marcus DC, Stockand JD, Pugh JR. Cerebellar ataxia caused by Type II unipolar brush cell dysfunction in the Asic5 knockout mouse. Sci Rep. 2020;10:2168.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper uses light field microscopy to measure calcium signals across the fly brain while it is walking and turning, and also while the fly is externally driven to walk and turn, using a treadmill. The authors drive calcium indicator expression using pan-neuronal drivers, as well as drivers specific to individual neurotransmitters and neuromodulators. From their experiments, the authors show that inhibitory and excitatory neurons in the brain are activated in similar patterns by walking and that neurons expressing machinery for different neuromodulatory amines tend to show differentially strong calcium signals during walking. By examining spontaneous and forced walking and turning, the authors identify brain regions that activate before spontaneous turning and that activate asymmetrically in concert with spontaneous or forced turning.

      Strengths: Overall, the strength of this paper is in its careful descriptions and analyses of whole brain activation patterns that correlate with spontaneous and forced behaviors. Showing how the pattern of activity relates to broad classes of cells is also useful for understanding brain activation. Especially in brain regions identified as preceding spontaneous walking and in being asymmetrically involved in spontaneous and forced turning, it provides a wealth of potential hypotheses for new experiments. Overall, it contributes to a coarse-grained understanding of broad changes in brain activity during behavior.

      Weaknesses: The primary weakness of this paper is that it presents some speculative interpretations and conclusions too strongly. Most importantly, average activity in a neuropil can represent the calcium activity of hundreds or thousands of neurons, and it is hard to know what fraction is active, for instance, or how expression pattern differences might play into calcium signals. Calcium signals also do not reliably indicate hyperpolarization, so a net increase in the average Ca++ indicator signal does not necessarily reflect that the average neuron is becoming more active, just that some labeled neurons are becoming more active, while others may be inactive or hyperpolarized. The conclusions about regions triggering walk (rather than just preceding it) are too strong for the manipulations in this paper, as are some of the links with individual neuron types. Thus, more presenting substantial caveats is required for the conclusions being drawn from the data presented here.

      We thank the reviewer for their assessment and the positive comments on our manuscript. We have made these caveats clear throughout the manuscript by adding text and removing overly strong conclusions and speculations.

      Reviewer #2 (Public Review):

      Aimon et al. used fast whole-brain imaging to investigate the relationship between walking and neural activity in adult fruit flies. They find that increases in brain-wide activity are tightly correlated with walking behavior, and not with grooming or flailing, and are independent of visual input. They reveal that excitatory, inhibitory, and neuromodulatory neurons all contribute to brain-wide increases in neural activity during walk. Aimon et al. extend their observations of brain-wide activity to reveal that activity in some inferior brain regions is more correlated with walk than in other brain regions. The authors further analyzed their imaging dataset to identify candidate brain regions and cell types that may be important for walking behavior, which will be useful in hypothesis generation in future studies. Finally, the authors show that brain-wide activity is similar between spontaneous and forced walk and that severing the connection between the ventral nerve cord and central brain abolishes walk-related increases in brain activity. These results suggest that increases in brain-wide activity during walking may be largely attributed to sensory and proprioceptive feedback ascending to the central brain from the ventral nerve cord rather than to top-down executive and motor control programs. The observations presented in this study suggest hypotheses that may be tested in future studies.

      Strengths: This paper presents a rich imaging dataset that is well-analyzed and cataloged, which will be valuable for researchers who use this paper for future hypothesis generation. The comparison of many different reagents, imaging speeds, and behavioral conditions suggests that the observed increases in brain-wide activity during walking are quite robust to imaging methods in adult fruit flies.

      Weaknesses: This study is largely observational, and the few experimental manipulations presented are insufficient to support the author's broad claims about the generation of brain-wide neural activity.

      We thank the reviewer for their assessment and have toned down claims throughout the paper accordingly.

      Notably, the authors suggest that their image analysis can reveal individual cell types that are important for walking by matching their morphologies to registered components from whole-brain imaging experiments. While these predictions are a useful starting point for future experiments, they have not convincingly shown that their method can identify individual cell types in genetic reagents with more restricted expression patterns. Adding further validation to show that genetically subtracting the candidate neurons from the overall expression pattern of the calcium indicator abolishes that component from the response would strengthen this claim. Furthermore, imaging the matched candidate neuronal cell type to show that it recapitulates the activity dynamics of the proposed component would add additional evidence.

      We agree that the correspondence to specific neuron types is often very speculative. We have clarified this throughout the manuscript. There are a few exceptions where the neurons we discuss are the only known neurons in a specific GAL4 expression pattern in a given region, and where we find the exact anatomical pattern matching these neurons’ anatomy. Together, this makes us quite confident that the activity results indeed from these neurons. However, the experiments proposed by the reviewer would be interesting complementary approaches. We believe, however, that abolishing activity in one neuron will be difficult to interpret regarding the neuron type as it would affect the activity of other neurons in the network (which is, in our opinion, an interesting point and research direction). Nevertheless, we plan to perform such experiments and experiments looking at the activity in more restricted drivers in the future.

      In addition, increases in neural activity prior to walk onset in specific brain regions are intriguing but insufficient to demonstrate the neurons in these regions trigger walking. This claim should await further studies that employ targeted and acute manipulation of neural activity, as noted by the authors. Furthermore, that activity in these brain regions is significantly increased prior to walk onset awaits more rigorous statistical testing, as do the authors' claims that spontaneous versus forced walking alters these dynamics. The suggestion that walking increases brain-wide activity via feedback from the ventral nerve cord is an interesting possibility and would also benefit from additional experimental validation. Activating and silencing neurons that provide proprioceptive feedback from the legs and determining the effect of this manipulation on brain-wide neural activity would be a good starting point.

      We have removed claims of causality in the result section. We have also added a statistical test for activation before walk onset. Activating and silencing proprioceptive neurons from the legs would be interesting follow up experiments although it is likely to affect walking. Nevertheless, we are planning to carry out such experiments in the future. We have added this point in the discussion.

      Reviewer #3 (Public Review):

      Aimon and colleagues investigated brain activity in flies during spontaneous and forced walking. They used light-field microscopy to image calcium activity in the brain at high temporal resolution as the animal walked on a ball and they used the statistical inference methods PCA and ICA to tease out subregions of the brain that had distinct patterns of activity. They then sought to relate those patterns to walking. Most interesting are the experiments they performed comparing forced walking to spontaneous walking because this provides a framework to generate hypotheses about which aspects of neural activity are reporting the animal's movements versus generating those movements. The authors identify subregions and neuron types that may be involved in generating vs reporting walking. Their analysis is reasonable but could be further strengthened with a more powerful statistical framework that explicitly considered the multiple hypotheses being tested. More broadly, the work serves as a starting point to investigate the role of different regions in the brain and should spur follow-up investigations that involve more perturbative approaches in addition to the correlative approaches presented here.

      We thank the reviewer for their overall positive assessment of our work and fully agree with the conclusion of its current limitations.

    1. Author Response:

      Reviewer #1 (Public Review):

      Tomasi et al. performed a combination of bioinformatic, next-generation tRNA sequencing experiments to predict the set of tRNA modifications and their corresponding genes in the tRNAs of the pathogenic bacteria Mycobacterium tuberculosis. Long known to be important for translation accuracy and efficiency, tRNA modifications are now emerging as having regulatory roles. However, the basic knowledge of the position and nature of the modifications present in a given organism is very sparse beyond a handful of model organisms. Studies that can generate the tRNA modification maps in different organisms along the tree of life are good starting points for further studies. The focus here on a major human pathogen that is studied by a large community raises the general interest of the study. Finally, deletion of the gene mnmA responsible for the insertion of s2U at position 34 revealed defects in in growth in macrophage but in test tubes suggesting regulatory roles that will warrant further studies. The conclusions of the paper are mostly supported by the data but the partial nature of the bioinformatic analysis and absence of Mass-Spectrometry data make it incomplete. The authors do not take advantage of the Mass spec data that is published for Mycobacterium bovis (PMID: 27834374) to discuss what they find.

      Important points to be considered:

      1) The authors say they took a list of proteins involved in tRNA modifications from Modomics and added manually a few but we do not know the exact set of proteins that were used to search the M. mycobacterium genome.

      Thank you for pointing out this issue. We will add the complete list of proteins used for the BLAST query.

      2) The absence of mnmGE genes in TB suggested that the xcm5U derivatives are absent. These are present in M. bovis (PMID: 27834374). Are the MnmEG gene found in M. bovis? If yes, then the authors should perform a phylogenetic distribution analysis in the Mycobacterial clade to see when they disappeared. If they are not present in M. bovis then maybe a non-orthologous set of enzymes do the same reaction and then the authors really do not know what modification is present or not at U34 without LC-MS. The exact same argument can be given for the xmo5U derivatives that are also found in M.bovis but not predicted by the authors in M. tuberculosis.

      The reviewer raises a valid point. In M. bovis mnm5U and cmo5U derivatives were observed in LC-MS analysis. However, we did not identify candidate genes known to be involved in the biogenesis of mnm5U and cmo5U in the Mycobacteriaceae, including M. bovis and Mtb, suggesting that if these modifications are indeed present, they are not synthesized through a canonical biogenesis pathways in this family. There are several examples where the same modification is generated by distinct modification enzymes (Kimura, 2021). These observations raise the interesting possibility that in the Mycobacteriaceae and most species in actinomycetota (except for Bifidobacterium, Corynebacterium and Rhodococcus species), major wobble modifications are generated by biosynthesis pathways that are distinct from those employed by well-characterized organisms. Future studies will examine this hypothesis.

      3) Why is the Psi32 predicted by the authors because of the presence of the Rv3300c/Psu9 gene not detected by CMC-treated tRNA seq while the other Psi residues are? Members of this family can modify both rRNA and tRNA. So the presence of the gene does not guarantee the presence of the modification in tRNAs

      Thank you very much for the careful read. We did not include RluA in the list of query proteins because it is not classified as a tRNA modification enzyme in Modomics. Additionally, the CMC-coupled tRNA-seq is imperfect for detection of all pseudouridylated positions. Due to this limitation, we only assigned modifications that are both predicted by the presence of putative biosynthetic enzymes and RT-derived signatures. As the reviewer points out, we cannot rule out that this homolog targets only rRNAs. We will clarify this possibility in the revised manuscript. Also, RluA will be added to the query and the name of Rv3300c will be changed to RluA in the text and related figures.   

      4) What are tsaBED not essential but tsaC (called sua5 by the authors) essential?

      Thank you for pointing out this interesting observation. We are also curious about differences in the essentiality among t6A biogenesis genes. We speculate that TsaC potentially has critical roles in cell viability other than t6A synthesis. TsaC synthesizes a compound, threonylcarbamoyl-AMP, as an intermediate for t6A biogenesis. Thus, it is possible that this intermediate has a role in other essential cellular activities besides t6A biogenesis. Further study of these factors in Mtb could reveal interesting crosstalk between modification synthesis and other cellular activities.

      Reviewer #2 (Public Review):

      In this study, Tomasi et al identify a series of tRNA modifying enzymes from Mtb, show their function in the relevant tRNA modifications and by using at least one deleted strain for MnmA, they show the relevance of tRNA modification in intra-host survival and postulate their potential role in pathogenesis.

      Conceptually it is a wonderful study, given that tRNA modifications are so fundamental to all life forms, showing their role in Mtb growth in the host is significant. However, the authors have not thoroughly analyzed the phenotype. The growth defect aspect or impact on pathogenesis needs to be adequately addressed.

      - The authors show that ΔmnmA grows equally well in the in vitro cultures as the WT. However, they show attenuated growth in the macrophages. Is it because Glu1_TTC and Gln1-TTG tRNAs are not the preferred tRNAs for incorporation of Glu and Gln, respectively? And for some reason, they get preferred over the alternate tRNAs during infection? What dictates this selectivity?

      Thank you very much for raising this excellent point. As the reviewer suggests, the attenuation of DmnmA Mtb growth inside of macrophages could be caused by disparate codon usage between genes required for in vitro growth and intracellular growth. Among multiple codons encoding Glu, Gln, or Lys, s2U modification-dependent codons might be preferentially distributed in genes associated with intracellular growth. For example, Mtb has two tRNA isoacceptors, Glu1_TTC and Glu2_CTC, to decipher two Glu codons, i.e., GAA and GAG. According to the wobble pairing rule, GAA is only decoded by Glu1_TTC, whereas GAG is decoded by both Glu1_TTC and Glu2_CTC; i.e., GAG can be deciphered by an s2U-independent tRNA. Thus, genes required for intracellular growth might be enriched with GAA, an s2U-dependent codon. The same thing can happen to other Gln and Lys codons deciphered by s2U-containing tRNAs. In the revised manuscript, we will include the perspective of codon usage for explaining the intracellular fitness defect of the ΔmnmA Mtb mutant.

      - As such the growth defect shown in macrophages would be more convincing if the authors also show the phenotype of complementation with WT mnmA.

      The reviewer raises a valid point. We note however, that Rv3023c, a putative transposase, is downstream of MnmA and unlike MnmA, Rv3023c appears to be dispensable for in vivo growth, according to the Tn-seq database. Therefore, it is likely that the intracellular growth defect is caused by loss of mnmA.

      An important consideration here is the universal nature of these modifications across the life forms. Any strategy to utilize these enzymes as the potential therapeutic candidate would have to factor in this important aspect.

      This is a valid point. Targeting a pathogen-specific system enables avoidance of the adverse side effects caused by many therapeutic reagents. There are a couple of Mtb modification enzymes that are specific to bacteria and critical for Mtb fitness (e.g., TilS). These enzymes represent ideal potential therapeutic targets to suppress Mtb intracellular growth.

      Reviewer #3 (Public Review):

      The work presented in the manuscript tries to identify tRNA modifications present in Mycobacterium tuberculosis (Mtb) using reverse transcription-derived error signatures with tRNA-seq. The study identified enzyme homologs and correlates them with presence of respective tRNA modifications in Mtb. The study used several chemical treatments (IAA and alkali treatment) to further enhance the reverse transcription signals and confirms the presence of modifications in the bases. tRNA modifications by two enzymes TruB and MnmA were established by doing tRNA-seq of respective deletion mutants. Ultimately, authors show that MnmA-dependent tRNA modification is important for intracellular growth of Mtb. Overall, this report identifies multiple tRNA modifications and discuss their implication in Mtb infection.

      Important points to be considered:

      - The presence of tRNA-based modifications is well characterised across life forms including genus Mycobacterium (Mycobacterium tuberculosis: Varshney et al, NAR, 2004; Mycobacterium bovis: Chionh et al, Nat Commun, 2016; Mycobacterium abscessus: Thomas et al, NAR, 2020). These modifications are shown to be essential for pathogenesis of multiple organisms. A comparison of tRNA modification and their respective enzymes with host organism as well as other mycobacterium strains is required. This can be discussed in detail to understand the role of common as well as specific tRNA modifications implicated in pathogenesis.

      The reviewer raises a fair point. However, with the exception of Chionh et al., the other studies cited here are not genome-wide characterization of tRNA modification. We will add a discussion of the distribution of tRNA modification enzymes across multiple mycobacterium species and the implications of this distribution for pathogenesis to the revised manuscript.

      - Authors state in line 293 "Several strong signatures were detected in Mtb tRNAs but not in E. coli". Authors can elaborate more on the unique features identified and their relevance in Mtb infection in the discussion or result section.

      Thank you for the suggestion. We will lengthen the discussion of the RT-derived signatures observed in Mtb but not in E. coli but the relevance of these modifications for Mtb pathogenicity remains speculative at this point.

      - Deletion of MnmA is shown to be essential for E. coli growth under oxidative stress (Zhao et al, NAR, 2021). In similar lines, MnmA deleted Mtb suffers to grow in macrophage. Is oxidative stress in macrophage responsible for slow Mtb growth?

      This is an excellent hypothesis which we will raise in the revised manuscript.

      - Authors state in line 311-312 "Mtb does not contain apparent homologs of the tRNA modifying enzymes that introduce the additional modifications to s2U". This can be characterised further to rule out the possibility of other enzyme specifically employed by Mtb to introduce additional modification.

      The reviewer raises a valid point. As discussed above (Reviewer #1, pt 2), Mtb may employ distinct enzymes to generate certain tRNA modifications. Future mass spec-based analyses of Mtb tRNAs will be carried out to identify the precise chemical structure of the sulfurated uridine, and subsequent studies will attempt to determine the enzymes that account for the biogenesis of these modifications.

    1. Author Response

      Reviewer #1 (Public Review):

      This refinement of their model, coupled with the demonstration that the Sis1 J protein chaperone does not appear to play a direct role in the inactivation phase of the HSR, provide a significant advance over their earlier work.

      We are pleased that the reviewer is satisfied that our new results represent a significant advance.

      A main weakness is that while the evidence that Sis1 is important for fitness of heat-stressed yeast cells is reasonable, exactly how Sis1 achieves this is not clear. In a single sentence the authors suggest that Sis1 might be an orphan ribosome chaperone, partly based on its nucleolar localization, but provide no evidence for this. If this were true, then one might expect a reduction in ribosome content under stress conditions (because there are more ORPS to take care of because of translation stalling?) and a decreased rate of protein synthesis (yes, this happens, how much this is due to overall translation suppression vs there being less ribosomes to translation things, is unknown and hard to test), which could be tested. Some further insights into this more general role of Sis1 would strengthen the authors' conclusions.

      We would like to make a distinction between the important biochemical roles for Sis1 in the cellular response to heat shock – which we explore elsewhere – and the role we are investigating here for the regulation of Sis1 expression by Hsf1. For new insights into the functional role of Sis1 as a chaperone for orphan ribosomal proteins, please see our recent preprint (Ali et al., https://www.biorxiv.org/content/10.1101/2022.11.09.515856v1). Here, we have focused on how Sis1 transcriptional regulation promotes fitness. Please see above for the description of the new mechanistic insight we have into the role of Sis1 expression tuning in controlling stress granules.

      Moreover, whether Sis1 plays a general role in the fitness of cells under stress has not been firmly established, i.e., is its mechanistic role the same in heat shock conditions and under nutrient stress conditions? Without knowing the mechanistic basis for how Sis1 maintains the fitness of heat-stressed cells, it is not possible to conclude that the same mechanism is at play in cells grown on a non-preferred carbon source.

      As described above, we have now provided evidence that the inability to properly tune Sis1 expression levels in the 2xSUP35-SIS1 strain results in disrupted stress granule homeostasis, linking a known function of Sis1 to a known process driven by nutrient stress.

      Figure 4: This is an ingenious experiment to study the subcellular localization of newly synthesized Sis1 in response to heat shock, compared to that of the heat-shock inducible Hsp70 Ssa1. However, based on the images presented in panel B it is hard to know how discrete the subnuclear distributions of Sis1 and Ssa1 really are, and ideally what is needed is to be able to analyze their localizations when both tagged proteins are expressed in the same cell, although this would obviously not be possible using the halo-tagged protein system. In addition, one would like to know the localization of Hsf1 in the cell at the same time. As it stands, these data seem overinterpreted, and it remains possible that some other event such as an inactivating post-translational modification of Sis1 under heat shock conditions might be involved in inactivating its function.

      To address this concern, we constructed two new imaging strains expressing Hsf1-mVenus/Halo-Sis1 and Hsf1-mVenus/Halo-Ssa1 (Hsp70) and used pulse-labeling followed by live lattice light sheet 3D imaging to resolve the subcellar localization of newly synthesized Sis1 and Hsp70 with respect to Hsf1 over a heat shock time course. Unfortunately, we cannot monitor newly induced Sis1 and newly induced Hsp70 simultaneously in the same cells with the HaloTag pulse labeling system. We found that a significantly greater fraction of newly synthesized Hsp70 colocalizes with Hsf1 than new Sis1. Thus, while we cannot directly image new Sis1 and Hsp70 in the same cell, we clearly observe a differential localization pattern with respect to Hsf1. These data are included in the revised Figure 4.

      One way to establish whether Sis1 nucleolar sequestration prevents it from acting on Hsf1 during the inactivation phase of the HSR would be to selectively disrupt its nucleolar localization signal eliminated while retaining its nuclear localization and determine how expression of such a mutant perturbed the inactivation kinetics of the HSR.

      Unfortunately, there is no known Sis1 nucleolar localization signal that we could use in the experiment you propose. In the preprint described above, we show that direct interactions with oRPs recruit Sis1 to the nucleolar periphery, but we do not yet know binding to oRPs is competitive with binding to Hsf1.

      Reviewer #2 (Public Review):

      This study aims to provide a needed update and validation of a previously outlined mathematical model that describes HSR/Hsf1 regulation. The purpose of the update is to incorporate the impact of newly translated proteins as negative regulators of Hsf1 following heat shock. A requirement for ongoing translation to mount the HSR and activate Hsf1 has been described in several recent studies. Moreover, the study addresses the role of the Hsp70 cochaperone Sis1 in HSR regulation, including its potential function in negative feedback regulation following heat-shock.

      The main strength of the study is that it combines quantitative modeling with a well-defined experimental system to generate data. Overall, the model appears to accurately reflect the behavior of HSR under the employed experimental conditions and provides and elegant example of a formalized model for this simple regulatory circuit. Another strength of the study is that it addresses the functional involvement of Sis1 in HSR/Hsf1 regulatory mechanisms and rules out Sis1 involvement in negative feedback regulation of Hsf1 following heat shock. This finding is of importance in light of the complexity of Sis1 involvement in HSR/Hsf1 regulation suggested by the literature. The authors also document a need for endogenous SIS1 promoter regulation during growth on non-fermentable carbon sources.

      The study is important for the advancement of Hsf1 research and it may provide inspiration for the study of other chaperone-titrated transcriptional mechanisms such as the UPR or bacterial stress sigma factors.

      We thank the reviewer for the generous evaluation.

      Reviewer #3 (Public Review):

      This paper follows other excellent work from the Pincus laboratory detailing the molecular mechanisms of Hsf1 regulation and extending experimental observations into predictive mathematical models. Overall, the work is top-quality, however, the findings are incremental in nature with respect to our understanding of the HSR and refine existing models rather than break new experimental or conceptual ground. Additionally, the relevance of the non-fermentable carbon source growth phenotype for the 2XSUP35pr-SIS1 strain is unclear with respect to HSR regulation.

      We thank the reviewer for this fair assessment of the work.

    1. Author Response

      Reviewer #1 (Public Review):

      Pelentritou and colleagues investigated the brain’s ability to infer temporal regularities in sleep. To do so, they measured the effect on brain and cardiac activity to the omission of an expected sound. Participants were presented with three different categories of sounds: fixed sound-to-sound intervals (isochronous), fixed heartbeat-to-sound intervals (synchronous), and a control condition without any regularity (asynchronous). When omitting a sound, they observed a difference in the isochronous and synchronous conditions compared to the control condition, in both wakefulness and sleep (NREM stage 2). Furthermore, in the synchronous condition, sounds were temporally associated with sleep slow waves suggesting that temporal predictions could influence ongoing brain dynamics in sleep. Finally, at the level of cardiac activity, the synchronous condition was associated with a deceleration of cardiac frequency across vigilance states. Overall, this work suggests that the sleeping brain can learn temporal expectations and responds to their violation.

      We thank the reviewer for the very useful and informed comments, to which we carefully reply below.

      Major strengths and weaknesses:

      The paradigm is elegant and robust. It represents a clever way to investigate an important question: whether the sleeping brain can form and maintain predictions during sleep. Previous studies have so far highlighted the lack of evidence for predictive processes during sleep (e.g. (Makov et al., 2017; Strauss et al., 2015; Wilf et al., 2016)). This work shows that at least a certain type of prediction still takes place during sleep.

      However, there are some important aspects of the methodology and interpretations that appear problematic.

      (1) The methodology and how it compares to previous articles would need to be clarified. For example, the Methods section indicates that the authors used a right earlobe electrode as a reference. This is quite different from the nose reference used by SanMiguel et al. (2013) or in Dercksen et al. (2022). This could affect the polarity and topographies of the OEP or AEP and thus represents a very significant difference. Likewise, SOs are typically detected in a montage reference to the mastoids. Perhaps the left/right asymmetries present in many plots (e.g. Figure 3) could be due to the right earlobe reference used.

      We thank the reviewer for raising this important point which has prompted us to clarify the reference choice in the manuscript both for completing the information about data recordings in our experiment and for emphasizing the influence of the reference on the EEG results and how they compare to previous reports.

      First, we would like to clarify that although EEG data is referenced to the right earlobe online, electrophysiological data from both earlobes were acquired and offline re-referencing to paired earlobes was performed. This is now clarified in the Methods section on page 26, lines 648-651 as follows:

      ‘Continuous EEG (g.HIamp, g.tec medical engineering, Graz, Austria) was acquired at 1200 Hz from 63 active ring electrodes (g.LADYbird, g.tec medical engineering) arranged according to the international 10–10 system and referenced online to the right earlobe and offline to the left and right ear lobes.’

      Additionally, after preprocessing, we performed common average re-referencing, as is common practise and recommended in the literature (see e.g. Niso et al., 2022), and hence the initial online referencing is no longer of relevance. Nonetheless, we agree with the reviewer that different online and offline referencing schemes could explain why some results in the literature are not optimally reproducible. We have clarified this point in the discussion on page 17, lines 408-411 as follows:

      ‘Finally, while we used largely similar pre-processing (i.e. filters) and experiment implementation (i.e. online and offline reference) as in Chennu et al. (2016), this was not the case for other studies with which direct comparisons are unwarranted.’

      For the SO analysis chosen reference (linked earlobes online and common average offline in our case) we acknowledge that - as the reviewer mentioned - many groups indeed employ mastoid re-referencing for SO detection (e.g. Siclari et al., 2018; Schneider et al., 2020; Ameen et al., 2022). However, to the best of our knowledge, this is not a standard choice, as many other groups choose a linked earlobe reference for online SO detection and the mastoids only for offline SO detection (Ngo et al., 2013; Besedovsky et al., 2017; Ngo and Staresina, 2022). In addition, other recent studies used linked earlobe referencing (Bouchard et al., 2021) or common average re-referencing (Züst et al., 2019) for offline SO detection. In our study we opted for using the same average reference for SO detection and evoked potential analysis in order to be able to relate the results of the omission evoked response comparison to that of the SO analysis.

      Also, the authors did not use the same filters in wakefulness and sleep, which could introduce an important bias when comparing sleep and wake results or sleep results with previous wake papers.

      We fully agree with the reviewer and thank him/her for this suggestion. We have now re-analysed the wakefulness data using a bandpass filter of 0.5-30 Hz as used for the sleep data. The chosen filtering range is commonly used in sleep research. Moreover, Chennu et al. (2016) employed a very similar filtering range (0.5-25 Hz) in an omission EEG study, whose results are similar to ours (Chennu et al., 2016). This new preprocessing resulted in a higher number of valid trials (average trial number: before N=245, now N=286) in wakefulness. Hence, the data from more participants could be used (before N=21, now N=23) and the statistical power of observed differences in our comparisons was improved. The Methods section has been updated accordingly on page 31, lines 763-764 as follows:

      ‘Continuous raw EEG data were band-pass filtered using second-order Butterworth filters between 0.5 and 30 Hz for the wakefulness and sleep session.’

      (2) The ERP to sound omission shows significant differences between the isochronous and asynchronous conditions in wakefulness (Figure 3A and Supp. Fig.) but this difference is very different from previous reports in wakefulness. Topographies are also markedly different, which questions whether the same phenomenon is observed. For example, SanMiguel and colleagues observed an N1 in response to omitted but expected sounds. The authors argue that they observe a similar phenomenon in the iso vs baseline contrast, but the timing and topography of their effect are very different from the typical N1. The authors also mention that, within their study, wake and N2 OEPs were "largely similar" but they differ in terms of latencies and topographies (Figure 3A-B). It would be better to have a more objective way to explore differences and similarities across the different analyses of the paper or with the literature.

      We concur with the reviewer and reviewing editor, who both pointed that the way we previously analysed (see our reply to the reviewer’s previous comment) and reported our data was sub-optimal. The new analysis of the wake data reveals more similarities with the MMN and to some extend with the omission literature (Figure 4). As requested, we also improved the description of the comparison of our results to those from the literature, in the Discussion section (pages 17-19, lines 391-458).

      (3) The authors applied a cluster permutation to identify clusters of significant time points. However, some aspects of this analysis are puzzling. Indeed, the authors restricted the cluster permutation to a temporal window of 0 to 350ms in wake (vs. -100 to 500ms in sleep). This can be misleading since the graphs show a larger temporal window (-100 to 500ms). Consequently, portions of this time window could show no cluster because the analysis revealed an absence of significant clusters but because the cluster permutation was not applied there. Besides, some of the reported clusters are extremely brief (e.g. l. 195, cluster's duration: 62ms), which could question their physiological relevance or raise the possibility that some of these clusters could be false positives (there was no correction for multiple comparisons across the many cluster permutations performed). Finally, there seems to be a duplication of the bar graphs showing the number of significant electrodes in the positive and first negative cluster for Figure 2 Supp. Fig. 1.

      We thank the reviewer for raising this point. We have now performed cluster permutation statistical analysis over the entire -100 to 500 ms window in wakefulness, thus matching the temporal window used for the sleep data (Methods, page 34, lines 843-846). Please note that this modified temporal window was applied to the wake data for which the pre-processing had also been modified (see our reply to comment #1 above). With matching analysis for wakefulness and sleep, we now identify clusters of higher or similar significance compared to our earlier results (Cohen’s d for isoch vs asynch = 0.92 now and 0.67 before; for synch vs asynch = 0.91 now and 1.06 before). In addition, for the isoch vs asynch omission response comparisons, overlapping cluster periods are identified in wakefulness (114-159 ms) and sleep (85-223 ms). The relevant results are thoroughly described on pages 9-10, lines 202-210; page 11, lines 238-251, pages 38-39, lines 970-985.

      We would like to also mention that while multiple comparisons correction is performed across channels and electrodes in the EEG using cluster permutation statistics, it is true that we do not perform multiple comparisons correction across the many comparisons. We now explicitly mention the lack of this correction for multiple comparisons in the Methods section page 34, lines 840-843 as follows:

      ‘Of note, the cluster permutation based multiple comparisons correction only applied across channels and latencies when comparing two experimental conditions, however no multiple comparisons correction was applied across the number of comparisons made in this study.’

      (4) More generally, regarding statistics, the absence of exact p-values can render the interpretation of statistical outputs difficult. For example, the authors report a significant modulation of the sound-to-SO latency across conditions (p<0.05) but no significant effect of heartbeat peak-to-SO latency (p>0.05). They interpret this pattern of results rather strongly as evidence that the "readjustment of SOs was specific to auditory regularities and not to cardiac input". Yet, examining the reported chi-square values show very close values between the two analyses (7.9 vs. 7.4). It seems thus difficult to argue for a real dissociation between the two effects. Providing exact p-values for all statistical tests could help avoid this pitfall.

      To assist the interpretation of statistical analysis results, we have now included exact p-values.

      Specifically, for SOs, we agree with the reviewer on the highly similar chi-squared values for the two analyses of Sound onset to SO peak and R peak onset to SO peak and have now included a comment in the discussion to reflect this on page 20, lines 478-480 as follows:

      ‘However, it should be noted that although not significant, we observed a trend of lower R peak to SO peak latencies during cardio-audio regularity compared to the other auditory conditions, possibly driven by the fixed relationship between heartbeat and sound in the synch condition.’

      Reviewer #2 (Public Review):

      This study was designed to study the cortical response to violations in auditory temporal sequences during wakefulness and sleep. To this end, the study had three levels of temporal sequence, a regular temporal sequence, an auditory tone that was yoked to the cardiac signal, and an irregular tone. The authors show significant EEG differences to an omitted tone when the auditory tone was predictable both during wakefulness and sleep.

      The authors analyze the ERP to the omitted tone as well as when aligned to the R-peak of the HEP. The analysis was comprehensive and the effects reported align with the interpretation given. Of particular interest was the fact that a deceleration of the heart rate was present for omissions when the auditory tone was yoked to the R-peak (synch) in all stages of wakefulness and sleep.

      We thank the reviewer for his/her positive judgment.

      However, one weakness was the rationale for the current study and how the results link to current theoretical frameworks for the role of interoception in perception and cognition. This was in contrast to the clear background and explanation to study the response to omissions for a predictable auditory sequence in wakefulness and sleep. It was unclear why the authors selected the cardiac signal to yoke their auditory stimuli. What is the specific motivation for the cardiac signal rather than the respiratory signal? This was not clear.

      In the revised Introduction section, we improved our description of these aspects, including the interaction between interoception and external stimulus processing. We hypothesized that cardiac signals would be more relevant than respiratory signals in coordinating temporal expectation because of existing prior experimental evidence thereof, as well as data showing a modulation of the neural response to heartbeat by levels of vigilance/consciousness, and the sharp cardiac R peak offering an ideal candidate for online temporal locking to administered sounds (see our detailed reply to the reviewer’s comment #2 below). However, we cannot exclude that respiratory signals could also be used by the brain to assist temporal regularities detection.

      Future studies may test for this possibility.

    1. Author Response

      Reviewer #1 (Public Review):

      Kozol et al adapt an important tool, in the form of the atlas, to the Astyanax research community. While broadly the atlas appears to correctly identify large brain regions, it is unclear what is the significance of the finer divisions. The external confirmations are restricted to just a few large brain regions (by independent human observer: e.g., optic tectum, hypothalamus. By molecular marker: hypothalamus only.). As such, interpretations of results from as many as 180 small subregions should be interpreted sceptically.

      The authors also suggest that some brain regions have increased in size during cavefish evolution (e.g., hypothalamus, subpallium). The analysis of progeny from a genetic cross of cave and surface morphs suggest a complex genetic program has evolved to control this variant set of brain structures. With the development of genetic manipulation tools in this species, an exciting series of experiments may link causal variants with brain development differences.

      MAJOR ISSUES

      Line 85+. Segmentation accuracy is not well established by the authors. For example, Figure S2 states that the pixel correlation is high between Astyanax populations. But the details of how this cross-correlation was done are sparse. Is the Y- axis here showing the fraction of pixels that are shared in the morphs? While the annotation appears to function similarly across morphs, the 80% machine:human correlation is difficult to put into context. On the one hand, this seems low. For what values should one strive? Are there common "mistakes" or differences in human & machine annotations that lead to certain regions being excluded? A discussion of these is warranted and will be useful to others who wish to use this approach.

      Line 87. "such as" is misleading since these were the only two antibodies used to confirm molecular definitions of regions.

      But more to the point, additional markers should be used to confirm more than just the ISL+ hypothalamic divisions.

      This is particularly warranted, as Fig 1d is not convincing. I believe that the yellow label is ISL; this is difficult to see in the figures. ISL is not ideal since this is widespread in the hypothalamus. There are no ISL-negative regions depicted, which would be necessary to demonstrate that the resolution of this subregion labeling tool is high. A complementary approach would be to find molecular markers that are more restricted than ISL which label only subsets of hypothalamic regions.

      Finally, do the mid/hindbrain ISL labeled regions correspond to known ISL+ subregions?

      We agree with the reviewer that the Islet1/2 assessment was insufficient for demonstrating automated segmentation accuracy and that the labeling was difficult to visualize in the previous version of the figure. We have addressed this reviewers concern by adding new molecular markers for verification of segment accuracy and through a modified presentation of the original data. The first, and in our opinion most convincing, is the addition of more markers of known neuroanatomical regions. This required not only adding extra antibody stains to our brain atlas, but also optimizing Hybridization Chain Reaction (HCR) in situ protocol that could be coupled with immunohistochemistry, permitting automated segmentation via total ERK registration and brain atlas inverse registration. This novel protocol showed corresponding localization of markers, such as 5-hydroxytryptamine (5-HT), gastrulation brain homeobox 1 (gbx1), and oxytocin (oxt), in the expected neuroanatomical areas. It should be noted that these markers included both large neuroanatomical areas as well as small, well-defined areas such as the superior, and also labeled disparate neuroanatomical loci throughout the brain. We also modified our original figure to better illustrate the regions that islet+ staining labeled. These markers show that islet1/2 labels precise regions of the hypothalamus which correspond to known expression patterns. The updated methodology can be found in lines 422- 440, while the results can be found in lines 105-118 of the text, Figure 1 and Figure 1 – Figure Supplement 1a.

      We believe these two changes address the reviewers concerns, and suggest that the neuroanatomical labels generated in this study faithfully label the Astyanax brain.

      The molecular and human-observed confirmations of brain regions suggests that the annotated borders of gross anatomical regions are correctly identified by the algorithm. However, data is not presented that indicates whether the smaller regions correspond to biologically meaningful compartments.

      We agree with the reviewer that our assessment of regional accuracy for automated segmentation necessitated additional markers, which labeled smaller, more refined compartments. To address this, we developed an HCR in situ hybridization strategy that was compatible with our brain atlas, and used several markers that label smaller regions, such as the 5-HT positive neurons of the dorsal raphe and oxytocin positive neurons of the medial preoptic region. Together, these results were consistent with our previous finding that anatomical regions confirmed by human- observation and molecular staining did faithfully label the correct regions of the brain. These findings can be found in lines 105-118 in the text, along with Figure 1 and Figure 1 – Figure Supplement 1a-d. Together, we hope this shows that not only large neuroanatomical areas, but also finer areas are correctly labeled by CobraZ.

      Parameters used in CobraZ to perform the segmentation are not defined. More transparency is required here for others to replicate.

      We agree with the reviewer that parameters used for CobraZ and Advanced Normalization Tools (ANT) are necessary for reproducibility of our results. We have since added sentences to clarify that we did not change the original ANTs or CobraZ parameters from Gupta et al. 2018. (line 474- 475) and have added the CobraZ parameter file and ANTs bash scripts to our dryad depository.

      Reviewer #3 (Public Review):

      In this manuscript the authors use novel techniques and analytical methods on an up and coming animal model for brain evolution. The manuscript utilizes the cavefish Astyanax mexicanus, which can provide future important insights into the field of neurobiology and in evolution in general.

      The authors however, only argue that Astyanax is a powerful system for functionally determining basic principles of brain evolution (which clearly it will be), but fail to actually describe what brain evolution insights Astyanax gives. The data is in the paper, but the interpretation needs refinement. This would be a much more valuable paper with a thorough evolutionary context based on the already existing, extensive literature. I believe this manuscript has the potential to be extremely impactful.

      We thank the reviewer for her positive critique of our manuscript, and more broadly for the thoughtful comments, the challenge to re-evaluate the way we have thought about our own data, and for hinting us in a direction of scientific direction that is more impactful. We have spent a lot of time re-thinking this work to address this reviewers critique, and believe that it is a far better study for it.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of this manuscript aimed to systematically evaluate the pleiotropic effects of MCR-1-mediated colistin resistance. They evaluated the effect of MCR-1 and MCR-3 carried on different plasmids on antimicrobial peptides (AMPs) and assessed their ultimate effect on virulence. The authors find that MCR-1-mediated colistin resistance correlates with increased resistance against some host AMPs, but also increased sensitivity to others. The authors also find that MCR-1 alone is associated with resistance to human serum and to elements of the complement system. This highlights a potential selective advantage for MCR-1-mediated resistance to host immune factors and a potential for enhanced virulence.

      The methods have been well established before and adequately support their main findings. While determining the role of MCR-1 in a single genetic background is important to better understand its potential pleiotropic effects against a diversity of AMPs and in a variety of scenarios, the impact and significance of the results are partially ameliorated because different genetic backgrounds, particularly those most relevant to a clinical (or agricultural) context were not considered. The results depicted here are still a necessary and important step towards a more comprehensive understanding of the pleiotropic effects of MCR-1. But, interactions between plasmids and host genomes and their co-evolution can have important effects more generally. The authors do mention this in the discussion and suggest it to be an important avenue for future work. However, given the objective of the study and the clinical and agricultural context in which the authors have framed their work, it seems more relevant to include those distinct genetic backgrounds already here.

      The conclusions stemming from the results found in Figure 3, and Figures 4c and d seem too overreaching to me. The associated resistance to AMPs from pigs seems to be only strong enough against one of the five tested AMPs and hence concluding that these impose a strong selective pressure in the pig's gut seems unsubstantiated. Similarly, the difference in survival probability within their in vivo system, though statistically significant, seems to be very ild between their MCR-1 and empty vector control.

      Thank you for the comment. We agree on the effect of MCR-MOR on AMP susceptibility and have edited the paragraph by removing the lines on strong selective pressure in the pig gut. As regards the 4c and 4d results (4e and 4f in the revised version), it is interesting and statistically convincing that MCR increases bacterial virulence despite the cost of MCR expression. And importantly, this effect is even stronger in the case of LPS treatment where the immune system is stimulated, expressing diverse host AMPs (PMID: 19897755). This shows MCR-mediated advantages to bacteria in the complex host environment.

      Reviewer #2 (Public Review):

      Jangir et al test the hypothesis that resistance to the antimicrobial peptide (AMP) colistin can simultaneously increase resistance to other AMPS with related modes of action. Because AMPS comprise part of innate immunity, their central concern is that colistin resistance may compromise host defenses and thereby increase bacterial virulence. Their results show that MCR-1, whether expressed from naturally circulating or synthetic plasmids, can increase the MIC to AMPS from humans, pigs, and chickens, and impart fitness benefits at sub-MIC concentrations. In addition, they find that MCR-1-containing strains have increased survival in human plasma and are more lethal in an insect infection model.

      The conclusions of the paper are generally well supported by the results, but some aspects could be clearer and better defended with a few small additional experiments.

      Strengths:

      Using both synthetic and natural plasmids makes it possible to cleanly separate the effects of MCR-1 from the effects of other plasmid-borne genes or plasmid copy numbers. This helps confirm the causal role of MCR-1 on altered AMP susceptibility.

      Testing the survival of transformed isolates in human serum and in insects points to relevance in the more immunologically complex host environment where cells are exposed to a suite of factors that reduce bacterial survival.

      Thank you!

      Weaknesses/suggestions:

      Although increases in MIC are evident for different AMPS, the effects are generally modest. To address this, it might be helpful to use pairwise competition assays, as in Figure 1, to establish that even small changes to MIC are associated with clear selective benefits.

      Thank you for the suggestion. We agree that in some cases the change in MIC is modest, however, we would like to highlight that small-level changes in resistance have important clinical implications. For example, resistance mutations conferring a small change in MIC can ensure the survival of pathogenic bacteria in antibiotic-treated hosts (PMID: 30131514). Additionally, a comparison between competition assays (Fig 1) and MICs (Fig 2) clearly shows that small changes in MIC are associated with substantial fitness benefits. For example, for pSEVA:MCR-1, the fold change in MIC of CATH2 (chicken), PMAP23 (pig), and LL37 (human) ranges between 1.05 and 1.5, however, the competitive fitness ranges from 10% to 17%. This issue is discussed in the revised manuscript (lines 306-317, page 13)

      ….This would be especially helpful in assays with human serum and in Galleria where the concentrations of AMPS or other immune components are unknown.

      It is clear that MCR-1 increases resistance to serum and virulence (Figure 4). However, we agree with the reviewer that the selective benefits of MCR-1 in complex host environments are not known (i.e., serum or Galleria). We have revised the final paragraph of the discussion to reflect this limitation of our study (lines 370-382, page 15).

      Assays using human serum are interesting but challenging to interpret given the diverse causes of bacterial killing, including complement. Although this was partly addressed in Supplementary Figure 6, I found the predictions of these experiments unclear. First, I think these experiments are too central to be relegated to the supplemental materials; they belong in the main text. Secondly, it is important to explicitly spell out the expectations of using heat-killed serum (which will degrade any heat-labile components) or complement-deficient serum. It should be clearer under which conditions MCR-1-containing strains are predicted to do better or worse than controls.

      We have addressed this in the revised version. We have moved Supplementary Fig 6 to the main text, and have edited the text, clarifying the model prediction (lines 245-257, page 10).

      Galleria is a useful infection model for virulence, but it is unclear what drives differences between strains. First, bacterial numbers aren't measured in this assay, so it isn't known if increased virulence is due to increased bacterial growth or decreased bacterial clearance. As above, I think these assays would be stronger using the competition-based approach in Figure 1. This would indicate bacterial numbers through time and directly show the selective benefit associated with MCR-1. Second, it would be useful to elaborate on why MCR-1 increases virulence, especially any known similarities between Galleria AMPS and those tested in Figures 1 and 2. Overall, it would help if Galleria were less of a black box.

      We agree that the mechanism underlying increased virulence remains to be explored and thus, we have already discussed this in the discussion as a limitation (lines, 370-382, page 15). However, elucidating the mechanisms by which MCR-1 increases virulence would clearly be an interesting line of research moving forward.

    1. Author Response

      Reviewer #1 (Public Review):

      The adhesion of Leishmania promastigotes to the stomodeal valve in the anterior region of the sandfly vector midgut is thought to be important to facilitate the transmission of the parasites by bite. The promastigote form found in attachment is termed a 'haptomonad', although its adhesion mechanism and role in facilitating transmission have not been well studied. Using 3D EM techniques, the paper provides detailed new information pertaining to the adhesion mechanism. Electron tomography was especially useful to reveal the ultrastructure of the attachment plaque and the extensive remodelling of the flagellum that occurs. A few of the attached haptomonads were found to be in division, which is a novel observation. The attachment of cultured promastigotes to plastic and glass surfaces in vitro was found to involve a similar remodeling of the flagellum and was exploited to image the sequential steps in attachment, flagellar remodeling, and haptomonad differentiation. The in vitro attachment was found to be calcium2+ dependent. Based mainly on the in vitro observations, a sound model of the haptomonad attachment plaque and differentiation process is provided.

      We thank the reviewer for highlighting the significant progress we have made in dissecting the adhesion mechanism and flagellum restructuring in the Leishmania haptomonad.

      Reviewer #2 (Public Review):

      The study by Yanase et al. investigated the details of the 3D architecture of Leishmania haptomonad promastigote's adhesion to the midgut of the insect vector. The authors generated a dataset of images that reveal intricate details of the formed adhesion plaque and expanded the study with in vitro alternatives for the exploration of how Leishmania promastigotes strong adhesion by hemidesmosomes to surfaces can happen and be maintained. They show with unprecedented detail the ultrastructure of the attachment plaque. The in vitro dataset of the paper adds to the specific literature important details on how to explore micro/nanostructures involved in an important attachment step for this eukaryotic parasite. However, the in vitro data should be reconsidered in its discussion and conclusions as it does not support direct comparison with in vivo Leishmania forms as pictured by the authors. In general, the dataset presented in this manuscript adds valuable data and resources for the study of Leishmania promastigotes to surfaces, especially to the thoracic midgut parts of its insect vector.

      The dataset of this paper is well-collected and robust, but some aspects of image analysis need to be clarified and extended. Also, the in vitro data from the manuscript will benefit from an extensive adjustment in its discussion. Points to focus on:

      We thank the reviewer for recognising the ultrastructural detail we have now provided of this cryptic parasite life cycle stage. Below we address each of your points in detail.

      1) The haptomonad promastigote is indeed a possible critical form for transmission, but it lacks formal demonstration still in all literature available. This should not be claimed without proper formal demonstration.

      We agree with the reviewer that any relationship between transmission and the haptomonad form has yet to be formally demonstrated. Hence, we revised the descriptions referring to the relationship between transmission and the haptomonad form (Line 22-23, 31 and 113-114).

      2) Literature available and cited in this manuscript regarding in vitro adhesion of culture Leishmania promastigotes does not provide direct evidence for haptomonad differentiation. Haptomonads are still a largely unknown promastigote form with no defined ontogeny. With that, to propose an in vitro haptomonad differentiation protocol, more detailed direct evidence of in vivo haptomonads will be necessary. The in vitro experiments available show how cultured promastigotes attach to surfaces. Detailed studies in vivo will be needed still to attribute the findings in vitro to haptomonads.

      We would like to highlight that promastigotes and haptomonads have morphological definitions within the literature and our cells are definitely more like haptomonads than promastigotes. As the reviewer highlights, the haptomonad-like cells we generate in vitro have an almost identical morphology and attachment plaque structure to those haptomonads we observed attached to the stomodeal valve. In addition, we have been able to watch individual cells that had a promastigote morphology acquire a haptomonad morphology and we believe this will provide future insights to the ontogeny of these forms. However, as there are currently no published molecular markers for haptomonads we have not been able to provide direct evidence other than the morphology and ultrastructure that in vitro attachment replicates in vivo haptomonad differentiation. Therefore, we have revised our nomenclature and now refer to the in vitro haptomonad-like cell. In the discussion, we have been careful to highlight that certain aspects of our model rely on in vitro data and therefore may not accurately reflect the situation in the sand fly.

      3) This manuscript will benefit by having a detailed description of how to analyze and get to the 3D models presented. This has a strong potential for usage beyond the Leishmania/sand fly field. Statistics should be made available with ease across the manuscript and with a dedicated section on methods.

      We added a detailed description of how to analyse the 3D models (Line 756-763), and added videos showing a rotated view of each 3D model (Figure 1—video 3 and 4, Figure 2—video 2, and Figure 3—video 2 and 4). We have deposited the SBF-SEM and tomography data on the Electron Microscopy Public Image Archive (EMPIAR; https://www.ebi.ac.uk/empiar/), enabling access to the raw data (Line 763-766). We have added a statistics section into the Materials and Methods (Line 864-868).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Sampaio et al. tackle the role of fluid flow during left-right axis symmetry breaking. The left-right axis is broken in the left-right organiser (LRO) where cilia motility generates a directional flow that permit to dictate the left from the right embryonic side. By manipulating the fluid moved by cilia in zebrafish, the authors conclude that key symmetry breaking event occurs within 1 hour through a mechanosensory process.

      Overall, while the study undeniably represents a huge amount of work, the conclusions are not sufficiently backed up by the experiments. Furthermore, the results provided present a limited advance to the field: the transient activity of the LRO is well established, and narrowing down this activity to 1 hour (even though unclear from the presented data that it is a valid conclusion) does not help to understand better the mechanism of symmetry breaking.

      We thank the reviewer1 for acknowledging the hard experimental set up. However, we must argue that knowing the exact timing that is more sensitive to fluid flow manipulations is a very important advance we provide here. The reason is because this type of experiment is giving us the physiological timing in a WT embryo. It is one thing to know the system can respond to optical tweezers earlier than 5 ss and later than 5 ss, as Yuan lab did recently, but quite another to constrain the physiological timing at which the process occurs in an unperturbed manner (as much as possible). Our aim was the latter. Our rationale is that knowing the physiological time is important to provide clues, for example we had these types of questions at the time: is the physiological time before or after cell rearrangements occur? is it falling in a directional or non-directional flow regime? Is it governed by a mild flow or stronger one? Is it before or after dand5 becomes asymmetric? Some of these questions that we think we all know the answers for, could be challenged by our experiments… so it is indeed very important to not assume we know the answer, and ask the question again in an unbiased way with every new technique available! We wanted to be unbiased, and we think that is the beauty of our time-window experiment. Indeed, it shows the physiological time-window peaks at 5 ss which is later than Yuan’s lab calcium transient recording and before dand5 asymmetric expression. In our opinion this is compatible and makes perfect sense because although the system already shows calcium transients before and can respond to lack of Pkd2 or optical tweezer cilia manipulations at 1 ss – 3 ss, it is from 4 to 6 ss, peaking at 5 ss, that it is most responsive physiologically to the fluid extraction and therefore both mechanical and chemical perturbations.

      We have made additional experiments and used smFISH on WT embryos for detecting dand5 expression with cellular resolution, and we have quantified asymmetries in dand5 number of transcripts as early as 6 ss (new Figure 7 and new author: Catarina Bota) that further support our time-window claim. Degradation of dand5 mRNA has been the mechanism suggested to be at the base of the asymmetric dand5 expression, which is usually a very fast mechanism. This new piece of evidence supports that the physiological breaking of symmetry is stronger around 5 ss. (see new discussion on this subject on page 27).

      Regarding the symmetry breaking. The fact that anterior angular velocity was the major difference between embryos that recovered without LR defects versus those that did not, reveals that angular velocity must be tightly regulated by cilia motility and CFTR activity to bring back fluid and flow directionality, which together confer the robustness of flow. This is now better explained in the manuscript. We agree that the novelty regarding angular velocity may seem incremental compared to our work from 2014, where we only analyzed speed (Sampaio et al, 2014). However, here we provided more resolution and detailed parameters of angular velocity per sections of the LRO as well as tangential and radial velocities, the components of angular velocity. The Radial component shows a trend towards left anterior that is now discussed in the text as evidence for a left difference. The present work shows that anterior angular velocity has a major role in the successful recovery of the symmetry breaking process, which was not claimed before. Here we challenged the embryo to bring to light the most important parameters.

      Importantly, the authors do not provide any convincing experiments to back up the mechanosensory hypothesis because the fluid extraction experiments affect both the chemical and physical features of the LRO, so it is impossible to disentangle the two with this approach.

      We agree the first extraction experiment (Figures 1-3 and Table 1) affects both mechanisms and does not disentangle them, and that was, in fact, our goal for the first experiment - the finding of the exact time-window for symmetry breaking. However, in the second part of the work (Figures 4-5 and Table 2) we provide a 20,000 times dilution experiment, this dilution experiment is very different than the extraction one. We apologize if this was not clear and hope to have made it clear this time.

      We must agree with the reviewer that chemosensing is not excluded, in fact we had provided a paragraph in the discussion about EV secretion rates to tone down our claim and did acknowledge that secretion could still overcome the dilution we are causing. We think we had already addressed this problem in the previous eLife manuscript but now we have discussed the possibilities and the experimental evidence that supports each of them (see page 28, last paragraph). The key experiment that does not fit with secretion is pointed out in the end, and we ask the reviewer to read it in the context of wildtype animals. We agree both scenarios must be discussed and leave space for future data on mmp21 and CIROP. However, so far, in zebrafish we cannot favor chemosensing as much as mechanosensing, we can only wait for more discoveries and be open.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Lee and colleagues address the participation of NBR1 in chloroplast clearance after treatment with high light intensity. Authors use NBR1 fused to reporter proteins (GFP, mCherry), with the aid of nbr1, atg7, and nbr1-atg7 mutants, in combination with immunogold labelling to show localization of NBR1 to surface and interior of photodamaged chloroplasts, which follows with their engulfment in the vacuole, a process which is independent of ATG7. The combined use of ATG8 fused to GFP further shows that NBR1 and ATG8 are recruited independently to photodamaged chloroplasts. In addition, the use of mutant versions of NBR1 in combination with mutants lacking E3 ligases PUB4 and SP1 and mutant toc132-2 and tic40-4 lacking members of the TIC-TOC complex of protein translocation to the chloroplast, authors show that chloroplast localization of NBR1 requires the ubiquitin ligase domain (UBA2) of the protein, whereas, the PB1 domain exerts a negative effect on NBR1 chloroplast association, yet neither the PUB4 and SP1 E3 ligases nor the TOC-TIC are required for NBR1 association to photodamaged chloroplasts. All these approaches are well described and strongly support the authors' conclusions that the loss of chloroplast envelope integrity allows the entrance of cytosolic ubiquitin ligases and the participation of NBR1 in photodamaged chloroplast clearance by a process of microautophagy. All these findings add valuable information to our knowledge of chloroplast homeostasis in response to light stress.

      To further support these conclusions, authors perform a chloroplast proteomic analysis of the WT, nbr1, atg7, and nbr1-atg7 mutants. However, in contrast with the above results, the description of the proteomic data is rather confusing. The paragraph on Page 17 (lines 393-406) is hard to follow. The term "over-representation of less abundant chloroplast protein" is also quite confusing, like the data in Fig. 6 and supplementary to this figure (what does show the PCA analysis in Fig. 6-suppl. 1?). I wonder whether it would be possible to show all these data as supplementary and try to present the data supporting the major conclusion of these analyses (if I understood correctly, that nbr1, atg7, and the double mutant have lower contents of chloroplast proteins), in a more simple and clear format.

      Following the reviewer’s comments, we have re-written the result section describing the proteomic data to make it more concise and clearer. We have also made modified Figure 6 to make it more concise and generated new graphs for Figure 6 supplemental figures 1 and 2.

      Reviewer #2 (Public Review):

      The authors conducted a wide-ranging series of experiments which lead to the conclusion that NBR1 is involved in the clearance of photodamaged chloroplasts. It is a novel finding because the role of NBR1 in this process was never documented. Notably, the NBR1-mediated clearance is only one of the several possible mechanisms responsible for chloroplast turnover. It is not surprising, considering that the nbr1 mutants are viable. The work is arranged very well. The rationale of the subsequent experiments is logically justified and the outcomes and followed by clear conclusions. In consequence, the authors managed not only to observe the association of NBR1 with the chloroplasts but they threw some light on the corresponding mechanisms. The manuscript contains numerous high-quality images from a confocal microscope and from a transmission electron microscope. All images are accompanied by statistical analysis of the respective microscopic observations, which greatly improves the credibility of the conclusions. Shortly, the authors demonstrated that NBR1 decorates not only the exterior but also the interior of damaged chloroplasts in an ATG7-independent way. Next, they establish that NBR1 and ATG8 are recruited to different populations of damaged chloroplasts, and they document differences in chloroplasts turnover, differences in chlorophyll abundance and chlorophyll photochemical properties, as well as differences in the total proteome of the nbr1 mutant in comparison to the wild type and atg7 mutant in two light regimes (low light and high light). Finally, they exclude the requirement for the known E3 ligases PUB4 and SP1 for NBR1mediated degradation and show that the NBR1 internalization relies rather on the chloroplastic membrane rupture than on the TIC-TOC-dependent processes. In summary, the authors postulate that NBR1-mediated chloroplast clearance is a novel, not yet described mechanism and summarize it in a clear diagram.

      The work is interesting, the figures are convincing and the conclusions are justified by the results. It provides novel data on the function of selective autophagy receptors NBR1 in plant cells, however, it also leaves the reader with some unanswered questions. The most important is the relative contribution of each of the chloroplast's degradation routes to the turnover of these organelles in different stresses, light regimes, plant growth stages, etc. This is a difficult problem because the mutations in relevant genes have pleiotropic effects and it is difficult to separate the functions of the individual turnover routes. For example, the defects in core autophagy genes (like the atg7 mutant used in this study) result in an increased level of NBR1. These issues are not sufficiently addressed in the discussion.

      The reviewer is correct and indeed, we also detected higher levels of NBR1 in the atg7 mutant (Fig 2G). This could be, for example, the underlying reason why there are more chloroplasts decorated with NBR1 in that atg7 mutants than in complemented nbr1 plants, 24h after high light treatment (Fig 1F). However, the higher frequency of photodamaged chloroplasts observed in atg7 (Fig 2D), supports a different scenario: the higher number of photodamaged chloroplasts that are not successfully repaired or degraded by canonical autophagy in atg7, become substrates of NBR1. The increased levels of NBR1 in the agt7 mutant and how this could influence the effects seen in the mutants studied in this manuscript is now discussed in lines 670-673.

      Reviewer #3 (Public Review):

      The authors use an impressive array of techniques to determine the role of the NBR1 autophagy receptor protein specifically in the clearing of photodamaged chloroplasts. The authors describe the mechanism(s) by which this receptor operates in this context and demonstrate that this NBR1-mediated process occurs independently of SP1 and PUB4 (whose own roles in other aspects of chloroplast autophagy have previously been shown). The authors further dissect the functional domains of NBR1 to identify which are important in this process.

      The major strength of this work is the myriad techniques used to approach the problem. The data are of high quality, and on the whole, well replicated and statistically analysed. In the main, these data substantiate the findings of the authors, although some findings are quite correlative/descriptive. However, the authors show good circumspection in their conclusions and discussion. One potential weakness is that the genetic data (use of mutants) rely on single mutant alleles, therefore whilst genetic linkage to the mutations is assumed, it cannot strictly be guaranteed. The authors performed effective genetic complementation to analyse the domain structure of NBR1 shown in Figure 7. It would have been good if complementation of nbr1 and atg1 mutants and/or alternative mutant alleles had been used for experiments described in Figures 1 to 6. Without this, I think even more circumspection regarding the data obtained from these single-allele mutants would be advised.

      We agree with the reviewer that more mutant alleles would have provided stronger support to our conclusions, but we would also like to highlight that the atg7-2 (Chung et al 2010), nbr1-2, and atg7-2 nbr1-2 mutants (Jung et al 2020) have been well characterized previously and the nbr1-2 mutant, shown to be rescued by the expression of fluorescently tagged NBR1 (Jung et al 2020). We are confident about the results on the localization of NBR1 in chloroplasts, not only because the fluorescently tagged NBR1 proteins are functional but also because we were able to corroborate the localization of NBR1 by using antibodies against the native proteins (Fig 2). That said, the reviewer does raise an important point and therefore, we have acknowledged more explicitly the limitation of our conclusions based on the analysis of single mutant alleles in lines 630-631 of the discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The model put forward by the authors in this manuscript is a simple and exciting one, explaining the function of AGS3 as a negative regulator of LGN, acting as a 'dominant-negative' version of LGN. Overall, the results support the model very well, and the results shown in Fig 6, which clearly reveal the functional relevance of AGS3, add strength to the paper.

      We thank the reviewer for their enthusiasm regarding our finding that AGS3 acts as an endogenous dominant-negative to inhibit LGN. We appreciate their assertion that the results support the model and that the functional relevance to epidermal stratification is a strength.

      In Figures 3A and B, the authors claim that AGS3 overexpression leads to depolarization of LGN in epidermal stem cells. However, in the example provided in Figure 3A, the LGN signal appears to be stronger than the control, with more LGN still on the apical side (many would categorize this as 'apically polarized'). In the scoring shown in Figure 3B, I am not sure if 'eyeballing' is the right way to decide whether it is polarized/depolarized/absent. The authors should come up with a bit more quantitative method to quantify the localization/amount of LGN and explain the method well in the manuscript. A similar concern regarding the determination of the LGN localization pattern applies to the rest of figure 3 as well.

      We agree with this important critique about the methodology used to assess LGN expression patterns. While we have historically included categorical analyses like those used in Fig. 3A,B in past publications (Williams et al, NCB 2014; Lough et al eLife, 2019), we have also now performed additional, unbiased, quantitative measures of LGN fluorescent intensity, as described in greater detail above. We added these new data in Fig. 4C-J, while the data previously in Fig. 3A,B have now been redistributed between Fig. 3E,F (overexpression) and Fig. 4A,B (knockdown).

      Reviewer #2 (Public Review):

      To date, only a handful of studies have addressed the importance of AGS3, a paralog of the relatively well-characterized spindle orientation factor LGN. The authors now show that AGS3 acts as a negative regulator of LGN and propose that this activity could work through competition for binding partner(s). Remarkably, regulation is temporally restricted in such a way that the conserved role played by LGN in metaphase spindle orientation is unaffected. Instead, AGS3 regulates a post-metaphase function for LGN, namely Telophase Correction. The article is well-written, the experiments are performed at a high level, and the claims are generally supported by the data. Two main points of confusion are raised in the current version. 1) The authors show that AGS3 regulates cortical localization of LGN, but would need to clarify how LGN is being affected. 2) The authors propose in the discussion that AGS3 might exert its regulatory effect through competition for NuMA, an important binding partner for LGN, but would need to clarify how and why NuMA would be involved in Telophase Correction.

      We thank the reviewer for appreciating the novelty of our findings regarding the understudied LGN/pins paralog AGS3. In regards to the first point, as described earlier, we have added additional quantitative analyses of how AGS3 affects cortical LGN fluorescent intensity in Fig. 4C-J. We now show that AGS3 loss leads to broader and higher expression levels throughout mitosis, and therefore we have amended our model to soften the claim that AGS3 primarily operates during telophase correction. This renders the second point somewhat moot, but we nonetheless have expanded our Discussion to note that NuMA can be cortically recruited to the anaphase cortex independent of LGN (lines 531-542). We also contextualize our findings with the Reviewer’s own recent study which proposes a “threshold model” of cortical Insc as a determinant of spindle orientation (Neville et al, 2023), and speculate that a similar model could apply in our system, perhaps with AGS3 binding and sequesting Insc rather than NuMA (lines 543-556).

      Reviewer #3 (Public Review):

      This paper examines the mechanisms that control division orientation in the basal layers of the epidermis. Previous work established LGN as a key promoter of divisions where one of the siblings populates the differentiated layers (perpendicular). This work addresses two important, related issues - the mechanisms that determine whether a particular division is planar vs perpendicular, and the function of AGS3, and LGN paralog that has been enigmatic. A central finding is that AGS3 is required for the normal distribution of planar and perpendicular divisions (roughly equal) such that in its absence the distribution is skewed towards the perpendicular. Interestingly, however, the authors find that AGS3 has no detectable effect on orientation if the orientation is measured at anaphase. This timing aspect builds upon previous work from this group demonstrating a phenomenon they term "telophase correction" in which the orientation changes at the latest phases of division (and possibly post division?). Thus AGS3 seems to exert its effect using these later mechanisms and this is supported by further analysis by the authors. Importantly, the authors show that AGS3 acts through LGN, based on localization data and an epistasis analysis. The function of AGS3 has been highly enigmatic so resolving this issue while providing a useful step towards understanding how the division orientation decision is made, makes for exciting progress towards an important problem. I found the overall narrative and presentation to be quite good and especially appreciated the thoughtful discussion section that did an excellent job of putting the results in context and speculating how unknown aspects of the mechanism might work based on current clues. With that said, I think there are some important issues that should be resolved.

      We thank the Reviewer for this excellent summary of our findings and appreciation of the significance of the issues that our study addresses.

      Regarding the orientation measurements, the authors should specify how the midbody marker was used to mark sibling cells, especially given the midbody can move following division. For example, how can the authors be confident that the siblings in the middle panel of 1A are correct and not an adjacent cell? Regarding quantification, it would be useful for the authors to comment on how the following would influence their measurements: 1) movements along the z-axis, and 2) movement of the nucleus within the cell

      We have used this methodology for over a decade, and while it is not flawless, we have included several safeguards to ensure that sibling cells are correctly identified. We have added additional details to the Methods section (lines 867-869, 873-879).

      A similar question is how much telophase correction really happens in telophase. How confident are the authors that the process actually occurs during division and not subsequent to it? What is drawn in their previous paper and in Figure 7A implies that post-division movements may be important. It would be useful for the authors to comment on whether they can make the distinction and whether or not it might be important.

      Our intent in coining the term “telophase correction” was to imply that this process initiates, rather than completes, during telophase. We apologize for this confusion and have clarified this in the text (lines 80-82). Since most mammalian cells complete M phase in ~1h, with the longest time spent in prophase, in the absence of direct evidence to the contrary, it may be prudent to assume that telophase, like metaphase and anaphase, is relatively short, on the order of minutes. Since we cannot directly observe reformation of the nuclear membrane in our movies, we cannot be sure when telophase ends. Likewise, we do not currently have a suitable marker of the spindle midbody for live-imaging, so cannot be sure when cytokinesis completes. That said, we feel confident that most of the reorientation is occurring prior to cytokinesis, because we have previously reported that the greatest changes in daughter cell positioning occur within the first 10-15 minutes of anaphase onset, when a gap in membrane-GFP/TdTomato is still visible (Lough et al, eLife, 2019). However, while we feel that there are many interesting questions that our work raises about the timing or reorientation relative to specific mitotic stages—e.g. is the midbody asymmetrically positioned, inherited, or ejected?—these questions are beyond the scope of the present study.

      Does the division angle in the AGS3 OE experiment (Figure 1D) correlate with AGS3 levels within the cell?

      This is an interesting question, and indeed, we our hypothesis would predict that it would. However, it is not straightforward to quantify AGS3 or mRFP1 levels, and as we explain in a new section of the Results (lines 212-237), we have some concerns that N-terminally tagged AGS3 may not be fully functional. We have added new data with C-terminally tagged AGS3-mKate2, which we feel provides even stronger evidence that mKate2+ cells show a planar shift compared to mKate2- cells (Fig. 3C,D). In the future, we could test this hypothesis at the population level by comparing division orientation profiles for AGS3-mKate2+ cells carrying either a non-targeting scramble or Gpsm11147 shRNA. We would predict that knocking down endogenous AGS3 while overexpressing AGS3-mKate2 should give an intermediate phenotype.

      I found the localization data to be the weakest part of the paper and feel that some reconsideration and reanalysis are warranted. First, the quantifications in Figures 2C, 3B, and 3F are unnecessarily vague scoring-based metrics. In 2C, "Localization pattern" should be replaced with membrane/cytoplasm ratio or an equivalent quantification. In 3B "LGN localization" should be replaced with apical/cytoplasmic and apical/basal ratios or equivalents. In 3F, "Polarized LGN frequency" should be replaced with apical/basal ratio or equivalent. It seems to me that non-AI processed data would be most appropriate for these quantifications unless such processing can be justified.

      This issue was raised by the previous two Reviewers and has been addressed by new data added to Figure 4.

      Second, it is important to note that the cytoplasmic localization of AGS3 does not allow one to conclude that AGS3 is not on the membrane. Unfortunately, high cytoplasmic signal can preclude the determination of membrane-bound signal.

      We agree with the Reviewer and have softened our language throughout the text.

      Finally, I had difficulty reconciling the images of LGN shown in Figure 3 with the conclusions made by the authors.

      We have added additional, representative images of LGN expression in control and AGS3 KD cells in Figure 4C-E.

      The challenge of the localization data is troubling because an important conclusion of the paper is that AGS3 acts via LGN. The localization data provided one leg of support for this conclusion and the other is provided by an epistasis analysis. Unfortunately, this data seems to be right on the edge because it is based on the difference between the solid and dashed blue lines in Figure 5B not being significant. However, we can see how close this is by comparing the solid and dashed red lines in the adjacent 5C, which are significantly different. Between the localization data, which doesn't seem clear cut, and the epistasis experiment, which is on the razor's edge, I'm concerned that the conclusion that AGS3 acts through LGN may be going beyond what the data allows.

      We appreciate the Reviewer’s comments about the importance of these two lines of experimentation: 1) AGS3’s effect on LGN localization, and 2) epistasis experiments between AGS3/Gpsm1 and LGN/Gpsm2. We feel we have significantly strengthened this first pillar with the additional data presented in Fig. 4C-J. Regarding the second point, we would like to emphasize that we present three lines of evidence for the existence of an epistatic relationship between LGN and AGS3: 1) the static division orientation data comparing LGN single KOs to both LGN KO + AGS3 KD and AGS3+LGN dKOs (Fig. 6B); 2) live imaging division orientation/telophase correction comparing LGN KOs to AGS3+LGN dKOs (Fig. 6C-E); 3) lineage tracing data comparing LGN KOs to AGS3+LGN dKOs (Fig. 7H,I). Further, we think the reviewer may have misconstrued the data presented in Fig. 5C (now Fig. 6C). The dashed lines indicate orientation at anaphase and solid lines 1h after anaphase, so the shift between dashed and solid lines indicates telophase correction, which occurs to similar (and statiscially significant) degrees in both LGN single mutants and AGS3+LGN dKOs. Comparisons between the single and double mutant would be between red and magenta solid lines or red and magenta dashed lines, and neither of these are statistically significant. We realize that our use of dashed lines in Fig. 5B (now Fig. 6B), which we normally only use to refer to anaphase entry in live imaging data, may have caused this confusion. Therefore, we have changed all plots to solid lines¬ in Fig. 6B, and use light and dark magenta, respectively, to differentiate between LGN KO + AGS3 KD and AGS3+LGN dKOs.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors took a comprehensive set of analyses to examine the relationship between pupil diameter / derivative and BOLD-signal during rest in the ascending arousal system nuclei in 72 young participants. Focus is on the locus coeruleus, ventral tegmental area, substantia nigra, dorsal and median raphe nuclei and the basal forebrain. Analyses were performed using various processing pipelines: canonical versus custom hemodynamic response functions, with/without smoothing, time to peak analyses and cross spectral power density analyses to define the time lag between both measurements. The authors could not replicate previous correlations between locus coeruleus BOLD and pupil measurements using standard analytic approaches, and also found no relationship between locus coeruleus BOLD and pupil measurements when using custom hemodynamic response functions. When using time to peak and cross-correlation analyses, the authors found that coupling between pupil size and AAS BOLD patterns increases with decreasing time to peak, when the two signals were close in time. The authors conclude that these findings suggest that pupil size could be used as a noninvasive readout of AAS activity under passive conditions.

      These authors did a thorough assessment, and described the methods and results well and in a balanced manner.

      Outstanding questions:

      • the reliability of these observations? would we see the same findings in a different cohort or using a different sequence/field strength?

      • What is the independent association of each assessed nucleus with pupil dilation? That could be informative to understand their shared or unique role.

      We are grateful to the reviewer for their expert advice in helping us strengthen our manuscript. We agree with the reviewer that these two outstanding questions are important and we have done our best to answer these questions below. We believe that our manuscript has greatly improved, thanks to the reviewer’s suggestions for running these additional analyses.

    1. Author Response

      Reviewer #2 (Public Review):

      The availability of large collections of Mycobacterium tuberculosis (Mtb) isolates has enabled many important studies looking to identify mycobacterial genetic polymorphisms associated with anti-tuberculosis (TB) drug resistance, including both classical "resistance-conferring" mutations and novel "resistance-enabling" mutations. Importantly, these studies have expanded our understanding of mycobacterial genetic adaptations undermining chemotherapy, in many cases allowing for improved diagnostic tests and predictions of treatment failure. In this submission, Gao and colleagues adopt a different approach to the problem: although also applying a GWAS-type analysis, they instead attempt to elucidate polymorphisms implicated in poor outcomes of TB patients undergoing treatment for the drug-susceptible disease. Starting with a large dataset comprising 3496 samples with corresponding clinical (host) metadata, the authors generate Mtb whole-genome sequence data for 91 samples obtained from patients with "poor" outcomes and 3105 patients with "good" outcomes. These are used to identify 14 fixed and >230 unfixed mutations that might be associated with "poor" treatment outcomes, a conclusion which they argue is plausible given transcriptional evidence implicating many of the identified genes in the mycobacterial response in vitro to first-line drug exposure and/or hypoxia, both of which are considered relevant to clinical disease. Notably, they also identify a tendency for a greater proportion of "ROS mutational signatures" in unfixed mutations from "poor" outcome samples. Finally, incorporating these observations in a prediction model, the authors observe that the mycobacterial factors aren't adequate on their own but, when combined with key host factors - including patient age, sex, and duration of diagnostic delay (which have stronger predictive value) - they enhance predictive capacity. In summary, this paper reports a novel approach yielding observations that offer tantalizing insight into the mycobacterial factors which might influence TB treatment outcomes independent of drug resistance, however, the following must be considered:

      (i) The manuscript provides little to no detail about how the samples were obtained, other than the fact that they comprise "pre-treatment" samples: are they all sputum samples? Were they induced? Similarly, no information is provided about sample propagation: were the samples cultured to achieve sufficient biomass for whole-genome sequencing? If so, in what growth media, for how long, and how many passages? Were all samples treated identically? And were they plated to single colonies - or are the "isolates" referred to throughout the manuscript actually heterogenous populations of potentially different Mtb clones obtained - and propagated - as a mixed sample? This information is critical given the potential that the identified polymorphisms - both fixed and (perhaps even more so) unfixed - might have arisen as a consequence of in vitro (laboratory) manipulation under standard aerobic conditions.

      Thanks for your encouraging comments. The requested information about sample propagation has been added to the methods section in the new version. For details, please see our response, above, to the essential revisions (Q1).

      (ii) A key question that arises from this study (and others like it) is whether causation has been adequately established. Ideally, the Mtb genotypes contained within samples obtained pre-treatment should be compared with samples obtained from the same patients following treatment - that is, when the "poor" outcome was manifest. The expectation is that the polymorphisms identified prior to initiation of therapy - especially the 14 fixed mutations - should be evident (even dominant) at the later stage when therapy failed (or at the subsequent presentation in cases of relapse). Recognizing that this is not easily accomplished, though, it seems fair to suggest that the perceived relevance of the identified mutations would be strengthened if the authors were able to provide any other evidence - perhaps from studies of drug-resistant Mtb isolates - supporting their inferred role in undermining frontline treatment.

      Thank you for these insightful questions. We sequenced the isolates obtained at the time of relapse for all 47 relapse cases and found that the 14 GWAS-identified fixed mutations were only detected in relapse isolates from the 13 patients whose first samples also contained the GWAS-identified mutations. None of the 14 mutations we identified were found in isolates from the other relapsed patients. We also searched for the presence or absence of theses 14 mutations in published studies seeking noncanonical mutations associated with drug-resistant Mtb isolates [5-7]. None of the 14 mutations we identified were reported in any of these studies, but two of the genes (ctpB & metA) in which our mutations were found had been previously identified as potentially associated with first-line drug resistance.

      (iii) Related to the above, the authors make the valid point that their intention here was different from other studies which have deliberately utilized drug-resistant Mtb isolates to identify resistance-conferring and resistance-enabling mutations (such as in the study they cite by Hicks et al). It would be interesting to know, however, if any of the mutations identified in those other studies were also picked up in this work - and, if not, why that might be the case.

      As mentioned in our response to the previous question, none of our mutations were mentioned in prior studies. Our inference is that the 14 fixed mutations we identified had only limited effects on outcomes, which would explain why: they were not identified in previous studies; isolates from only 24.2% (22/91) of patients carried any of these 14 mutations; and none of the mutations were shared amongst all 22 patients.

      (iv) Finally, the analyses presented in this study are heavily dependent on the use of appropriate statistical methods to identify potentially rare genetic polymorphisms. However, as noted for sample processing (see my earlier comment above), there is very little detail provided about the methodology applied. This omission detracts from the interpretation, especially given that the predominance of lineage 2 (which contributes >75% of the isolates, with sublineage 2.3 constituting >50%) risks a lineage-specific association, rather than a more generalizable pathogenicity phenotype. Similarly, the heavy skew in the numbers of "good" (3105 samples) versus "poor" (91 samples) collections (approximately 34x difference in sample size) raises the possibility that mutations identified in the "poor" category might be artificially over-represented. More clarity in detailing the statistical methods is required to allay any concerns about the identification of candidate polymorphisms.

      Thank you for pointing this out. We have added details of our statistical methods to the methods section, and in the results section we have indicated the specific statistical methods used and the meaning of the statistical metrics.

    1. Author Response

      Reviewer #1 (Public Review):

      Lammer et al. examined the effects of social loneliness, and longitudinal change in social loneliness, on cognitive and brain aging. In a large sample longitudinal dataset, the authors found that both baseline loneliness and an increase in loneliness at follow-up were significantly associated with smaller hippocampal volume, reduced cortical thickness, and worse cognition in healthy older adults. In addition, those older adults with high loneliness at baseline showed even smaller hippocampal volume at follow-up. These results are interesting in identifying the importance of social support to cognitive and brain health in old age. With a longitudinal design, they were able to show that increased loneliness was related to reduced brain structural measures. Such results could help guide clinicians and policymakers in designing social support systems that would benefit the growing aging population.

      The strength of the current study lies in the large sample size and longitudinal follow-up design. The multilevel models used to separate within and between subject effects are well constructed. Combining neuroimaging data with behavioral changes provided further evidence that social loneliness may be related to accelerated brain aging. Stringent FDR correction, Bayes factor comparison, and the additional analyses for sensitivity showed the robustness and credibility of the results.

      Thank you for a thorough and overall positive evaluation of our manuscript and the constructive feedback. We considered all of your comments valuable, please see point-by-point responses below for more details.

      Weaknesses of the study were related to the interpretation and discussion of their findings.

      1a) Social loneliness is a relatively little-studied factor in cognitive ageing, and the authors should consider expanding the discussion, with some additional analyses, as to how their results could be used by clinicians and older adults to monitor social behaviors.

      We agree with the reviewer and are thankful for these suggestions. We have run additional analyses following the clinical cut-off of the questionnaire on social isolation and added those and their interpretations to the results and discussion section. Please see below response to questions 2a) and 3a) as well as to those in section b) to this reviewer how we implemented the reviewer’s advice in detail.

      2a) The authors examined the interaction between baseline and age change to see if higher baseline loneliness was associated with accelerated decline. The interaction was significant, but the authors did not further explore the interaction effect, which may have clinical significance. The authors should consider identifying a cut-off point in LSNS that suggests persons scoring less than this score on the LSNS may be at greater risk of accelerated brain decline than others. Such a cut-off point is important for clinicians, as well as for future researchers to compare their results.

      2a) Thanks to your recommendation, we decided to explore differences between handling LSNS as a categorical (using the standard threshold of 12) and continuous variable and recalculated all LMEs on HCV and cognitive functions with LSNS coded dichotomously. We found the results to be similarly good in detecting adverse effects of social isolation (see new Tables S16-18). The interaction of categorical LSNS with change in age on HCV tends towards showing an effect but does not reach significance even before FDR-correction.

      As cut-off points are central to clinical work, we are convinced that this expansion improved our study greatly, contributed to its benefit to our readers and we are thus very grateful for this valuable question.

      Our analyses indicate that the cut-off can be employed in clinical settings to detect social isolation that might harm patients’ brain health.

      However, this does not answer another important question, namely which public health strategy is most suitable to target social isolation for preventive purposes. Should it focus on the most isolated individuals (i.e. those categorised as socially isolated) or pursue a population strategy (Rose et al., 2008)? This actually is the topic of ongoing research in our group and we hope to answer it in future work. For now, we ran additional models testing an interaction effect of dichotomous LSNS with continuous LSNS. Finding evidence for such an interaction effect would suggest that having less social contact has stronger negative effects for those that are categorised as socially isolated. Roughly speaking, is it worse to have one instead of two reliable friends than it is to have four instead of five? If this were the case, this would point public health towards a high-risk rather than population strategy. We did not find any evidence for such an interaction effect and thus can not say that we have found that more social contact ceases to be beneficial beyond the threshold score of 12. In addition to the new results, we have expanded on this in the discussion section where it now reads: „We showed that the established LSNS cut-off can be employed by clinicians to identify subjects likely to suffer adverse effects due to social isolation. However, the absence of evidence for more pronounced negative effects of less social contact amongst those that are deemed socially isolated by the cut-off renders a public health strategy focused on high-risk individuals questionable.”

      3a) Although it was not directly tested in the paper, LSNS scores did not seem to change with increasing age (Table 1). This general stability of LSNS scores in older adults should be discussed further. The authors should consider how their relatively healthy and high SES sample may be less vulnerable to loss of family or friends in old age, making this sample sub-optimal for the question they have. The significance of the subject effect suggests that some individuals still experience a loss of social connectedness. The authors may want to elaborate on this and give some explanations of such subject differences in the ageing effect on social loneliness. Although stress was not a significant mediating factor, is it related to baseline loneliness or changes in loneliness in the current sample?

      Concerning the link between change in age and LSNS we indeed found a statistically significant effect of age change on higher social isolation in an ancillary LME. However, as the reviewer noticed, the per year effect is very small, meaning that it would need getting more than 20 years older to score one point higher on the LSNS sum score (see new Table S2, see also answer below to questions 4a and 3b). We therefore tend to agree that in our sample, higher age does not affect social isolation substantially.

      Furthermore, we very much appreciated your recommendation to further discuss how our relatively high SES-sample might be less vulnerable to loss of social contact during the aging process. As a foundation for this discussion, we investigated the link between SES and LSNS using an LME and found the association to be highly significant (see new Table S2). Furthermore, we added a table showing which percentage of our participants fell into the SES quintiles that would be observed in a fully representative German sample to help our readers to interpret our findings (see new Table S3). Following your advice, we have added a comment highlighting how the relatively high SES of our sample might have contributed to this in the limitations section: “As we found higher SES to be associated with lower LSNS scores, this relatively high SES sample might have led to underestimation of the detrimental effects of social isolation and increases in social isolation in the aging process.”

      Regarding the importance of chronic stress to social isolation, we did not only find no mediating effect of stress, we also did not find a significant simple association between TICS and LSNS scores (see new Table S2). We are hesitant to attribute this finding to the incorrectness of the stress-buffering hypothesis as the missingness in stress data makes all interpretations of analyses involving TICS scores problematic. We have expanded on this in the discussion section and added emphasis to the importance of also pursuing other mechanistic theories in our discussion, where it reads: “we could not find evidence that social isolation affected hippocampal volume through higher chronic stress measured with questionnaires, a hypothesis put forward by the stress buffering theory (Kawachi & Berkman, 2001). These latter analyses suffered from small sample sizes and a limited number of timepoints. Nonetheless, the lack of any significant link between chronic stress and social isolation (see Table S2) is hard to align with the stress-buffering hypothesis in spite of the missingness in the TICS.”.

      4a) The presentation of longitudinal data (Figure 1) lacks dimensionality. The scatter plots presented here are more suitable for cross-sectional studies and could cause confusion regarding the interpretation of the results. The authors should consider individual growth curves or spaghetti plots in visualizing change within subjects.

      We are grateful for your advice to visualise individual developments in social isolation and outcome measures over time in spaghetti plots and have done so to give our readers insight into these developments (see new Fig. S1). As you had assumed, there is no unequivocal pattern of increasing social isolation over time (see also answer to 3a). In addition, we decided to stick with presenting results of the statistical modeling of linear mixed effect using scatterplots in Figure 1, as this is regarded the most appropriate visualization of the tested effectors. Please see also response to 5b.

      Reviewer #2 (Public Review):

      The paper by Laurenz Lammer and colleagues used cohort data to investigate the cross-sectional and longitudinal association between loneliness and brain structure and cognitive function. The main finding was that baseline social isolation and change in social isolation were associated with smaller hippocampus volumes, reduced cortical thickness, and poorer cognitive function. Given that more and more people feel lonely nowadays (e.g., due to the pandemic), the study by Lammer and colleagues addresses a highly relevant health concern of our time.

      Significant strengths of the study:

      • large cohort;

      • the cross-sectional and longitudinal analyses confirmed the findings;

      • the study was preregistered;

      • the study included men and women;

      • analyses were sound and controlled for essential confounders.

      Thank you for your time to thoroughly review the manuscript and for the encouraging comments. Please see below how we implemented your advice.

      The major weaknesses of the study:

      1a) it is unclear whether loneliness causally contributes to brain structure and cognitive function;

      Indeed, based on structural equation analyses of the available data from this cohort, we could not find strong evidence for neither causality (social isolation causes brain/cognitive decline) nor reverse causality (brain/cognitive decline causes social isolation). This could be due to a lack of power to detect such effects due to the drop in sample size for these analyses. Overall, regarding these two competing hypotheses, we see some minor indication of support for causality of social isolation in our data due to the presence of robust and significant associations in our very healthy sample, the absence of clear increases in effect size when including cognitively less healthy participants and the absence of clear decreases in effect sizes when only including participants with high MMST scores. Accordingly, we added this concluding synopsis to our paragraph on causality in our discussion: “Still, overall these results only add a modicum of corroboration to the case for a causal role of social isolation.” and pointed towards the key role of RCTs in understanding causality in this regard: ”Intervention studies will be the gold standard to provide evidence with regards to the causal role and effect size of social isolation.”

      2a) the factors that may cause loneliness are unclear.

      Thank you very much for encouraging us to shed some light on participant characteristics of potential relevance to social isolation. Starting from the impulse to look into marital status and employment, we also investigated links to socioeconomic status, migration background, age at baseline, change in age, gender, living alone and the number of persons living in the participants dwelling. We found all of these factors except for gender and migration background to be significantly linked to social isolation. Results are presented in Table S2 and briefly referred to in the results section: “In our sample, social isolation was positively correlated with not living alone, being married, the number of persons living in the participants’ dwelling, being gainfully employed, younger baseline age and less change in age and being married but no to gender or having a migration background. See Tables S1-2 for descriptive statistics and details of the associations. To contextualise the observed link to SES, a comparison of SES category frequencies in LIFE-Adult and a fully representative sample (Lampert et al., 2013) is provided in Table S3.” And added to the discussion: “Existing and future research on reasons for and the role of social isolation in health and disease should provide guidance for the urgently needed development and evaluation of tailored strategies against social isolation and its detrimental effects.”

    1. Author Response

      Reviewer #1 (Public Review):

      Weakness of the study include:

      1) There are no data supporting a role for insulin regulation of microtubule-dependent GLUT4-containg vesicle movement. The data in Fig.2B do not support a differences in the number of "moving" GLUT4 vesicles between basal and insulin-stimulated fibers. The statement on line 103 that they "observed a ~16% but insignificant increase" to be confusing. These data do not support an effect of insulin on the number of moving GLUT4 vesicles that can be detected in an individual experiment. There is also effect of insulin on GLUT4 vesicles in the data reported in Fig.S2D, Fig.S5B, and Fig.S5F. However, the data in Fig. 2C suggest there was a consistent increase in "moving" vesicles in insulin-stimulated conditions in 4 independent experiments (how are these data normalized?). Because the basis of insulin-regulation of glucose uptake is the control of GLUT4 translocation to the plasma membrane, the authors need to clarify their thinking on why they do not detect insulin robust effects on GLUT4 dynamics in the individual experiments. Is it that they are not measuring the correct parameter? That the assay is not sensitive to the changes?

      The small (or no effect) of insulin distracts a bit from the findings that there is microtubule-dependent GLUT4 movement in basal and stimulated muscle fibers, and that disruption of this movement by depolymerization of microtubules or Kif5b knockdown blunts GLUT4 translocation. As noted above, the data strongly support microtubule-dependent GLUT4 dynamics as permissive for insulin-stimulated GLUT4 translocation even if this dynamics might not be a target of insulin action.

      In light of the reviewer´s comment and to avoid confusing/distracting readers we have removed figure 2C showing the effect of insulin based on pooled data across all our independent experiments. We discuss several possibilities for the lack of significant insulin effect on GLUT4 movement in individual experiments in the discussion section (lines 342 to 361 in TC version of MS). The discussion has been updated to reflect the points raised by the reviewer. More sensitive techniques than currently available in our lab are required to firmly conclude whether microtubule-based GLUT4 trafficking is directly regulated by insulin.

      2) The analyses of GLUT4-containing structures are not particularly informative. Co-localization with other markers (beyond syntaxin6) are needed to understand these structures. Defining structures as small, medium or large is incomplete. In particular, it is important to probe the microtubule nucleation site clusters for other membrane markers. Transferrin receptor? IRAP?

      While our analysis based on structure-segmentation clearly demonstrate a microtubule-dependent effect on GLUT4 localization, we completely agree that additional work including co-labelling of GLUT4 and various compartment markers is required to fully understand the localization changes observed for GLUT4-containing structures upon microtubule disruption. However, for practical reasons, it is not currently feasible for us to complete these analyses within a reasonable time-frame so we will reserve this for future studies.

      3) The Kinesore data do not support the authors hypothesis. The data show that Kinesore increases the amount of GLUT4 in the plasma membrane of basal cells and that insulin further increases plasma membrane GLUT4 to the same extent as it does in control cells. How does that provide insight into the role microtubules (or kif5b) in GLUT4 biology? Why does Kinesore increase plasma membrane GLUT4? Is it an effect of Kinesin 1 on GLUT4 vesicles? Kinesore is reported to remodel the microtubule cytoskeleton by a mechanism dependent on Kinesin 1. Is that the reason for the change in GLUT4?

      To better understand the effect of kinesore on GLUT4-dependent glucose uptake, we have now incubated EDL and Soleus muscles ± kinesore and ± insulin and measured 2-DG uptake (GLUT4 translocation and glucose transport is considered the rate-limiting step for 2-DG uptake in incubated muscles due to the lack of muscle perfusion in this model) and proximal insulin signaling. In contrast to the enhancing effect on membrane GLUT4 observed following kinesore treatment in basal and insulin stimulated L6 cells, kinesore did not stimulate basal 2-DG uptake in EDL and Soleus. Furthermore, kinesore markedly impaired insulin-stimulated 2-DG uptake (figure 4B). We also tested the effect of 2h kinesore treatment in differentiated primary human myotubes. In this model, kinesore reduced basal glucose uptake and blocked the insulin effect (figure 4C). Together, this suggests that kinesore inhibits GLUT4-dependent glucose uptake in adult muscle and primary human muscle cells, presumably by inhibiting the binding of GLUT4 containing cargo, despite kinesore also having an activating effect on Kinesin-1 motor function. This possibility is discussed in the current version of the manuscript (line 177-180, 203-211). These data are consistent with the KIF5B knockdown data in L6 and support a necessary role of this motor protein in skeletal muscle GLUT4 trafficking.

      To better understand, why kinesore led to increased rather than decreased GLUT4 translocation in L6 cells, we also disrupted the microtubule network using nocodazole and colchicine prior to kinesore stimulation. Surprisingly, kinesore stimulation enhanced membrane GLUT4 even in microtubule-disrupted L6 cells, indicating that the effect of kinesore on GLUT4 translocation is microtubule-independent in L6 cells. With three of four data sets supporting a necessary role of Kinesin-1 motor proteins in GLUT4 trafficking, including the adult muscle data, we end up concluding:

      …our shRNA data in L6 myoblasts and kinesore data in adult muscle support the requirement of KIF5B-containing Kinesin-1 motor proteins in insulin-stimulated GLUT4-dependent glucose uptake in skeletal muscle.

      However, we would also like to include the discrepant effect of Kinesore in L6 myoblasts as this may be useful information to others using this compound and/or studying GLUT4 in cultured cells.

      4) The analysis of Kif5b is a bit cursory. Depolymerization of microtubules in muscle fibers essentially blocks all GLUT4 movement (only the insulin condition is shown in Fig.2B but I assume basal would be equally inhibited), and fully inhibits insulin-stimulated glucose uptake in muscle fibers. What are the effects of nocodazole in L6 cells (cell used for kif5b studies) and is it similar in magnitude to kif5b knockdown? Those data would identify there are non-Kif5b microtubule-dependent effects.

      To address the magnitude of reduced insulin-stimulated GLUT4 translocation in microtubule-disrupted L6 cells, we investigated the effect of nocodazole (13 µM) and colchicine (25 µM) on GLUT4 translocation in L6 cells.

      Insulin stimulated GLUT4 translocation was reduced but not blocked by either nocodazole or colchicine. This is in accordance with previous in vitro studies in 3T3 adipocytes and muscle cells (PMID: 11085918, PMID: 11145966, PMID: 24705014). Overall, these data still support that Kif5b is a major microtubule motor protein regulating GLUT4 translocation across cell-types.

      5) The authors need to show that the fibers isolated from the HFD mice remain insulin-resistant ex vivo by measuring glucose uptake. It is possible that once removed from the mice they "revert" to normal insulin-sensitivity, which might contribute to the differences reported in Fig5.

      This is an important point. In figure 5 figure supplement 1E, we show that the fibers isolated from the diet-induced obese mice display impaired insulin-induced p-Akt Thr308 and p-TBC1D4 Thr642 after isolation and in vitro culture. This shows that the insulin resistance is present at the muscular level and is preserved after isolation and in vitro culturing.

      6) Although it is interesting that the authors have included the insulin-resistance models/experiments, they are not well developed and therefore the conclusions are not particularly strong.

      In this study, we induced insulin resistance by two different means (C2 ceramide treatment and diet-induced obesity) and demonstrated at the level of p-Akt and p-TBC1D4 in cultured muscle fibers that we successfully achieved insulin resistance in our models. In particular the high fat diet model is arguably the most common in vivo model of obesity-linked insulin resistance. Thus, we were able to study GLUT4 trafficking on microtubules in normal vs. insulin-resistant muscle fibers and found this to be impaired in insulin-resistant muscle. Although one could always have done more, we believe that our data on adult muscle GLUT4 movement in insulin-resistance are robust, novel and do support our conclusions and title.

      7) The data do not support the title.

      We respectfully disagree. See our reply to comment 6 above.

    1. Authorr Response

      Reviewer #1 (Public Review):

      1) The study finds Lyn to be degraded more efficiently via the proteasome and to be more tightly controlled by phosphatases when compared to Lck. However, rather than interpreting the findings as distinct kinase-intrinsic properties, one could attribute the slower degradation and stricter PTP control of Lyn to the fact that Lyn is the principal and predominant SFK in B cells and thus a "standard target" of the B-lymphoid molecular machinery, to which it is better adapted to.

      We respectfully disagree with the reviewer’s comment that our interpretation is limited to “kinase-intrinsic properties”. In many points within the manuscript we refer to the “B-lymphoid molecular machinery”. More specifically:

      • Lines 62-64 in the original submission (lines 60-61 in the revised manuscript): “….enzymatic promiscuity of SFKs can be buffered by their differential susceptibility to regulatory control mechanisms designed for keeping global SFK activity levels under strict control….”

      • Lines 113-114 in the original submission (lines 137-138 in the revised manuscript): “Lck and Lyn differ in the efficiency for signal ignition and in their susceptibility to regulatory mechanisms in B-cells”

      • Lines 135-136 in the original submission (lines 159-160 in the revised manuscript): “Thus, the proteasomal degradation machinery constrains the abundance of Lyn, but not Lck, within B-cells.”

      • Lines 162-163 in the original submission (lines 185-186 in the revised manuscript): “Collectively these data show that the BCR signaling machinery is more responsive to the action of Lyn, at the same time imposing stricter regulation on its expression and activity levels.”

      • Lines 475-477 in the original submission (lines 527-528 in the revised manuscript): “…identified specialized control mechanisms designed to keep Lyn, but not Lck, activity levels under strict control.”

      However, we cannot rule out, as a mutually inclusive scenario, that intrinsic SFK features contribute to their differential regulation by cellular mechanisms, a possibility that we also refer to in the manuscript. More specifically:

      • Lines 335-337 in the original submission (modified text in the revised version, lines 372-374): “On one hand there is the total amount of SFK activity within the cell, and on the other the individuality of SFK family members, dictated by intrinsic molecular features.”

      • Lines 477-478 in the original submission (lines 528-529 in the revised manuscript): “These data may signify that SFKs have been evolutionarily diversified to best suit the needs of the cellular environment they are expressed in…”

      Based on the reviewer’s comment, and to clarify further, we have modified the revised version of the manuscript (lines 372-374) as follows:

      “On one hand there is the total amount of SFK activity within the cell, and on the other the individuality of SFK family members, dictated by intrinsic molecular features and/or adaptation to cell-specific regulatory mechanisms.”

      We hope that our clarifications, satisfy the reviewer.

      2) Venn diagram depicting differentially regulated transcripts between Lck- and Lyn-expressing cells, it does not seem like Lck is able to regulate pathways which are not "canonically" regulated by Lyn.

      and

      As a distinct functional difference between Lck and Lyn is not established in this work, said SFKs' largely exclusive expression in T and B cells remains enigmatic.

      We thank the reviewer for the comment. We address this issue on the discussion section of the revised manuscript (lines 514-519).

      3) There is also the persisting problem of Lck being expressed to a much higher extent and the effect of the endogenously expressed Lyn since the model systems are not based on a Lyn-deficient cell line.

      For the purpose of the analysis, we tried to circumvent the discrepancies between Lck and Lyn expression levels by our equal GFP gating strategy (explained in Figure 1-figure supplement 3E/Fig.S3E in the original submission). Nevertheless, as shown in Figure 1C there is a physiological reason for the two SFKs not being equally expressed, and we refer to the biological implications of these individualities in the Discussion.

      The effect of endogenously expressed Lyn is represented by the phenotype of -Dox cells which we use as background in all our studies, especially since we show that there are no alterations on Lyn or any other SFK activation status resulting from Lck overexpression (Figure 1-figure supplement 2B/ Fig.S2B in the original submission), so we do not believe this is a problem. Additionally, a Lyn-deficient environment would also not be perfect, since very plausibly it could have undergone further signaling and survival adaptations that we could not account for.

      4) Lastly, the authors follow up their finding of deregulated transcripts belonging to the ER/UPR ontology cluster. Flow cytometric analysis indeed shows an influence of Lck and Lyn expression on ER homeostasis, which can be reverted with SFK inhibitors. Alas, additional follow-up experiments to functionally investigate the deregulated pathways suggested by the RNAseq analysis are not included in this study.

      We thank the reviewer for the comment, and we agree. However, its beyond of our capabilities and manpower and the scope of the present work to perform numerous functional or semi-functional studies for every GO analysis pathway that emerged from the transcriptomics studies. Although follow up work from our group will focus on comprehensive and meticulous analyses of gene expression profiles, currently such an effort would require long-lasting studies which would also significantly extend the size of the manuscript but also distort the focus from the effects we wish to pinpoint with the present work i.e. the unique adaptation of SFKs within the lymphocyte environment and gene expression profile tendencies exclusively controlled by SFK-generated signals.

      In an effort to satisfy the reviewer, we performed focused follow up studies specifically on the ER effect of SFK-transduced signals, since it appears to be a so-far unknown aspect of their function. The new data are presented in the revised version of Figure 4 (panels C and D) and Supplementary Figure 4-figure supplement 1. Corresponding text can be found in lines 323-345 of the revised manuscript (results section) and lines 499-512 and line 531 of the discussion. In brief, we show an SFK kinase-activity dependent activation of the ER-phagy receptor FAM134B, which is not accompanied by recruitment of LC3B, as dictated by the currently known canonical ER-phagy pathway. This is the first report of SFKs’ involvement in ER-phagy process and first time FAM134B activation is described in B-cells. Since this field is relatively new, and the role and regulation of ER-phagy is almost unexplored in B-cells, we hope that the reviewers will appreciate the novelty of the finding and its sufficiency for the current manuscript. We do realize that these initial data prompts for more detailed mechanistic investigation, which we are pursuing in the form of a more complete and comprehensive future study.

      Reviewer #2 (Public Review):

      1) Studies reveal no qualitative functional differences in Lck and Lyn that are likely to explain its unique ectopic expression of Lck in CLL

      and

      If Lck promotes pathophysiology by transduction of a qualitatively unique signal, one would expect that transcriptome analysis should reveal this difference.

      We thank the reviewer for the comment. We address this issue on the discussion section of the revised manuscript (lines 514-519).

      2) It is unclear from the material and methods whether the overexpressed Lyn is LynA or Lyn B. It appears in the text (lines 130-133) that they overexpress LynB specifically. A recent paper from Tania Freedman (Sci Adv 2022 PMID:35452291) suggests that LynA is more activating whereas LynB is more balanced with an inhibitory bias. The point is that it is important to discuss this because they may not be making a relevant comparison.

      We thank the reviewer for the comment, to clarify this, we added in the Materials and Methods section of the revised manuscript (under “Cloning and Plasmids”) the use of Lyn isoform B.

      We initially attempted to produce BJAB lines overexpressing LynA, however expression levels of this isoform was particularly low and we could not proceed with further analyses, so we cannot comment on how LynA might behave in an overexpression model in B-cells, especially given the absence of relevant information in the existing literature.

      The recent Sci Adv 2022 PMID:35452291 study deals with germline LynA and LynB isoform-specific knockouts and their propensity towards autoimmunity in mice. The authors compared the single isoform (LynA or LynB) and total Lyn knockouts by performing systemic phenotypic analyses of autoimmunity features (splenomegaly, myeloid cell profiles, proinflammatory markers on myeloid cells, B cell development, expansion of activated and autoimmunity-associated B cell subsets, autoimmunity scores). Differences they pinpoint between LynA and LynB are summarized as follows:

      1. “It was found that LynB has the dominant regulatory role in mice of both sexes, but that LynA expression is uniquely required to prevent autoimmunity in female mice”. The etiology of which is unclear.

      2. “LynB generally appears to be the dominant immunosuppressive isoform, with LynB deletion causing severe autoimmune disease in male and female mice. For some indicators (splenomegaly, glomerular IgG and C3 deposition, and kidney fibrosis), LynBKO and total LynKO mice developed equally severe phenotypes. In other cases (serum IgM and BAFF, glomerular immune infiltration, myeloid cell polarization, and monocyte/granulocyte expansion), LynBKO mice had less severe phenotypes than total LynKO mice, suggesting an additive effect with LynA”.

      3. “LynA and LynB seemed equally capable of promoting B cell development, regulating myeloid cell polarization and restraining myeloid-driven inflammation. Given the increased number of activated/inflammatory B cell types in LynAKO and LynBKO mice, future studies will be aimed at determining whether the single-isoform knockouts have a more B cell–initiated than myeloid cell–initiated form of autoimmune disease”.

      After careful reading of the manuscript, we could not find any functional analyses on the activation status of the distinct isoforms, or signaling events they elicit. Furthermore, the authors do not report any conclusions that LynA is more activating at the molecular level. Based on the above, we cannot connect the data published in PMID:35452291 paper and our results for discussing “LynA being more activating” and implications this might have on our studies.

      To comply with the reviewer’s suggestion, in our revised manuscript we cite this study (ref number 29) in the following sentence appearing in lines 380-383:

      “Lyn exists as two alternatively spliced variants LynA and LynB. Distinct biological functions between the two isoforms still remain poorly understood. A recent study (29) documented that LynB provides an advantage in protecting against autoimmunity compared to LynA; however, the underlying mechanisms for this phenotype are unclear.”

    1. Author Response

      Reviewer #2 (Public Review):

      The authors use data from 3 cross-sectional age-stratified serosurveys on Enterovirus D68 from England between 2006 and 2017 to examine the transmission dynamics of this pathogen in this setting. A key public health challenge on EV-D68 has been its implication in outbreaks of acute flaccid myelitis over the past decade, and past circulation patterns and population immunity to this pathogen are not yet well-understood. Towards this end, the authors develop and compare a suite of catalytic models as fitted to this dataset and incorporate different assumptions on how the force of infection varies over time and age. They find high overall EV-D68 seroprevalence as measured by neutralizing antibodies, and detect increased transmission during this time period as measured by the annual probability of infection and basic reproduction number. Interestingly, their data indicate very high seroprevalence in the youngest children (1 year-olds), and to accommodate this observation, the authors separate the force of infection in this age class from the other groups. They then reconstruct the historical patterns of EV-D68 circulation using their models and conclude that, while the serologic data suggest that transmissibility has increased between serosurvey rounds, additional factors not accounted for here (e.g., changes in pathogenicity) are likely necessary to explain the recent emergence of AFM outbreaks, particularly given the broader age-profile of reported AFM cases. The Discussion mentions important current unknowns on the biological interpretation of EV-D68 neutralizing antibody titers for protection against infection and disease. The analysis is rigorous and the conclusions are well-supported, but a few aspects of the work need to be clarified and extended, detailed below:

      1) Due to the lack of a clear single cut-point for seropositivity on this assay, the authors sensibly present results for two cut-points in the main text (1:16 and 1:64). While some differences that stem from using different cut-points are fully expected (i.e., seroprevalence being higher using the less stringent cut-point), differences that are less expected should be further discussed. For instance, it was not clear in Figure 2 why the annual probability of infection decreased after 2010 using the 1:64 cut-point, while it continued to increase using the 1:16 cut-point. It would also be helpful to explain why overall seroprevalence and R0 continue to increase over this time period using the 1:64 cut-point. Lastly, it would be useful to see the x-axis in Figure 4 extended to the start of the time period that FOI is estimated, with accompanying credible intervals.

      For the discussion on differences between the two cut-offs, please see response to essential comment 1.

      Extending the x-axis before 2006 in Figure 4 is not possible. Estimates of the overall seroprevalence at a year y require FOI estimates up until y-40. This implies the first estimates we can provide are for 2006.

      Credible intervals have been added to Figure 4.

      2) Additional context of EV-D68 in the study setting of England would be useful. While the Introduction does mention AFM cases "in the UK and elsewhere in Europe" (line 53), a summary of reported data on EV-D68/AFM in England prior to this study would provide important context. The Methods refers to "whether transmission had increased over time (before the first reported big outbreak of EV-D68 in the US in 2014)" (lines 133-134), rather than in this setting. It would be useful to summarize the viral genomic data from the region for additional context - particularly since the emergence of a viral clade is highlighted as a co-occurrence with the increased transmissibility detected in this analysis.

      We have added a figure (new Figure 1 – figure supplement 1) showing the annual number of EV-D68 detections reported by Public Health England from 2004 to 2020.

      We have also added the following text to the introduction: “Similarly, in the UK, reported EV-D68 virus detections also show a biennial pattern between 2014 and 2018 (Figure 1 – figure supplement 1).”

      We have also amended the sentence in the Methods.

      Finally, below is a screenshot of the nexstrain tree for EV-D68 based on the VP1 region and with tips representing sequences from the UK (light blue) and European countries in colour. There is a lot of mixing between sequences from different regions, indicating widespread transmission and small regional clustering. We have added the following text to the Discussion: “Reported EV-D68 outbreaks in 2014 and 2016 were due to clade B viruses, while the 2018 outbreaks were reported to be linked to both B3 and A2 clade viruses in the UK (10), France (32) and elsewhere.”

      Reviewer #3 (Public Review):

      In the proposed manuscript, the authors use cross-sectional seroprevalence data from blood samples that were tested for evidence of antibodies against D68 for the UK. Samples were collected at 3 time points from individuals of all ages. The authors then fit a suite of serocatalytic models to explain the changing level of seropositivity by age. From each model they estimate the force of infection and assess whether there have been changes in transmissibility over the study period. D68 is an important pathogen, especially due to its links with acute flaccid myelitis, and its transmission intensity remains poorly understood.

      Serocatalytic models appear to be appropriate here. I have a few comments.

      The biggest challenge to this project is the difficulty in assigning individuals as seronegative or seropositive. There is no clear bimodal distribution in titers that would allow obvious discrimination and apparently no good validation data with controls with known serostatus. The authors tackle this problem by presenting results to four different cut-points (1:16 to 1:128) - resulting in seropositivity ranging from around 50% to around 80%. They then run the serocatalytic models with two of these (1:16 and 1:64) - leading to a range of FoI values of 0.25-0.90 for the 1 year olds and 0.05-0.25 for older age groups (depending on model and cutpoint). This represents a substantial amount of variability. While I certainly see the benefit of attacking this uncertainty head on, it does ultimately limit the inferences that can be made about the underlying risk of infection in UK communities, except that it's very uncertain and possibly quite high.

      I find the force of infection in 1 year olds very high (with a suggestion that up to 75% get infected within a year) and difficult to believe, especially as the force of infection is assumed much lower for all other ages.

      The authors exclude all <1s due to maternal antibodies, which seems sensible, however, does this mean that it is impossible for <1s to become infected in the model? We know for other pathogens (e.g., dengue virus) with protection from maternal antibodies that the protection from infection is gone after a few months. Maybe allowing for infections in the first year of life too would reduce the very large, and difficult to believe, difference in risk between 1 year olds and older age groups. I suspect you wouldn't need to rely on <1 serodata - just allow for infections in this time period.

      Relatedly, would it be possible to break the age data into months rather than years in these infants to help tease apart what happens in the critical early stages of life.

      Yes. We have added two figures (new Figures 1C and 1D) showing the prevalence of antibodies in children <1 yo. We show these data for the three serosurveys combined, because the number of individuals per month of age is very small.

      One of the major findings of the paper is that there is a steadily increasing R0. This again is difficult to understand. It would suggest there are either year on year increases in inherent transmissibility of the virus through fitness changes, or year on year increases in the mixing of the population. It would be useful for the authors to discuss potential explanations for an inferred gradual increase in R0.

      We have removed the estimates of R0 from the manuscript.

      On a similar note, I struggle to reconcile evidence of a stable or even small drop in FoI in the 1:64 models 4 and 5 from 2010/11 (Figure 3) with steadily increasing R0 in this period (Figure 4). Is this due to changes in the susceptibility proportion. It would be good to understand if there are important assumptions in the Farrington approach that may also contribute to this discrepancy.

      We have removed the estimates of R0 from the manuscript and only present the reconstruction of the annual number of new infections per age class and year (new Figure 5). We think this measure is more adapted to the discussion of the results.

      In addition, when using the classical expression R{0t}=1/(1-S(t)), with S(t) the annual proportion seropositive, the high seroprevalence estimates (new Figure 4) result in extremely high estimates of the basic reproduction number (median ranges: 11.6 – 29.7 for 1:16 and 3.3 – 7.6 for 1:64 during the period 2006 to 2017).

      We had previously used the Farrington approach as it is adapted to cases when the force of infections is different for different age classes.

      The R0 estimates (Figure 4) should also be presented with uncertainty.

      R0 no longer presented, but estimates of overall seroprevalence now presented with uncertainty.

      Finally, given the substantial uncertainty in the assay, it seems optimistic to attempt to fit annual force of infections in the 30 year period prior to the start of the sampling periods. I would be tempted to include a constant lambda prior to the dates of the first study across the models considered.

      We thank the reviewers for the suggestion.

      We implemented this change (constant FOI before 2006) in the previous models without maternal antibodies and the result for the random-walk-based models was that the variance of the random walk was estimated over a very short period, thus resulting in a rather non- smoothed FOI.

      Implementing this change with the new models with maternal antibodies and random-walk on the FOI was technically a bit complex. We therefore kept the simple random-walk over the whole period and added the following paragraph to the Discussion:

      “It is important to interpret well the results for the estimates of the FOI over time from our analysis under the assumptions of the models. First, as the best model uses a random walk on the FOI, the change in transmission that we infer happens continuously over several years. In reality, this may have occurred differently (e.g. in a shorter period of time). Our ability to recover more complex changes in transmission is limited by the data available. It would not be surprising if EV-D68 has exhibited biennial (or longer) cycles of transmission in England over the last few years, as it has been shown in the US (7) and is common for other enteroviruses (30). However, it is difficult to recover changes at this finer time scale with serology data unless sampling is very frequent (at least annual). Therefore, our study can only reveal broader long-term secular changes. Second, interpretation of the results before 2006 must be avoided for two resasons. On the one hand, as we go backwards in time, there is more uncertaintly about the time of seroconversion of the individuals informing the estimates of the FOI. On the other hand, because age and time are confounded in cross-sectional seroprevalence measurements, the random walk on time may account for possible differences in the FOI through age (possibly higher in the youngest age classes, and lowest in the oldest), which are note explicitly accounted for here. This may explain the decline in FOI when going backwards in time before the first cross-sectional study in 2006.”

    1. Author Response

      Reviewer #3 (Public Review):

      A large body of work in the literature has established that the diversity in cells of identical genetic background occurs due to two components: 1) intrinsic noise - such as stochastic fluctuations in gene expression - as well as 2) extrinsic noise - variability that arises from sources that are external to the biochemical process of gene expression, such as abundances of ribosomes or stage in the cell cycle. Note that this widely-accepted definition does not separate intrinsic and extrinsic from intracellular and extracellular. The authors cite a few of these seminal papers (which focus on noise introduced to gene expression) but then define their interpretation of intrinsic noise much more broadly "... intrinsic noise as phenotype(s) fluctuations across isogenic cell populations cultured under the same conditions. Measurement noise in some cases can also be thought of as intrinsic noise. Fluctuations in cellular phenotype(s) driven by the global environment will be referred to as extrinsic noise." This misuse of widely accepted terminology creates significant confusion in the interpretation of the results.

      A point of contention with redefining noise as the authors have done is that they are lumping all processes unique to the cell as intrinsic and all environmental factors as extrinsic. Thus, when statements are made such as "external factors that contribute to noise are principally manifest through convection" (line 40-41, page 2) the veracity of these assumptions must be established. For example, when a ligand binds and unbinds from a receptor due to thermal energy, that "noise" in cellular stimulation is not convection-based, yet an example of how extrinsic noise can influence cellular responses. The definition is important because the underlying premise for the pipeline presented is that "While intrinsic cell variability can be significant, we believe that it is the extrinsic factor(s) that drive sample variability in most experimental cellular systems" (lines 42-43, page 4).

      We thank the referee for this very important critical comment. The referee correctly points out that the terminology (intrinsic vs. extrinsic noise) used in the cited papers has to be adapted and more clearly stated.

      We wish to point out that the autonomous system in Michael Elowitz and colleagues’ original paper was a single protein within a single cell. The noise that was measured in these experiments was driven by temporal fluctuations. An example of extrinsic noise for this system is, indeed, as pointed out by the referee, ligand binding and unbinding from a receptor.

      By contrast, our autonomous system is an ensemble of cells isolated from other samples but still subject to fluctuations in the external environment. We did not continuously measure temporal fluctuations in individual cells, but recorded snapshot(s) of cellular phenotype(s) within a single sample. The source of noise in these measurements is variability between individual cells, and we referred to this type of noise as intrinsic because it driven by the processes within the sample. We denoted as extrinsic noise that which is driven by external factors to this autonomous system (a particular sample), such as variability between different samples due to temperature, humidity, etc.

      All of these external factors (to the best of our knowledge) are related to movement and gradient formation of fluid or gas and, hence, from a physicochemical perspective, driven by convection process(es). The initial cell seeding that eventually leads to unique microenvironment formation can also be thought as an example of extrinsic noise using this terminology. The process of cell sedimentation and attachment is driven by advection, as the referee correctly points out. We have, therefore, adjusted the text accordingly.

      We hope that clarifying the intrinsic/extrinsic terminology in the "Introduction" section of the manuscript (line 37) should be sufficient to avoid the confusion the referee discusses. We are open (very reluctantly) to switching terminology to terms internal and external noise.

      Throughout, figures lack labels and sufficient explanation for interpretation, as well as the number of experiments used to generate the data that is processed through the pipeline for each condition. For a study designed to eliminate replicate culture conditions, the onus is on the authors to show that replicates are in fact fully recapitulated in the population variance after statistical binning/processing.

      To address this comment, we modified the figure legends and labels of most of the figures.

      We wish to emphasize that each point-injection experiment we performed is unique due to randomness in the local delivery method. This is due to the variability in the manual micro-injection release rate and direction of the initial flow. Several experiments (3+) were performed to improve the width of the label(s) distribution(s) and their mixing condition, and the results of the better optimized local delivery were selected as representative for the manuscript. Sample selection was independent of the outcome of drugs action and based on initial label distribution only. An experimental improvement of our method, similar to initialization of the pseudo-random number generator in numerical experiments, is required to achieve systematic reproducibility of drug(s) distribution(s). One way to do so is robotically, but certainly the best is to design a system that utilizes a predictably constant drug gradient within a sample that contains large enough cells, a topic that will be the subject of future experiments.

      Ultimately, when the paper presents results such as Figure 9 as the culmination of the pipeline as applied to cell viability studies, it is unclear how useful insight is extracted from this methodology. Four drugs are applied in combination to adherent HeLa cells and time-dependent local cell density is provided as a proxy for cell viability. While it is stated that "The absolute drug concentration can be determined using the homogeneous delivery method discussed above" (line 421-422, page 19), this analysis is not performed, and I am left unsure of whether extrinsic factors are truly driving sample variability under this context. It is unclear to the reader how the point injections were administered, and no discussion of how the confounding factors of synergy or antagonism will be addressed through this methodology.

      We attempted to explain that data shown in Figure 9 were not meant to be the climactic point of the entire pipeline (rather, the data shown in Figure 6 represent our key achievement). In this four-drug experiment, we exhausted the fluorescent spectrum bandwidth necessary to distinguish drug labels (i.e., using commonly available microscopy tools). In order to estimate local cell density, we had to rely on bright field imaging data which is not the most accurate possible implementation (see further response to your comment below). More importantly, we had to wash samples between the measurements to remove detached (dead) cells and cell debris. This step can (and usually does) influence local cell density in a non-uniform fashion, since both media removal and deposition are performed locally by pipetting (cells in the vicinity of aspiration/media deposit sites can be washed off regardless of the drug treatment.)

      To clarify how point injections were administered, we added a detailed description in the Methods section. Please see section Drug labeling and delivery, pages 11-12.

      In this manuscript, we wished to establish possible applications of our method and avoid in depth analysis or biological interpretation of a specific drug combination that is dependent on the cell line or on a particular experimental condition. We added a paragraph in the "Discussion" section suggesting the necessity of future research dedicated to methodology and analytical interpretation of high-dimensional context-dependent drug interaction data.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors unexpectedly found that the protein Grb2, an adaptor protein that mediates the recruitment of the Ras guanine-nucleotide exchange factor, SOS, to the EGF receptor, can be recruited to membranes by the immune cell tyrosine kinase Btk. The authors show, using total internal reflection fluorescence (TIRF) microscopy that the interaction with Grb2 is reversible, dependent on the proline-rich region of Btk, and independent of PIP3. These experiments are well performed and unambiguous.

      The authors next asked whether Grb2 binding to Btk influences its kinase activity, by evaluating (i) Btk autophosphorylation and (ii) the phosphorylation of a peptide from the endogenous substrate PLCy1. The readout relies on non-specific antibody-mediated detection of phosphotyrosine but nevertheless reveals a concentration-dependent increase in both Btk autophosphorylation and PLCy1 phosphorylation. The experiments, however, have only been performed in duplicate and, particularly in the case of PLCy1 phosphorylation, exhibit enormous variability which is not reflected in the example blot the authors have chosen to display in Figure 3C. Comparison of the same, duplicate experiment presented in Figure 3 Supplement 2 paints a very different picture.

      We added an experiment wherein we measure phosphorylation of the PLC𝛾2-peptide fusion by Btk in the presence of different concentrations of Grb2, and we have carried out LC-MS/MS to probe which Tyr are phosphorylated in these experiments. We have also modified our presentation of the Western blot data to allow readers to view each replicate separately. We believe this makes it easier to evaluate the trends observed in each replicate, and because the intensity measured here is only semi-quantitative, due to limitations of the technique, we believe this is a more accurate way to present our results. Both Tyr of the PLC𝛾2-peptide are phosphorylated, as well as one Tyr at the very C-terminus of GFP (Figure 3 – Supplements 3-5).

      The authors next sought to determine which domains of Grb2 are required for activation of Btk. Again, these experiments were only performed in duplicates, and the authors’ claims that Grb2 can moderately stimulate the SH3-SH2-kinase module of Grb2 are not well supported by their data (Figure 4C-D).

      We have opted to remove the data for the activation of the SH3-SH2-kinase construct (Src module) from the revised manuscript. Upon further inspection, we agree that these experiments only showed a weak trend and believe that much more experimentation is needed to draw firm conclusions regarding this construct. We do still speculate that SH2 linker displacement may contribute to our observations of enhanced catalytic activity of Btk in the presence of Grb2, however this speculation is based solely on previous work with Btk and other kinases (Aryal et al., 2022; Moarefi et al., 1997).

      The authors next asked whether Grb2 stimulates Btk by promoting its dimerization and trans- autophosphorylation. The authors measured the diffusion coefficient of Btk on PIP3- containing supported lipid bilayers in the presence and absence of Grb2. They noted that the diffusion coefficient of individual Btk particles decreases with increasing unlabeled Btk, which they interpret as Btk dimerization. Grb2 does not appear to influence the diffusion of Btk on the membrane (Figure 5A). Presumably, the diffusion coefficient reported here is the average of a number of single-molecule tracks, which should result in error bars. It is unclear why these have not been reported. Next, the authors assessed the ability of Grb2 to stimulate a mutant of Btk that is impaired in its ability to dimerize on PIP3-containing membranes. In contrast to wild-type Btk, autophosphorylation of dimerization-deficient Btk is not enhanced by Grb2. Whilst the data are consistent with this conclusion, again, the experiment has only been repeated once and the western blot presented in Figure 5 Supplement 2 is unreadable. It is also puzzling why Grb2 gets phosphorylated in this experiment, but not in the same experiment reported in Figure 3 Supplement 2.

      The diffusion coefficient reported here is determined from a large number of single molecule tracks. We have expanded our explanation of how this is done in the Materials and Methods, as well as providing an example of the data and fits from one of the conditions in Figure 4 – Supplement 3. We are now including standard deviation for each diffusion coefficient determined from the fit of the step size distribution.

      We have opted to remove the data involving the dimerization-deficient Btk construct. We agree that these results are difficult to interpret.

      We have investigated the Grb2 phosphorylation signal and concluded that this is an off-target effect of the antibody. MS/MS cannot detect and phosphorylation on Grb2. We now comment on this in the figure legend of Figure 3 – Supplement 2.

      Finally, the authors argue that Grb2 facilitates the recruitment of Btk to molecular condensates of adaptor and scaffold proteins immobilized on a supported lipid bilayer (SLB) (Figure 6). This is a highly complex series of experiments in which various components are added to supported lipid bilayers and the diffusion of labelled Btk is measured. When Btk is added to SLBs containing the LAT adaptor protein (phosphorylated in situ by Hck immobilized on the membrane via its His tag), it exhibits similar mobility to LAT alone, and its mobility is decreased by the addition of Grb2. The addition of the proline-rich region (PRR) of SOS further decreases this mobility. In this final condition, the authors incubate the reactions for 1 h until LAT undergoes a phase transition, forming gel-like, protein-rich domains on the membrane, shown in Figure 6B. The authors’ conclusion that Btk is recruited into these phase-separated domains based on a slow-down in its diffusion is not well supported by the data, which rather indicates that Btk is excluded from these domains (Figure 6B – Btk punctae (green) are almost exclusively found in between the LAT condensates (red)). As such, the restricted mobility of Btk that the authors report may simply reflect the influence of barriers to diffusion on the membrane that result from LAT condensation into phase- separated domains. The authors also present data in Figure 6 Supplement 1 indicating that Grb2 recruitment to Btk is out-competed by SOS-PRR and that Btk does not support the co- recruitment of Grb2 and SOS-PRR to the membrane. These data would appear to suggest that the authors’ interpretation of the decreased mobility of Btk on the membrane may not be correct.

      We have now included an example of one of the single molecule videos, overlayed with the surrounding LAT phase, to more directly display the data that was recorded for this experiment. In this video, it is possible to see that the LAT dense phase occupies only some of the observed window, and although it is possible that these dense “islands” function as barriers to Btk diffusion, Btk would be expected to diffuse freely outside of the LAT dense areas of the bilayer. This property can be clearly seen in the video we have now included. This is reminiscent of what was observed previously during the LAT phase transition for tracking of LAT itself (Sun et al., 2022). Given the extensive previous analysis of LAT diffusion on supported lipid bilayers (Lin et al., 2022; Sun et al., 2022), we believe the necessary controls have been included to support our conclusions. However, we agree there is much to be learned about this interaction and we hope that future studies will further investigate the relationship between cytoplasmic kinases and plasma membrane associated signaling clusters.

      Reviewer #3 (Public Review):

      The study of Nocka and colleagues examines the role of membrane scaffolding in Btk kinase activation by the Grb2 adaptor protein. The studies appear to make a case for a reinterpretation of the "Saraste dimer" of Btk as a signaling entity and assigns roles to the component domains in the Src module in Btk activation. The point of distinction from earlier studies is that this work ascribes a function to an adaptor protein as promoting the kinase activation, rather than vice versa, and also illustrates why Btk can be activated via modes distinct from its close relative, such as Itk. Importantly, these studies address these key questions through membrane tethering of Btk, which is a successful, reductionist way to mimic cellular scenarios. The writing could be improved and can absolutely be more economical in word choice and use; currently, there is a good deal of background to each section that is not always comprehensive or crucial to contextualise the findings, while key information is often omitted. The results are currently not described in a detailed manner so there is an imbalance between the findings, which should be the focus, relative to background and interpretations or models.

      We have assessed the manuscript and made many improvements to shift the focus to the findings, while providing only the necessary background for readers unfamiliar with the specifics of Btk and Grb2 signaling and structure.

    1. Author Response

      Reviewer #1 (Public Review):

      Ge et. al., examined sodium-glucose cotransporter-2 inhibitors (SGLT2i) in Alport syndrome (AS), and demonstrate that it was beneficial in AS through reduced lipotoxicity in podocytes as a key mechanism of action. The SGLT2i empagliflozin has been previously shown to have positive effects on hyperglycemia control, as well as on cardiovascular and renal outcomes of type II diabetes mellitus through tubuloglomerular feedback, but its effect on glomerular diseases such as AS are unknown to date. The authors have previously identified that cholesterol efflux in podocytes plays a critical pathogenic role in a diabetic kidney disease setting. The evidence that authors provide in favor of their hypothesis in a disease of non-metabolic origin such as AS, was supported as the SGLT2i was effective in reducing the deleterious effects of lipotoxicity in podocytes, ameliorated glomerular injury and proteinuria, and extending the life span of Col4a3 knockout mice. They further show that empagliflozin treatment mitigated AS podocytes from cell death through apoptosis, but did not impact the cell's cytotoxicity. These results support the notion that empagliflozin affects the regulation of important metabolic switch in mouse kidneys, perhaps through decreasing lipid accumulation in podocytes.

      However, the authors solely rely on one IHC staining image of a human biopsy to demonstrate SGLT2 expression in podocytes in vivo. Although the authors have done several experiments which greatly increase the confidence in their findings that empagliflozin is beneficial in AS and would have clinical significance, their data does not rule out the possibility that empagliflozin has beneficial effects through the other glomerular cells in AS, or limited to impacting lipids in podocytes in AS.

      We thank the reviewer for recognizing the significance of our findings and for pointing out some additional concerns with our study. In this revised version, we have added experiments that focus on investigating the specific effect of empagliflozin on AS podocytes. We added immunofluorescence staining of AS mouse kidney sections which supports the idea that SGLT2 is expressed in podocytes. We investigated the effect of SGLT2 knockdown in AS podocyte using siRNA and compared the anti-lipotoxic effects of siSGLT2 to SGLT2i.

      Reviewer #3 (Public Review):

      Using cultured human podocytes the expression of SGLT2 is established using immunostaining and western blotting. An analysis of podocyte RNA wasn't performed, but the expression in cultured podocytes was comparable to that seen in human cultured proximal tubular cells. This work then paved the way for treatment of immortalized cells obtained from an Alport syndrome mouse model (Col4A3-/-), representing an autosomal recessive form of Alport syndrome. Podocytes from Alport syndrome mice showed a lipid droplet accumulation which was reduced to some extent by SGLT2 inhibition. In a series of metabolic experiments, it was shown that SGLT2 inhibition reduced the formation of pyruvate as a metabolic substrate in Alport podocytes. In vivo experiments showed an improvement in survival of Col4a3-/- mice treated with SGLT2 inhibition. When compared to ace inhibitor, SGLT2 inhibition has a similar effect on renal function and no additive effect was seen with SGLT2 inhibitor plus ace inhibitor. Like the cell assays, the in vivo treatment seemed to prevent the podocyte lipid accumulation in Alport syndrome mice.

      This data in cells and animals generally supports the findings in SGLT2 inhibitor human studies, where Alport syndrome patients with proteinuria and progressive CKD seem to benefit. The work paves the way for a dedicated trial of SGLT2i in Alport patients and a reassessment of the human podocyte disease phenotype in this condition, before and after treatment. There are patients with mutations in SGLT2 with familial renal glycosuria - it would be interesting to test via urine derived podocytes whether a similar metabolic switch was occurring and its consequences to pave the way for long term treatment regimes.

      We thank the reviewer for recognizing the significance of our findings. We appreciate the reviewer’s concern that podocyte SGLT2 RNA levels should be studied. In this revised version, we added the results of SGLT2 mRNA expression analysis in immortalized podocytes and tubular cells. These results were added in Figure 1E. We agree with the insightful suggestions to study the metabolic switch in familial renal glucosuria in patients with SGLT2 mutations, as well as to evaluate Col4a5 AS model. We have included these insights in our discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors address the origin of the macrophage increase in sensory ganglia after peripheral nerve injury, showing that there is no major influx by blood-derived monocytes into ganglia after injury and that resident macrophages proliferate, which is dependent on CX3CR1 signaling.

      • Interesting and relevant question, mainly addressed with adequate experimental approaches.

      • Most conclusions are supported by the data, however, some important controls and experiments are missing.

      • The authors should demarcate their results from the study of Iwai et al, 2021 which addresses similar questions.

      Thank you for the positive comments, we hope that our point-by-point responses below and the important changes/inclusions in the MS satisfactorily addressed your concerns. We agree that some important controls were missing, and we have included additional data in the revised manuscript. Regarding the Iwai et al. paper, it is in line with our hypothesis. In fact, they suggest that in trigeminal ganglia (TG), resident macrophages proliferate after peripheral injury, although they detected few blood monocytes infiltrating the TG. Our paper, besides to confirm Iwai et al. results, by using different and complementary approaches are more specific compared to BM transfer in irradiated mice, we also advanced in terms of the mechanisms that these cells proliferate (CX3CR1 signalling) and the impact of these proliferation for neuropathic pain development. We discussed these points in the new version of the MS. Please see page 4 lines 88-93.

      Reviewer #2 (Public Review):

      The investigators looked at mφs in lumbar DRG after a spared nerve injury in which two of the three branches of the sciatic nerve are transected and the third left intact. This is a classical preparation for studying neuropathic pain. This paper demonstrates that the increase of mφs is an increase in the number of CX3CR1+ (resident) mφs and not CCR2+ (infiltrating mφs) by using CX3CR1 and CCR2 individual reporter mice. Using a CX3CR1 conditional knockout (KO) mouse, they found that this receptor must be present on the mφs for the increase in number to occur. Next, they did a parabiosis experiment with GFP+ mice and found that neither of these mφ subtypes infiltrated into the DRG. To examine proliferation, they injected animals with Ki67 and found this label, which is an indication of proliferation, was present in the CX3CR1+ mφs (but not the CCR2+ mφs). Finally, they identified the CX3CR1 mφs to be the cells that express TNFα and IL-1β but not IL-6.

      An experiment that would be useful would be to determine if there is an increase or a decrease in the availability to mφs of the ligand CXC3L1 after the spared nerve injury. The authors state from the work of others that membrane-bound CX3CL1 is constitutively expressed and that it is decreased after nerve injury. They hypothesize that this indicates a release of the chemokine, but such a decrease could also indicate a decrease in expression. A few sentences on what is known in other systems on the importance and mode of action of membrane-bound and non-membrane-bound CX3CL1 would be useful.

      Thanks to the reviewer for a great summary of our manuscript. We have now performed a time course of Cx3cl1 expression in the DRG after the spared nerve injury and it was included in figure 7A. We also apologise for the lack of information regarding the importance and mode of action of membrane-bound and non-membrane-bound CX3CL1, which is now included in the discussion section (Page 16).

      The main weakness of the manuscript is that many highly relevant previous findings, in some cases reporting nearly identical experiments sometimes with the same and sometimes with somewhat different results, are not mentioned. Kalinski et al. (which is cited but not in this context) reported a very similar parabiosis experiment. While they did not identify subtypes of mφs, they found an increase in infiltration of mφs, which was small (though statistically significant) compared to the larger increase that occurred in the distal nerve. In 2013 and 2018, Niemi et al. and Lindborg et al (J Neurosci and J

      Neuroinflammation respectively) reported that mφs in the DRG are somewhat decreased in a CCR2 KO mouse, suggesting again that there is some infiltration of mφs into the DRG after axotomy. They also showed that the mφ chemokine CCL2 increases in the DRG after sciatic nerve injury. With regard to proliferation, Yu et al. in 2020 (which again is cited but not in this context) also used a spared nerve paradigm stained DRGs for CX3CR1+ mφs and found an increase. They then stained DRG sections for Ki67 and demonstrated proliferation in this population. An earlier reference by Krishnan et al in 2018 published in J Neuropathol Exp Neurol is entitled "An Intimate Role for Adult Dorsal Root Ganglia Resident Cycling Cells in the Generation of Local Macrophages and Satellite Glial Cells". With regard to cytokine expression, in 1995, Murphy et al published a paper in J Neurosci demonstrating induction of interleukin-6 in axotomized sensory neurons.

      Thank you for the comment. These papers, you have indicated, are the main reason we have idealised our MS. The controversy regarding the possible infiltration of peripheral blood monocytes for the increase in the number of macrophages in the sensory ganglia after peripheral nerve injury. Furthermore, some of these papers you also indicated, came out during the execution of this manuscript, and they also brought controversies or did not explore some points. Therefore, we believe that our work by using different and complementary approaches strongly support the hypothesis that after peripheral nerve injury, peripheral blood monocytes did not infiltrate the DRGs significantly, but that the increase in the macrophages population is due to the proliferation of resident macrophages. Furthermore, we provided novel mechanistic evidence of the role of CX3CR1 signalling for the proliferation of these cells (figures 7 and S6). In addition, our new experiments suggested by the referees and editor suggest that CX3CR1-dependent proliferation of DRG macrophages is involved in the development of neuropathic pain (Figures 6D and 7E). We will make these points clear in the new version of the MS. Please see pages 11, 12, 14 and 17 (discussion and introduction section).

      Reviewer #3 (Public Review):

      This paper addresses the mechanism underlying a well-documented finding whereby the numbers of resident macrophages increase in dorsal root ganglia following peripheral nerve injury. It delineates the relative contribution of monocyte recruitment via circulation and local proliferation. The paper is clearly structured and written, and the data overall support the main conclusion that the increase in nerve-associated macrophages is primarily driven by proliferation, not monocyte recruitment. Its main weakness is that the question that is being asked is rather restricted, so the additional insight gained for the field will be incremental. It would be particularly interesting in the future to address whether the existence of a protective barrier indeed is the reason peripheral cells are not recruited to the nerve injury lesion and to assess e.g. whether forced breaching of this barrier results in monocyte influx and altered injury response.

      We appreciate your comments and suggestions. In the new version of the MS, we are presenting a series of novel experiments that confirm and support our initial hypothesis. Furthermore, novel experiments also explore the importance of the phenomenon we have explored in the context of neuropathic pain development. Regarding your suggestion about the next steps, we are working now in an attempt to understand why these cells are not able to infiltrate the DRGs after injury. Interestingly, one paper that came out during the revision of this work, showed that CD8+ T cells that are not able to infiltrate the DRGs after nerve injury in adult mice, start to infiltrate the DRGs of old mice (Zhou et al. 2022), indicating that ageing process may promote changes in this protective barrier. In addition, we have published a recent paper indicating that immune cells infiltrate the dorsal root leptomeninges after SNI (Maganin et al. 2022). We included these references and discussed these points in the new version of our MS. Please see page 15 lines 366 and 370.

      References:

      Zhou, L., G. Kong, I. Palmisano, M. T. Cencioni, M. Danzi, F. De Virgiliis, J. S. Chadwick, G. Crawford, Z. Yu, F. De Winter, V. Lemmon, J. Bixby, R. Puttagunta, J. Verhaagen, C. Pospori, C. Lo Celso, J. Strid, M. Botto, and S. Di Giovanni. 2022. "Reversible CD8 T cell-neuron cross-talk causes aging-dependent neuronal regenerative decline." Science 376 (6594): eabd5926. https://doi.org/10.1126/science.abd5926.

      Maganin, A. G., G. R. Souza, M. D. Fonseca, A. H. Lopes, R. M. Guimarães, A. Dagostin, N. T. Cecilio, A. S. Mendes, W. A. Gonçalves, C. E. Silva, F. I. Fernandes Gomes, L. M. Mauriz Marques, R. L. Silva, L. M. Arruda, D. A. Santana, H. Lemos, L. Huang, M. Davoli-Ferreira, D. Santana-Coelho, M. B. Sant'Anna, R. Kusuda, J. Talbot, G. Pacholczyk, G. A. Buqui, N. P. Lopes, J. C. Alves-Filho, R. M. Leão, J. C. O'Connor, F. Q. Cunha, A. Mellor, and T. M. Cunha. 2022. "Meningeal dendritic cells drive neuropathic pain through elevation of the kynurenine metabolic pathway in mice." J Clin Invest 132 (23). https://doi.org/10.1172/JCI153805.

  3. Mar 2023
    1. Author Response

      Reviewer #1 (Public Review):

      This study focuses on the role of polo like kinase 1 (PLK-1) during oocyte meiosis. In mammalian oocytes, Plk1 localizes to chromosomes and spindle poles, and there is evidence that it is required for nuclear envelope breakdown, spindle formation, chromosome segregation, and polar body extrusion. However, how Plk1 is targeted to its various locations and how it performs these functions is not well understood. This study uses C. elegans oocytes as a model to explore PLK-1 function during meiosis. They take advantage of an analogue-sensitive allele of plk-1, which enabled them to bypass nuclear envelope breakdown defects that occur following PLK-1 RNAi. This allowed them to dissect later roles of PLK-1 in oocytes, demonstrating that depletion causes defects in spindle organization, chromosome congression, segregation, and polar body extrusion. Moreover, the authors defined mechanisms by which PLK-1 is targeted to chromosomes, showing that CENP-C (HCP-4) is required for localization to chromosome arms and that BUB-1 is required for targeting to the midbivalent region. Finally, they demonstrate that upon removal of PLK-1 from both domains, there are severe meiotic defects. These findings are interesting. However, there is a need for additional analysis to better support some of their conclusions, and to aid in interpretation of particular phenotypes. Specific comments are below.

      • For many important claims of the paper, a single representative image is shown but the n is not noted. This is an issue throughout the paper for much of the localization analysis (e.g. Figure 1B, 1C, 1D, 2A, 2B, 3A, 3B, 3C, etc.); in cases like this, numbers should be included to increase the rigor of the presented data. How many images or movies were analyzed that looked like the one shown? For linescans, were they done only on one image? How many independent experiments were done, etc?

      We had initially chosen a representative image. Localisation was the same in all images that allowed ‘proper’ assessment of PLK-1 localisation. In our case, this means that we can only analyse bivalents that are perpendicular to the light path to distinguish between bivalent, chromosome arms, and kinetochore. We now report the number of oocytes (N) and bivalents (n) analysed for each condition. The line scans were done in one representative image.

      • In the abstract, it is stated that PLK-1 plays a role in spindle assembly/stability (this is also stated elsewhere, e.g. line 101). This phrasing implies that the authors have demonstrated roles in both spindle assembly and stability. However, to distinguish between these roles, they would have to show that removal of PLK-1 before spindle assembly causes defects, and also that removal of PLK-1 from pre-formed spindles causes collapse. I don't think it is necessary to do this, as the spindle roles of PLK-1 are not a focus of the paper. However, the language should be altered so that it does not imply that the paper has demonstrated roles in both. A good place to do this would be in the section from lines 144-147, where they first discuss the spindle defects. It would be straightforward to explain that their approach does not distinguish between spindle assembly and stability, and that PLK-1 could have a role in either or both.

      We fully agree with this comment. We cannot distinguish between spindle assembly and stability, and it is also not the focus of our current work. We have changed the text accordingly.

      • It is stated that there is kinetochore localization of PLK-1 (and I do see some dim cup-like localization in images after PLK-1 is removed from the chromosome arms via HCP-4 RNAi). However, this cup-like localization is not clear in most wild-type images (e.g. Figure 1B, 1D, 2A, 3A, etc.). Although I recognize that the chromatin staining might be obscuring kinetochore localization, if PLK-1 was truly a kinetochore protein I would also expect it to localize to filaments within the spindle (as many other kinetochore proteins do), especially since the authors state that BUB-1 targets PLK-1 to the kinetochore (and BUB-1 is in the filaments). In fact, the only images where it looks like PLK-1 may be localized to filaments are in Figure 4C and 6A, when HCP-4 has been depleted (though I don't know if this generally true across all HCP-4 RNAi images). For me, this calls into question the conclusion that PLK-1 truly is on the kinetochore in wild type conditions - could it be that PLK-1 only localizes to the kinetochore (and to the filaments) when HCP-4 is depleted? The authors need to resolve this issue and provide better evidence that PLK-1 normally localizes to the kinetochore, if they want to make this claim. Additionally, the observation that PLK-1 is not on the kinetochore filaments (in wild type conditions) should be addressed in the text somewhere - do the authors think that this is a special type of kinetochore protein that does not localize to the filaments?

      While our initial claim of PLK-1 kinetochore localisation was based on its cup-like localisation, we have now performed additional analysis and experiments to confirm this claim. First, we corroborated that PLK-1 cup-like pattern co-localises with the Mis12 complex component KNL-3 (New Figure 5-figure supplement 1). Second, we show that PLK-1 is present in the so called ‘linear elements’ (filaments) both within the spindle and in the cortex. Since PLK-1 presence in these filaments is seen in wild type as well as hcp-4 mutant oocytes, we conclude that PLK-1 likely localises in kinetochore in normal conditions.

      • The authors should provide a control experiment, treating wild-type worms with 10uM 3-IB-PP1. This would be important to ensure that the spindle defects seen at this concentration in the plk-1as strain are not non-specific effects of the inhibitor. There is a control in Figure 1 - figure supplement 3 using 1uM 3-IB-PP1 but didn't see a control for 10uM (the concentration at which spindle defects are observed).

      This control has now been included in Figure 1-figure supplement 3.

      • In Figure 2F, the gels for BUB-1+PLK-1 look different in the presence and absence of phosphorylation by Cdk1 - for these data, I agree with the authors that it looks as if the complex elutes at a higher volume if BUB-1 is not phosphorylated (lines 200-204). However, Figure 2G has a repeat of the condition with phosphorylated BUB-1, and in this panel, the complex appears to elute at a higher volume than it did on the gel in panel F. The gel in panel G looks much more similar to the unphosphorylated condition in panel F. The authors need to explain this discrepancy (i.e., Is there a reason why the gels cannot be compared between panels? How reproducible are these data?). Ideally, the authors would include a repeat of the unphosphorylated BUB-1 + PLK-1 condition in panel G, done at the same time as the conditions shown in that panel, to avoid the impression that their results may not be reproducible.

      The specific elution volume cannot be compared in different experiments as the column has proven to “drift” over time – with proteins eluting at a later volume than they did previously despite extensive washing. What is reproducible under the experimental conditions is that the unphosphorylated wild type proteins, or the phosphorylated T527A/T163A mutant proteins A) elute at a later volume than the phosphorylated wild type proteins and B) bind to a lower proportion of the MBP-PLK1PBD (as you can see in the relative absorbance profiles and Coomassie gels).

      • The authors would need to provide convincing evidence that co-depletion of BUB-1 and HCP-4 delocalizes PLK-1 from the chromosomes entirely, and that this co-depletion condition is more severe than either single depletion alone.

      We now provide a quantitation on the total PLK-1 levels to go along the images (New Figure 8-figure supplement 1).

      Additionally, the bub-1T527A and hcp-4T163A alleles are nice tools to, in theory, more specifically delocalize PLK-1 from the midbivalent and chromosome arms, respectively, to explore the functions of chromosome-associated PLK-1. However, I think the authors cannot rule out the possibility that other proteins are also being depleted from the midbivalent and/or chromosome arms in their conditions, and that this delocalization may contribute to the phenotypes observed. For example, hcp-4 depletion was recently shown to delocalize KLP-19 from the chromosome arms (Horton et.al. 2022), so in the experiment shown in Figure 6E (HCP-4 RNAi in the bub-1 mutant), PLK-1 was likely not the only protein missing from the chromosome arms. Therefore, understanding if other proteins are absent from these domains (in the bub-1T527A and hcp-4T16A3 mutants) would help the reader understand and interpret the presented phenotypes (and how specific they are to PLK-1 loss). Consequently, I think that to better understand the co-depletion analysis presented in Figure 6 (and Figure 6 supplement 1), the authors should analyze other midbivalent and chromosome arm proteins, to determine if any are also delocalized (e.g. SUMO, KLP-19, MCAK, etc.).

      As stated above, this paper focuses on identifying the specific meiotic events PLK-1 plays a role in and characterising its targeting mechanism. We are following on this work to understand what proteins are regulated by PLK-1 in different chromosome domains and how this relates to the observed phenotypes.

      For the current, we should emphasise that mutating a single Thr residue within an STP motif in a largely disordered region is far more specific than depleting HCP-4 or BUB-1, making it likely that the observed effects are mediated through PLK-1 targeting. It should be noted that the finding presented in Horton et.al. 2022 is in contradiction with another study in which hcp-4 depletion did not impact KLP-19 localisation (Hattersley et al 2022).

      Additionally, instead of performing a combination of mutant and RNAi analysis (i.e. HCP-4 RNAi in the bub-1 mutant (Figure 6) and BUB-1 RNAi in the hcp-4 mutant (Figure 6 figure supplement 1)), it would be more powerful to generate a double mutant - this has a higher chance of being a more specific depletion condition.

      We have performed these experiments, which are now presented in Figure 9.

    1. Author Response

      Reviewer #1 (Public Review):

      Sorkac et al. devised a genetically encoded retrograde synaptic tracing method they call retro-Tango based on their previously developed anterograde synaptic tracing method trans-Tango. The development of genetically encoded trans-synaptic tracers has long been a difficult stumbling block in the field, and the development of trans-Tango a few years back was a breakthrough that was immediately, widely, and successfully applied. The recent development of the retrograde tracer method BActrace was also exciting for the field, but requires lexA driver lines and required by its design the test of candidate presynaptic neurons instead of an unbiased test for connectivity.

      Retro-Tango now provides an unbiased retrograde tracer. They cleverly used the same reporter system as for trans-Tango by reversing the signaling modules to be placed in pre-synaptic neurons instead of post-synaptic neurons. Therefore, synaptic tracing leads to the labeling of pre-synaptic neurons under the regulation of the QUAS system. Using visual, olfactory as well sexually dimorphic circuits authors went about providing examples of specificity, efficiency, and usefulness of the retro-Tango method. The authors successfully demonstrated that many of the known pre-synaptic neurons can be successfully and specifically labelled using the retro-Tango method.

      Most importantly, because it is based on the most used, very well tested and widely adopted trans-Tango method, retro-Tango promises to not just be a clever development, but a really widely and well-used technique as well. This is an outstanding contribution.

      We would like to thank Dr. Hiesinger for his very kind words and for the overall appreciation of the contribution of the development of retro-Tango to the field. We are also grateful for the suggestions below aimed at improving the clarity of our manuscript. We individually address the points raised by Dr. Hiesinger below.

      Reviewer #2 (Public Review):

      Tools that enable labeling and genetic manipulations of synaptic partners are important to reveal the structure and function of neural circuits. In a previous study, Barnea and colleagues developed an anterograde tracing method in Drosophila, trans-TANGO, which targets a synthetic ligand to presynaptic terminals to activate a postsynaptic receptor and trigger nuclear translocation of a transcription factor. This allows the labeling and genetic manipulation of cells postsynaptic to the ligand-expressing starter cells. Here, the same group modified trans-TANGO by targeting the ligand to the dendrites of starter cells to genetically access pre-synaptic partners of the starter cells; they call this method retro-TANGO. The authors applied retro-TANGO to various neural circuits, including those involved in escape response, navigation, and sensory circuits for sex peptides and odorants. They also compared their retro-TANGO data with synaptic connectivity derived from connectivity obtained from serial electron microscopy (EM) reconstruction and concluded that retro-TANGO can allow trans-synaptic labeling of presynaptic neurons that make ~ 17 synapses or more with the starter cells.

      Overall, this study has generated and characterized a valuable retrograde transsynaptic tracing tool in Drosophila. It's simpler to use than the recently described BAcTrace (Cachero et al., 2020) and can also be adapted to other species. However, the manuscript can be substantially strengthened by providing more quantitative data and more evidence supporting retrograde specificity.

      We thank Dr. Luo for his kind words and his assessment of the value of retro-Tango as a new tool in the transsynaptic labeling toolkit in Drosophila. We followed the suggestions of Dr. Luo for providing more quantitative data and addressing the specificity and directionality of retro-Tango. We strongly believe that the implementation of his suggestions did enhance the quality of our manuscript.

      Reviewer #3 (Public Review):

      This is a valuable addition to the currently available arsenal of methods to study the Drosophila brain.

      There are many positives to the present manuscript as it is:

      (i) The introduction makes a clear and fair comparison with other available tracing methods.

      (ii) The authors do a systematic analysis of the factors that influence the labeling by retro-tango (age, temperature, male versus female, etc...)

      (iii) The authors acknowledge that there are some limitations to retro-TANGo. For example, the fact that retro-T does not label all the expected neurons as indicated by the EM connectome. This is fine because no technique is perfect, and it is very laudable that the authors did a serious study of what one should expect from retro-tango (for example, a threshold determined by the number of synapses between the connected neurons).

      We would like to thank the reviewer for the kind words and the positive assessment of our manuscript. In addition, we would like to acknowledge the reviewer for the recommendations below, which we followed and we think made our manuscript stronger.

    1. Author Response

      Reviewer #1 (Public Review):

      Bustion and colleagues outline the creation and testing of an in-silicon method to query gut microbiome databases for genes encoding enzymes predicted to catalyze a reaction of interest, which is provided by the user. Strengths of the tool include attempts to examine nearly 9,000 MetaCyc reactions in a pre-calculated fashion and to rank order enzymes based on their likelihood of catalyzing a reaction. Substrates, products, and even cofactors, if known, are employed to strengthen the power of the search algorithm, which also employs a hidden Markov model to improve the selection of putative hit enzymes. The authors outline high success rates with examples presented and compare those results with other extant methods, which are reported to perform in a less robust manner. Weaknesses include lack of evidence of success on a more difficult "real world" example. However, the tool outlined is a clear advance over existing methods and will be useful to explore the diversity of chemical transformation performed by commensal microbiota.

      We thank Reviewer 1 for their positive feedback and constructive summary. We agree that a real-world example would add confidence to our findings. We previously demonstrated SIMMER’s utility using published datasets. To expand upon these findings, we added another evaluation on an external dataset (Artacho et al., 2020) and performed new experiments to test SIMMER predictions for methotrexate metabolism into DAMPA and glutamate, a reaction known to be performed by the human microbiome but for which human gut strains and specific gut enzymes were not previously known. Both the new external dataset and our experimental findings validate SIMMER’s predictions of bacteria capable of metabolizing methotrexate, the mainline therapeutic for rheumatoid arthritis patients.

      Reviewer #2 (Public Review):

      This work provides a new computational tool for the systematic characterization of biotransformation reactions in the human gut microbiome: given a biotransformation reaction of interest, it predicts a list of candidate bacterial species, enzymes, and EC identifiers putatively capable of performing the queried reaction. The method is innovative and clearly presented.

      The pipeline that relies on both chemical and protein similarity algorithms, is in principle applicable to any biotransformation reaction that can be formulated as linked substrates and products (possibly including co-factors). This contrasts with other approaches that, for example, only rely on smaller databases and solely rely on substrates and chemical similarity. Moreover, SIMMER outperformed two other recently developed methods, against which it was benchmarked for its prediction accuracy when tested on a control test set derived from literature.

      The work interestingly focuses on predicting bacterial enzymes responsible for drug biotransformation, therefore showcasing its potential as a hypothesis generator for characterizing and validating novel bacterial enzymes in vitro.

      The authors correctly describe the relevance of an accurate input (in terms of reaction completeness, including cofactors and reaction products) as paramount for the quality of the prediction.

      The conclusions of this paper are mostly well supported by data, but some aspects of performance evaluation and its generality might benefit from additional elaborations and clarifications.

      1) Great emphasis has been dedicated to the prediction performance of SIMMER over a positive control set derived from the available literature. However, a more extensive description and analysis of false positive results are needed to better understand the possible impact of the (potentially many) false positive predictions listed for each reaction.

      We agree that our analysis would benefit from an assessment of false positives. Unfortunately, current literature usually reports which reactions an enzyme is capable, rather than incapable, of performing. For this reason, we took a conservative approach and decided to define all reactions preceding that which yielded a positive control enzyme sequence as false positives. This is now described above in Essential Revisions Response 1.3.

      2) The authors imply that the current method is superior to two other methods based on accuracy. However, a more extensive description of the benchmarking results would strengthen these benchmarking efforts.

      We have addressed this concern in Essential Revisions Response 3.

      3) The authors only showcase SIMMER in the context of drug metabolism but claim its applicability to be general enough to also describe other biotransformation in the human gut microbiota. Although in principle believable, the authors could improve the credibility and generalizability of their method by demonstrating another use case, e.g., food compounds, for which extensive metagenomic and metabolomic data are already available from previous gut microbiome studies.

      We agree that assessments of SIMMER’s predictions on food metabolism would improve the generalizability of the method. We have edited the text to focus on drug metabolism, as we believe SIMMER’s application to food metabolism merits a more thorough, future investigation.

      4) Showcasing experimental in vitro validation of SIMMER predicted enzyme(s) could greatly strengthen the relevance of this work.

      We have addressed this in Essential Revisions Response 2.

      5) Throughout the text and the title, a more careful and precise phrasing of the tool's scope (characterization of microbiome-encoded enzymatic reactions and not the identification of novel chemical transformations) would improve the reader's understanding of the work.

      We agree, and have reworded many key phrases in the text, including the title.

      Reviewer #3 (Public Review):

      This manuscript presents a new tool, SIMMER, to predict bacterial enzymemediated transformations of compounds, an important and incompletely understood aspect of microbiome drug metabolism. The authors compare their resource to existing resources that allow users to generate hypotheses related to compound toxicity and putative routes of compound metabolism. The authors identify the key innovations of their resource as including full chemical representations of reactions and a novel method to predict an enzyme's EC number (a description of function) from its reaction.

      Strengths

      Generating user-friendly tools to explore existing knowledge of bacterial enzymes and their reactions is important.

      SIMMER is a novel resource where the user provides the substrates and products as input and receives a list of potential microbiome enzymes as output.

      SIMMER includes a novel EC predictor based on reaction rather than based on sequence.

      Weaknesses

      Validation claims are not well supported by the results.

      We have extensively edited the manuscript to better describe our previous computational validations, and we have added new analyses to further evaluate SIMMER. We added an additional validation on an external dataset, an in vitro experimental assessment of SIMMER’s predictions for methotrexate metabolism, two new reactions to the positive control analysis, a false positive rate, and additional comparisons to the two competing methods.

      Need for the user to know both the substrate and the product for a reaction of interest limits the utility of the resource.

      We agree that this is a limitation for the user, but as we show in our Results, relying on substrates alone does not yield appropriate representations of reactions and therefore does not allow for accurate predictions of responsible species/strains and enzymes (i.e., finding True Positives, and confirming associations from previously collected data). We agree that tools requiring only substrates are convenient, but our results show that they are less helpful in finding appropriate metabolism and enzyme predictions. Many studies of biotransformation in the human gut identify the product information or product structure via HPLC, LC-MS, and NMR techniques. In cases where such data was not gathered, or not gathered with enough structural resolution, researchers can use tools such as Biotransformer to make product template predictions before inputting a query to SIMMER. This recommendation is included in the present manuscript’s lines 376–391:

      In instances when DrugBug and MicrobeFDT did make predictions, they suffered from low accuracy (Table 1), which we hypothesized was due to both methods’ reliance on substrate rather than reaction chemistry. Biotransformations involve the relationship between substrate(s), cofactor(s), and an enzyme to yield a particular product(s). As one substrate can exhibit affinity for multiple enzymes, resulting in multiple unique products, sole employment of substrates in a chemical fingerprint does not achieve the resolution necessary to make relevant predictions. To test if SIMMER’s better performance could be attributed to including cofactors and products, we modified our code to run with a chemical representation that includes only the substrate of each positive control reaction. Enzyme prediction accuracy dropped from 88% down to 33%, and EC prediction accuracy dropped from 93% down to 48% (Table 1—source data), supporting the hypothesis that SIMMER’s better performance when compared to DrugBug and MicrobeFDT is due in large part to our using chemical representations that include the full reaction. These results are in line with our previous demonstration that SIMMER clusters enzymatic reaction chemistry only when a full reaction is employed (Figure 2, Figure 2—figure supplement 4).

      Reliance on homology transfer annotation to predict enzyme function; this approach has important, microbiome-relevant, limitations.

      Please refer to our separate Common_Questions.pdf document, Common question 1: Are EC codes sufficient to select enzyme orthologs within an overall class?

    1. Author Response:

      The authors would like to thank the Editors and reviewers for their careful consideration of our article and we express our appreciation for the work required by both Editors and reviewers to study and produce the detailed reviewer reports. We are pleased at the general consensus that our paper is of interest and highlights an important region of the channel for drug-protein interaction. We are also cognizant that the reviewer reports highlight areas where important revisions need to be made to our work before it can be considered fully complete. We will revise the paper according to the comments of the reviewers and submit a new version in the near future which we hope will become the version of record.

    1. Author Response

      We thank the editors and reviewers for their support of our work, as well as their constructive feedback and useful suggestions, which have improved the readability and presentation of the manuscript for a broader audience.

    1. Author Response

      Reviewer 1 (Public Review):

      Fox, Birman, and Gardner use a previously proposed convolutional neural network of the ventral visual pathway to test the behavioral and physiological impact of an attentional gain spotlight operating on the inputs to the network. They show that a gain modulation that matches the behavioral benefit of attentional cueing in a matching behavioral task, induces changes in the receptive fields (RFs) of the model units, which are consistent with previous neurophysiological reports: RF scaling, RF shift towards the attentional focus, and RF shrinkage around the focus of attention. Ingenious simulations then allow them to isolate the specific impact of these RF modulations in achieving performance improvements. The simulations show that RF scaling is primarily responsible for the improvement in performance in this computational model, whereas RF shift does not induce any significant change in decoding performance. This is significant because many previous studies have hypothesized a leading role of RF shifts in attentional selection. With their elegant approach, the authors show in this manuscript that this is questionable and argue that changes in the shape of RFs are epiphenomena of the truly relevant modulation, which is the multiplicative scaling of neural responses.

      Strengths:

      The use of a multi-layer network that accomplishes visual processing, with an approximate correspondence with the visual system, is a strength of this manuscript that allows it to address in a principled way the behavioral advantage contributed by various attentional neural modulations.

      The simulations designed to isolate the contributions of the various RF modulations are very ingenious and convincingly demonstrate a superior role of gain modulation over RF shifts in improving detection performance in the model.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      There is no mention of a possible specificity of the manuscript conclusions in relation to the type of task to be performed. It is conceivable that mechanisms that are not important for detection tasks are instead crucial for a reproduction task, as in Vo et al. (2017).

      We agree that other behavioral tasks may rely on different attentional mechanisms then the ones we have studied here for detection and discrimination and now specifically point this out in the discussion [379-395].

      The manuscript puts emphasis on the biological plausibility of the model, and some quantitative agreements. But at some important points these comparisons do not appear very consistent:

      1) It is unclear what output of the model at each cortical area is to be compared with neurophysiological data. On the one hand, the manuscript argues that a 1.25 attentional factor is consistent with single-neuron results, but here this factor is applied to the inputs into V1 units. When this modulation goes through normalization in area V1, the output of V1 has a 2x gain. Intuitively, one would think that recordings in V1 neurons would correspond to layer V1 outputs in the model, but this is not the approach taken in the manuscript. This needs clarification. Also, note that the 20-40% gain reported in line 287 corresponds to high-order visual areas (V4 or MT), but not to V1, in the cited references. The quantitative correspondence between gain factors at various processing steps in the model and in the data is confusing and should be clearer.

      We agree that making a one-to-one mapping of gain effects measured in neurophysiology and different layers of the CNN is problematic. We therefore have clarified that the introduction of gain at the earliest stages of processing is meant to study how gain propagates through a complex CNN and has downstream effects [49-52 and 410-447] and we have also also clarified the various uncertainties in making one-to-one mapping from the CNN to neurophysiological measurements of gain [410-447].

      2) The model assumes a gain modulation in the inputs to V1. This would correspond to an attentional gain modulation in LGN unit outputs. There is little evidence of such strong modulation of LGN activity by attention. Also in V1 attentional modulation is small. As stated in Discussion (line 295), there is no reason to favor the current model as opposed to a model where the attentional gain is imposed later on in the visual hierarchy (for example V4). If anything, neurophysiology would be more consistent with this last scenario, given the evidence for direct V4 gain control from frontal eye fields (Moore and Armstrong, Nature 2003). The rationale for focusing on a model that incorporates the attentional spotlight on the inputs to V1 should be disclosed.

      We agree that measurements of gain changes with attention appear larger in later stages of visual processing and do not wish to explicitly link the gain changes imposed at the earliest stages of processing in our CNN observer model with changes in input from LGN as we agree this would be unrealistic. Instead, our goal was to examine how gain changes can propagate through complex neural networks and cause downstream effects on spatial tuning properties and the efficacy of readout. We have substantially re-written the manuscript, in particular the introduction [24-38, 49-52] and discussion [441-447] to better describe this rationale. We also now explicitly discuss how our propagated gain test demonstrates exactly the reviewer’s point - that gain can be injected late in the system, rather than at the earliest stages [274-276, 441-447].

      3) The model chosen is the CORnet-z model, but this model does not include recurrent dynamics within each layer. Recurrent dynamics is a prominent feature in the cortex, and there is evidence indicating that attentional modulations operate differently in feedforward and in recurrent architectures (Compte and Wang, Cerebral Cortex 2006). A specific feature of recurrent models is that the attentional spotlight need not be a multiplicative factor (which is biologically complicated) but an additive term before the ReLU non-linearity, which achieves the expected RF modulations (Compte and Wang, 2006). A model with recurrence thus represents another architecture that links gain and shift in a way that has not been explored in this manuscript, and this may limit the generalization of the conclusions (line 205).

      We appreciate the reviewer pointing us toward the Compte paper and we’ve added a discussion of recurrence as an alternate model [410-423].

      Reviewer 2 (Public Review):

      This manuscript by Fox, Birman, and Gardner combines human behavioral experiments with spatial attention manipulation and computational modeling (image-computable convolutional neural network models) to investigate the computational mechanisms that may underlie improvements in behavioral performance when deploying spatial attention.

      Strengths:

      • The manuscript is clear and the analyses, modeling, and exposition are executed well.

      • The behavioral experiments are carefully conducted and of high quality.

      • The manuscript takes a creative approach to constructing a ”neural network observer model”, that is, coupling an image-computable model to a potential readout mechanism that specifies how the representations might be used for the purposes of behavior. The focused analyses of the model innards (architecture, parameters) provide insight into how different model components lead to the final behavior of the model.

      Thank you for these supportive comments.

      Weaknesses:

      • The overall conclusions and insights gained seem heavily dependent on particular choices and design decisions made in this specific model. In particular, the readout mechanism lacks some critical descriptive details, and it is not clear whether the readout mechanism (512-dimensional representation that reflects summing over visual space) is a reasonable choice. As such, while the computational analyses and results may be correct for this model, it is not clear whether the strong general conclusions are justified. Thus, the results in their current form feel more like exploratory work showing proof of concept of how the issue of attention and underlying computational mechanisms can be studied in a rigorous and concrete computational modeling context, rather than definitive results concerning how attention operates in the visual system.

      Please see below for our response to the issue with readout and conclusions.

      Overall, the work is solidly constructed, but the overall generality and strength of the conclusions require substantial dampening.

    1. Author Response:

      We would like to thank the reviewers for their time, insights, and constructive feedback. We appreciate the recognition by the reviewers of the value and importance of our study. The reviewers also highlighted: the importance of carefully using and interpreting data from small molecule inhibitors due to possible off-target effects, considering inter-study differences in the cardiomyocyte cell trajectories, examining a possible role of PI3K signaling in proliferation and the intriguing yet not fully elucidated role of membrane protrusions in cardiac fusion. We agree with this important feedback. We plan to address these comments and others directly, in detail.

    1. Author Response:

      We thank the reviewers and editors for their careful reading and reviews of our work. We are grateful that they appreciate the value in our experimental approach and results. We acknowledge what we interpret as the major criticism, that in our original manuscript we focused too heavily on the hypothesized role of GABAergic neurons in driving habituation. This hypothesis will remain only indirectly supported until we can identify a GABAergic population of neurons that drives habituation. Therefore, we will revise our manuscript, decreasing the focus on GABA, and rather emphasizing the following three points:

      1. By performing the first Ca2+ imaging experiments during dark flash habituation, we identify multiple distinct functional classes of neurons which have different adaptation profiles, including non-adapting and potentiating classes. These neurons are spread throughout the brain, indicating that habituation is a complex and distributed process. 

      2. By performing a pharmacological screen for dark flash habituation modifiers, we confirm habituation behaviour manifests from multiple distinct molecular mechanisms that independently modulate different behavioural outputs. We also implicate multiple novel pathways in habituation plasticity, some of which we have validated through dose-response studies.

      3. By combining pharmacology and Ca2+ imaging, we did not observe a simple relationship between the behavioural effects of a drug treatment and functional alterations in neurons. This observation further supports our model that habituation is a multidimensional process, for which a simple circuit model will be insufficient. 

      We would like to point out that, in our opinion, there appears to be a factual error in the final sentence of the eLife assessment: “However, the data presented are incomplete and do not show a convincing causative link between pharmacological manipulations, neural activity patterns, and behavioral outcomes”. We believe that a “convincing causative link” between pharmacological manipulations and behavioural outcomes has been clearly demonstrated for PTX, Melatonin, Estradiol and Hexestrol through our dose response experiments. Similarly a link between pharmacology and neural activity patterns has also been directly demonstrated. As mentioned in (3), we acknowledge that our data linking neural activity and behaviour is more tenuous, as will be more explicitly reflected in our revised manuscript. Nevertheless, we maintain that one of the primary strengths of our study is our attempt to integrate analyses that span the behavioural, pharmacological, and neural activity-levels.

    1. Author Response

      Reviewer #1 (Public Review):

      Rosas et al studied the mechanism/s that enabled carbapenems resistance of a Klebsiella isolate, FK688, which was isolated from an infected patient. To identify and characterize this mechanism, they used a combination of multiple methods. They started by sequencing the genome of this strain by a combination of short and long read sequencing. They show that Klebsiella FK688 does not encode a carbapenemase, and thus looked for other mechanisms that can explain this resistance. They discover that both DHA-1 (located on the mega-plasmid) and an inactivation of the porin OmpK36, are required for carbapenem resistance in this strain. By using experimental evolution, it was shown that resistance is lost rapidly in the absence of antibiotics selection, by a deletion in pNAR1 that removed blaDHA-1. Moreover, their results suggested that it is likely that exposure to other antibiotics selected for the acquisition of the mega-plasmid that carries DHA-1, which then enabled this strain to gain resistance to carbapenemase by a single deletion.

      The major strength of this study is the use of various approaches, to tackle an important and interesting problem.

      The conclusions of this paper are mostly well supported by data, but one aspect is not clear enough. The description of the evolutionary experiment is not clear. I could not find a clear description of the names of the evolved populations. However, the authors describe strains B3 and A2, but their source is not clear. The legends of the relevant figure (Figure 5) are confusing. For example, the text describing panel B is not related to the image shown in this panel. Moreover, it is shown in panel C (and written in the main text) that the OmpK36+ evolved populations had only translucent colonies, so what is the source of B3(o)?

      We appreciate the point and in response have added a panel to Figure 5 (in the revised paper this is now Fig. 5A) to illustrate the evolutionary experiment and specify that there are two lineages (A and B) with 20 replicates each that, after 200 generations of evolution, give rise to populations of which A2 and B3 are the exemplars characterized.

      We have corrected the legends in Figure 5.

      We now explain (sentence starting on Line 197) that the B3 (o) is the single isolate of an opaque colony from lineage B3, it is the only colony that we identified from out of 595 colonies observed in the B3 population. B3(o) was sequenced and analysed as a comparator and has some value in that regard, despite being an anomaly.

      Reviewer #2 (Public Review):

      The authors sequenced a clinical pathogen, Klebsiella FK688, and definitively establish the genetic basis of the carbapenem-resistance phenotype of this strain. They also show that the causal mutations confer reduced fitness under laboratory conditions, and that carbapenem sensitivity readily re-evolves in the lab due to the fitness costs associated with the resistance mutations in the clinical isolate. They also establish that subinhibitory concentrations of ceftazidime select for the otherwise deleterious blaDHA-1 gene. Based on this finding the authors speculate that prior beta-lactam selection faced by the ancestors of Klebsiella FK688 potentiated the evolution of the carbapenem-resistance phenotype of this strain. If this hypothesis is true, then prior history of beta-lactam exposure may generally potentiate the evolution of carbapenem resistance.

      Strengths:

      From a technical perspective, the findings in this paper are solid. In addition, the authors establish a simple genetic basis for carbapenem resistance in a clinical strain, which is a valuable and non-trivial finding (i.e. they show that the CRE phenotype in this strain is not an omnigenic trait distributed over hundreds of loci).

      Weaknesses:

      The main weakness of this paper is that the authors draw overly broad conclusions of a conceptual nature from narrow experimental findings. This could be addressed by drawing more modest and narrow implications from the findings.

      1) The title of this paper is "Treatment history shapes the evolution of complex carbapenem-resistant phenotypes in Klebsiella spp." But they provide no data on the treatment history of the patient from whom this strain was isolated from. Therefore, the authors have no evidence to support their central claim. Indeed, it is completely possible that this strain never faced beta-lactam selection in the past, or that the patient's hypothetical history of betalactamase was irrelevant for the evolution of FK688. First, it is completely possible that this is a hospital-acquired infection, such that the history of this strain is due to selection in other contexts in the hospital that have little to do with the patient's treatment history. Second, it is completely possible that this strain (the chromosome anyway) has no prior history of beta-lactamase selection, and that it acquired the megaplasmid containing blaDHA-1 via conjugation from some other strain. In this second hypothetical scenario, it is possible that the fitness cost of the blaDHA-1 gene is not particularly high in a different source strain, but that it has some cost in the FK688 strain that it was isolated from. And of course, fitness costs in the human host could be very different than fitness costs in the laboratory, where strains are evolving under strong selection for fast growth. And given the benefit of resistance, it's clear that this strain clearly has a strong fitness advantage over faster-growing sensitive strains in the context of the source patient under antibiotic treatment.

      My general point here is that the broad claims made about patient history or prior history shaping the evolution of this strain are largely indefensible because there is no data here to make solid inferences about how prior history shaped the evolution of this strain.

      We appreciate the point and have changed our title and scaled back the strength of our conclusions regarding patient treatment history.

      2) Historical contingency. The authors claim that their work shows how historical contingency shapes the evolution of resistance. One problem with this claim is that it is trivial- this is only a significant claim if the reader believes that prior history is not important in the evolution of antibiotic resistance, which is a straw-man null hypothesis, to mix a couple metaphors. To be more concrete, clearly strain background (prior history) matters-eliminating the plasmid with the resistance gene eliminates resistance. But that is not particularly surprising, given the past 50 years of evolutionary microbiology literature on plasmids and resistance. By contrast to this work, the major contribution of papers that examine the role of historical contingency in evolution (i.e. various Lenski papers) is that those works quantitatively measure the role of history in comparison to other factors (chance, adaptation). Since this work is a deep dive into a single clinical isolate, the data presented here do not and cannot shed light on the role of historical contingency in the emergence of this strain. The authors' claims about the prior history that led to the CRE phenotype are reasonable- but are fundamentally speculative. I have nothing against speculation, as long as it is clear what claims are speculative, and what are concrete implications. But the authors frame these speculative claims as concrete implications of their findings.

      This is a fair point. We have reframed the study to not focus on historical contingency.

      As the reviewer points out, any discussion about historical contingency in the context of evolution is trivial in one sense. One of the reasons that the studies of Lenski and Blount provide new insights into the role of historical evolution because they knew the history of their populations (at, least for the number of generations since the LTEE began), and had a high degree of control and understanding of the growth conditions where the trait evolved. As such, they could go back to time points before the trait evolved, and then repeat the evolution experiment many times, in the exact same environment where the trait originally evolved, and then count how often they observed the evolution of that trait.

      Here we study a clinical isolate, and have less understanding of the evolutionary history of our strain. While we cannot re-evolve carbapenem resistant in the exact same environment experienced by the FK688 strain, we did test the capacity for the wild type, and two possible intermediate genotypes genotypes, to evolve carbapenem resistance in growth media with carbapenem.

      Altogether- we have comprehensive evidence for the genetic cause of carbapenem resistance: the BLA1 plasmid + OmpK36. We showed, by experiment, that it is much more likely for carbapenem resistance to evolve in a FK688 strain that carries the BLA1 plasmid, than in an FK688 strain that did not carry the plasmid even if it had acquired the OmpK36 mutation. We think this not trivial because a significant proportion of all of the carbapenem resistant Klebsiella that have been isolated are non-carbapenemase CRE. Our reconstruction provides a plausible explanation for why non-carbapenemase CRE evolve – because they are evolving from strains that have already been treated with a non-carbapenem beta-lactam drug and have thereby selected for the presence of a beta-lactamase (that is not a carbapenemase).

      So, while we have scaled back the strength of our claims, we do think that our results can provide some insight into how the evolutionary history of a pathogen can shape the molecular path to antibiotic resistance.

      3) The authors claim that "[This work] suggests that the strategic combinations of antibiotics could direct the evolution of low-fitness, drug-resistant genotypes". I suppose this is true, but I also think this is a stretch of an implication given these findings. To be blunt, while I suppose it's better to have costly resistance variants that re-evolve sensitivity than to have low-cost high-resistance strains circulating, I think the patient's family would probably disagree that the evolution of a low-fitness drug-resistant genotype was good or strategic in the clinical context, even if better from a public health perspective. Low-fitness drug-resistant strains are just as lethal under clinical antibiotic concentrations!

      Thank you for the comment, we see how this sentence could be seen as too strong a conclusion and have rewritten the last sentence of the DISCUSSION (line 351):

      “These results show how an individual’s treatment history might shape the evolution of AMR, and should be taken into consideration in order to explain the evolution of non-carbapenemase CRE”

      The authors do show the plausibility of their hypothesis/model that prior beta-lactam selection is sufficient to potentiate the evolution of carbapenem-resistance (by the additional ompK loss-of-function mutation). I think those findings are very nice. But the authors undermine their results by extrapolating too far from their data. Hence, I think narrowing the scope of the implications would improve this paper.

      In addition to narrowing the scope of the implications as written, I also would like to add that there may be other ways of framing this paper (other than historical contingency) that may make the significance of this work more apparent to a broader audience. This may be worth considering during the revision process.

      We have taken these suggestions on board and have re-framed the final sentences of the ABSTRACT, INTRODUCTION and DISCUSSION accordingly. Specifically, we have removed reference to historical contingency and instead have reframed our experiments as providing a genetic and evolutionary explanation for an interesting and concerning cause of antibiotic resistance – non-carbapenemase CRE.

    1. Author Response

      Reviewer #1 (Public Review):

      During the height of the Covid19-pandemic, there was great and widely spread concern about the lowered protection the screening programs within the cancer area could offer. Not only were programs halted for some periods because of a lack of staff or concern about the spreading of SARS CoV2. When screening activities were upheld, participation decreased, and follow-up of positive test results was delayed. Mariam El-Zein and coworkers have addressed this concern in the context of cervical screening in Canada, one of the rather few countries in the world with well organized, population-based, although regionalized, cervical screening program.

      Comment 1: Despite the existence of screening registries, they choose to do this in form of a survey on the internet, to different professional groups within the chain of care in cervical screening and colposcopy. The reason for taking this "soft data" approach is somewhat diffuse.

      We are happy to provide a counterargument to the reviewer’s concern about the “soft data” approach. Our unit – McGill’s Division of Cancer Epidemiology – is a major stakeholder in policymaking and cervical screening guideline development in Canada. It is one of the components in a McGill Task Force on COVID-19 and Cancer that has been widely engaged in assessing the pandemic’s impact on the entire spectrum of cancer control and care (examples: PMID: 33669102, PMID: 34843106). Canada is a country of continental size, and during the pandemic even travel between provinces was interrupted. It is only via a web-based survey that one could have captured the required information. We took advantage of our unit’s credibility and stature to secure a substantial response to the survey, which elicited a high level of detail.

      The survey questionnaire instrument was thoughtfully developed with input from Canadian experts who are active in the field of cervical cancer prevention and involved in clinical care to comprehensively formulate informative questions (and practical, reasonable responses) underpinning each of the themes covered. Of note, some of these coinvestigators, having executive roles in relevant clinical professional bodies, advised our team on the logistics of circulating the survey to members. The administration of the survey was coordinated with the pertinent societies. Our aim was to provide an overall portrait across Canada of the extent of the harms to cervical cancer screening and treatment processes at the beginning of the COVID-19 pandemic (specifically a snapshot from mid-March to mid-August 2020), as perceived by professional groups in multiple health disciplines.

      Indeed, as the reviewer mentioned, there are fully (i.e., for Saskatchewan) and partially (i.e., for British Columbia, Alberta, Manitoba, Ontario) organized cervical cancer screening programs in Canada in addition to opportunistic programs (i.e., for North West Territories, Yukon, Nunavut, Quebec). The Canadian Partnership Against Cancer also collects information on cervical cancer screening programs and/or strategies across Canada. Using data from these different sources enables a quantitative assessment of the impact of the pandemic on cervical cancer screening, but this was not the research methodology used; the survey approach was our research strategy as we attempted to collect responses from all provinces and territories, regardless of the different screening programs and modalities implemented across the country, and including regions that do not have an official screening program.

      Since the effects of the COVID-19 pandemic will stay with us for years to come, our research team is also examining – using a “hard data” approach via administrative healthcare datasets – the long-term effects that will accrue on cervical cancer morbidity and mortality from the interruptions and delays in screening processes and other activities in the process of care. A discussion of this is, however, beyond the scope and objectives of our manuscript.

      No modifications were made in the manuscript to address this comment.

      Comment 2: The authors claim they want to "capture modifications". However, the suggestions that come from this study are limited and are submitted for publication 2 years after the survey when the height of the pandemic has passed long since, and its burden on the screening program has largely disappeared. The value of the study had been larger if either the conclusions had been communicated almost directly, or if the survey had been done later, to sum up the total effect of the pandemic on the Canadian cervical screening program.

      We appreciate this comment. As part of our commitment to transparency, we now plainly acknowledge that considerable time (1.5 years) has elapsed between the time the survey data were available (March 2021) and manuscript submission (September 2022) for publication in the special issue, curated by eLife, on the impact of the COVID-19 pandemic on cancer prevention, control, care and survivorship. However, we also argue that this lag time is reasonable given the undertaking of data management, analysis, and reporting of a large amount of data, including the synthesis of replies to open-ended questions. We also took this opportunity to expose two graduate students to the research process.

      Changes made: Page 15, Lines 437-440.

      In terms of assessing the total effect of the pandemic on the Canadian cervical screening program, this work is in progress, but not within the current manuscript. The PubMed references mentioned above show examples of directions we are taking. Also, as mentioned in our response 1 to comment 1, we will use data from administrative healthcare datasets (medical and drug claims, hospitalization data, death registry data) and hospital cancer registries (clinical characteristics such as cancer stage, grade, and biomarkers) on cancer patients diagnosed in Quebec between 2010 and 2026. Using these datasets, we intend to compare the pre- and post-pandemic eras in order to analyze changes in patterns of cancer care, cancer prognosis, and survival, including shifts at stage at diagnosis.

      Comment 3: Another major problem with this study is the coverage. The results of persistent activities to get a large uptake is somewhat depressing although this is not expressed by the authors. 510 professionals filled out the survey partially or in total. 10 professions were targeted. The authors make no attempt to assess the coverage or the validity of the sample. They state the method used does not make that possible. But the number of family practicians, colposcopists, cytotechnicians, etc. involved in the program should roughly be known and the proportion of those who answered the survey could have been calculated. My guess is that it is far below 10%.

      There were no extensive additional efforts to increase participation rate, apart from follow-up reminder emails to complete the survey, which is standard practice followed by the societies that administered the survey to their constituents. We respectfully disagree with the reviewer concerning coverage being a major limitation, particularly in view of the difficulty in general to secure a high response rate in a survey such as ours, at a time like the middle of the pandemic. Although it appears to be a seemingly easy to compute classic non-response rate, information on the “population of interest” (i.e., number of professionals approached in addition to the advertisement of the survey on social media platform”) is not available to estimate the extent of non-response. Even if the response rate is below 10% as suggested by the Reviewer, our survey and findings should be considered on their merits; the target population was involved in the survey design to ensure the validity of coverage of the questions along the continuum of care in cervical cancer screening and treatment. In addition, we followed the Checklist for Reporting Results of Internet E-surveys to inform the design, conduct, and reporting of our survey research.

      Changes made: Page 14, Lines 421-425.

      Comment 4: The national distribution seems shewed despite the authors boosting its pan-Canadian character. I am just faintly familiar with the Canadian regions, but, as an example, only 2 replies from Quebec must question the national validity of this survey.

      We apologize for this typo error in Table 1; many cells were accidently shifted down (the last couple of provinces had the wrong numbers). There were actually 21 survey respondents from the province of Quebec. This has now been corrected.

      Changes made: Page 19.

      Comment 5: The result section is dominated by quantitative data from the responses to the 61 questions. All questions and their answers are tabulated. As there is no way to assess the selection bias of the answers these quantitative results have no real value from an epidemiological standpoint.

      Indeed, we opted to provide the reader with descriptive results on all the questions and sub-questions that were asked, with explicit annotation to each question number and clear reference to the formulated question by appending the full survey instrument to the manuscript. We designed the survey as a descriptive and not an analytical study, contrary to traditional epidemiology studies that investigate a specific exposure-outcome relationship.

      Changes made: Page 12, Lines 366-368.

      In the spirit of other papers in the special issue on COVID-19 and cancer, curated by eLife, we measured the impact of the pandemic on the process of care like many other eLife articles did. The eLife collection is a snapshot of a period when not only was cancer control disrupted, but the ability to conduct valid research was also severely curtailed. The reviewer will likely agree that our paper is not the only one to suffer from these methodological shortcomings. Yet, taken together, the gestalt value of the eLife collection will inform epidemiologic modellers for the next long while on how this period affected cancer control. We are happy to contribute with this paper a few more pieces of the puzzle, adding to that which eLife published for many other jurisdictions.

      Comment 6: The replies to the open-ended questions are summarized in a table and in the text. The main conclusion of the content analysis of the answers to the direct questions, and one of the main conclusions of the study, is that the majority favors HPV self-sampling in light of the pandemic. However, this not-surprising view is taken by only 80 responders while almost as many (n=60) had no knowledge about HPV self-sampling.

      Another aim of our survey was to identify the windows of opportunity that were created by the pandemic and pinpoint positive aspects that could enable the transformation of cervical cancer screening (i.e., HPV primary based screening and HPV self-sampling). We found that 33% of respondents were of the opinion that the pandemic context could facilitate the implementation of self-sampling and that 50.1% were in favor of the implementation of this new screening practice (described in Results Theme 1: Screening Practice and Stable 5).

      Changes made: Page 4, Lines 93-97.

      The reviewer is correct that in the open-ended sub-question of Question 23 “Are you in favor of the implementation of HPV self-sampling as an alternative screening method in your clinical practice?”, 60 respondents justified their answer to the nominal question by their lack of familiarity with HPV self-sampling, compared to 80 who shared positive comments. However, we would like to draw the reviewer’s attention to the responses to the nominal part of the question in Stable 5. Of those who answered “Maybe”, 47.1% said that they were not familiar enough to express a favorable or unfavorable opinion. We would also like to draw the reviewer’s attention to the results of our cross-tabulation of profession and the question of relevance (described in Results Theme 1: Screening Practice). The lack of familiarity with novel screening practices such as self-sampling can be explained by the fact that most (75.0%) of those who expressed these views were primary healthcare professionals, and not secondary and tertiary specialists.

      Changes made: Page 12, Lines 344-346

      Comment 7: The authors conclude that their study identified the need for recommendations and strategies and building resilience in the screening system. No one would dispute the need, but the additional weight this study adds, unfortunately, is low, from a scientific standpoint.

      Although no one would dispute the need as the reviewer is suggesting, but as epidemiologists we needed to collect this empirical evidence. We urge the reviewer to consider that this article is to contribute to a more complete picture of the collective process of discovery of the impact of the pandemic initiated by eLife’s special issue.

      No modifications were made in the manuscript to address this comment.

      Comment 8: The conclusion I draw from this study is that the authors have done a good job in identifying some possible areas within the Canadian screening programs where the SARS-Cov2 pandemic had negative effects and received some support for that in a survey. Furthermore, they listed a few actions that could be taken to alleviate the vulnerability of the program in a future similar situation, and received limited support for that. No more, no less.

      We thank the Reviewer for the positive feedback provided in the first part of the comment. As for the rest, we believe we have addressed above the reviewer’s concerns.

      Reviewer #2 (Public Review):

      The study aimed to provide information on the extent to which the COVID-19 pandemic impacted cervical cancer (CC) screening and treatment in 3 Canadian provinces. The survey methodology is appropriate, and the results provide detailed descriptive statistics by province and type of practice. The results support the authors' conclusions. This evidence together with data gathered from other national surveys may provide baseline data on the impact of the pandemic on CC outcomes such as late-stage diagnoses and CC treatment outcomes due to these delays.

      We are flattered by the Reviewer’s overall assessment of our manuscript.

      Comment: This study relies mostly on descriptive statistics and open-ended questions that provide details about what CC screening and treatment procedures were delayed. It is unclear how the reader would use the results to affect current or future practice.

      As mentioned in our reply above to a similar comment raised by reviewer 1, our overarching aim was to portray in a purely descriptive manner the negative and positive impacts of the COVID-19 pandemic on cervical cancer screening-related activities, as perceived by healthcare professionals. Please refer to arguments above.

      Changes made: Page 12, Lines 366-368; Page 15, Lines 437-440.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors set out to determine the degree to which early language experience affects neural representations of concepts. To do so, they use fMRI to measure responses to 90 words in adults who are deaf. One group of deaf adults (n=16) were native signers (and thus had early language exposure); a second group (n=21) was exposed to sign language later on. The groups were relatively well-matched in other respects. The primary finding was that the high dimensional representations of concepts in the left lateral anterior temporal lobe (ATL) differed between native and delayed signers, suggesting a role for early language experience in concept representation.

      The analyses are carefully conducted and reflect a number of thoughtful choices. These include the "inverted MDS" method for constructing semantic RDMs, a normal hearing comparison group for both behavioral and fMRI data, and care taken to avoid bias in defining functional ROIs. And, comparing early and delayed signing groups is a clever way to study the role of early language experience on adult language representations.

      We greatly appreciate the reviewer’s positive evaluation and constructive comments on our study.

      One interesting result that I struggled to put in a broader context relates to the disconnect between behavioral and neural results. Specifically, the behavioral semantic RDMs (Figure 1a) did not differ between any of the groups of participants. This suggests that the representations of the 90 concepts are represented similarly in all of the participants. However, the similarity of the neural RDMs in left lateral ATL differs between the native and delayed signing groups (but not in other regions). Given the similarity of the behavioral semantic RDMs, it is unclear how to interpret the difference in left lateral ATL representations. In other words, the neural differences in left ATL do not affect behavior (semantic representation). The importance of the differences in neural RDMs is therefore questionable.

      Thank you for this comment. In the Revision we have added explicit discussions about this important issue of the relationship between the behavioral and neural profiles for semantics:

      Introduction (pages 4-5): “(previous) studies have reported little effects on semantics behaviors, including semantic interference effects in the picture-sign paradigm (Baus et al., 2008), scalar implicature (Davidson and Mayberry, 2015), or accuracy scores of several written word semantic tasks (e.g., synonym judgment) (Choubsaz and Gheitury, 2017). However, as shown by the color knowledge in the congenitally blind studies (e.g., Wang et al., 2020), similar semantic behaviors may arise from (partly) different neural representations. Semantic processing is supported by a multifaceted cognitive system and a complex neural network entailing distributed semantic regions (Bi, 2021; Binder and Desai, 2011; Lambon Ralph et al., 2017; Martin, 2016), and thus focal neural changes may not necessarily lead to semantic behavioral changes. Neurally, neurophysiological signatures assumed to reflect semantic processes showed incongruent effects across studies: N400 effects in the semantic violation of written sentences were not affected (Skotara et al., 2012), whereas M400 in the picture-sign matching task showed atypical activation patterns (reduced recruitment of left fronto-temporal regions and involvement of right parietal and occipital regions) (Ferjan Ramirez et al., 2016, 2014; Mayberry et al., 2018). It remains to be tested whether and where delayed L1 acquisition affects how semantics are neurally represented, using imaging techniques with higher spatial resolutions.”

      Discussion (pages 17-18): “Notably, different from phonological and syntactic processes, where both visible behavioral underdevelopment (e.g., Caselli et al., 2021; Cheng and Mayberry, 2021; Mayberry et al., 2002) and brain functional changes (Mayberry et al., 2011; Richardson et al., 2020; Twomey et al., 2020) were observed, for semantics we only observed brain functional changes in dATL but no visible behavioral effects. Consistent with the literature where deaf delayed signers did not show differences to controls in semantic interference effects in the picture-sign paradigm (Baus et al., 2008), scalar implicature (Davidson and Mayberry, 2015), or N400 measures (Skotara et al., 2012), we did not observe visible differences in terms of semantic distance structures (Figure 1a) or reaction time of lexical decision and word-triplet semantic judgment (Supplementary file 1). As reasoned in the Introduction, this seeming neuro-behavior discrepancy might be related to the multifaceted, distributed nature of the cognitive and neural basis of semantics more broadly. The general semantic behavioral tasks we employed could be achieved with representations derived from multiple types of experiences, supported by highly distributed neural systems (e.g., (Bi, 2021; Binder and Desai, 2011; Lambon Ralph et al., 2017; Martin, 2016), including those not affected by the delayed L1 acquisition in regions beyond the dATL. This finding invites future studies to specify the exact developmental mechanisms in the left dATL (Fu et al., 2022; Unger and Fisher, 2021) and to uncover semantic behavioral consequences related to the functionality of this area.”

      An important point is that, if I understand correctly, the semantic space is defined by the 90 experimental items. That is, behavioral RDMs were created by having normal hearing participants arrange 90 items spatially, and neural RDMs were created by comparing patterns of responses to these 90 experimental items. This 90-dimensional space is thus both (a) lower dimensional than many semantic space models that include hundreds of directions and (b) constrained by the specific 90 experimental items chosen. On the one hand, this seems to limit the generalizability of the findings for semantic representations more broadly.

      Indeed, for the RDM the spaces were constructed by the relations among the 90 items, as is the standard practice for current RSA analyses. Regarding the dimensionality issue, we would like to clarify that although the space is a 90 x 90 matrix, the semantic distance for each pair was obtained by the subjects’ ratings, i.e., the psychological space, which is likely to be high-dimensional in nature. That is, we compressed the potentially high-dimensional psychological construct into one measure to construct the 90 x 90 matrix. If we understood correctly, semantic space models with hundreds of directions the reviewer referred to are various types of embedding and/or distributional models. There although each word is projected onto a high-dimensional vector, the distance for each pair is still extracted (e.g., by cosine similarity) to construct the cross-item similarity matrix for RSA. Regarding the generalization of the findings across items, we greatly appreciate this concern and indeed that was one of the reasons why we extracted the categorical structure based on the clustering of the items (see also response to the next Comment). We also examined the univariate abstractness contrast, which looked at the broad categorical effects rather than specific items. We have made clarifications accordingly in the Revision to address these concerns (page 8).

      The logic behind using a categorical semantic RDM (e.g., Figure 2a) was not clear. The behavioral semantic RDMs (Figure 1a) clearly show gradations in dissimilarity, particularly for the abstract categories. It would seem that using the behavioral semantic RDM would capture a more accurate representation of the semantic space than the categorical one.

      Thank you for this suggestion. We opted for the categorical structural similarity based on the clustering analyses to boost signal and to allow for better generalization across items (i.e., along the categorical structure). Agreeing with the reviewer that such an approach may lose the important graded space especially for the abstract items, we added an analysis using continuous semantic distances specifically focused on the abstract items (page 10):

      “1) Types of semantic distance measures: While semantic categories for concrete/object words are robust and well-documented, the semantic categorization within the abstract/nonobject words is much fuzzier and remains controversial (Catricalà et al., 2014; Wang et al., 2021). The behavioral semantic RDM in Figure 1a indeed shows gradations in dissimilarity for abstract/nonobject words. We thus checked the two groups’ semantic RDMs using the continuous behavioral measures and further examined whether group differences in the left dATL were affected by the types of semantic distance (categorical vs. continuous) being used for abstract/nonobject words. The two deaf groups showed comparable similarities to the hearing benchmark (by correlating each deaf subject’s RDM with the group-averaged RDM of hearing subjects, Welch’s t23.0 = -0.12, two-tailed p = .90). RSA was performed by correlating each deaf subject’s neural RDM in the left dATL with these two types of semantic RDMs. Significant group differences were observed (Figure 3), for both the categorical RDM (Welch’s t31.0 = 3.06, two-tailed p = .005, Hedges’ g = 0.98) and the continuous behavioral semantic RDM (Welch’s t36.7 = 2.47, two-tailed p = .018, Hedges’ g = 0.76), with significant semantic encoding in dATL observed in both analyses for native signers (one-tailed ps < .003) and neither for delay signers (one-tailed ps > .42). These results indicate that the reduced dATL encoding of abstract/nonobject word meanings induced by delayed L1 acquisition was reliable across semantic distance measures.”

      As the reviewer suggested, we could also carry out RSA using the 90-word behavioral semantic RDM. We did observe similar group differences with this RDM, with delayed signers showing a trend of semantic encoding reduction in the left dATL relative to native signers (native signers, mean (SD): 0.019 (0.023); delayed signers, mean (SD): 0.006 (0.022), Welch’s t31.5 = 1.78, two-tailed p = .085; a delayed signer was excluded from this analysis for being an outlier beyond 3 standard deviations). It appears that the behavioral semantic RDM yielded smaller effect sizes in group differences than the categorical RDM, but the ANOVA (the within-subject factor - RDM-type: categorical, behavioral; the between-subject factor – group: native, delayed) revealed no significant effects of RDM-type or its interaction with the group (ps > .71), but a significant main effect of group (F(1,36) = 9.19, p = .004). The seemingly weaker group differences using the behavioral semantic RDM should not be over-interpreted.

      Reviewer #2 (Public Review):

      The authors investigated patterns of fMRI activation for familiar words in two groups of deaf people. One "language rich" group received exposure to sign from birth, whereas the "language poor" group included kids born to hearing parents who had limited exposure to language during the first few years of life. The primary findings involved group differences in BOLD activation patterns across different areas of interest within the semantic network when participants made intermittent 1-back category judgments for words appearing in succession.

      There was much to be liked about this study, including the rigor of the methods and the novel contrasts of two deaf samples. These strengths were balanced by a number of questions about the assumptions and theoretical interpretations underlying the data. I will elaborate on the major points in the paragraphs to follow, but briefly, the ways in which the authors are framing critical period constraints in language fundamentally differ from the standard nativist perspectives (e.g., Chomsky, Lenneberg). The assumptions of what constitutes a deprivation model require further justification and perhaps recasting to avoid unnecessary stigma (i.e., this reviewer was uncomfortable with the assertion that being born deaf to hearing parents by default constitutes deprivation). The introduction lacked principled hypotheses that motivated the choice of comparing abstract and concrete words, and potential accounts of group differences were underdeveloped (e.g., how do parents in China typically react to having a deaf child, and what supports are in place for preventing language deprivation? Are newborn infants universally screened for hearing loss in China? The answers to these questions might help the readers to understand why/how deaf children in this circumstance might experience deprivation).

      We appreciate the reviewer’s positive evaluations and constructive comments on our study. We have revised the manuscript substantially in light of these comments (see below).

      References to critical periods require a bit more elaboration with respect to lexical-semantic vs. semantic acquisition. The nature of the critical period in language acquisition remains controversial with respect to its constraints. Lenneberg and Chomsky speculated that the limit of the critical period for language acquisition was about puberty (13ish years of age). This is much older than the deaf sample tested here so arguments about aging out of the critical period at least for language acquisition need more nuance. Another issue relates to learning semantic mappings vs. learning language as falling under the same critical period umbrella. This seems highly unlikely as semantic acquisition in early childhood is aided by linguistic labeling but would likely occur in parallel even in the context of language deprivation. Much of the prior literature on critical periods and nativist approaches to language development has focused on syntactic acquisition and elements such as recursion rather than a mapping of symbols to conceptual referents. This makes the critical period group comparison somewhat tenuous because what you are really interested in is a critical period for word meaning acquisition not the more general case of syntactic competency.

      The point above is highlighted in the following statement underlying one of the primary assumptions of the study:

      Pg. 3, "Here, we take advantage of a special early-life language-deprivation human model: individuals who were born profoundly deaf in hearing families and thus had very limited natural language exposure (speech or sign) during the critical period of language acquisition in early childhood"

      "hypofunction of the language system as a result of missing the critical period of language acquisition" (pg 3), same critique as previous - the critical period window is thought to be 13ish years old.

      There are a couple of problems with this assertion/assumption. Although it is true that most children who are born deaf have hearing parents, it is not justifiable to label this condition an early-life deprivation model. Hearing parents who are extremely motivated to learn sign language and pursue related language enrichment strategies can successfully offset many of these effects. Similarly, it is not inconceivable that a deaf child born to a deaf parent might be neglected or abandoned without the benefit of early sign exposure. My argument here is that classifying deaf children born to hearing parents as automatically 'language deprived' is potentially both stigmatizing and scientifically unjustified.

      We originally used the term “language deprivation” because it has been recently advocated in the deaf field mainly to increase society’s awareness of the risks of language deprivation and the lifelong impact that deaf and hard-of-hearing children face (e.g., Hall, 2017, Maternal and Child Health Journal; Lillo-Martin & Henner, 2020, Annual Review of Linguistics). In the current context, we agree with the reviewer that “early-life deprivation” model may not precisely describe the language acquisition condition of delayed signers. Indeed, for some of the delayed subjects in our study, their hearing parents actively tried to provide additional aids of exposure to signs (via preschool special education programs; learning signs by themselves) or speech (via hearing aids). In the revision, we avoided the term “language deprivation” and used the terms “subjects with varying amounts and qualities of early language exposure” or “delayed L1 acquisition” to more precisely describe our experimental manipulation throughout the revised manuscript.

      We fully agree with the reviewer that the “critical period” of language acquisition is too much an umbrella term, which may be taken to refer to critical period for different, specific cognitive and/or neural development in the literature. In the Revision we avoided using this term to reduce ambiguity. Instead, we now made explicit throughout the specific processes being discussed (phonology, syntax, semantics). The effects of early language experience (reduced in delayed L1 acquisition) on the behavioral and neural patterns relating to phonology, syntax, and semantics are now elaborated, discussed separately and explicitly in both the Introduction and Discussion (pages 3-4, 17-18).

      Regarding the potential nonlinguistic socio-environmental differences (e.g., coping strategies after deafness awareness), we have added further clarifications (page 15): “Notably, routine nation-wide neonate hearing screening in China did not start until 2009, years after the early childhood of our participants (born before 2000), and some hearing parents may nonetheless try to give deaf children additional aids of exposure to signs (via preschool special education programs) or speech (via hearing aids). Critically, our positive results of the robust group differences in dATL suggest that early homesign/aid measures and later formal education for sign and written language experiences are insufficient for typical dATL neurodevelopment; the full-fledged language experience during early infancy and childhood (before school age) plays a necessary role in this process.” Relevant information has also been added in the Method/Result sections.

      Pg. 6 "It should be noted that the neural semantic abstractness effect does not equate with language-derived semantic knowledge, as it might arise from some nonverbal cognitive processes that are more engaged in abstract word processing (Binder et al., 2016)." - I had great difficulty understanding what this meant.

      We have revised this sentence as follows: “While the abstractness effect has often been used to reflect linguistic processes (e.g., (Wang et al., 2010)), “abstractness” is not a single dimension and instead relates to both linguistic and nonlinguistic (e.g., emotion) cognitive processes (Binder et al., 2016; Troche et al., 2014; Wang et al., 2018).” (page 11)

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors present a method for discovering response properties of neurons, which often have complex relationships with other experimentally measured variables, like stimuli and animal behaviors. To find these relationships, the authors fit neural data with artificial neural networks, which are chosen to have an architecture that is tractable and interpretable. To interpret the results, they examine the first- and second-order approximations of the fitted artificial neural network models. They apply their method profitably to two datasets.

      The strength of this paper is in the problem it is attempting to solve: it is important for the field to develop more useful ways to analyze and understand the massive neural datasets collected with modern imaging techniques.

      The weaknesses of this paper lie in its claims (1) to be model free and (2) to distinguish the method from prior methods for systems identification, including spike triggered averaging and covariance (or rather their continuous response equivalents). On the first claim, the systems identification methods are arguably substantially more model free approach. On the second claim, this reviewer would require more evidence that the presented approach is substantially different from or an improvement on systems identification methods in common use applied directly to the data.

      We thank the reviewer for carefully engaging with the manuscript and believe that our revisions address these points of critique both through novel analysis and through clarifications.

      First claim: We fully agree that systems identification approaches are in theory truly model-free while MINE imposes constraints through the chosen architecture. However, our new analysis comparing MINE to direct fitting of the kernels of a Volterra expansion highlights that this is not really the case in practice. In order to obtain good fits, the model-free-ness has to be substantially reduced by imposing constraints on the degrees of freedom. We quantify this reduction in Figure S3 and directly compare it to the effective degrees of freedom of the CNN. Reducing degrees of freedom is also a theme that can be found throughout the literature on systems-identification, especially when the analysis does not involve Gaussian white noise as input stimuli. We therefore stand by our claim that MINE is “essentially model-free” in the sense that it does not rely on defining a model a-priori much like systems identification. And we also clarify our choice of calling the method “model-free” in the introduction where we state: “While the architecture and hyper-parameters of the CNN used by MINE do impose constraints on which relationships can be modeled, we consider the convolutional network ``model-free’’ because it does not make any explicit assumptions about the underlying probability distributions or functional forms of the data.”

      Second claim: We believe that our new analysis for the comparison with the Volterra expansion approach of systems identification addresses this point. By directly fitting Volterra kernels instead of relying on spike-triggered analysis we put the comparison on a more equal footing than our previous STA/STC exposition. We can show that while the methods are equivalent for Gaussian white noise stimuli, MINE is superior for highly correlated input stimuli. We show that imposing constraints on the regression used to identify the Volterra kernels can overcome this gap to a large extent, but MINE still produces a model that has higher predictive power and MINE also does more than extracting receptive fields. We are also not entirely sure to what extent Wiener/Volterra analysis has been applied to calcium imaging data. While there is a vast body of literature on systems identification, there is little evidence that it has been widely applied to data in which both inputs and outputs are highly correlated across time, such as calcium imaging experiments using naturalistic stimuli. While this doesn’t have to mean anything in and of itself it might point to the fact that this analysis is not easily accessible and requires ample tuning. These are precisely two problems that MINE aims to overcome. We now more explicitly state in the manuscript that we believe this accessibility to be one of the core strengths of MINE.

      Reviewer #2 (Public Review):

      This paper describes a relatively unbiased and sensitive method for identifying the contributions of different behavioral parameters to neural activity. Their approach addresses, in an elegant way, several difficulties that arise in modeling of neuronal responses in population imaging data, namely variations in temporal filtering and latency, the effects of calcium indicator kinetics, interactions between different variables, and non-linear computations. Typical approaches to solving these problems require the introduction of prior knowledge or assumptions that bias the output, or involve a trade-off between model complexity and interpretability. The authors fit individual neuron's responses using neural network models that allow for complex non-linear relationships between behavioral variables and outputs, but combine this with analysis, based on Taylor series approximations of the network function, that gives insight into how different variables are contributing to the model.

      The authors have thoroughly validated their method using simulated data as well as showing its applicability to example state of the art data sets from mouse and zebrafish. They provide evidence that it can outperform current approaches based on linear regression for the identification of neurons carrying behaviorally relevant signals. They also demonstrate use cases showing how their approach can be used to classify neurons based on computational features. They have provided Python code for the implementation and have explained the methods well, so it will be easy for other groups to replicate their work. The method could be applied productively to many types of experiments in behavioral and systems neuroscience across different model systems. Overall, the paper is clearly written and the experiments are well designed and analysed, and represent a useful contribution to the neuroscience field.

      We thank the reviewer for their favorable assessment of our work.

      Reviewer #3 (Public Review):

      In the current study, the authors present a novel and original approach (termed MINE) to analyze neuronal recordings in terms of task features. The method proposed combines the interpretability of regressor-based methods with the flexibility of convolutional neural networks and the aim is to provide an unbiased, "model-free" approach to this very important problem.

      In my opinion, the authors succeed in most of these aspects. They use three datasets: an artificially-generated one that provides a ground-truth, a published dataset from wide-scale cortical mouse recordings and a novel one that studies thermosensation in larval zebrafish. MINE compares favorably in all three cases.

      I believe that the paper would mostly benefit from an increased effort in clear exposition of the Taylor expansion approach, which is at the core of the method. The methods section describes the mathematics, but I wonder whether it would be possible to illustrate or schematize this in a main Figure, e.g. as an addition to Figure 1 or as a new figure. Around line 185, the manuscript reads: "We therefore perform local Taylor expansions of the network at different experimental timepoints. In other words, we differentiate the network's learned transfer function that transforms predictors into neural activity."

      It would help to explicitly state with respect to what the derivative is being computed (i.e. time) and maybe a diagram (which I had to draw to understand the paper) in which a neuronal activity trace is shown and from time t onwards a prediction is computed using terms in the Taylor expansion would be very instructive (showing on an actual trace how disregarding certain terms changes the prediction and hence the conclusions about the actual dependence of the trace on the behavioral features). The formulation in terms of Jacobians and Hessians can then be restricted to the Methods section and the paper will be easier to read for a wider audience.

      We agree with the reviewer that readability is key. We hope that our re-write and re-organization of the manuscript makes it easier to follow. We now start with a unified description of complexity and non-linearity both derived from a Taylor decomposition around the data-average. We use this section (starting Line 91) to lay out the logic of the Taylor expansion and explicitly state that the derivatives describe the expected change in output given any change in predictors. We did not want to remove the math entirely from the paper, simply because we found it hard to explain the concept entirely without it. We have provided an annotation to the formula parts in the new Figure 2 and a small schematic to illustrate the pointwise expansion of the Taylor metric in the new Figure 4.

      The method is presented as a "model-free" approach (title and introduction). I think it would help to discuss this with some precision. The Taylor expansion approach does imply certain beliefs on the structure of the data (which are well founded in most cases). Do the authors agree that MINE would encapsulate any regression model where both linear and interaction terms are allowed to include an arbitrary non-linearity (in the case of the interaction terms, different non-linearities for both variables)? If this is the case, maybe an explicit statement would allow the reader to quickly identify the versatility of MINE.

      We are now attempting to make the statement of model-free more precise through quantifications in our rewritten section on deriving receptive fields. We now provide an explanation in the introduction for why we believe that “model-free” is justified. We state: “While the architecture and hyper-parameters of the CNN used by MINE do impose constraints on which relationships can be modeled, we consider the convolutional network ``model-free’’ because it does not make any explicit assumptions about the underlying probability distributions or functional forms of the data.”

      In principle, MINE can accommodate higher-order interactions as well (say of the form xyz or x*y^2) and it certainly has flexibility in applying nonlinear transformations. However, we did not find a satisfying way to quantify the space of possible models MINE can represent exactly and therefore do not feel comfortable to make a precise statement about this.

      I find the section relating to non-linearities interesting, but was slightly disappointed to find that the authors do not propose a single method. In Figure 3E, the authors show that a logistic regression model that combines the curvature and NLC apporaches outperforms either, but the model is not described in any sort of detail. I appreciate the attempt made by the authors to apply this to the zebrafish imaging dataset in Figure 7, but it was still unclear to me how non-linearities and complexity are related.

      We fully agree with the reviewer. We have now merged non-linearity and complexity determination. We hope that this a) simplifies the paper and b) creates a metric that likely generalizes better and in which specific values are more interpretable. In brief, we now define both the nonlinearity and complexity based on truncations of the Taylor expansion around the data average. This new result section (Lines 90-142) also gives us a chance to (hopefully) better introduce the Taylor expansion approach.

    1. Author Response

      Reviewer #1 (Public Review):

      Li et al investigated the behavioral response and fMRI activations associated with deep brain stimulation (DBS) of the lateral habenula (LHb) in 2 distinct rodent models of depression. They found that a) LHb DBS reduces depressive and anxiety behaviors using multiple behavioral tests: sucrose preference, forced swim, and open field. These results held across multiple models of depression and multiple tests, and generally restored results of these behavioral tests to parity with controls. Furthermore, fMRI activations of brain regions with known connectivity to LHb strongly correlated with behavioral responses to LHb DBS, particularly in limbic regions. These behavioral responses clearly depended on electrode location, with more medial placements within the LHb producing a more robust behavioral effect.

      The conclusions of this paper are generally well supported by the data, with the primary weaknesses of the study being 1) limited novelty due to LHb already being a well-established target for DBS in depression, and 2) the questionable validity of rodent models of depression in general. The authors deal with the first point (novelty) by extending their study to electrode localization and fMRI correlates with the behavioral response, leading to insight into surgical targeting as well as mechanism of effect, respectively. They also partially mitigate fundamental problems with rodent models of depression by using 2 different models and showing consistent responses to LHb DBS across both. The methods used in this study were sound, with high-quality techniques used for electrode implantation, confirmation of electrode placement, fMRI acquisition, anesthesia and physiological monitoring, as well as an appropriate statistical analytic approach.

      We thank the reviewer deeply for the positive assessment on our work.

      Reviewer #2 (Public Review):

      This important paper is a real tour de force and combines functional MRI, behaviour, and brain stimulation to characterise the effect of stimulation of the lateral habenula in a rodent model for depression. The results are stunning and the data presented seems compelling.

      My only comment is I would like more discussion on the relevance of these results for the treatment of depression in humans, both in terms of the rodent model and in terms of the results shown in this study.

      We thank the reviewer deeply for the positive assessment on our work. We have added discussion on the relevance of our finding for the treatment of depression in humans on Page 17 of the revised manuscript as follows:

      “The WKY and LPS-treated depressive rat models share similar characteristics, including abnormalities in various neurotransmitter and endocrine systems and emotional changes resulting from inflammatory stimuli. These models are widely used in pharmacological and nonpharmacological depression treatment studies(Caldarone et al., 2015; Aleksandrova et al., 2019; Lasselin et al., 2020). Previous research indicates that classic antidepressants used in humans, such as selective serotonin reuptake inhibitors, also cause an antidepressant reaction in WKY rats. Ketamine, a rapid-acting antidepressant in clinical practice, has been shown to be effective in both WKY and LPS-treated rats(Aleksandrova et al., 2019; J. Zhao et al., 2020). In WKY rats, DBS of the NAc increased exploratory activity and exerted anxiolytic effects, and NAc-DBS was found to be effective for TRD treatment in humans(Dandekar et al., 2018; Aleksandrova et al., 2019). These results suggest that the depression rat models can provide valuable information about the efficacy of various pharmacological and nonpharmacological therapies. In a recent case report, researchers observed acute stimulation effects in addition to long-term clinical improvements in depression, anxiety, and sleep in a patient with TRD upon administering LHb-DBS (Wang et al., 2020). This finding supports the clinical relevance of our observations. However, no animal model of depression can completely replicate human symptoms, and further research is necessary to validate our findings in human patients. Additionally, the long-term efficacy and side effects of LHb-DBS require further investigation. Nevertheless, we believe that our findings propose a promising addition to the rapid-acting therapeutic options for the most refractory depression patients.”

    1. Authorr Response

      Reviewer #2 (Public Review):

      This manuscript is clear in that it shows no/minimal weight gain in a mouse model of trisomy 21 compared to the control mouse, even under a high-calorie diet. The difference is the clear demonstration of the increased expression of sarcolipin. It is important that the expression of SERCA was also shown not different between the genotypes. Additionally, an important result is that manipulating the skeletal muscle was sufficient to promote weight loss without the need for hypermetabolism in other tissues such as adipose tissue.

      • A clear explanation of why the expression of sarcolipin/hypermetabolism is different between mouse and human under the same condition would be useful.

      Overexpression of sarcolipin is only seen in this particular mouse model carrying the near complete human chromosome 21. In another widely used mouse model (Ts65Dn) of Down syndrome where all the triplicated genes (~40% of the human Chr21 orthologs) are of mouse origin, we did not observe the same overexpression of sarcolipin (PMID: 36587842). The reason for this is presently unknown. Human Chr21 contains a significant number of non-coding human genes (>400) with uncertain effects on the mouse transcriptome. Data in Figure 8 represents our efforts to understand what drives the overexpression of mouse sarcolipin (Sln) gene expression in the TcMAC21 mouse model. Although we narrowed it down and highlighted some potential candidate transcriptional drivers for Sln overexpression (Fig. 8), future work is clearly needed to confirm and establish if any of those candidates are the or one of the bona fide driver(s).

      • p.12-13 and15. The language around 'futile' cycling is not correct because Ca movement through the sarcoplasmic reticulum of the resting fiber is essential to the function of the muscle. Firstly, the cycle of Ca through the SR is through the ryanodine receptor (RyR) as well as due to slippage through the SERCA (PMID: 11306667, PMID: 35311921). This is not made clear anywhere in the manuscript. Ca leak out of the SR through RyR is an essential component to the control/setting of the resting cytoplasmic [Ca2+] via the activation of store-operated Ca2+ entry, which is in a balance with the activation of the PMCA on the t-system membrane (PMID: 35218018). The SERCA resequesters the leaked Ca2+ from the SR. It is not possible that the resting [Ca2+] is set by the reduced efficiency of the SERCA, as indicated in the ms (PMID: 20709761). It is expected that the mito [Ca2+] steady state is set by the raised resting cyto [Ca2+] (PMID: 20709761). Ca2+ transients during EC coupling will promote transient increases in mito Ca2+ (PMID: 21795684, PMID: 36121378), but not steady-state increases. Some of these problems are highlighted by the errors in the diagram Fig 5D: please change/correct (i) the invagination of the sarcolemma is called the t-system; (ii) the cycle of Ca leak through the SR starts with RyR Ca leak, where the Ca is resequestered by the SERCA, in addition to Ca slippage through the pump. Draw a RyR opposite the t-system on the SR terminal cisternae. The heat generated by SERCA is absorbed in the cytoplasm, metabolites enter the mito and the OxPhos generates heat (PMID: 31346851). (iii) Ca does not enter mito because it cannot get into the SR (the resting cyto Ca is controlled by the t-system/plasma membrane, PMID: 20709761, PMID: 35218018). Please redraw.

      We have redrawn Fig. 6D diagram as suggested by the reviewer. We have also clarified the information as presented in revised Fig. 6D in the text and figure legend. Heat is generated by mitochondrial oxidative activity. In addition, ATP hydrolysis by the Ca2+ ATPase (SERCA pump) also generates heat (PMID: 12512777; PMID: 34826239; PMID: 11342561; PMID: 17018526; PMID: 12887329). In resting muscle, for every ATP hydrolyzed by the SERCA pump, 2 Ca2+ molecules get transported into the sarcoplasmic reticulum (SR) (PMID: 15189143). In the presence of sarcolipin (SLN), a higher number of ATP needs to be hydrolyzed to move the same number of Ca2+ molecules into the SR, due to Ca2+ slippage (PMID: 34826239; PMID: 23341466). In essence, ATP hydrolysis and Ca2+ transport into the SR by SERCA becomes uncoupled in the presence of SLN. This uncoupling of the SERCA pump, in the context of Ca2+ cycling in and out of the SR (also involving Ryr1), represents the ATP-consuming futile cycle in the skeletal muscle (PMID: 34741717). Since SLN is persistently overexpressed, the ATP-consuming futile activity of the SERCA pump is presumably happening in resting muscle, as well as during EC coupling (since the TcMAC21 mice are also hyperactive).

      • The changing of the properties of the muscle towards oxidative properties is consistent with the expression of sarcolipin in mouse muscle (all of it is in type II fibers). It is important to show whether the muscles have fiber-type shifts. Please report the fiber types of the muscles that have been surveyed in this project.

      In the qPCR data as shown in Figure 6C, we have profiled many genes associated with slow- and fast-twitched muscle fibers in gastrocnemius, and little if any changes were noted. At least at the level of the transcript, there is no indication of fiber type switching in gastrocnemius muscle. However, we did not perform the same qPCR analyses for all the other muscle types isolated (i.e., EDL, quadriceps, plantaris, soleus, and tongue). The main reason for this is that we had used all of these muscle tissues in our respirometry analysis as shown in Figure 6O-Q and Figure 6-Figure Supplement 4-9. Unfortunately, we did not have any leftover muscle tissues to profile muscle fiber types.

      • Non-shivering thermogenesis (NST) is mentioned in this manuscript as the means of hypermetabolism, as has the lengthened duration of the cyto Ca transients during EC coupling. It is not clear at all what the contribution of NST compared to the increased work of the SERCA to clear released Ca from the cyto to the hypermetabolism. What are the relative proportions? If sarcolipin is largely for NST, then hypermetabolism is about the resting muscle.

      In our view, the hypermetabolism we observed in the TcMAC21 mice is primarily due to SLN-mediated uncoupling of the SERCA pump. Chronic effects of SLN overexpression elevates ATP consumption by the SERCA pump and drives the catabolic process (i.e., increased mitochondrial OXPHOS) to generate the ATP needed to meet the demand created by the persistent uncoupling of the SERCA pump. However, the TcMAC21 mice are also hyperactive, and this can also contribute to increased metabolic rate. Since the mice are both hyperactive and hypermetabolic, we do not know the relative contribution of each to the overall phenotype of the mice.

      • The link that SLN is causing more ATP use at the pump but the heat generated by OxPhos in mito is important and should be made, see Barclays' work (eg. PMID: 31346851). A direct link between the SERCA function and mito function is occurring but I currently don't see one being made in the ms. This could be made clear in Fig 5D diagram.

      We have modified and clarified Figure 6D as suggested.

      p.22. "The reprogramming of glycolytic...elevated Ca transients...". The language is wrong here. Oxidative fibers do not have elevated Ca transients compared to glycolytic. The amplitude of Ca release is greater in glycolytic and the duration of the transient is longer in the oxidative (eg. PMID: 12813151).

      We have corrected this in the text and added the citation.

      • p.22. "as less calcium is being transported into the SR due to uncoupling of the SERCA pumps". The same amount of Ca is being transported, just at the expense of more ATP than would be the case in the absence of SLN. Otherwise, the SR Ca2+ content would not be at a steady state while the SR continuously leaks Ca2+.

      We have corrected this in the revised text. The incorrect statement has been deleted.

      • p.23. Tavi & Westerblad (PMID: 21911615) show how Ca transient amplitude and frequency signal in slow and fast twitch fibres. Here, we are not concerned with what is happening in myotubes, where the SR is less developed than in adult fibres.

      We did not use any myotubes in the present study. The myotube was mentioned in the context of discussing a published work (PMID: 30208317).

      Reviewer #3 (Public Review):

      Sarver et al., propose that TcMAC21 mice are hypermetabolic and that this is the cause of their reduced weight. Unfortunately, the developmental defects of TcMAC21 mice make this a challenging question to definitively answer. The authors claim that TcMAC21 mice are hypermetabolic due to a futile calcium cycling in skeletal muscle, which is caused by up-regulation of SLN. However, all of the data that would go into the energy balance equation (food intake, energy absorption, and energy expenditure) have been improperly analyzed. TcMAC21 pups are 8.5 g lighter than euploid littermates. The body weight data and images in Fig. 3A indicate that TcMAC21 mice runted. This difference is primarily a result of lower lean mass (FIG. 2B). This is important as it sets up many concerns that need to be addressed. Specific comments are noted below.

      There is no overt developmental defect in the TcMAC21 mice as their birth weight are not different from the euploid controls (PMID: 32597754). A “runted” mouse is considered very small, poorly developed, and less competitive (PMID: 22822473). The lean phenotype of TcMAC21 mice is due to their hypermetabolism and not the result of developmental defects. The absolute lean mass of TcMAC21 mice is lower than the euploid controls. This is to be expected. A human being that weighs 150 pounds will have less lean mass compared to another person weighing 250 pounds. Lean mass scales with body weight. This does not mean that there is a muscle deficit in the person weighing 150 pounds. That is the reason why the lean mass is also generally presented as % lean mass (after normalizing to body weight). This normalization can tell us whether the amount of lean mass is appropriate (or normal) for a given weight. The % lean mass is either not different between TcMAC21 or euploid mice fed a control chow (Fig. 2B) or significantly higher in TcMAC21 mice fed a high-fat diet (Fig. 3B). This tell us that there is no developmental deficit in the skeletal muscle (biggest contributor to lean mass) of TcMAC21. The amount of lean mass seen in TcMAC21 mice scale appropriately with their lower body weight. Our food intake and energy absorption data were correctly done and analyzed (addressed below). In fact, TcMAC21 mice have the same or slighter higher food intake (absolute amount without normalization) despite weighing much less than the euploid controls (Fig. 2C and Fig. 3A, and Supplementary File 2 and Supplementary File 5). A sick or runted mouse generally consumes much less food and are physically much less active. The TcMAC21 mice are actually hyperactive (Fig. 2D-F and Fig. 4D-F). All our data argue against the notion of “runting” or “developmental defects” in TcMAC21 mice, and instead support our conclusion that TcMAC21 mice are lean due to elevated activity and hypermetabolism.

      Specific comments:

      1) It is incorrect to normalize EE to lean mass if this parameter is different between groups. Normalizing the EE data to lean mass makes it appear as though TcMAC21 mice exhibited increased EE when in fact this is a mathematical artefact. EE data should simply be plotted as ml/h (or kcal/h) per mouse. Alternatively, ANCOVA can be applied using lean mass as a covariate. Excellent reviews on this topic have been written (PMID: 20103710; PMID: 22205519).

      Energy expenditure (EE) data should not be plotted as kcal/h per mouse, as indicated in the review article that the reviewer alluded to (PMID: 22205519). It is a given that EE increases as a function of body weight, as larger body mass requires greater energy to maintain. Plotting EE data per mouse (i.e., kcal/h) would lead to the erroneous conclusion that a fat mouse would have a higher EE compared to a lean mouse. Because lean mass is metabolically much more active than fat mass, normalizing EE data to lean mass is an acceptable way to plot EE data, although not ideal, as indicated by the review article the reviewer alluded to (PMID: 20103710). Often times, normalizing EE to lean mass gives similar results as the ANCOVA, as pointed out by the authors (PMID: 22205519). However, both review articles recommend ANCOVA (using body mass as a covariant of EE) as the preferred method to plot and evaluate EE data. Alongside the EE data (normalized to lean mass), we have now also included the ANCOVA data (Fig. 2D-F and Fig. 4D-F) where we used body weight as a covariate as recommended (PMID: 22205519). The results clearly indicate that the TcMAC21 mice have significantly higher EE compared to the euploid controls.

      2) It makes no sense to normalize food intake to weight, as it makes no sense to divide metabolic rate by weight as well (see above). If food intake is not normalized, this will clearly show that TcMAC21 mice eat much less than controls, and if plotted as cumulative food intake will show that TcMAC21 are smaller and gain less weight on a high-fat diet because they simply eat less. This further indicates that the major tenet of this paper is not correct.

      It is expected that a smaller mouse will eat less food compared to a bigger mouse. Normalizing food intake to body weight can tell you whether the amount of food intake is appropriate (or normal) for a given weight. Amazingly, despite a much lower body weight, ad libitum fed TcMAC21 mice consumed the same or a slightly higher absolute amount of food, without normalizing the data to body weight (Fig. 2C and Fig. 4A and Supplementary File 2 for the chow-fed group and Supplementary File 5 for the HFD-fed group). In fact, the absolute food intake (without normalization) in the refeeding period, after a fast, was significantly higher in the TcMAC21 mice relative to euploid controls (17.7 ± 0.082 vs. 13 ±0.87 kcal, P = 0.002; Supplementary File 5). Thus, relative to their body weight, ad libitum fed TcMAC21 consumed a significantly higher amount of calories (Fig. 2C and Fig. 4A). For transparency, we chose to show side-by-side both the absolute and relative food intake data. These results, along with the rest of the data, provide compelling evidence that hypermetabolism, and not reduced food intake, underlies the lean phenotype of the TcMAC21 mice.

      3) The authors have tried to address the smaller weight of TcMAC21 mice by including weight-matched wild-type mice. However, they only focus on analyzing surface temperature, which is not an indicator of thermogenesis. Moreover, there is no information on whether these weight-matched wild-type mice are similar in age or body composition to the TcMAC21 mice. Nevertheless, the increased surface temperature can also indicate increased heat conservation, which is opposite to thermogenesis. It would make sense that TcMAC21 mice with massive reductions in lean mass would activate compensatory mechanisms of heat conservation to offset increased heat dissipation to the environment. This does seem to be the case, based on the data shown in Fig. 6D (see below).

      Skin temperature has been widely and extensively used a proxy for thermogenesis, often in association with thermogenesis of brown adipose tissue (BAT), which is located just deep to the skin over the shoulder blades of the mouse. Mice fed a high-fat diet lose the “brownness” of their brown adipose tissue as excessive circulating lipid is stored in this depot. This is a well-known phenomenon. One can see this clearly in Figure 4K where the euploid BAT has accumulated a significant amount of lipid while the TcMAC21 BAT has not. The addition of weight-matched mice was solely to help indicate whether or not the BAT was a major contributor to the TcMAC21 hypermetabolic phenotype.

      We did not conduct body composition analysis on the weight-matched mice. With a body weight of less than 30 grams, these wild-type mice represent a similarly lean and healthy adult mouse. They are not age-matched (the control mice are younger) because this is not possible. A wild-type mouse of the same age of TcMAC21 (already on high-fat diet for 12 weeks or longer) will weigh significantly more than the TcMAC21, just as the age-matched euploid littermates weighed significantly more than the TcMAC21 mice.

      The idea of heat conservation is possible, but our data clearly indicate the TcMAC21 mice have elevated thermogenesis. The supporting data include: 1) increased deep colonic temperature; 2) activation of oxidative and thermogenic gene program in skeletal muscle; 3) overexpression of sarcolipin in the skeletal muscle, leading to futile SERCA pump activity and heat generation; 4) Increased skeletal muscle mitochondrial respiration; 5) elevated T3 levels; 6) increased physical activity level; 7) increased energy expenditure (EE normalized to lean mass or ANCOVA using body weight as a covariate). Taken together, these data provide compelling evidence to support our conclusion that the TcMAC21 mice are indeed hypermetabolic and have elevated thermogenesis.

      4) A more optimal method of testing whether increased heat dissipation plays a role in the EE of TcMAC21 mice, is to measure EE at thermoneutrality, where energy dissipation to the environment will be minimized. Here the authors have attempted this in Fig. 6D. Unfortunately, the authors normalized EE to lean mass, artefactually elevating TcMAC21 EE. Despite this mistake, it now looks as though the large differences in EE that were seen at room temp have been attenuated, and only significantly limited to the dark phase. This indicates that in addition to the normalization artefact, higher heat dissipation from smaller TcMAC21 mice may also contribute to the elevated EE at 22C.

      It is well known that at thermoneutrality mouse will markedly reduce their EE. Therefore, it is not surprising that the TcMAC21 mice, housed at thermoneutrality, will have lower EE compared to the TcMAC21 mice housed at room temperature. This also holds true for the euploid controls. This is to be expected. Yet, remarkably, the TcMAC21 mice still have significantly higher EE compared to the euploid controls when housed at thermoneutrality. The TcMAC21 mice never reduce their EE to the level of the euploid controls. We have now included the ANCOVA data for EE using body weight as a covariate as recommended (PMID: 22205519) (Fig. 7F). The results clearly indicate that the TcMAC21 mice have significantly higher EE compared to euploid controls even at thermoneutrality. The data obtained at thermoneutrality, as well as the body weight-matched control experiment as shown in Figure 4I, argue against heat dissipation as the driver of increased EE. Instead, our data support hyperactivity and hypermetabolism as the driver of increased EE.

      5) In Fig. 6D, why is the hourly plot not shown here (like 2D and 4C)? The data clearly are not as striking as the EE data at 22C?

      Because of space limitation in Figure 7, we did not include the hourly tracing data and instead showed the overall energy expenditure (EE) during the light and dark cycle as bar graphs. Per reviewer request, we have now included the hourly tracing data in Fig. 7F, along with the ANCOVA data. The data clearly indicates that TcMAC21 mice, housed at thermoneutrality, have higher EE, especially in the dark cycle when they are active. This is quite remarkable. We know from many published studies that mice significantly reduce their EE when house at thermoneutrality. And yet, the TcMAC21 mice never reduce their EE to the level of euploid controls when housed at thermoneutrality.

      6) GTT was similar between TcMAC21 and controls (Fig. 3I). However, the smaller insulin response could be due to the fact that glucose was normalized to body weight. It would be better to normalize to lean mass, since that is different as well, or simply give all mice the same amount of glucose that the control group receives since this is how it is done in humans.

      The dose of glucose injection in GTT based on mouse weight is widely and extensively practiced across the metabolic community. The TcMAC21 mice are markedly more insulin sensitive, supported by multiple independent lines of evidence: 1) Overnight fasting blood glucose and insulin levels are significantly lower in TcMAC21 mice relative to euploid controls (Figure 3G). 2) Insulin tolerance test clearly indicate a substantial improvement in insulin sensitivity in TcMAC21 mice even though the insulin dose injected was much smaller (i.e., insulin dose was based on body weight) (Figure 3K). 3) The insulin response during refeeding, after an overnight fast, is dramatically lower even though the refeeding blood glucose levels rise to the same levels as the euploid controls (Fig. 3L-M). This is similar to the GTT data where the rate of glucose clearance in TcMAC21 mice is the same as the euploid controls despite a dramatically lower insulin response (Fig. 3I-J). Taken together, these data clearly indicate a markedly heightened insulin sensitivity in TcMAC21 mice relative to euploid controls.

      7) The fecal energy in Fig. 4B only measures the concentration of energy per gram of feces. However, this analysis has failed to take into account total fecal excretion, which should be used to multiply the energy density of the feces. Thus, these data are incomplete and not sufficient to exclude absorption differences between the groups. And it is now curious why if all other metabolic measurements (even though wrong), such as food intake and EE are normalized to body weight, why have the authors not normalized to body weight for the feces data? Is this because if this was done this would show massive elevating in fecal energy in TcMAC21 mice and thus falsify their hypothesis?

      The fecal data the reviewer requested was originally in the supplemental figure section. We have now moved these data to the main figure to ensure that this will not be missed by any reader. As indicated in the text and in Fig. 4B, TcMAC21 mice fed a HFD show no difference in fecal frequency (movements/day), fecal weight (g/movement), fecal energy composition (cal/g) and total fecal energy (kcal/day). These data clearly indicate that the fecal energy content is not different between TcMAC21 and euploid mice. These results, along with the rest of the data in the paper, provide compelling evidence that hypermetabolism, and not reduced nutrient absorption in the gut, underlies the lean phenotype and resistance of TcMAC21 mice to weight gain when fed a high-fat diet.

      8) I cannot find any indication of sample size in any of the EE experiments, aside from the bar graph in Fig. 6D. In any case, this experiment only an n=4 to 5 per group. This is an extremely small number for these types of experiments, so how can the authors be sure of reproducibility with such a low sample size? Are all of the other EE experiments also of similarly small sample sizes?

      Sample size for all EE experiments were clearly indicated in the original text, figure legends, and figures themselves, as well as in all supplemental figures and Supplementary files. In addition, for transparency, we always include individual data points, whenever possible, for all our data figures. They were sufficiently powered (n = 8-9 per genotype) and the effect size was large. Sample size for all thermoneutral experiments were lower than both the chow-fed and HFD-fed experiments because these mice are hard to breed and in limited supply.

    1. Author Response

      Reviewer #1 (Public Review):

      How morphogens spread within tissues remains an important question in developmental biology. Here the authors revisit the role of glypicans in the formation of the Dpp gradient in wing imaginal discs of Drosophila. They first use sophisticated genome engineering to demonstrate that the two glypicans of Drosophila are not equivalent despite being redundant for viability. They show that Dally is the relevant glypican for Dpp gradient formation. They then provide genetic evidence that, surprisingly, the core domain of Dally suffices to trap Dpp at the cell surface (suggesting a minor role for GAGs). They conclude with a model that Dally modulates the range of Dpp signaling by interfering with Dpp's degradation by Tkv. These are important conclusions, but more independent (biochemical/cell biological) evidence is needed.

      As indicated above, the genetic evidence for the predominant role of Dally in Dpp protein/signalling gradient formation is strong. In passing, the authors could discuss why overexpressed Dlp has a negative effect on signaling, especially in the anterior compartment. The authors then move on to determine the role of GAG (=HS) chains of Dally. They find that in an overexpression assay, Dally lacking GAGs traps Dpp at the cell surface and, counterintuitively, suppresses signaling (fig 4 C, F). Both findings are unexpected and therefore require further validation and clarification, as outlined in a and b below.

      a) In loss of function experiments (dallyDeltaHS replacing endogenous dally), Dpp protein is markedly reduced (fig 4R), as much as in the KO (panel Q), suggesting that GAG chains do contribute to trapping Dpp at the cell surface. This is all the more significant that, according to the overexpression essays, DallyDeltaHS seems more stable than WT Dally (by the way, this difference should also be assessed in the knock-ins, which is possible since they are YFP-tagged). The authors acknowledge that HS chains of Dally are critical for Dpp distribution (and signaling) under physiological conditions. If this is true, one can wonder why overexpressed dally core 'binds' Dpp and whether this is a physiologically relevant activity.

      According to the overexpression assay, DallyDeltaHS seems more stable than WT Dally (Fig. 4B’, E’, 5H, I). As the reviewer suggested, we addressed the difference using the two knock-in alleles and found that DallyDeltaHS is more stable than WT Dally (Fig.4 L, M inset), further emphasizing the insufficient role of core protein of Dally for extracellular Dpp distribution.

      (During the revising our figure, we found labeling mistake in Fig. 4M, N and Fig. 4Q, R and corrected the genotypes.)

      In summary, we showed that, although Dally interacts with Dpp mainly through its core protein from the overexpression assay (Fig. 4E, I), HS chains are essential for extracellular Dpp distribution (Fig. 4R). Thus, the core protein of Dally alone is not sufficient for extracellular Dpp distribution under physiological conditions. These results raise a question about whether the interaction of core protein of Dally with Dpp is physiologically relevant. Since the increase of HS upon dally expression but not upon dlp expression resulted in the accumulation of extracellular Dpp (Fig. 2) and this accumulation was mainly through the core protein of Dally (Fig. 4E, I), we speculate that the interaction of the core protein of Dally with Dpp gives ligand specificity to Dally under physiological conditions.

      To understand the importance of the interaction of core protein of Dally with Dpp under physiological conditions, it is important to identify a region responsible for the interaction. Our preliminary results overexpressing a dally mutant lacking the majority of core protein (but keeping the HS modified region intact) showed that HS chains modification was also lost. Although this is consistent with our results that enzymes adding HS chains also interact with the core protein of Dally (Fig. 4D), the dally mutant allele lacking the core protein would hamper us from distinguishing the role of core protein of Dally from HS chains.

      Nevertheless, we can infer the importance of the interaction of core protein of Dally with Dpp using dally[3xHA-dlp, attP] allele, where dlp is expressed in dally expressing cells. Since Dally-like is modified by HS chains but does not interact with Dpp (Fig. 2, 4), dally[3xHA-dlp, attP] allele mimics a dally allele where HS chains are properly added but interaction of core protein with Dpp is lost. As we showed in Fig.3O, S, the allele could not rescue dallyKO phenotypes, consistent with the idea that interaction of core protein of Dally with Dpp is essential for Dpp distribution and signaling and HS chain alone is not sufficient for Dpp distribution.

      b) Although the authors' inference that dallycore (at least if overexpressed) can bind Dpp. This assertion needs independent validation by a biochemical assay, ideally with surface plasmon resonance or similar so that an affinity can be estimated. I understand that this will require a method that is outside the authors' core expertise but there is no reason why they could not approach a collaborator for such a common technique. In vitro binding data is, in my view, essential.

      We agree with the reviewer that a biochemical assay such as SPR helps us characterize the interaction of core protein of Dally and Dpp (if the interaction is direct), although the biochemical assay also would not demonstrate the interaction under the physiological conditions.

      However, SPR has never been applied in the case of Dpp, probably because purifying functional refolded Dpp dimer from bacteria has previously been found to be stable only in low pH and be precipitated in normal pH buffer (Groppe J, et al., 1998)(Matsuda et al., 2021). As the reviewer suggests, collaborating with experts is an important step in the future.

      Nevertheless, SPR was applied for the interaction between BMP4 and Dally (Kirkpatrick et al., 2006), probably because BMP4 is more stable in the normal buffer. Although the binding affinity was not calculated, SPR showed that BMP4 directly binds to Dally and this interaction was only partially inhibited by molar excess of exogenous HS, suggesting that BMP4 can interact with core protein of Dally as well as its HS chains. In addition, the same study applied Co-IP experiments using lysis of S2 cells and showed that Dpp and core protein of Dally are co-immunoprecipitated, although it does not demonstrate if the interaction is direct.

      In a subsequent set of experiments, the authors assess the activity of a form of Dpp that is expected not to bind GAGs (DppDeltaN). Overexpression assays show that this protein is trapped by DallyWT but not dallyDeltaHS. This is a good first step validation of the deltaN mutation, although, as before, an invitro binding assay would be preferable.

      Our overexpression assays actually showed that DppDeltaN is trapped by DallyWT and by dallyDeltaHS at similar levels (Fig. 5H-J), indicating that interaction of DppDeltaN and HS chains of Dally is largely lost but DppDeltaN can still interact with core protein of Dally.

      (Related to this, we found typo in the sentence “In contrast, the relative DppΔN accumulation upon DallyΔHS expression in JAX;dppΔN was comparable to that upon DallyΔHS expression in JAX;dppΔN (Fig. 5H-J).” and corrected as follows, “In contrast, the relative DppΔN accumulation upon Dally expression in JAX;dppΔN was comparable to that upon DallyΔHS expression in JAX;dppΔN (Fig. 5H-J).”

      We thank the reviewer for the suggesting the in vitro experiment. Although we decided not to develop biophysical experiments such as SPR for Dpp in this study due to the reasons discussed above, we would like to point out that our result is consistent with a previous Co-IP experiment using S2 cells showing that DppDeltaN loses interaction with heparin (Akiyama2008).

      However, in contrast to our results, the same study also proposed by Co-IP experiments using S2 cells that DppDeltaN loses interaction with Dally (Akiyama2008). Although it is hard to conclude since western blotting was too saturated without loading controls and normalization (Fig. 1C in Akiyama 2008), and negative in vitro experiments do not necessarily demonstrate the lack of interaction in vivo. One explanation why the interaction was missed in the previous study is that some factors required for the interaction of DppDeltaN with core protein of Dally are missing in S2 cells. In this case, in vivo interaction assay we used in this study has an advantage to robustly detect the interaction.

      Nevertheless, the authors show that DppDeltaN is surprisingly active in a knock-in strain. At face value (assuming that DeltaN fully abrogates binding to GAGs), this suggests that interaction of Dpp with the GAG chains of Dally is not required for signaling activity. This leads to authors to suggest (as shown in their final model) that GAG chains could be involved in mediating the interactions of Dally with Tkv (and not with Dpp. This is an interesting idea, which would need to be reconciled with the observation that the distribution of Dpp is affected in dallyDeltaHS knock-ins (item a above). It would also be strengthened by biochemical data (although more technically challenging than the experiments suggested above). In an attempt to determine the role of Dally (GAGs in particular) in the signaling gradient, the paper next addresses its relation to Tkv. They first show that reducing Tkv leads to Dpp accumulation at the cell surface, a clear indication that Tkv normally contributes to the degradation of Dpp. From this they suggest that Tkv could be required for Dpp internalisation although this is not shown directly. The authors then show that a Dpp gradient still forms upon double knockdown (Dally and Tkv). This intriguing observation shows that Dally is not strictly required for the spread of Dpp, an important conclusion that is compatible with early work by Lander suggesting that Dpp spreads by free diffusion. These result show that Dally is required for gradient formation only when Tkv is present. They suggest therefore that Dally prevents Tkv-mediated internalisation of Dpp. Although this is a reasonable inference, internalisation assays (e.g. with anti-Ollas or anti-HA Ab) would strengthen the authors' conclusions especially because they contradict a recent paper from the Gonzalez-Gaitan lab.

      Thanks for suggesting the internalization assay. As we discussed in the discussion, our results suggest that extracellular Dpp distribution is severely reduced in dally mutants due to Tkv mediated internalization of Dpp (Fig. 6). Thus, extracellular Dpp available for labelling with nanobody is severely reduced in dally mutants, which can explain the reduced internalization of Dpp in dally mutants in the internalization assay. Therefore, we think that the nanobody internalization assay would not distinguish the two contradicting possibilities.

      The paper ends with a model suggesting that HS chains have a dual function of suppressing Tkv internalisation and stimulating signaling. This constitutes a novel view of a glypican's mode of action and possibly an important contribution of this paper. As indicated above, further experiments could considerably strengthen the conclusion. Speculation on how the authors imagine that GAG chains have these activities would also be warranted.

      Thank you very much!

      Reviewer #2 (Public Review):

      The authors are trying to distinguish between four models of the role of glypicans (HSPGs) on the Dpp/BMP gradient in the Drosophila wing, schematized in Fig. 1: (1) "Restricted diffusion" (HSPGs transport Dpp via repetitive interaction of HS chains with Dpp); (2) "Hindered diffusion" (HSPGs hinder Dpp spreading via reversible interaction of HS chains with Dpp); (3) "Stabilization" (HSPGs stabilize Dpp on the cell surface via reversible interaction of HS chains with Dpp that antagonizes Tkv-mediated Dpp internalization); and (4) "Recycling" (HSPGs internalize and recycle Dpp).

      To distinguish between these models, the authors generate new alleles for the glypicans Dally and Dally-like protein (Dlp) and for Dpp: a Dally knock-out allele, a Dally YFP-tagged allele, a Dally knock-out allele with 3HA-Dlp, a Dlp knock-out allele, a Dlp allele containing 3-HA tags, and a Dpp lacking the HS-interacting domain. Additionally, they use an OLLAS-tag Dpp (OLLAS being an epitope tag against which extremely high affinity antibodies exist). They examine OLLAS-Dpp or HA-Dpp distribution, phospho-Mad staining, adult wing size.

      They find that over-expressed Dally - but not Dlp - expands Dpp distribution in the larval wing disc. They find that the Dally[KO] allele behaves like a Dally strong hypomorph Dally[MH32]. The Dally[KO] - but not the Dlp[KO] - caused reduced pMad in both anterior and posterior domains and reduced adult wing size (particularly in the Anterior-Posterior axis). These defects can be substantially corrected by supplying an endogenously tagged YFP-tagged Dally. By contrast, they were not rescued when a 3xHA Dlp was inserted in the Dally locus. These results support their conclusion that Dpp interacts with Dally but not Dlp.

      They next wanted to determine the relative contributions of the Dally core or the HS chains to the Dpp distribution. To test this, they over-expressed UAS-Dally or UAS-Dally[deltaHS] (lacking the HS chains) in the dorsal wing. Dally[deltaHS] over-expression increased the distribution of OLLAS-Dpp but caused a reduction in pMad. Then they write that after they normalize for expression levels, they find that Dally[deltaHS] only mildly reduces pMad and this result indicates a major contribution of the Dally core protein to Dpp stability.

      Thanks for the comments. We actually showed that compared with Dally overexpression, Dally[deltaHS] overexpression only mildly reduces extracellular Dpp accumulation (Fig. 4I). This indicates a major contribution of the Dally core protein to interaction with Dpp, although the interaction is not sufficient to sustain extracellular Dpp distribution and signaling gradient.

      The "normalization" is a key part of this model and is not mentioned how the normalization was done. When they do the critical experiment, making the Dally[deltaHS] allele, they find that loss of the HS chains is nearly as severe as total loss of Dally (i.e., Dally[KO]). Additionally, experimental approaches are needed here to prove the role of the Dally core.

      Since the expression level of Dally[deltaHS] is higher than Dally when overexpressed, we normalized extracellular Dpp distribution (a-Ollas staining) against GFP fluorescent signal (Dally or Dally[deltaHS]). To do this, we first extracted both signal along the A-P axis from the same ROI. The ratio was calculated by dividing the intensity of a-Ollas staining with the intensity of GFP fluorescent signal at a given position x. The average profile from each normalized profile was generated and plotted using the script described in the method (wingdisc_comparison.py) as other pMad or extracellular staining profiles.

      Although this analysis provides normalized extracellular Dpp accumulation at different positions along the A-P axis, we are more interested in the total amount of Dpp or DppDeltaN accumulation upon Dally or dallyDeltaHS expression. Therefore, we plan to analyze the normalized total amount of Dpp against GFP fluorescent signal (Dally or Dally[deltaHS]) in the revised ms. In this case, normalization will be performed by dividing total signal intensity of extracellular Dpp staining (ExOllas staining) divided by GFP fluorescent signal (Dally or Dally[deltaHS]) in ROI in each wing disc.

      We agree with the reviewer that additional experimental approaches are needed to address the role of the core protein of Dally. As we discussed in the response to the reviewer1, to understand the importance of the interaction of core protein of Dally with Dpp, it is important to identify a region responsible for the interaction. Our preliminary results overexpressing a dally mutant lacking the majority of core protein (but keeping the HS modified region intact) showed that HS chains modification was also lost. Although this is consistent with our results that enzymes adding HS chains also interact with the core protein of Dally (Fig. 4D), the dally mutant allele lacking the core protein would hamper us from distinguishing the role of the core protein of Dally from HS chains.

      Nevertheless, we can infer the importance of the interaction of core protein of Dally with Dpp using dally[3xHA-dlp, attP] allele, where dlp is expressed in dally expressing cells. Since Dally-like is modified by HS chains but does not interact with Dpp (Fig. 2, 4), dally[3xHA-dlp, attP] allele mimics a dally allele where HS chains are properly added but interaction of core protein with Dpp is lost. As we showed in Fig.3O, S, the allele could not rescue dallyKO phenotypes, consistent with the idea that interaction of core protein of Dally with Dpp is essential for Dpp distribution and signaling.

      Prior work has shown that a stretch of 7 amino acids in the Dpp N-terminal domain is required to interact with heparin but not with Dpp receptors (Akiyama, 2008). The authors generated an HA-tagged Dpp allele lacking these residues (HA-dpp[deltaN]). It is an embryonic lethal allele, but they can get some animals to survive to larval stages if they also supply a transgene called “JAX” containing dpp regulatory sequences. In the JAX; HA-dpp[deltaN] mutant background, they find that the distribution and signaling of this Dpp molecule is largely normal. While over-expressed Dally can increase the distribution of HA-dpp[deltaN], over-expression of Dally[deltaHS] cannot. These latter results support the model that the HS chains in Dally are required for Dpp function but not because of a direct interaction with Dpp.

      Our overexpression assays actually showed that both Dally and Dally[deltaHS] can accumulate Dpp upon overexpression and the accumulation of Dpp is comparable after normalization (Fig. 5H-J), consistent with the idea that interaction of DppdeltaN and HS chains are largely lost. As the reviewer pointed out, these results support the model that the HS chains in Dally are required for Dpp function but not because of a direct interaction with Dpp.

      In the last part of the results, they attempt to determine if the Dpp receptor Thickveins (Tkv) is required for Dally-HS chains interaction. The 2008 (Akiyama) model posits that Tkv activates pMad downstream of Dpp and also internalizes and degrades Dpp. A 2022 (Romanova-Michaelides) model proposes that Dally (not Tkv) internalizes Dpp.

      To distinguish between these models, the authors deplete Tkv from the dorsal compartment of the wing disc and found that extracellular Dpp increased and expanded in that domain. These results support the model that Tkv is required to internalize Dpp.

      They then tested the model that Dally antagonizes Tkv-mediated Dpp internalization by determining whether the defective extracellular Dpp distribution in Dally[KO] mutants could be rescued by depleting Tkv. Extracellular Dpp did increase in the D vs V compartment, potentially providing some support for their model. However, there are no statistics performed, which is needed for full confidence in the results. The lack of statistics is particularly problematic (1) when they state that extracellular Dpp does not rise in ap>tkv RNAi vs ap>tkv RNAi, dally[KO] wing discs (Fig. 6E) or (2) when they state that extracellular Dpp gradient expanded in the dorsal compartment when tkv was dorsally depleted in dally[deltaHS] mutants (Fig. 6I). These last two experiments are important for their model but the differences are assessed only visually. In fact, extracellular Dpp in ap>tkv RNAi, dally[KO] (Fig. 6B) appears to be lower than extracellular Dpp in ap>tkv RNAi (Fig. 6A) and the histogram of Dpp in ap>tkv RNAi, dally[KO] is actually a bit lower than Dpp in ap>tkv RNAi, But the author claim that there is no difference between the two. Their conclusion would be strengthened by statistical analyses of the two lines.

      We will provide the statistical analyses in the revised ms.

      Strengths:

      1) New genomically-engineered alleles

      A considerable strength of the study is the generation and characterization of new Dally, Dlp and Dpp alleles. These reagents will be of great use to the field.

      Thanks. We hope that these resources are indeed useful to the field.

      2) Surveying multiple phenotypes

      The authors survey numerous parameters (Dpp distribution, Dpp signaling (pMad) and adult wing phenotypes) which provides many points of analysis.

      Thanks!

      Weaknesses:

      1) Confusing discussion regarding the Dally core vs HS in Dpp stability. They don't provide any measurements or information on how they "normalize" for the level of Dally vs Dally[deltaHS]? This is important part of their model that currently is not supported by any measurements.

      We explained how we normalized in the above section. We will update the analysis in the revised ms.

      2) Lacking quantifications and statistical analyses:

      a) Why are statistical significance for histograms (pMad and Dpp distribution) not supplied? These histograms provide the key results supporting the authors' conclusions but no statistical tests/results are presented. This is a pervasive shortcoming in the current study.

      Thanks. We will provide statistics in the revised ms.

      b) dpp[deltaN] with JAX transgene - it would strengthen the study to supply quantitative data on the percent survival/lethal stage of dpp[deltaN] mutants with or without the JAK transgene

      In this study, we are interested in the role of dpp[deltaN] during the wing disc development. Therefore, we decided not to perform the detailed analysis on the percent survival/lethal stage of dpp[deltaN] mutants with or without the JAX transgene in the current study. Nevertheless, the fact that dpp[deltaN] allele is maintained with a balanced stock and JAX;dpp[deltaN] allele can be maintained as homozygous stock indicates that the lethality of dpp[deltaN] allele comes from the early stages. Indeed, our preliminary results showed that pMad signal is severely lost in the dpp[deltaN] embryo without JAX (data not shown), indicating that the allele is lethal at early embryonic stages.

      c) The graphs on wing size etc should start at zero.

      Thanks. We corrected this in the current ms.

      d) The sizes of histograms and graphs in each figure should be increased so that the reader can properly assess them. Currently, they are very small.

      Thanks. We changed the sizes in the current ms.

      The authors' model is that Dally (not Dlp) is required for Dpp distribution and signaling but that this is not due to a direct interaction with Dpp. Rather, they posit that Dally-HS antagonize Tkv-mediated Dpp internalization. Currently the results of the experiments could be considered consistent with their model, but as noted above, the lack of statistical analyses of some parameters is a weakness.

      Thanks. We will perform the statistical analyses in the revised ms.

      One problematic part of their result for me is the role of the Dally core protein (Fig. 7B). There is a mis-match between the over-expression results and Dally allele lacking HS (but containing the core). Finally, their results support the idea that one or more as-yet unidentified proteins interact with Dally-HS chains to control Dpp distribution and signaling in the wing disc.

      Our results simply suggest that Dpp can interact with Dally mainly through core protein but this interaction is not sufficient to sustain extracellular Dpp gradient formation under physiological conditions (dallyDeltaHS) (Fig. 4Q). We find that the mis-match is not problematic if the role of Dally is not simply mediated through interaction with Dpp. We speculate that interaction of Dpp and core protein of Dally is transient and not sufficient to sustain the Dpp gradient without HS chains of Dally stabilizing extracellular Dpp distribution by blocking Tkv-mediated Dpp internalization.

      There is much debate and controversy in the Dpp morphogen field. The generation of new, high quality alleles in this study will be useful to Drosophila community, and the results of this study support the concept that Tkv but not Dally regulate Dpp internalization. Thus the work could be impactful and fuel new debates among morphogen researchers.

      Thanks.

      The manuscript is currently written in a manner that really is only accessible to researchers who work on the Dpp gradient. It would be very helpful for the authors to re-write the manuscript and carefully explain in each section of the results (1) the exact question that will be asked, (2) the prior work on the topic, (3) the precise experiment that will be done, and (4) the predicted results. This would make the study more accessible to developmental biologists outside of the morphogen gradient and Drosophila communities.

      Thanks. We will modify our texts to help non-experts understand our story in the revised ms.

    1. Author Response

      Reviewer #2 (Public Review):

      Major points:

      1). This study does not provide any evidence about the cell death of the transplanted cells. The immunostaining of the Caspase-3 or TUNEL staining should be used to address this issue.

      We have conducted immunostaining of Caspase-3 at 7 days after transplantation using the human-specific STEM121 antibody to demonstrate the transplanted cells. We have added the results to Figure 3A and modified the text accordingly (Page 8, Line 156-165).

      2). The authors showed that the neurological functions (evaluated by balance beam, ladder lung, rotarod test and Modified Neurological Severity Score (mNSS) up to 8 weeks after treatment (Figure 1C)) were significantly improved in the NES+Exo group compared to their control groups. However, these cells (transplanted cells) are progenitors (Nestin+) or undifferentiated cells (Tuj1+) at this stage (Figure 3). Thus, I was curious about that how can the immature neurons play neurological functions? This point should be explained.

      We agree with the reviewer’s insightful comments. We have performed immunostaining using antibodies against the post-mitotic mature neuron marker RBFOX3/NeuN, post-synaptic marker PSD-95 and human-specific STEM121 at 4 weeks after transplantation. The results confirmed that NeuN+/STEM121+ and PSD-95+/STEM121+ mature neurons appeared in NSC group and increased in NSC+Exo group (Figure 3B and Figure 3 - supplement 1D). Furthermore, our additional data showed that the expression of presynaptic marker SYN1 was increased in both NSC and NSC+Exo groups at 8 weeks after treatment. Therefore, we believe that there are mature neurons and newly formed synapses involved in neurological functions.

      3). The authors used the Golgi staining to show the NES+Exo can improve dendritic density and length. How do you know these neurons are transplanted cells?

      Our data show that mature neurons and synapses are generated by the transplanted cells (please also see response to reviewer #2-major ponts #2). We believe that the newly generated neurons partly contribute to the improved dendritic density and length. However, we agree that the neurons with increased dendritic density and length may be both survived local neurons and those generated by the transplanted cells.

      4). The cell morphology of tdTomato+ cells is fuzzy and it is difficult to distinguish the cell body. It looks like that these cells out of whack.

      We have immunostaining using the human-specific STEM121 antibody to demonstrate the transplanted cells and more neuronal markers such as RBFOX3/NeuN to identify NSC differentiation (Figure 3A and 3B; Figure 3 - supplement 1C and 1D).

    1. Author Response

      Reviewer #1 (Public Review):

      Lemerle et al utilize elegant imaging and molecular biology approaches to convincingly demonstrate the presence of Bin1 and caveolae containing rings capable of tubulation in developing muscle. The data is of fundamental potential significance as it advances our understanding of t-tubule biogenesis, which represents a major knowledge gap in muscle biology. The paper will be of broad interest to skeletal and cardiac muscle biologists and physiologists. The paper is well written, with a comprehensive yet concise introduction, clearly presented results, and an appropriate discussion. The imaging is spectacular, and the use of CLEM provides compelling validation of the protein constituents of ring structures identified via EM. When combined with time-lapse imaging, the combination of approaches provides powerful nanoscale structural information alongside temporal dynamics and live-cell confirmation of tubulating ability by Bin1-Cav3 containing rings. The data indicate that Bin1 is sufficient to generate circular structures that are subsequently decorated by caveolae which facilitate tubule formation at the membrane, and they support the requirement of both Bin1 and Cav3 for efficient tubule initiation and elongation. The authors also utilize myotubes from patients with cav3 mutations to explore whether altered ring formation may contribute to muscle pathology - however, this section requires additional controls and validation to confer pathological insight. Further, additional quantification of imaging data across the study is required to increase the rigor and strength of the conclusions of this work.

      We would like to thank reviewer #1 for his appreciation of our work, in particular the imaging experiments and for deeming our overall conclusions convincing. We have now performed additional experiments on patient myotubes including a rescue of Cav3, performed rigorous quantifications of rings and tubules under our different experimental conditions and re-wrote corresponding parts of the of the discussion to increase the strength of our conclusions.

      Reviewer #2 (Public Review):

      In this work Lemerle et al. provide long-awaited insight into how transverse tubules develop in skeletal muscle. Together with the sarcoplasmic reticulum transverse tubules form the triad, a specialized structure required for excitation-contraction coupling in skeletal muscle. Defects in transverse tubules or the triad can lead to problems such as muscular dystrophy. Whilst the involvement of specialist membrane structures (caveolae) and the membrane-bending protein Bin1 have long been recognized the precise mechanism of how caveolae and Bin1 cause transverse tubules to form and extend has remained unknown. This work provides compelling evidence, correlating antibody labelling with electron microscopy, to support the concept that caveolae rings form underneath the cell membrane which is surrounded by the endo/sarcoplasmic reticulum. These rings contain caveolin-3 and Bin1 and the authors show Bin1 enriched tubes extend from multiple points on these rings. Their data suggest that Bin1 assembles to initially form these scaffolds that then recruit the caveolae to form the ring. In addition, tubules appear continuous with the extracellular environment which is necessary for their function of facilitating calcium release during excitationcontraction coupling. In patients with mutations in caveolin-3 the caveolin ring formation as well as Bin1 tubulation were defective which may play a role in the pathology. The elegant experiments including time-lapse work clearly support the conclusions of the authors.

      The ability of the authors to combine labelling studies with advanced microscopy to show the underlying structures provides very strong evidence for the proposed mechanisms. The authors suggest that the muscle-specific isoforms of BIN1 are key to tubule extension from caveolae rings but it would be interesting for them to discuss how this fits with studies suggesting that constitutive Bin1 isoforms can also form transverse tubules. It would also be interesting to understand the authors' views on whether caveolae rings are involved in the turnover of transverse tubules in adult myotubes as well as the initial formation and, additionally, if the caveolae rings are restricted to the region just under the surface membrane.

      Insight into how transverse tubules are formed sets the groundwork for future therapies. This is clearly important for skeletal muscle myopathies but should also be considered in the heart. Cardiac transverse tubule loss and disorder play an important role in dysfunction in heart failure and atrial fibrillation and as such lessons learned in skeletal muscle may be successfully applied to the heart.

      We would like to thank reviewer #2 for this appreciation of our work. We agree with the points raised and have updated our discussion section to highlight these points.

      Reviewer #3 (Public Review):

      T-tubules are an elaborate series of membrane invaginations that bring membrane voltageactivated Ca2+ channels in close apposition to the sarcoplasmic reticulum containing RyR, allowing for Ca2+-induced Ca2+ release. They serve as critical hubs of excitation-contraction coupling and play a central role in myopathies and inherited and acquired cardiomyopathies. Several membrane structures and proteins have been implicated in striated muscle t-tubule biogenesis, but the specific mechanisms of early t-tubule biogenesis are not defined. Lemerle et al here investigate the biogenesis of transverse tubules in skeletal muscle. They use skeletal myoblasts from murine and human muscle as well as sophisticated high-resolution microscopy, live cell imaging, and adenoviral targeting to forward a model of BIN1 mediated caveolae ring formation which give rise to DHPR enriched t-tubules and associate with SR. While they demonstrate that BIN1 and Cav3 enriched caveolae act together to form t-tubules, the precise pathophysiological mechanisms by which this process acts in disease remain unclear. Strengths of the study consist in the use of both murine and human skeletal muscle experiments, suggesting a conserved molecular mechanism; the innovative approach of correlative light and electron microscopy, and the use of pathological specimens. The live cell timelapse provides imaging evidence of Cav3-enriched caveolae-rings forming in centers of high BIN1 enrichment, from which t-tubules emanate. This is novel evidence in support of the biogenesis model proposed by the authors. The pathological correlation of their model is promising but limited. Specifically, while the study of Cav3 mutant specimens is used to show the Cav3 dependence of BIN 1 action (in experiments using BIN 1 overload), the authors have not tested the sufficiency of their proposed mechanism by rescuing the pathologic state. Moreover, the conditions of development likely have an important effect on the studied mechanism - such as mechanical loading, contractile state, neurohormonal environment, and so on. Furthermore, a more complete description of the precise molecular binding sites between BIN1 and Cav3 would be important. While exon11 is required for tubulation, BIN1 not expressing exon 11 appears sufficient to assemble caveolar rings, suggesting this is mediated by other specific BIN1 regions.

      Overall, the study provides new details on early t-tubule biogenesis in skeletal muscle (likely shared with other striated muscle) and lays the foundations for further definition of the precise molecular mechanisms.

      We would like to thank reviewer #3 for the appreciation of our work. We have now performed additional experiments on patient myotubes including rescue experiments, analysis of key excitationcontraction coupling proteins by Western blot and quantification of caveolae rings and tubules to strengthen our claims with patient myotubes.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Mastrototaro et al. perform a series of experiments in transgenic murine models assessing the function of Palladin (PALLD) in the heart. Global PALLD KOs are embryonic lethal, precluding the assessment of the roles of this protein in adulthood. To circumvent this limitation, the authors generated a floxed Palld allele and ablated it with two cardiomyocyte-specific Cres: the constitutively active Myh6-Cre and the tamoxifen-inducible aMHC-MerCreMer. Interestingly, ablation with the constitutive Cre (cKO) did not produce any overt phenotype, but ablation in adulthood (cKOi) resulted in compromised cardiac function. These observations suggest a compensation mechanism that takes place when cardiomyocytes develop in the complete absence of this protein but not when cardiomyocytes develop in a wild-type background and are deprived of this protein after achieving full maturation. These experiments were complemented with yeast two-hybrid techniques to identify novel partners that bind to a region of PALLD for each no interactants had been previously identified. Experiments in human samples revealed an upregulation of PALLD transcripts in the hearts of patients.

      This manuscript adds important information to our understanding of sarcomeric proteins. Data are generally of good quality and well presented in figures. The numbers of animals in echocardiographic studies are also adequate for proper conclusions. Authors achieve most of their goals, including the identification of novel partners of PALLD and the identification of a requirement for PALLD in cardiomyocytes for normal heart function. However, given that all experiments performed in this study were focused on the loss-of-function of PALLD, it is not clear what is the relevance of the PALLD upregulation observed in human patients. Authors should clearly state this limitation in their results.

      Considering that authors have observed evidence for nuclear PALLD, which could hint at potential major gene expression changes when this protein is ablated, it would be interesting to perform an unbiased assessment of transcriptional alterations (RNA-seq) in cardiomyocytes isolated from control and cKOi hearts. In addition, to test if the compensation observed in the embryonic cKO involves mechanisms of transcriptional adaptation, it would be interesting to compare RNA-seq results from cKOi and cKO (genes encoding proteins similar to PALLD that are upregulated in cKO but not cKOi cardiomyocytes would be very strong candidates). However, these transcriptomic data are not essential to support current findings and can be performed in follow-up studies.

      We agree with the reviewer that it would be interesting to perform RNA-Seq on isolated cardiomyocytes from cPKOi mice and we are in fact planning to do this in a follow-up study.

      Reviewer #2 (Public Review):

      The role of the actin-binding protein palladin (PALLD) in cardiomyocyte development, growth, and function has not been defined. In order to address this question, the authors first identified that CARP and FHOD1 interact with PALLD in cardiomyocytes. They then performed cardiomyocyte selective deletion of PALLD in embryonic and adult mice and discovered that deletion of PALLD in adult mice leads to dilated cardiomyopathy (DCM) and intercalated disc ultrastructural changes. In contrast, embryonic deletion of cardiomyocyte PALLD did not cause a cardiomyopathy phenotype in neonatal or adult animals.

      1. The divergent cardiac phenotypes of the embryonic deletion of cardiomyocyte PALLD (no cardiomyopathy) versus the adult deletion of cardiomyocyte PALLD (dilated cardiomyopathy(DCM)) is an interesting result. The authors speculate that embryonic deletion of PALLD induces compensatory pathways that prevent the development of adult cardiomyopathy in these mice. However, these compensatory pathways remain unexplored.<br /> 2. The authors discovered that mice with adult cardiomyocyte deletion of PALLD had significant changes in the cardiomyocyte intercalated disc (ICD) ultrastructure. They suggest these changes in ICD ultrastructure contribute to DCM formation in the adult PALLD deletion mice (line 270). However, it remains unclear if these changes in ICD ultrastructure are specific to mice with adult deletion of PALLD.<br /> 3. The different transgenic Cre mouse lines may be an alternative explanation for the divergent cardiac phenotypes in the embryonic versus adult deletion of cardiomyocyte PALLD. The tamoxifen dose administered for the inducible Myh6:MerCreMer mice was 30mg/kg/day x 5 which has been reported to lead to the induction of cardiomyocyte DNA damage response pathways (Dis Model Mech. 2013 Nov; 6(6): 1459-1469, J Cardiovasc Aging 2022;2:8). The electron micrograph experiments in Figure 5 did not include a group of Myh6:MerCreMer mice administered tamoxifen. The authors only compared PALLD fl/fl and Myh6:MerCreMer/PALLD fl/fl mice.

      In the papers that the Reviewer refers to it was shown that administration of tamoxifen to Myh6:MerCreMer mice at a dose of 30 mg/kg/day for 3 (Bersell et al., Dis Model Mech. 6, 1459-1469, 2013) or 5 days (Rouhi et al., J Cardiovasc Aging 2, 8, 2022) is not associated with apoptosis. Bersell et al., found that amounts ≥40 mg/kg/day for 3 days is associated with apoptosis, and Rouhi et al., showed that injection of 30 mg/kg/day for 5 days causes transient minor changes in gene expression with no discernible effects on cardiac function, myocardial fibrosis, apoptosis, or induction of double-stranded DNA breaks. The reason that we chose to inject tamoxifen at an amount of 30 mg/kg/day for 5 days was in fact that this amount has been shown not to be associated with severe effects and has been widely used in the literature.

      4. The apoptosis assessment was performed 24 weeks after administration of tamoxifen to the Myh6:MerCreMer/PALLD fl/fl mice. However, cardiomyocyte apoptosis may have occurred much earlier if it was secondary to Myh6:MerCreMer tamoxifen-induced cardiotoxicity (or related to PALLD deletion).<br /> 5. The animal studies in Fig 3D show a DCM phenotype in mice with adult deletion of cardiomyocyte 200kDa PALLD which suggests a potential loss of function mechanism for DCM formation. However, the authors then report in Fig 6 that human DCM heart tissue samples have a ~2.5fold increase in mRNA expression of the 200kDa PALLD transcript which would suggest a possible gain of function mechanism for DCM formation. How do the authors reconcile these divergent results with regard to palladin's role in cardiomyocyte homeostasis and cardiomyopathy formation?

      In the revised manuscript we demonstrate that the transcriptional changes in PALLD expression are not reflected at the protein level.

      Reviewer #3 (Public Review):

      This study shows for the first time changes in palladin expression under disease conditions and mRNA alterations in human samples. The authors have identified novel binding partners for the protein as a first step toward determining how palladin mediates its effects in the heart. Finally, through the use of mouse models to decrease palladin expression they identify a crucial role for palladin in the cardiac response to pathological stress, with some interesting findings that show the effects of palladin depend on when the protein is altered.

      We appreciate that the Reviewer finds our study interesting. However, we did not show a role of PALLD in the cardiac response to pathological stress. On the contrary, we demonstrated that mice with constitutive knockout of PALLD in the heart (cPKO mice) show no pathological cardiac phenotype either under basal conditions or in response to mechanical pressure overload by transaortic constriction. On the other hand, deletion of PALLD in adult mice resulted in DCM under basal conditions within 8 weeks after tamoxifen induction.

      The novel findings of the study are supported by the data presented, but there are several instances where clarification is needed of the conclusions drawn from the data reach beyond what is presented in the Results section.

      The focus on only male mice is a significant limitation of the paper, as it is well known that there are profound sex differences in the response to pathological stressors. While the ability to obtain sufficient heart samples from male and female patients may be a reasonable justification for focusing on males, the preclinical mouse model should have been examined in both sexes and the limitation of this choice should be clearly noted in the paper.

      Due to the three Rs and the high costs associated with the breeding of the high amount mice required for the project, we chose to focus only on male mice.

      In line 537-539, we stated. “All experiments were performed on male mice as females often develop a less severe cardiac phenotype due to the cardioprotective role of estrogen (Brower, Gardner, & Janicki, 2003; Du, 2004).

      The changes in myopalladin expression were not measured in the disease model (TAC), which limits the ability to determine if myopalladin was altered in the disease state. This addition would strengthen the study.

      We have previously demonstrated that myopalladin protein levels are significantly reduced after TAC in wildtype mice (Figure 6K, L in Filomena et al., eLife 10:e58313, 2021). We did not measure myopalladin levels in cPKO subjected to TAC and unfortunately don’t have tissue from cPKO mice to perform the measurements.

      Finally, the myofilament data are presented as evidence that changes in the contractile apparatus are contributors to the observed contractile dysfunction at the organ level. But these studies were conducted using levels of calcium that far exceed what is seen in vivo and, therefore, do not support the conclusion drawn.

      The reviewer is right that the myofibril experiments were conducted at Ca2+ concentrations that cannot be reached under the physiological conditions of cardiac contraction. However, the result clearly demonstrates that the intrinsic force generating capacity of the cardiac sarcomeres of cPKOi mice is impaired 8 weeks after TAM independently from any changes in myofilament Ca2+ sensitivity and cardiomyocyte Ca2+ handling. Experiments at lower (more physiological) Ca2+ concentrations would have produced less clear results in the absence of a full investigation of the relation between force and [Ca2+]. Since data demonstrate that cross bridge mechanics and kinetics are not affected, the reported finding supports the idea that a myofibril structural defect is responsible for the lower maximal force of the KO sarcomeres.

    1. Author Response:

      Reviewer #1 (Public Review):

      This study presents a resource aiming to unify language and rules used in the literature to describe, curate and assess biology experiments, published or not. Focusing on host-pathogen interactions, the work presents a new ontology and controlled vocabulary, as well as rules to describe 'metagenotypes', a term coined for the joint description of interacting host-pathogen genotypes. 'PHI-Canto' extends a previous resource by also enabling using UniProtKB IDs to curate proteins. Among other important by-products, PHI-Canto could contribute to damping proliferating names and acronyms for genes, processes, and interactions; a chronic annoyance in the biosciences.

      The tool does give the impression that, with sufficient time and usage, it could become a rich and robust resource. Just addressing the Uniprot IDs issue is a nice move.

      We thank the reviewer for their positive comments and acknowledgement of the importance of using unified language in literature curation. We are pleased to see that our effort to improve interoperability and use existing resources has been recognized. We are also pleased that this reviewer recognizes the additional benefits of choosing to use UniProtKB accession numbers. 

      Reviewer #2 (Public Review):

      In this paper, the authors propose a system for annotating and curating scientific publications in the context of interspecies host-pathogen interactions. This system, called PHI-Canto (the Pathogen-Host Interaction Community Annotation Tool), is an extension of an existing tool (called Canto). In addition, they present the development of new concepts, controlled vocabularies, and an ontology for annotating relevant aspects in this domain, called PHIPO (Pathogen-Host Interaction Phenotype Ontology).

      The approach has been empirically validated by annotating ten publications. The application's source code is available, as well as the associated ontologies and vocabularies and an example of the data resulting from the annotation process.

      We thank the reviewer for their positive comments on our framework for curating interspecies interactions literature. We are pleased that the reviewer has recognized that the source code, associated ontologies and curated data are freely available for others to use. We are delighted that the reviewer found the curation of ten trial publications in PHI-Canto informative and benefited from the worked curation examples.

      Reviewer #3 (Public Review):

      In this work, the authors have built a framework for the annotation of interactions between species. The framework includes ontologies, methodologies, and an annotation tool called PHI-Canto. The framework makes use of multiple existing ontologies that are in wide use in the biocuration community. In addition, the authors have built their own project-specific controlled vocabularies and ontologies for the capture of pathogen-host interaction phenotypes (PHIPO), diseases (PHIDO), and environmental conditions (PHI-ECO). Their work builds on and extends methods that have been developed within the Gene Ontology Consortium and model organism databases. The tool PHI-Canto is an extension of the tool Canto developed by PomBase for curation. The authors used this framework to annotate pathogen-host interactions within the Pathogen-Host Interactions Database.

      Strengths: The manuscript is well-written and includes significant detail regarding curation policies/methods and the use of the actual PHI-Canto tool. The appendices are very detailed and provide useful illustrations of the annotation practices and tool interface. The work has built upon and extended well-established standards and methods that have proven their utility over many years of use in the biocuration community. The authors have rigorously tested their framework with the curation of a variety of publications providing a diverse assortment of annotation challenges. The concept of a "metagenotype" is important and providing such a structured system for the capture of this information is useful. All of the materials produced by the work are completely freely available for use by the wider community.

      Weaknesses: There are some areas of the manuscript and appendices which are a bit confusing and could be improved. The authors have developed their own set of disease terms (PHIDO) but do not comment on why existing disease terminologies (such as Mondo or DO) were not used or if the PHIDO terms relate to those other vocabularies. There is no discussion of the possible use of a graph representation for the capture of this complex information (which is being done in many settings including the Gene Ontology with GO Causal Activity Models (GO-CAMs)) or why such a structure was not used. Although the abstract talks about the use of the framework within the PHI database as a test case for broader use regarding interspecies interactions, there is no mention of extending the use of the tool to other species interaction communities beyond pathogen-host interactions.

      We thank the reviewer for their detailed response. We are pleased that the reviewer found the manuscript to be well-written and informative with useful examples. We thank the reviewer for their helpful suggestions to improve the appendices and manuscript text.

      We would like to clarify that PHIDO is not intended to compete with existing disease ontologies: it is instead being used as a placeholder, until the time when its terms can be replaced with terms from existing disease ontologies. PHIDO was an expedient solution, in the sense that it provided the fastest way for us to test the process of curating diseases with PHI-Canto. This is because we only had to convert the existing list of disease names already in PHI-base into a controlled vocabulary, thus removing the need to wait for maintainers of other ontologies to add terms for us (as reported in Urban et al., 2022).

      Additionally, we were required to use terms from PHIDO due to the lack of representation for plant and animal diseases in existing ontologies or vocabularies. Plant disease, in particular, is very underrepresented, with the ontologies we surveyed having either inappropriate semantics (e.g. the Plant Trait Ontology focusing on traits related to disease, rather than the diseases themselves) or still being in development (e.g. the Plant Stress Ontology). The majority of source ontologies used by MONDO are human-centric, and DO is exclusively for human disease, yet human disease represents only part of the focus of PHI-base (~35%). Furthermore, our choice of vocabularies is limited by the fact that Canto currently only supports ontologies in OBO format (for historical reasons).

      We have begun the process of harmonizing disease names in PHI-base with terms from existing disease ontologies – such as MONDO, DO, and the National Cancer Institute Thesaurus – with the ultimate aim of using terms from those ontologies in curation, instead of terms from PHIDO. As general vocabularies for animal and plant disease emerge or are identified, we will extend this procedure to those diseases.

      With regards to a graph representation of the data, we are aware of the examples the reviewer described, and we agree that this type of representation could be preferable. However, our data model is currently constrained by the developers of Canto, who use a relational data model and currently have no plans to implement a graph data model or a graph representation. We acknowledge that query languages like GraphQL can provide a graph-based interface to an existing relational data model, but we believe this would require a significant technological investment. For PHI-base, we plan to enable a graph representation of the data by integrating with existing knowledge graph tools, such as KnetMiner (www.knetminer.com;doi.org/10.1111/pbi.13583), which will provide graph-based queries on PHI-base (albeit only on select species for which knowledge graphs will be provided, i.e. Arabidopsis, rice, wheat, eight plant and human infecting fungal ascomycete pathogens, and two non-pathogenic yeast species). We will also use KnetMiner integration to embed subgraphs of the complete knowledge graph into the gene-centric pages on the PHI-base 5 website.

      We acknowledge the lack of discussion about extending the tool for broader interspecies interactions. These examples may have been omitted from a previous draft due to journal word count limits. Possible future uses of the PHI-Canto schema could include insect–plant interactions (both beneficial and detrimental), endosymbiotic relationships such as mycorrhiza–plant rhizosphere interactions, nodulating bacteria–plant rhizosphere interactions, fungi–fungi interactions, plant–plant interactions or bacteria–insect interactions, and non-pathogenic relationships in natural environments, such as bulk soil, rhizosphere, phyllosphere, air, freshwater, estuarine water or seawater, and tissues or organs (e.g. the gut, lungs, and skin of humans, birds, or other animals). The schema could also be extended to situations where phenotype relations to genes or genotypes have been established for predator–prey relationships, or where there is competition in herbivore–herbivore, predator–predator, or prey–prey relationships in the air, on land or in the water. Customizing Canto to use other ontologies and controlled vocabularies is as simple as editing a configuration file within the source code.

    1. Author Response:

      We appreciate the Reviewers’ feedback. The manuscript was extensively revised and ultimately accepted for publication (Petrican and Fornito, 2023, Developmental Cognitive Neuroscience). The revisions address the Reviewers’ key concerns, including the theoretical basis of the link between MDD and AD, the rationale for studying this link in adolescence, clear references to significant genetic associations between the two, detailed assessment of CCA and PLS model generalisability and reliability, quantification of resilience, residualization of confounders, and corrections for multiple comparisons. We also note that the details concerning the receptor density maps we use in our analysis have now been published (Hansen et al., 2022, Nature Neuroscience; Markello et al., 2022, Nature Methods).

    1. Author Response

      Reviewer #1 (Public Review):

      By performing immunopeptidomics of macrophages infected with virulent M. tuberculosis, the authors were able to appropriately address whether Mtb proteins are able to enter the MHC-I antigen processing pathway. Their interrogation provides convincing evidence that substrates of Mtb's type VII secretion systems (T7SS) are a significant contributor to the Mtb-derived peptides presented on MHC-I. Compelling data are provided to demonstrate that ESX-1 activity is required for the MHC-1 presentation of these newly identified peptides.

      Strength

      Employing a virulent strain of Mtb for infection of human monocyte-derived macrophages to identify Mtb proteins that access the MHC-I antigen processing pathways and the associated mechanisms.

      Weakness

      The immunogenicity of at least some of the identified peptides should have been evaluated.

      Although obtaining T cells from a cohort of TB-exposed patients was not within the scope of this study, we are also eager to assess the immunogenicity of the epitopes we identified in future work. In addition to the references we made in our initial submission to prior work showing that many of the proteins from which the epitopes we identified derive elicit T cell responses in Mtb-exposed humans, we’ve added references to prior studies that show that a few of the specific epitopes we identified are immunogenic, providing at least a preliminary indication that MHC-I peptides identified by MS can be immunogenic T cell epitopes (lines 420-423): “Individual peptides we identified by MS have also been previously shown to be recognized by human T cells, including EsxJ24-34 (Grotzke et al., 2010; Lewinsohn et al., 2013) and EsxA28-36 (Tully et al., 2005), providing a proof of concept that particular epitopes identified by MS can be immunogenic.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have performed scATACseq on multiple timepoints during mouse male gonadogenesis and germ cell maturation during the fetal to neonatal transition (E18.5 and postnatal days 1,2,5). Clustering of thousands of cells revealed striking cellular diversity and led to the identification of cell populations that were not known before. This work may have far reaching implications, but additional validation is needed.

      We would like to start by expressing our appreciation to the reviewer’s valuable comments and feedback on our manuscript. We would also like to express our sincere apologies for the delay in submitting our revised manuscript. The COVID-19 pandemic has had a significant impact on academic research and publication, and we encountered several challenges during this time. Both co-first authors of this manuscript were promoted to new roles, which required additional time and effort to transition into these new positions. Furthermore, we experienced significant delays in obtaining the necessary research materials due to longer shipment times for antibodies and other reagents during the pandemic, which further contributed to the delay. We understand that our delay may have caused inconvenience but we want to assure you that we have carefully addressed all of the reviewer comments and we deeply appreciate your understanding and patience during these challenging times.

      The identification of novel transitional spermatogonia population in Figure 4D is intriguing. Independent validation by flow cytometry or in testis cross section to better allow the colocalization of nr5a1 and Oct4 and other germ cell markers would be important. Additional validation is needed to ensure that populations 1 and 2 in figure 4d are not to doublets. Providing violin plots for both soma and germ cell markers will be helpful. Is SF1 the only gene expressed in this unique germ cell population or are many other somatic markers expressed in the population. Do these cells express well recognized SPG markers like Oct4+ , PLZF, GFRA?

      We have performed immunostaining of NR5A1 in testicular sections and showed that NR5A1+ germ cells (TRA98+ cells) exist in P5.5 testis (Figure 4D). We appreciate the reviewer's comment and understand the concern regarding potential doublets in figure 4d. We examined the expression of various markers in both scATAC-seq (gene score) and scRNA-seq (mRNA) datasets and provided violin plots. Sertoli cell markers and germ cell markers showed variable levels in unknown 1 and 2 populations while the Leydig cell marker did not (Supplementary figure S6D).

      As additional evidence supporting our finding that a subset of somatic markers are expressed in the unique germ cell population we identified, we reference a study where cells in the spermatogonial signature 3 cluster showed high levels of mRNAs characteristic of Sertoli cells, including Nr5a1, Sox9, and Wt1 (PMID: 25568304). This indicates that cells with germ cell identity can express somatic cell genes, which is consistent with our findings. Additionally, another study reported the expression of the somatic cell marker WT1 in some germ cells through immunostaining (Figure 3B, PMID: 34815802). We have included this information in the revised manuscript to further support our conclusion (line 301). In addition, as we have isolated nuclei rather than whole cells, it is less likely that germ cells and sertoli cells are sticking together during single cell capture. We hope that the additional evidence and analysis provided will help to ease the reviewer's concerns and further support the conclusions drawn from our data.

      The IF validation in 5F is not as convincing that these cells are potentially Sertoli stem cells. IF in cross-sections will be easier to interpret- especially when co-stained with several germ, somatic, or novel markers of that population. purification of these cells and further characterization is needed. A hallmark of fetal Sertoli cells is to mediate the migration of endothelial cells to the seminiferous tubules during testicular cord formation. Is it possible to purify these cells to determine whether they have functional Sertoli cells properties in vitro using human umbilical vein endothelial cells (HUVECs). Do these cells have immune privilege properties - can they suppress proliferation of Jurkat E6 cells.

      Following the reviewer’s suggestions, we conducted further immunostaining of MBD3 and AMH in Sertoli cells (Figure 5F). The observed staining results not only confirm the properties of MBD3+ cells (MBD3-high/AMH-high) but also highlight the heterogeneity of Sertoli cells, as evidenced by the presence of various expression patterns such as MBD3-low/AMH-high (cluster SC3 in Figure 5A) and MBD3-low/AMH-low (cluster SC2/4/5/6 in Figure 5A). This further emphasizes the complexity and diversity within the Sertoli cell population.

      However, we understand that it is premature to definitively conclude that MBD3-high cells are Sertoli stem cells without functional studies. We appreciate the suggestion of using additional functional assays such as in vitro co-culture with HUVECs and immune privilege assays to further characterize the potential Sertoli stem cell population. These are valuable experiments to consider for future research in order to gain a deeper understanding of the properties and functions of these cells. To more accurately reflect the scope of our study and avoid potential misinterpretation, we have revised the language to reflect that we have identified subpopulations of Sertoli cells with unique characteristics, rather than using the term "stem cell". We hope that our revised data adequately addresses the reviewer’s concerns.

      Reviewer #2 (Public Review):

      Liao et at performed single cell ATAC sequencing to reveal chromatin status in various cell types in the perinatal mouse testes. The chromatin status was then used to define cell types and identify potential transcription factors that control the progress of differentiation. This work could provide new insights into how various cell types acquire their fate in early testis development and establish a genomic framework that can be used to correlate with human data for infertility. The strength lies on the novelty of single cell analyses. The weaknesses include a lack of statistical power, the uncertainty on the correlation between chromatin status, gene expression, and transcription factor activity, and insufficient information and confirmation on some of the experiments and results.

      We would like to start by expressing our appreciation to the reviewer’s valuable comments and feedback on our manuscript. We would also like to express our sincere apologies for the delay in submitting our revised manuscript. The COVID-19 pandemic has had a significant impact on academic research and publication, and we encountered several challenges during this time. Both co-first authors of this manuscript were promoted to new roles, which required additional time and effort to transition into these new positions. Furthermore, we experienced significant delays in obtaining the necessary research materials due to longer shipment times for antibodies and other reagents during the pandemic, which further contributed to the delay. We understand that our delay may have caused inconvenience but we want to assure you that we have carefully addressed all of the reviewer comments and we deeply appreciate your understanding and patience during these challenging times.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-performed and carefully executed and quantified study. There is however a point that needs clarification:

      We thank the reviewer for these motivating comments and appreciate the careful reflection of our work.

      The authors state that acute regeneration occurs between 5-10dpt. However, the graphs in Fig 1D, F, and 2F indicate that most PC generation occurs from 20-30 days. What happens in this period? Does proliferation increase? Can the authors perform BrdU incorporation between 6 days and 1 month?

      The reviewer is right that PC regeneration seems to be more intense from 20-30 days. Yet during this stage also wildtype larvae add a number of PCs to their PC population pool, thus we would consider only PCs being added in surplus to the number of regularly added PCs as a contribution to regeneration, and here we see in quantified samples the largest increase of regenerating PCs during 8-10 days post-treatment with 20,9 and 23,2 additional (surplus) PCs on average respectively.

      This question also relates to the first comment of reviewer 3 who asked for a combined BrdU and EdU labeling approach to address the cell cycle length of PC progenitors. We have therefore performed this experiment with the first pulse of BrdU-labeling at 18 days after PC-ablation to include the request stated here for a BrdU-labeling at later stages of regeneration. Again, no significant difference between BrdU-positive PC progenitors was found at this later stage of PC regeneration, but a small number of PC progenitors underwent additional rounds of proliferation compared to controls, which provide an explanation of how the entire PC population is replenished and why complete PC regeneration requires several months. Please see also our answer to question 1 of reviewer 3. These new findings are now presented in an additional Supplementary Figure (Figure 1-figure supplement 3) and have been added to the last paragraph of the section reporting the findings presented in Figure 1.

      Related to this, as the authors indicate in lines 129-131, the regeneration of new PCs overlaps with normal development. Are other neuronal cell types generated in appropriate numbers?

      This is an interesting question raised by the reviewer. But it is very general relating to all cerebellar neuronal cell types, which is out of our possibilities to address. We considered eurydendroid cells as the most likely cell population, which could be affected in their numbers by PC ablation and regeneration, because eurydendroid cells share the same ptf1a+-expressing progenitor cells with Purkinje cells. Eurydendroid cells – the zebrafish equivalents to deep nuclei neurons in mammals – can be identified by their expression of olig2. We have therefore quantified the number of eurydendroid cells in the cerebellum of double transgenic PC-ATTAC/olig2:GFP larvae 15 days after PC ablation. No significant difference in olig2:GFP positive cells could be observed between PC-regenerating and control zebrafish suggesting that eurydendroid cells are not affected in their quantity and are generated in appropriate numbers in PC regenerating larvae. These findings are presented in a new Supplementary Figure (Figure 3-figure supplement 3) and are described together with findings about eurydendroid cells presented in the main Figure 3.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Gonzalez et al investigated the dynamics of dopamine signals, measured with optophysiological methods in the lateral shell of the nucleus accumbens (LNAc), in response to different types of visual stimuli. Contrary to most current theories of dopamine signaling, the authors found that LNAc dopamine transients tracked sensory transitions in visual stimulation rather than any immediately apparent motivational variable. This unorthodox finding is of potential interest to the field, as it suggests that dopamine in this particular area of the striatum supports a very different, albeit unclear behavioral function than what has been previously attributed to this neuromodulator. Many of the approaches used by the authors were very elegant, like the careful selection of visual stimuli parameters and the use of Gnat1/2 KO mice to demonstrate that the dopamine responses were directly dependent on the visual stimulation of rods and cones. That said, the authors did not discuss how their findings relate to much previously published work, many of which offer potential alternative explanations for their results. It is also not clear from the manuscript text which mice were used for which experiments, and how testing history might affect the results.

      We would like to thank the reviewer for their careful review of our manuscript. In our revised manuscript, we reworked our Materials and Methods to better detail the experimental workflow, which is highlighted in yellow. We have also added new data in stimulus-naïve animals to better examine the effect of exposure history on the dopaminergic response to light. To provide validation of our recording sites, we have included a new figure (Figure 1-Figure Supplement 1) that contains a representative histological image showing the location of the optical fiber/virus expression, as well as a schematic demonstrating optical fiber placements. Finally, the reviewer’s point about discussing the current results in the context of previous literature is well taken, and we have added three new paragraphs of text in the Discussion to highlight these findings.

      Reviewer #2 (Public Review):

      In this elegant work, the authors investigated dopamine release (measured by dLight sensor fiber photometry) in the nucleus accumbens shell, in response to salient luminance change. They show that abrupt visual stimuli - including stimuli not detectable by the human eye - can evoke robust dopamine release in the accumbens shell.

      The fact that dopamine signals can be evoked by salient sensory stimuli is not itself novel, but the paper manages to make several important and new findings:

      1) The authors show that the dopamine signal is not related to the level of threat evoked by the visual stimuli.

      2) They provide important detail about the stimuli parameters relevant to dopamine release. For instance, they show that the rate of luminance change (or abruptness) is a key factor in evoking dopamine responses.

      3) They show that robust dopamine responses can be evoked by visual stimuli of low intensity, including stimuli not perceptible by the human eye.

      4) They show that these dopamine responses can be evoked by all wavelengths in the visible spectrum (with some higher sensitivity at certain wavelengths).

      5) Finally, by recording dopamine responses in two knockout mice strains, the authors show that the light-evoked dopamine release critically relies on rod and cone photoreceptors, but not melanopsin phototransduction.

      These results add to a series of recent findings showing that dopamine signals are not restricted to the encoding of reward prediction error, but instead contribute to signaling environmental changes more broadly. The study has been skillfully executed, the results are clear and appropriately analyzed, and the manuscript is very well written. Although the work did not include control mice lacking the dLight sensor, the fact that light-evoked dopamine responses were not observed in mice lacking cone + rod phototransduction is strong evidence that the fiberphotometry signals were not due to direct light artifacts.

      We would like to thank the reviewer for taking their valuable time over the holidays to review our manuscript. We appreciate their feedback and have responded to their concerns below.

      Comment/concerns are minor:

      1) The authors show that the dopamine response evoked by a brief visual stimulus is drastically reduced when the visual stimulus is repeated in rapid succession (stimulus train). The authors interpret this as evidence for the HABITUATION of this light-evoked dopamine release. An alternative explanation is that it is the prediction of the stimulus that is responsible for canceling the dopamine response (i.e. sensory prediction error). The authors should discuss this alternative explanation for this finding.

      This is a valid point, which we have now addressed in the revised Discussion section (Paragraph 3).

      2) Although the study largely focuses on dopamine responses to visual stimuli, the results are largely consistent with previous studies showing dopamine signals encoding value-neutral changes in sensory inputs (i.e. sensory prediction errors) in different modalities (taste or odors; cf. Takahashi et al., 2017, Neuron; Howard & Kahnt, 2018, Nat. Comm.). The authors might want to cite those papers (note that I am not affiliated with those papers).

      This is similar to the point brought up by Reviewer 1, namely that several key pieces of literature were not discussed in the original manuscript. We agree that this was an oversight and hope we have remedied it in the revised Discussion, as detailed in the response to Reviewer 1. We have included both citations in the new text.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes efforts to understand how independence from ribonucleotide reduction might evolve in obligate intracellular bacterial pathogens using E. coli as a model for this process. The authors successfully deleted the three ribonucleotide reductase (RNR) operons present in E. coli and showed that growth of this knockout strain can be achieved with deoxyribonucleotide supplementation. They also performed evolutionary experiments and analysis of cell growth and morphology under conditions of low nucleotide availability. In this work, they established that certain genes are consistently mutated to compensate for the loss of RNR activity and the low availability of deoxynucleotides. Comparison to genomes of intracellular pathogens that lack RNR genes shows that these patterns are largely conserved.

      While the experimental results support the conclusions of the study, the authors do report changes in cell morphology upon the growth of the RNR knockout strains with low concentrations of nucleotides. It would be ideal to note this complication earlier in the manuscript. And to clarify how the possibility of cell elongation might affect the OD measurements in Figure 3 describing the experiments to establish that dC is necessary for growth in the knockout strain. It would also be ideal to provide a more detailed explanation for that observation in the discussion.

      Thank you for the feedback. We have now added mention of cell morphology in the final paragraph of the introduction, where we summarise key findings.

      For establishing if there is either growth or no growth under various conditions, as we have done, a qualitative assessment such as the one presented in Figure 3 is sufficient. The issue of whether OD is impacted by cell elongation has been documented by Stevenson et al. (https://www.nature.com/articles/srep38828), and becomes a problem if trying to quantify parameters such as doubling time or when trying to estimate cell counts. We do not do either of these, as calculation of both requires an assumption of normal cell morphology in E. coli. We have added a note to clarify this in the first paragraph of the Discussion section, as per the suggestion from Reviewer #1.

      Reviewer #2 (Public Review):

      Ribonucleotide reductase (RNR) is crucial for de novo synthesis of the dNTP building blocks needed for DNA synthesis and is essential in nearly all organisms. In the current study, all three E. coli RNRs have been removed and the essential function of the enzyme is bypassed by the introduction of an exogenous deoxyribonucleoside kinase that enables dNTP production via salvage synthesis. This leads to a complete dependency on exogenously supplied deoxyribonucleosides (dNs), loss of control of dNTP regulation, and a highly increased mutation rate. The bacteria could also grow with only supplied deoxycytidine (and no other dNs), indicating that all dNTPs could be synthesized from deoxycytidine. An evolutionary analysis of the recombinant E. coli strain grown in multiple generations showed that mutations accumulated in genes involved in the catabolism of deoxycytidine and deoxyribose-1-P, supporting a model that all the other deoxyribonucleosides can be produced by a phosphorylase using nucleobases and deoxyribose-1-P as substrates and that the deoxycytidine (besides being a precursor of dCTP) could be a substrate to produce the deoxyribose-1-P needed by the phosphorylase working in the opposite direction.

      The story is very interesting with novel findings, and the experiments are well performed. There are a few missing pieces of information, but on the other hand, it is many steps to cover if everything is going to be shown in a single paper and I came to the conclusion that the data is enough at this stage. One of the missing points for future research is to check what happens with the dNTP pools. RNR is a very important enzyme to control the dNTP levels and it is likely that it is unbalanced dNTP pools that lead to the increased mutation rates. However, it would be interesting to really measure the dNTP pools and connect them to the mutations reported. Another missing piece is to identify which nucleoside phosphorylase is involved and investigate its substrate specificity to better understand why the cells can live on deoxycytidine but not other dNs.

      We thank the reviewer for these comments. It is certainly possible that the mutational biases we observe across the genomes of our evolved lines are related to skewed pools. We hope to examine this in a follow-up study. Likewise, it will be interesting to investigate the biochemical basis for our lines being able to grow solely on deoxycytidine, and to ascertain how this might also impact mutation.

      Reviewer #3 (Public Review):

      The study focuses on a compelling question focusing on a largely indispensable mechanism, ribonucleotide reduction. The authors generate a unique specific bacterial strain where the ribonucleotide reducatase operon, entirely, is deleted. They grow the mutant strain in environments that have various amounts of the necessary deoxyribonucleoside levels, further, they perform evolution experiments to see whether and how the evolved lines would be able to adapt to the limited deoxyribonucleosides. Finally, researchers identify key mutations and generate key isogenic genetic constructs where target mutants are deleted. A summary postulation based on the evolutionary trajectory of ribonucleotide reduction by bacteria is presented. Overall, the study is well presented, well-justified, and builds on fairly classic genetic and evolution experiments. The select question and hypotheses and the overall framing of the story are fairly novel for the respective communities. The results should be interesting to evolutionary biology researchers, especially those interested in RNA>DNA directional evolution, as well as molecular microbiologists interested in the ribonucleotide reception dependence and selection by the environment. A discussion on the limitations of the laboratory study for the broader understanding of the host dependence during endosymbiosis and parasitism would be a good addition given the emphasis on this phenomenon as a part of the broader impacts of the study.

      We thank the reviewer for suggestion that we consider the broader implications of our work. We have now added a final paragraph which addresses the question of why loss of ribonucleotide reduction appears so rare.

    1. Author Response:

      What is novel here is that we calculated the time-varying retinal motion patterns generated during the gait cycle using a 3D reconstruction of the terrain. This allows calculation of the actual statistics of retinal motion experienced by walkers over a broad range of normal experience. We certainly do not mean to claim that stabilizing gaze is novel, and agree that the general patterns follow directly from the geometry as worked out very elegantly by Koenderink and others.  We spend time describing the terrain-linked gaze behavior because it is essential for understanding the paper. We do not claim that the basic saccade/stabilize/saccade behavior is novel and now make this clearer.

      The other novel aspect is that the motion patterns vary with gaze location which in turn varies with terrain in a way that depends on behavioral goals. So while some aspects of the general patterns are not unexpected, the quantitative values depend on the statistics of the behavior.  The actual statistics require these in situ measurements, and this has not previously been done, as stated in the abstract.

      The measured statistics provide a well-defined set of hypotheses about the pattern of direction and speed tuning across the visual field in humans. Points of comparison in the existing literature are hard to find because the stimuli have not been closely matched to actual retinal flow patterns, and the statistics will vary with the species in question. However, recent advances allow for neurophysiological measurements and eye tracking during experiments with head-fixed running, head-free, and freely moving animals. These emerging paradigms will allow the study of retinal optic flow processing in contexts that do not require simulated locomotion. While the exact the relation between the retinal motion statistics we have measured and the response properties of motion-sensitive cells remains unresolved, the emerging tools in neurophysiology and computation make similar approaches with different species more feasible.

      A more detailed description of the methods including the photogrammetry and the reference frames for the measurements has been added primarily to the Methods section.

      Reviewer #1 (Public Review):

      Much experimental work on understanding how the visual system processes optic flow during navigation has involved the use of artificial visual stimuli that do not recapitulate the complexity of optic flow patterns generated by actual walking through a natural environment. The paper by Muller and colleagues aims to carefully document "retinal" optic flow patterns generated by human participants walking a straight path in real terrains that differ in "smoothness". By doing so, they gain unique insights into an aspect of natural behavior that should move the field forward and allow for the development of new, more principled, computational models that may better explain the visual processing taking place during walking in humans.

      Strengths:

      Appropriate, state-of-the-art technology was used to obtain a simultaneous assessment of eye movements, head movements, and gait, together with an analysis of the scene, so as to estimate retinal motion maps across the central 90 deg of the visual field. This allowed the team to show that walkers stabilize gaze, causing low velocities to be concentrated around the fovea and faster velocities at the visual periphery (albeit more the periphery of the camera used than the actual visual field). The study concluded that the pattern of optic flow observed around the visual field was most likely related to the translation of the eye and body in space, and the rotations and counter-rotations this entailed to maintain stability. The authors were able to specify what aspects of the retinal motion flow pattern were impacted by terrain roughness, and why (concentration of gaze closer to the body, to control foot placement), and to differentiate this from the impact of lateral eye movements. They were also able to identify generalizable aspects of the pattern of retinal flow across terrains by subsampling identical behaviors in different conditions.

      Weaknesses:

      While the study has much to commend, it could benefit from additional methodological information about the computations performed to generate the data shown. In addition, an estimation of inter-individual variability, and the role of sex, age, and optical correction would increase our understanding of factors that could impact these results, thus providing a clearer estimate of how generalizable they are outside the confines of the present experiments.

      Properties of gait depend on the passive dynamics of the body and factors such as leg length and subject specific cost functions which are influenced by image quality and therefore by optical correction. In this experiment all subjects were normal acuity or corrected to normal (with no information regarding their uncorrected vision). This is now noted in the Methods. The goal of the present work was to calculate average statistics over a range of observers and conditions in order to constrain the experience-dependent properties one might see in neurophysiology. We have added between-subjects error bars to Figure 2 and added gaze angle distributions as a function of terrain for individual observers in the Supplementary materials. Figure 4 b and d now show standard errors across subjects. Individual subject plots are shown in the Supplementary materials. For Figure 2, most variability between subjects occurs in the Flat and Bark terrains where one might expect individual choices of energetic costs versus speed and stability etc might come into play. This is supported by our subsequent unpublished work on factors influencing foothold choice. We have also found that leg length determines path choices and thus will influence the retinal motion. Differences between observers are now noted in the text. These individual subject differences should indicate the range of variability that might be expected in the underlying neural properties and perhaps in behavioral sensitivity. Because of the size of our dataset (n=11) it is not feasible to make comparisons of sex or age. There were equal numbers of males and females and age ranged from 24 to 54. Now noted in the Methods section.

      Reviewer #2 (Public Review):

      The goal of this study was to provide in situ measurements of how combined eye and body movements interact with real 3D environments to shape the statistics of retinal motion signals. To achieve this, they had human walkers navigate different natural terrains while they measured information about eyes, body, and the 3D environment. They found average flow fields that resemble the Gibsonian view of optic flow, an asymmetry between upper and lower visual fields, low velocities at the fovea, a compression of directions near the horizontal meridian, and a preponderance of vertical directions modulated by lateral gaze positions.

      Strengths of the work include the methodological rigor with which the measurements were obtained. The 3D capture and motion capture systems, which have been tested and published before, are state-of-the-art. In addition, the authors used computer vision to reconstruct the 3D terrain structure from the recorded video.

      Together this setup makes for an exciting rig that should enable state-of-the-art measurements of eye and body movements during locomotion. The results are presented clearly and convincingly and reveal a number of interesting statistical properties (summarized above) that are a direct result of human walking behavior.

      A weakness of the article concerns tying the behavioral results and statistical descriptions to insights about neural organization. Although the authors relate their findings about the statistics of retinal motion to previous literature, the implications of their findings for neural organization remain somewhat speculative and inconclusive. An efficient coding theory of visual motion would indeed suggest that some of the statistics of retinal motion patterns should be reflected in the tuning of neural populations in the visual cortex, but as is the present findings could not be convincingly tied to known findings about the neural code of vision. Thus, the behavioral results remain strong, but the link to neural organization principles appears somewhat weak.

      We agree, but we think that strengthening the neural links requires future studies. As mentioned above, it is very difficult to relate the measured statistics to existing neurophysiological literature and we have tried to make this clearer in the Discussion (p14, 15, 16). This is because the stimuli chosen are typically arbitrary and not chosen to be realistic examples of patterns consistent with natural motion across a ground plane. Other stimuli are simply inconsistent with self-motion together with gaze stabilization (eg not zero velocity at the fovea). It has also been technically difficult to map cell properties across the visual field. We have made the comparisons we thought were useful. The point of the paper is to provide a hypothesis about the pattern of direction and speed tuning across the visual field. So the challenge for neurophysiology is to show how the observed cell properties vary across the visual field. Note also that the motion patterns will be influenced by the body motion of the animal in question, and because of this we are now collaborating with a group who are attempting to record from monkey MT/MST during locomotion while tracking eyes and body. Similarly we are training neural networks to learn the patterns generated by human gait to develop more specific hypotheses about receptive field properties.

      Reviewer #3 (Public Review):

      Gaze-stabilizing motor coordination and the resulting patterns of retinal image flow are computed from empirically recorded eye movement and motion capture data. These patterns are assessed in terms of the information that would be potentially useful for guiding locomotion that the retinal signals actually yield. (As opposed to the "ecological" information in the optic array, defined as independent of a particular sensor and sampling strategy).

      While the question posed is fundamental, and the concept of the methodology shows promise, there are some methodological details to resolve. Also, some terminological ambiguities remain, which are the legacy of the field not having settled on a standardized meaning for several technical terms that would be consistent across laboratory setups and field experiments.

      Technical limits and potential error sources should be discussed more. Additional ideas about how to extend/scale up the approach to tasks with more complex scenes, higher speed or other additional task demands and what that might reveal beyond the present results could be discussed.

      This issue is addressed in more detail in the Discussion, second paragraph, and also the second last paragraph.

    1. Author Response

      Reviewer #1 (Public Review):

      This work presents a unification model (of sorts) for explaining how the flow of evidence through networks can be controlled during decision-making. The authors combine two general frameworks previously used as neural models of cortical decision-making, dynamic normalization (that implement value encoding via firing activity) and recurrent network models (which capture winner-take-all selection processes) into a unified model called the local disinhibition-based decision model (LDDM). The simple motif of the LDDM allows for the disinhibition of excitatory cells that represent the engagement of individual actions that happens through a recurrent inhibitory loop (i.e., a leaky competing accumulator). The authors show how the LDDM works effectively well at explaining both decision dynamics and the properties of cortical cells during perceptual decision-making tasks.

      All in all, I thought this was an interesting study with an ambitious goal. But like any good study, there are some open issues worth noting and correcting.

      MAJOR CONCERNS

      1. Big picture

      This was a comprehensive and extremely well-vetted set of theoretical experiments. However, the scope and complexity also made the take-home message hard to discern. The abstract and most of the introduction focus on the framing of LDDM as a hybrid of dynamic normalization models (DNM) and recurrent network models (RNMs). This is sold as a unification of value normalization and selection into a novel unified framework. Then the focus shifts to the role of disinhibition in decision-making. Then in the Discussion, the goal is stated as to determine whether the LDDM generates persistent activity and does this activity differ from RNMs. As a reader, it seems like the paper jumps between two high- level goals: 1) the unification of DNM and RNM architectures, and 2) the role of disinhibition. This constant changing makes it hard to focus as the reader goes on. So what is the big picture goal specifically?

      Also, the framing of value normalization and WTA as a novel computational goal is a bit odd as this is a major focus of the field of reinforcement learning (both abstractly at the computational level and more concretely in models of the circuits that regulate it). I know that the authors do not think they are the first to unify value judgements with selection criteria. The writing just comes across that way and should be clarified.

      We thank the Reviewer for their thoughtful consideration of the overall framing of the big picture goals of the paper. Upon reflection, we agree that the paper really centers on the importance of incorporating disinhibition into computational circuit-based models of decision-making. Thus, we have significantly revised the Introduction and Discussion to focus on the theoretical and empirical importance of incorporating disinhibition into computational models of decision-making, and use the integration of value normalization and WTA selection as an example of how disinhibition increases the richness of circuit decision models. Please see the response to recommendations below for more detail on the changes.

      1. Link to other models

      The LDDM is described as a novel unification of value normalization and winner-take-all (WTA) selection, combining value processing and selection. While the authors do an excellent job of referencing a significant chunk of the decision neuroscience literature (160 references!) the motif they end up designing has a highly similar structure to a well-known neural circuit linked to decision-making: the cortico-basal ganglia pathways. Extensive work over the past 20+ years has highlighted how cortical-basal ganglia loops work via disinhibition of cortical decision units in a similar way as the LDDM (see the work by Michael Frank, Wei Wei, Jonathan Rubin, Fred Hamker, Rafal Bogacz, and many others). It was surprising to not see this link brought up in the paper as most of the framing was on the possibility of the LDDM representing cortical motifs, yet as far as I know, there does not exist evidence for such architectures in the cortex, but there is in these cortical-basal ganglia systems.

      We thank the Reviewer for the suggestion to link the LDDM to disinhibition in CBG models; this is indeed an important body of empirical and computational work that we overlooked in the original manuscript. We have now added text to the Discussion to highlight the link between LDDM and these CBL disinhibition models, focusing on how they are conceptually similar and how they differ. Please see our response to recommendations below for a more detailed discussion of the revisions.

      1. Model evaluations

      The authors do a great job of extensively probing the LDDM under different conditions and against some empirical data. However, most of the time there is no "control" model or current state-of-the-art model that the LDDM is being compared against. In a few of the simulation experiments, the LDDM is compared against the DNM and RNM alone, so as to show how the two components of the LDDM motif compare against the holistic model itself. But this component model comparison is inconsistently used across simulation experiments.

      Also, it is worth asking whether the DNM and RNM are appropriate comparison models to vet the LDDM against for two reasons. First, these are the components of the full LDDM. So these tests show us how the two underlying architectural systems that go into LDDM perform independently, but not necessarily how the LDDM compares against other architectures without these features. Second, as pointed out in my previous comment, the LDDM is a more complex model, with more parameters, than either the DNM or RNM. The field of decision neuroscience is awash in competing decision models (including probabilistic attractor models, non-recurrent integrators, etc.). If we really want to understand the utility of the LDDM, it would be good to know how it performs against similarly complex models, as opposed to its two underlying component models.

      We greatly appreciate the Reviewer’s comments on the point of model comparison, which points out that our original manuscript failed to clearly convey a very important difference between the LDDM and the existing RNM(s). In the revision, we now make it clearer that the fundamental difference between the LDDM and the RNMs is the architecture of disinhibition (see the revised Introduction, especially p. 8 lines 164-168). The LDDM is not simply a combination of the DNM model with RNM architecture (a point we may have mistakenly conveyed in the original manuscript): the introduction of disinhibition separates LDDM inhibition into option-selective subpopulations, as opposed to the single pooled inhibition of RNM models. Given this fact, the LDDM predicts unique selectiveinhibition dynamics shown in recent optogenetic and calcium imaging results, a finding inconsistent with the common-pooled and non-selective inhibition assumed in the existing RNMs and many of its variants. Thus, we believe that a comparison between the LDDM and the RNM, which share similar level of complexity and numbers of parameters, is important.

      We also appreciated the Reviewer’s concern about testing the LDDM against alternative models. In order to better connect to the existing literature, we now compare the LDDM to another standard circuit model of decision-making - the leaky competing accumulator (LCA) model. The LCA is a circuit model that captures many of the aspects of perceptual decision-making seen in the mathematical drift diffusion model (DDM), but with a construction that allows for fitting to behavioral data and comparison of underlying unit activities. Please see our response to recommendations below for further detail.

      1. Comparison to physiological data

      I quite enjoyed the comparisons of the excitatory cell activity to empirical data from the Shadlen lab experiments. However, these were largely qualitative in nature. In conjunction with my prior point on the models that the LDDM is being compared against, it would be ideal to have a direct measure of model fits that can be used to compare the performance of different competing "control" models. These measures would have to account for differences in model complexity (e.g., AIC or BIC), but such an analysis would help the reader understand the utility of the LDDM in connecting with empirical data much better.

      We agree with the Reviewer that a quantitative comparison of the match between model neural predictions and empirical neurophysiological data is important. First, we wish to clarify that the model neural predictions are simulated from models fit to the behavioral (choice and RT data), not from fits to the neural activity traces – a point we now clarify in the text. While directly fitting dynamic models (LDDM, RNM, or LCA) to the neurophysiological data is appealing, there are currently several obstacles to this approach. The first problem is the complexity of the dynamic neural traces. Despite the long history of the random-dot motion paradigm, detailed features of the dynamics are still not understood. For example, the stereotyped initial dip after stimulus onset may reflect a reset of the network state to improve signal to noise ratio (Conen and Padoa-Schioppa, 2015) or simply reflect a surround suppression-like lateral inhibition in visual processing. A second problem is that the primary difference between the models is the activity of inhibitory (and disinhibitory) neurons, which are typically not recorded in neurophysiological experiments; thus, there is a lack of empirical data to which to fit the models. In the revision, we clarified that the model fitting to the Roitman & Shadlen data is for behavioral data only, and model unit activity traces are derived from models fit to behavioral data.

      That being said, we agree that a quantitative comparison of model activity predictions is helpful. Because the models are fit not to the neural data but to the behavioral data, rather than using likelihood-based measures like AIC and BIC we used a simple RMSE measure to compare the match between predicted and neural activity patterns (revised Fig. 6E, Fig 6-S4E, Fig 6-S5E). Please see response to recommendations below for details.

      Reviewer #2 (Public Review):

      The aim of this article was to create a biologically plausible model of decision-making that can both represent a choice's value and reproduce winner-take-all ramping behavior that determines the choice, two fundamental components of value- based decision-making. Both of these aspects have been studied and modeled independently but empirical studies have found that single neurons can switch between both of the aspects (i.e., from representing value to winner-take-all ramping behavior) in ways that are not well described by current biological plausible models of decision making.

      The current article provides a thorough investigation of a new model (the local disinhibition decision model; LDDM) that has the goal of combining value representations and winner-takes-all ramping dynamics related to choice. Their model uses biologically plausible disinhibition to control the levels of inhibition in a local network of simulated neurons. Through a careful series of simulation experiments, they demonstrate that their network can first represent the value of different options, then switch to winner-takes-all ramping dynamics when a choice needs to be made. They further demonstrate that their single model reproduces key components of value-based and winner-takes-all dynamics found in both neural and behavioral data. They additionally conduct simulation studies to demonstrate that recurrent excitatory properties in their network produce value-persistence behavior that could be related to memory. They end by conducting a careful simulation study of the influence of GABA agonists that provide clear and testable predictions of their proposed role of inhibition in the neural processes that underlie decision-making. This last piece is especially important as it provides a clear set of predictions and experiments to help support or falsify their model.

      There are overall many strengths to this paper. As the authors note, current network models do not explain both value- based and ramping-like decision-making properties. Their thorough simulation studies and their validation against empirical neural and behavioral data will be of strong interest to neuroscientists and psychologists interested in value- based decision-making. The simulations related to persistence and the GABA-agonist experiments they propose also provide very clear guidelines for future research that would help advance the field of decision-making research.

      Although the methods and model were generally clear, there was a fair amount of emphasis on the role of recurrence in the LDDM, but very little evidence that recurrence was important or necessary for any of the empirical data examined. The authors do demonstrate the importance of recurrence in some of their simulation studies (particularly in their studies of persistence), but these would need to be compared against empirical data to be validated. Nevertheless, the model and thorough simulation investigations will likely help develop more precise theories of value-based decision-making.

      We appreciate the Reviewer’s thoughtful comments. These comments - especially about anatomic recurrence and its relationship to the parameter 𝛼 - inspired us to think more about the uniqueness of the current circuit to others, especially the implications related to the parameters 𝛼 (i.e., self-excitation) and 𝛽 (i.e., local disinhibition). Recurrence is required to drive winner-take-all competition in the standard RNM of decision-making. However, we show here with both analytical and numerical approaches that recurrence helps WTA competition but is not necessary in our model. Instead, the key feature of the LDDM is to utilize disinhibition in conjunction with lateral inhibition to realize winner-take-all competition. That leads to many different predictions of the current model from the existing models, such as selective inhibition and flexible control of dynamics.

      In response to the Reviewer’s points and after careful consideration of the differential equations, we realized that in our model fitting, the 𝛼 parameter fitting to zero does not necessarily mean recurrence should be zero. The 𝛼 parameter shares a lot of similarity to the baseline gain control (parameter BG in our revision), and thus is unidentifiable in the current dataset. In the interest of parsimony, we did not include the parameter BG in the original manuscript, but now include it because it reveals the difficulty of interpreting fit 𝛼 values as simply the level of recurrence.

      Overall, disinhibition (𝛽) in the LDDM is required for WTA activity while recurrence (𝛼) can contribute but is not necessary; however, 𝛼 is theoretically important for generating persistent activity, with the caveat that in the current framework there is an unclear relationship between fit 𝛼 and recurrence. Regardless, we agree that the contribution of 𝛼 to the LDDM framework is worth further testing and examining with future empirical data.

      Reviewer #3 (Public Review):

      Shen et al. attempt to reconcile two distinct features of neural responses in frontoparietal areas during perceptual and value-guided decision-making into a single biologically realistic circuit model. First, previous work has demonstrated that value coding in the parietal cortex is relative (dependent on the value of all available choice options) and that this feature can be explained by divisive normalization, implemented using adaptive gain control in a recurrently connected circuit model (Louie et al, 2011). Second, a wealth of previous studies on perceptual decision-making (Gold & Shadlen 2007) have provided strong evidence that competitive winner-take-all dynamics implemented through recurrent dynamics characterized by mutual inhibition (Wang 2008) can account for categorical choice coding. The authors propose a circuit model whose key feature is the flexible gating of 'disinhibition', which captures both types of computation - divisive normalization and winner-take-all competition. The model is qualitatively able to explain the 'early' transients in parietal neural responses, which show signatures of divisive normalization indicating a relative value code, persistent activity during delay periods, and 'late' accumulation-to-bound type categorical responses prior to the report of choice/action onset.

      The attempt to integrate these two sets of findings by a unified circuit model is certainly interesting and would be useful to those who seek a tighter link between biologically realistic recurrent neural network models and neural recordings. I also appreciate the effort undertaken by the authors in using analytical tools to gain an understanding of the underlying dynamical mechanism of the proposed model. However, I have two major concerns. First, the manuscript in its current form lacks sufficient clarity, specifically in how some of the key parameters of the model are supposed to be interpreted (see point 1 below). Second, the authors overlook important previous work that is closely related to the ideas that are being presented in this paper (see point 2 below).

      1) The behavior of the proposed model is critically dependent on a single parameter 'beta' whose value, the authors claim, controls the switch from value-coding to choice-coding. However, the precise definition/interpretation of 'beta' seems inconsistent in different parts of the text. I elaborate on this issue in sub-points (1a-b) below:

      1a). For instance, in the equations of the main text (Equations 1-3), 'beta' is used to denote the coupling from the excitatory units (R) to the disinhibitory units (D) in Equations 1-3. However, in the main figures (Fig 2) and in the methods (Equation 5-8), 'beta' is instead used to refer to the coupling between the disinhibitory (D) and the inhibitory gain control units (G). Based on my reading of the text (and the predominant definition used by the authors themselves in the main figures and the methods), it seems that 'beta' should be the coupling between the D and G units.

      1b). A more general and critical issue is the failure to clearly specify whether this coupling of D-G units (parameterized by 'beta') should be interpreted as a 'functional' one, or an 'anatomical' one. A straightforward interpretation of the model equations (Equations 5-8) suggests that 'beta' is the synaptic weight (anatomical coupling) between the D and G units/populations. However, significant portions of the text seem to indicate otherwise (i.e a 'functional' coupling). I elaborate on this in subpoints (i-iii) below:

      (1b-i). One of the main claims of the paper is that the value of 'beta' is under 'external' top-down control (Figure 2 caption, lines 124-126). When 'beta' equals zero, the model is consistent with the previous DNM model (dynamic normalization, Louie et al 2011), but for moderate/large non-zero values of 'beta', the network exhibits WTA dynamics. If 'beta' is indeed the anatomical coupling between D and G (as suggested by the equations of the model), then, are we to interpret that the synaptic weight between D-G is changed by the top-down control signal within a trial? My understanding of the text suggests that this is not in fact the case. Instead, the authors seem to want to convey that top-down input "functionally" gates the activity of D units. When the top-down control signal is "off", the disinhibitory units (D) are "effectively absent" (i.e their activity is clamped at zero as in the schematic in Fig 2B), and therefore do not drive the G units. This would in- turn be equivalent to there being no "anatomical coupling" between D and G. However when the top-down signal is "on", D units have non-zero activity (schematic in Fig 2B), and therefore drive the G units, ultimately resulting in WTA-like dynamics.

      (1b-ii). Therefore, it seems like when the authors say that beta equals zero during the value coding phase they are almost certainly referring to a functional coupling from D to G, or else it would be inconsistent with their other claim that the proposed model flexibly reconfigures dynamics only through a single topdown input but without a change to the circuit architecture (reiterated in lines 398-399, 442-444, 544-546, 557-558, 579-590). However, such a 'functional' definition of 'beta' would seem inconsistent with how it should actually be interpreted based on the model equations, and also somewhat misleading considering the claim that the proposed network is a biologically realistic circuit model.

      (1b-iii). The only way to reconcile the results with an 'anatomical' interpretation of 'beta' is if there is a way to clamp the values of the 'D' units to zero when the top-down control signal is 'off'. Considering that the D units also integrate feed- forward inputs from the excitatory R units (Fig 2, Equations 1-3 or 5-8), this can be achieved either via a non-linearity, or if the top-down control input multiplicatively gates the synapse (consistent with the argument made in lines 115-116 and 585-586 that this top-down control signal is 'neuromodulatory' in nature). Neither of these two scenarios seems to be consistent with the basic definition of the model (Equations 1-3), which therefore confirms my suspicion that the interpretation of 'beta' being used in the text is more consistent with a 'functional' coupling from D to G.

      We thank the reviewer for pointing out this confusion. We apologize that the original illustrations (Fig. 2A) and the differential equations in Methods (Eqs. 5-8) did not convey very well our ideas. 𝛽 is intended to reference the coupling from R to D, not a change in the weights between D and G units. We realize there was some confusion on this part due to inconsistency between our original figures, text, and supplementary material.

      Given the lack of clarity in the previous version as well as the Reviewer’s questions, we now emphasize that 𝛽 represents a functional coupling between the R and D neurons. The biological assumption of the disinhibitory architecture is built based on recent findings that VIP neurons in the cortex always inhibit other neighboring inhibitory cells, such as SST and PV neurons, and consequently disinhibit the neighboring primary neurons (e.g., Fu et al., 2014; Karnani et al., 2014, 2016). We did not see evidence in the literature of fast-changing (anatomic) connections between VIP and SST/PV. However, there is evidence that the responsiveness of VIP neurons to excitatory neurons can be modulated by changing the concentrations of neuromodulators, such as acetylcholine and serotonin (Prönneke et al., 2020). While the stereotype of neuromodulator action is slow dynamics, recent findings show that for example basal forebrain cholinergic neurons respond to reward and punishment with surprising speed and precision (18 ± 3ms) (Hangya et al., 2015) to modulate arousal, attention, and learning in the neocortex. Given the large number of studies that identify long-term projections and neuromodulatory inputs to VIP neurons (e.g., Pfeffer et al., 2013; Pi et al., 2013; Alitto & Dan, 2013; Tremblay et al., 2016), we believe that it will be more plausible to assume the connection weights between R and D in our case is quickly modulated within a trial.

      To clarify this issue in the revised manuscript, we made the following corrections:

      1. We repositioned the 𝛽 parameter in Fig. 2A between the connection from R to D, to align the description of 𝛽 modulating R to D in the main text.

      2. We modified the differential equations 5-8 (now numbered as Eqs. 28-32) in Methods (pp. 61) to include the disinhibitory unit D as an independent control from the inhibitory unit I, in order to be consistent with the disinhibitory D units in LDDM. Such a change makes tiny differences in the model predictions (please see dynamics simulated after the change in Fig. 2-figure supplement 1B).

      3. We updated the neural circuit motif in Fig. 2 -figure supplement 1A accordingly.

      2) The main contribution of the manuscript is to integrate the characteristics of the dynamic normalization model (Louie et al, 2011) and the winner-take-all behavior of recurrent circuit models that employ mutual inhibition (Wang, 2008), into a circuit motif that can flexibly switch between these two computations. The main ingredient for achieving this seems to be the dynamical 'gating' of the disinhibition, which produces a switch in the dynamics, from point-attractor-like 'stable' dynamics during value coding to saddle-point-like 'unstable' dynamics during categorical choice coding. While the specific use of disinhibition to switch between these two computations is new, the authors fail to cite previous work that has explored similar ideas that are closely related to the results being presented in their study. It would be very useful if the authors can elaborate on the relationship between their work and some of these previous studies. I elaborate on this point in (a-b) below:

      2a) While the authors may be correct in claiming that RNM models based on mutual inhibition are incapable of relative value coding, it has already been shown previously that RNM models characterized by mutual inhibition can be flexibly reconfigured to produce dynamical regimes other than those that just support WTA competition (Machens, Romo & Brody, 2005). Similar to the behavior of the proposed model (Fig 9), the model by Machens and colleagues can flexibly switch between point-attractor dynamics (during stimulus encoding), line-attractor dynamics (during working memory), and saddle-point dynamics (during categorical choice) depending on the task epoch. It achieves this via a flexible reconfiguration of the external inputs to the RNM. Therefore, the authors should acknowledge that the mechanism they propose may just be one of many potential ways in which a single circuit motif is reconfigured to produce different task dynamics. This also brings into question their claim that the type of persistent activity produced by the model is "novel", which I don't believe it is (see Machens et al 2005 for the same line-attractor-based mechanism for working memory)

      We thank the Reviewer for pointing out the conceptual similarities between the LDDM and the Machens Romo Brody model, and now include a discussion of the link between the two early in the revised Discussion (p. 38, lines 826-837). Please see response to recommendations below for a more detailed discussion of this point.

      2b) The authors also fail to cite or describe their work in relation to previous work that has used disinhibition-based circuit motifs to achieve all 3 proposed functions of their model - (i) divisive normalization (Litwin-Kumar et al, 2016), (ii) flexible gating/decision making (Yang et al, 2016), and working memory maintenance (Kim & Sejnowski,2021)

      The Reviewer notes several relevant papers, and we have now discussed them and their relationship to the LDDM in a revised Discussion section (pp. 35-36). Please see response to recommendations below for a more details.

    1. Author Response

      Reviewer #2 (Public Review):

      The two new micropeptides are well characterized in the manuscript and appear to be functionally important with some chromatin-level consequences of their loss (which can be either direct or indirect), but the finding that lincRNA sequences encode micropeptides is not novel, and the two described in the paper appear to be zebrafish-specific and their function was tested only in zebrafish, which limits the interest in these genes. The use of ribosome profile data along behavioral screening to identify micropeptides is interesting and important, but the scope of the screen, the candidates selected for testing, etc. are not clear enough as presented. The ChIP-seq analysis of the new proteins is very interesting but is not described in any detail. Overall, the experimental part is well designed and the phenotypes reported by the authors appear to be strong and convincing, but the mechanistic understanding of what the two new proteins do and how, and the general interest in the results given the current scope of understanding of micropeptide is limited.

      We apologize for the misunderstanding that these genes are zebrafish-specific. In this revision, we have clarified throughout the text and with additional data that these genes are not zebrafish-specific, but that linc-mipep and linc-wrb are homologous to human Hmgn1.

    1. Author Response

      Reviewer #1 (Public Review):

      Francou et al. examine the dynamics of cell ingression at the primitive streak during mouse gastrulation and correlate this with the localization of elements of the apical Crumbs complex and the actomyosin cytoskeleton. Using time-lapse live imaging, they show that cells at the primitive streak ingress in a stochastic manner, by constricting their apical surface through a ratcheting shrinkage of individual junctions. Meticulous evaluation of immunofluorescent staining for many elements of the actomyosin contractile process as well as junctional and apical domain elements reveals anisotropic localization of Crumbs2, ZO1, and ppMLC. In addition, the localization of two groups of proteins showed a close correlation - actomyosin regulators and apical and junctional components - but there was a lack of correlation of localization of these two groups of proteins to each other. The localization of actomyosin and its activity, was altered and more homogeneous in Crumbs2-/- embryos, and there was a significant decrease in aPKC and Rock1. The authors conclude from these observations that Crumbs2 regulates anisotropic actomyosin contractility to promote apical constriction and cell ingression.

      The strengths of this manuscript are the very detailed observations on the process of apical constriction and the meticulous evaluation of the localization of the many proteins likely to be involved in the process. While many of the general observations are not new, Francou et al. provide a much richer understanding of this process, as well as a paradigm with which to evaluate the effects of mutations on the gastrulation process. The figures are beautiful, clear, and informative, and support the conclusions made by the authors. The data provide a very compelling picture of both the dynamics of cell behavior and the anisotropies in protein localization associated with it.

      However, much of the Crumbs2 mutant phenotype is not sufficiently explained by the authors' data or conclusions. First, the loss of Crumbs2 does not prevent ingression, as there are mesoderm cells evident between the epiblast and endoderm (Ramkumar et al., 2016, Xiao et al., 2011). There are certainly fewer, and the biggest effect appears to be during the elongation of the axis from E7.75 onward and not during the earlier migratory period (E6.5-E7.75) according to data from both previously published work (Xiao et al., 2011; Ramkumar et al., 2015, 2016) and the data presented here.

      • The reviewer makes a good point regarding the defects observed in Crumbs2 mutant embryos. It is true that in this mutant, a first wave of gastrulation EMT, taking place around E6.5, does not appear to be affected. We interpret this to mean that the gastrulation EMT is a sequential process under differential regulation, and that Crumbs2 is not required for the first wave of cells ingression through the primitive streak, at the onset of gastrulation. Consequently, a small number of early mesodermal cells are produced in Crumbs2 mutants. However, within 24hours of the onset of gastrulation, corresponding to around E7.75, ingression defects are evident in Crumbs2 mutant embryos.

      • For simplicity, these distinct sequential phases of gastrulation regulation, initially independent of Crumbs2, but subsequently dependent, were not initially discussed in our manuscript. We have now elaborated these details in the revised manuscript.

      Nor does the loss of Crumbs2 prevent apical constriction. Ramkumar et al. in their 2016 paper show by live imaging that the major effect of the Crumbs2 mutation is to prevent the cells from detaching from the epithelium, but that the apical domain does undergo constriction, leading to many elongated flask-shaped cells still attached at the apical end. These observations do not fit well with the model proposed by the authors of Crumbs2 regulating anisotropic actomyosin contractility to promote apical constriction and suggest a more complicated story.

      • We thank the reviewer for bringing this up, as it is an important point that we now discuss in greater detail and clarify in the revised manuscript.

      • Importantly, we do not believe our data are in disagreement with the previous study of Ramkumar et al. The precise details of the defect observed in Crumbs2 mutants are still not totally clear. However, we would like to point out that in Ramkumar et al., the timelapse imaging data did not depict cells constricting their surfaces, but rather these data revealed that cells having small apical surfaces failed to detach and delaminate out of the epiblast layer. Thus, this previous study focused on the subsequent step in the process of ingression (delamination), to that being addressed in the present work.

      • Furthermore, epiblast cells outside the domain occupied by the primitive streak, and even some cells positioned on the lateral sides of the embryo, were reported by Ramkumar and colleagues to exhibit abnormally small apical surfaces in Crumbs2 mutants. These cells, at a distance from the primitive streak, will not normally constrict their apical surfaces, since they are not going to undergo the gastrulation EMT, a behavior restricted to the region of the primitive streak. Thus, these previous data do not directly address nor demonstrate that epiblast cells in Crumbs2 mutants undergo apical constriction.

      • Moreover, in Crumbs2 mutants a large number of cells were reported to fail to ingress at the primitive streak, and consequently they were seen to accumulate within the epiblast epithelial layer. Indeed, we believe that the small apical surfaces first reported in Crumbs2 mutants by Ramkumar and colleagues, most likely result from the crowding/jamming of cells within the epiblast layer, and that this causes changes in the shape and volume of cells due to them being spatially constrained. Thus, increased crowding of epithelial cells within a spatially constrained tissue, likely drives a reduction in apical surface area and extensive apico-basal elongation, as observed in Crumbs2 mutants.

      However, the complications of the Crumbs2 mutant do not detract from the value of the basic observations presented in this manuscript, which are solid and well-documented, and will be a valuable resource for the field.

      Reviewer #2 (Public Review):

      In their manuscript, Francou and colleagues study the delamination of epiblast cells into the mesodermal layers using live imaging of mouse embryos cultured ex vivo. By segmenting the apical area of delaminating cells, they quantify extensively the dynamic behavior of delaminating cells. Using immunostaining and crumbs2 mutants, they propose that apical constriction of cells results from pulsed contractions, which could be guided by crumbs2 signals.

      The manuscript is interesting and provides extremely valuable data for our understanding of mouse gastrulation. Occasionally, the manuscript can be a bit confusing and contains a few inaccuracies.

      However, the main issues I have are with some of the interpretations from the authors, which may be incorrect due to limited time resolution (with a 5 min time resolution that was used, it might be difficult to distinguish pulses from measurement noise) and the analysis of immunostaining data, which would require more rigorous quantification.

      • We acknowledge the reviewer’s comments and agree that a shorter time resolution would be ideal to facilitate the detection of constriction pulses of apical surfaces. However, we need to consider that imaging the apical surface of cells within the epiblast layer, which constitutes the most internal surface inside the embryo, is technically challenging in a gastrulating mouse embryo.

      • As suggested by the reviewer, we attempted to image with a shorter time interval than 5min on several different microscope systems and modalities available at our institution (including two different laser point scanning confocals, a spinning disc system, as well as light-sheet microscopes with both upright and inverted configurations) and were not successful in acquiring usable images (having a shorted time-resolution) with the ZO1GFP knock-in reporter. We also need to consider that single-copy GFP knock-in reporters are often dim, thereby exacerbating the issue. In our hands, a high-speed resonant scanning confocal (Nikon A1RHD25) was the system that gave us the best signal-to-noise ratio, spatial resolution and temporal resolution, and was the set-up we used for our most recent live imaging experiments. Using this system, we were able to acquire a limited number of time-lapses with a time resolution of 2min, but none with a shorter time interval, and from our analyses, we determined that movies with a 2min time interval did not yield increased detail over movies with 5min time intervals to warrant a detailed reanalysis. We have provided additional detail relating to these technical issues within the revised manuscript and edited some of the conclusions.

      • We acknowledge that immunostaining is not the most quantitative method, but we were unable to come up with alternative methods that can be used with our samples. We believe the junctional reduction of Myosin, aPKC and Rock1 is generally due to a nonrecruitment or activation of these proteins at junctions, and do not reflect their reduced expression at the gene or protein level. We do not believe that methods such as RTqPCR or Western blotting would be informative in the context in which we are looking, especially since they do not yield spatial resolution. Furthermore, we would need to isolate primitive streak cells to consider applying these methods, and we do not believe they would provide a sufficient improvement over immunostaining.

      • By contrast to the live imaging, which was performed by placing the objective at the posterior side of the embryo in closest proximity to the outer visceral endoderm layer, for fixed tissue imaging, embryos were microdissected to recover the posterior side containing the primitive streak. Microdissected posterior regions were imaged on the side of the cavity by placing the objective in closest proximity to the inner epiblast layer, which permitted direct access to the apical surface of epiblast cells at the primitive streak. In this fixed tissue imaging configuration, the apical surfaces of cells in WT and Crumbs2 mutants were in closest proximity to the imaging objective and thus directly accessible. Thus, any difference in tissue thickness on the other side of the epithelium did not interfere with light penetration. We have edited the figures and include schematics to clarify how the objective positions are flipped with respect to the primitive streak regions at the embryo’s posterior for live vs. fixed tissue imaging.

      • We have now measured the signal intensity in the cytoplasmic region of WT and Crumbs2 mutant embryos, and junctional intensity measurements have been normalized to cytoplasmic intensities.

      Reviewer #3 (Public Review):

      The manuscript by Francou et al investigated cellular mechanisms of epiblast ingression during mouse gastrulation. The authors wanted to know whether/how epiblast cell-cell junctional dynamics correlate with apical constriction and subsequent ingression. Because mouse gastrula adopts an inverted-cup morphology (as a result of differential invasive behavior of polar and mural trophoblast cells), epiblast cells are located in the innermost position and are difficult to image. This is more so when one wants to perform live imaging of epiblast cells' apical surface. The authors tackled such problems/limitations by using a combination of ZO-1 GFP line, confocal time-lapse microscopy, fixed embryo immunostaining, and Crumbs2 mutant embryos. The authors observed that apical constriction was associated with cell ingression, that this constriction occurred in a pulsed fashion (i.e., 2-4 cycles with phases of contraction and expansion, eventually leading to reduction of apical surface and ingression), that this constriction took place asynchronously (i.e., neighboring epiblast cells did not exhibit coordinated behavior) and that junctional shrinkage during apical constriction also occurred in a pulsed and asynchronous manner. The authors also investigated localization/co-localization of several apical proteins (Crumbs2, Myosin2B, pMLC, ppMLC, Rock1, F-actin, PatJ, and aPKC) in fixed samples, uncovering somewhat reciprocal distribution of two groups of proteins (represented by Myosin2B in one group, and Crumbs2 in the other). Finally, the authors showed that Crumbs2 -/- embryos had disturbed actomyosin distribution/levels without affecting junctional integrity (partially explaining the ingression defect reported in Crumbs2 -/- mutant embryos). Overall, this manuscript offers high-quality live imaging data on the dynamic remodeling of epiblast apical junctions during mouse gastrulation.

      It would be interesting to see whether phenomena reported in this manuscript can be extended to the entire primitive streak (or are they specific only to a subset of mesoderm precursors) and to the entire period of mesendoderm formation. More importantly, it would be interesting to see whether the ingression behavior seen here is representative of all eutherian mammals regardless of their gastrular topography.

      • The reviewer raises a very interesting and important point. We focused our data analysis on a middle region in the proximo-distal axis of the embryo, because this is the most optically accessible and the flattest region of the posterior of the embryo to analyze. We also focused on the E7.5 stage of development when the primitive streak is fully elongated, so as to capture as many ingression events within a single time-lapse experiment as possible. Due to the difficulties associated with live imaging the apical epiblast layer of embryos at these stages, we chose to focus our analysis on a defined region of the embryo and a defined period of time. We acknowledge that it will be important to analyze different regions of the primitive streak and at different stages of gastrulation to glean any general versus more distinct modes of epiblast cell ingression, but given the technical difficulties discussed we believe that any extended analysis is beyond the scope of the current study.

      • We also agree that it would be interesting to know if the ingression behavior we observe in the mouse embryo is representative of all mammals, and even more generally of amniotes, but this is beyond the scope of our study.

    1. Author Response

      Reviewer #2 (Public Review):

      Throughout the manuscript, the authors aim to distinguish signal from the lack of it. All conclusions depend on the success of this process. In such an endeavor, the sensitivity of the applied methods is critical. Thus, the authors must use the most sensitive tools to draw meaningful conclusions. The latest iGluSnFR has amazing sensitivity allowing the detection of single AP-evoked responses. This is not the case for vGpH, which requires hundred APs to get a meaningful signal. Similar, synthetic Ca2+ dyes have much better dynamic range, linearity and sensitivity compared to GCaMP6f.

      The rate of silent boutons at 2 mM [Ca2+]e is lower for a single AP compared to 20 or 200 APs. The overall failure rate cannot be increased with increasing the number of APs. This clearly indicates a technical issue (e.g. insufficient sensitivity of vGpH and GCaMP6f).

      We thank the reviewer for raising this concern. We attribute the relatively lower rate of silencing with 1 AP in [Ca2+]e 2.0 mM in neurons expressing iGluSnFr to its sensitivity to detect glutamate exocytosed from neighboring, possibly non-transfected terminals. This limitation is described in the manuscript (page 7, line 26 – page 8, line 5). The overall agreement in the proportion of silencing with iGluSnFr compared to physin-GCaMP or vGpH at lower [Ca2+]e, where the contributions from neighboring terminals is likely greatly diminished, supports this interpretation.

      The authors used three different measuring tools and used three different stimulation protocols, making the interpretation of the data challenging. It is impossible to tell how the failure rate changes from 1 to 20 APs without knowing the release probability, the pool size, depletion, recovery of SVs, and facilitation. These are all unknown.

      In an ideal world, a measure of release probability during a train of stimuli at varied [Ca2+]e would provide the most insight, but this is difficult to achieve with any of the existing methods, including the remarkable new iGluSnFR. The challenge we face is, for our approach, it is impossible to exclude signals from neighboring axons that are closely packed near the axon harboring the indicator. This limitation is described in the manuscript (page 7, line 26 – page 8, line 5). Given this, we felt that showing that silencing can be revealed with all the different techniques was the most conservative approach to address the issue. Because we have focused on this phenomenon, the number of APs is experimentally important only to ensure an adequate response could be detected. We have also included, in the discussion, an acknowledgement of the possibility that we are failing to detect minimal Ca2+ entry (see response to #8 from the synthesized review).

      The last experiment with the GABAB agonist has little novelty in its present form. The authors demonstrate that GABAB agonism increases the rate of silent terminals. The interesting issue would be to reveal how the effect of GABAB activation depends on the [Ca2+]e. This information is essential to see whether there is indeed a shoulder in its effectiveness curve.

      We are grateful to the reviewer for this recommendation and we have performed additional experiments (see response to #7 from the synthesized review).

      The authors refer to a theoretical set-point in [Ca2+]e below which the function of the terminals is fundamentally different. From the presented experiments, the reviewer does not see any data that is inconsistent with a continuum. 'Thus, as with Ca2+ influx, SV recycling is modulated in an all-or-none manner by modest changes in [Ca2+]e around the physiological set point.' This statement is not supported by the data. The reviewer cannot see a set point.

      We appreciate the reviewer’s criticism and wish to clarify that we mean the normal physiologic [Ca2+]e in the CSF. We have changed the text to clarify this point (page 7, line 20).

    1. Author Response

      Reviewer #1 (Public Review):

      Part 1: Type 2 deiodinase

      Table I is supposed to clarify and summarize the results but brings confusion. The text says that table I supports the claim that "in the cerebellum, Luc-mRNA was lower in the Ala92-Dio2 mice" whereas figure 1G does not show any difference. It is unclear whether Table I and figure 1 report the same data, and what the statistical tests are actually addressing (effect of genotype vs effect of treatment, whereas what matters here is only the interaction between genotype and treatment). Overall, it is not acceptable to present quantitative data without giving numbers, standard deviation, p-value, etc. as in Table I.

      Thank you. We agree with the reviewer. We intended to minimize the amount of data presented, which was already very large, and therefore only presented the ratios of thr/alaDio2 and which created confusion. This part was removed from the new version of the MS.

      Also, evaluating T3 signaling by only looking at the luc reporter and the Hprt housekeeping gene is not always sufficient (many T3 responsive genes can be found in the literature and more than one housekeeping gene should be used as a reference).

      Thank you. The advantage of using the THAI mouse is that the Luciferase reporter gene is driven by a promoter that is only sensitive to T3, which is not the case for any other T3-responsive responsive gene. The Hprt housekeeping signal was stable among the samples, and the differences observed were not caused by differences in the housekeeping gene expression. This part was removed from the new version of the MS.

      Another important weakness is that the wild-type mice have a proline at position 92. Why not include them? In absence of structural prediction, one wonders whether the mouse models are relevant to the human situation and whether the absence of the proline reduces the enzymatic activity when substituted for an Ala or Thr. This might have been addressed in previous work, but the authors should explain.

      The position 92 in DIO2 is occupied by Thr in humans. Its Km(T4) is indistinguishable from mouse Dio2 which has a Pro in the position 92 (4nM vs. 3.1nM) [PMID 8754756; PMID: 10655523]. Humans also carry an Ala in position 92. Comparing the two human alleles is the purpose of the study.

      Experiment 2: Ala92-Dio2 Astrocytes Have Limited Ability to Activate T4 to T3

      Here, the authors use primary cell cultures from different areas of the brain to measure the in vitro conversion of T4 to T3 by Dio2. They find that hippocampus astrocytes are less active, notably if they come from Ala92-Dio2 mice.

      This part has the following weaknesses:

      • This result correlates with the results from Fig 1F however the difference between Ala92-Dio2 and Thr92-Dio2 is significant in vitro, but not in vivo.

      From a deiodinase perspective, TH signaling in vivo depends on the presence of D2 (expressed in glial cells) and D3 (expressed in neurons), whereas in vitro it only depends on D2. In fact, D2 and D3 are known for a reciprocal regulation to preserve TH signaling [PMID: 33123655]. Thus, it is conceivable that the differences observed between the two models are explained by the intrinsic differences in the models.

      What matters is not the activity/astrocytes, but the total activity of the brain area, which depends on the number of astrocytes x individual activity. This is not measured.

      We respectfully disagree with the reviewer. The total D2 activity in a brain area depends fundamentally on the number of astrocytes in that area and on the intrinsic activity of the enzyme. The reviewer is suggesting that having an area denser in astrocytes expressing a catalytically less active D2 preserves a normal local T3 production. This is unlikely to be the case because we have no evidence that the density of astrocytes is different in Ala-DIo2 mice. Please keep in mind that the intimate relationship between astrocytes and neurons is what defines the microenvironment that surrounds the neuron. By separating astrocytes from neurons we are able to measure T3 production that is occurring in the neuronal microenvironment and show that cells obtained from AlaDio2 mouse produce less T3.

      • What the authors called 'primary astrocytes' is an undefined mixed population of glial cells, (including radial glial cells, stem cells, ependymal cells, progenitor cells, etc...) that proliferated differentially for more than a week in culture, among which an unknown ratio expresses Dio2. The cellular model is thus poorly characterized, and the interpretation must be prudent.

      • Again, wild-type mice are not included.

      Thank you. We now include a reference to illustrate the types and percentages of cells present in our cultures. Given that the study is to compare the Thr92 and the Ala92 alleles, which are both present in humans, we did not believe it was necessary to include them here. Please note (as explained above) the Km(T4) for Thr92 and Pro92-Dio2 is indistinguishable.

      Part 2: Neuronal response to T3 Involves MCT8 and Retrograde TH transport

      The authors next move to primary neuronal cultures, prepared from the fetal cortex which they grow in the microfluidic chamber to study axonal transport. This is a surprising move: the focus is not on Dio2 anymore, but on the MCT8 transporter, which is known in humans to play an important role to transfer TH into the brain. It is expressed mainly in glia, but also in neurons. They study the influence of endosomes and type 3 deiodinase on the trafficking and metabolism of TH.

      Thank you.

      It would be useful to perform an experiment, in which radioactive T3 is introduced in the "wrong" side of the chamber, in an attempt to detect a possible anterograde transport. This would address the possibility that Mct8 also promotes efflux and control so that the chamber is not leaking.

      Thank you. To satisfy the reviewer, we have conducted three new experiments adding 125IT3 in the MC-CS. The first experiment verified that the T3 transport in the cortical neurons also occurs anterogradely. The second experiment showed that the anterograde transport depends on mct8. The third experiment shows that D3 activity in the neuronal soma is limiting the amount of T3 transported along axons. We have included a new paragraph in the results section describing these experiments (Line 154 to 167), and a new supplementary figure (Figure 3—figure supplement 3). We have also discussed these new findings. Line 383 to 386. In every experiment, we have controlled for the possibility of leaking using one device without neurons that received radioactive T3. After 24 and 72h samples from the opposite side were obtained but did not contain any radioactive T3. We refer the reviewer to figure 1, where this is explained.

      The authors use sylichristin as an inhibitor of Mct8, to demonstrate that transport is Mct8 dependent. They do not provide indications or references that would clearly indicate that this drug is a fully selective antagonist of Mct8 (but not of Oatp1c1, Mct10, Lat1, Lat2, etc., the other TH transporters). A good alternative would be to use Mct8 KO mice as controls.

      Thank you. We refer the reviewer to reference 27 [J. Johannes et al., Silychristin, a Flavonolignan Derived from the Milk Thistle, Is a Potent Inhibitor of the Thyroid Hormone Transporter MCT8. Endocrinology 157, 1694-1701 (2016)] clearly indicating that Silychristin has a remarkable specificity toward MCT8. While using mct8 KO is interesting, it would have prevented us from testing some of our hypotheses. Being able to selectively inhibit Mct8 either in the MC-CS or in the MC-AS was a clear advantage. For example, pls see the experiment in which we add T3 in the MC-AS and the silychristin in the MC-CS (Fig. 3F). Here, we discovered new roles of mct8, such as its involvement in the release of T3 from the endosomes (line 228 to 231).

      The B27 used in primary neuronal culture might contain TH. This is not easy to know, but at least some batches do.

      Thank you. While the neurons were cultured in B27, all experiments were performed in cells incubated with neurobasal only (B27 was removed 24 earlier). This was not clear in the initial version, where there was only a vague reference in the legend of figure 3F. Now, this has been explained in the footnote of figure 3 and in line 207.

      The presence of astrocytes, probably expressing Mct8 and Dio2 is inevitable in primary neuronal cultures, and is not mentioned, but might interfere with TH metabolism.

      Thank you. We were aware that, under normal conditions, primary neuronal culture contains 25% of astrocytes. This was however minimized/eliminated by 2-day culture with the anti-mitotic cytosine arabinoside, which restricts astrocytes and microglia to <0.01 in this type of culture. This was explained in the initial version of the manuscript in the material and methods section (lines x to x) and supported with reference 53 (reference 57 in the previous version).

      Part 3: T3 Transport Triggers Localized TH Signaling in the Mouse Brain

      The authors return to in vivo experiments, implanting T3 crystals, labeled or not with radioactive iodine. They do so in the hypothalamus, where they address the retrograde transport of TH in TRH neurons, and in the cortex, looking for contralateral transport. These data are the most difficult to interpret. - First, T3 is hydrosoluble and would probably migrate without active transport.

      Thank you. Please note that at no point we characterized the T3 transport “active transport”, which by definition is an ATP-dependent process. Please note that to address the issue raised by the reviewer “migrate without active transport”, in both experimental approaches, we included controls to assess the random diffusion of T3.

      In hypothalamic studies, we used the (i) cerebral cortex and (ii) the lateral hypothalamus, a region that is immediately adjacent to the PVN. Neither region exhibit an axonal connection to the median emminence. The results, in both cases, show that the presence of radioactive T3 in the control areas was minimal when compared to the PVN (Fig. 5C).

      In the cerebral cortical studies, we included ipsi- and contra-lateral hypothalamic measurements that served as controls given the absence of a connection between the cortex and the hypothalamus. Accordingly, T3 signaling was not detected in any of the control regions (Fig. 6C previous version; now figure 5). Thus, these controls indicate that it is unlikely that the results could be explained by “migrate without active transport” of T3.

      • The authors do not demonstrate that these specific neuronal populations contain Mct8, and that these observations are connected to the previous in vitro observation (which used cortical neurons prepared from the fetus).

      Thank you. In the previous version, we did not make it abundantly clear that the EM pictures in Fig. 3D-G (previous version; now figure 2 D-G) were from neurons in the mouse motor cortex (this information is now explained in lines 149 to 151), which is where we inserted the T3 crystals. In addition, we have done more histological work on the brain M1 (cortex) of adult mice and found that many neurons in the M1 express D3 and Mct8—lines 433-434 and Figure 5 G-K (along with histological studies showing the specificity of the ab against D3 Fig S6).

      The possibility that astrocytes are involved, as reported in the literature, is not considered.

      • Here again, using Mct8KO mice would greatly help to interpret the data. In particular, the experiments with cold T3 involve a 48h delay which is very long in comparison to the 30 minutes required for long-distance transfer of radioactive T3.

      Thank you. We are unsure about the question posed by the reviewer. We are wondering how would astrocytes play a role in inter-hemispheric transport of T3? Given that astrocytes are not known to project across long distances, we have not considered this possibility. We agree that using the Mct8KO mouse could have provided supporting evidence of the role played by Mct8 in this process, but please keep in mind that the Mct8KO mouse does not have or exhibits a very mild brain phenotype, indicating that during development compensatory mechanisms have occurred that obviate the function of the transporter. This compensatory mechanism most likely involved Oatp1c1, given that only the double Mct8 and Oatp1c1 KO mouse develops a significant phenotype. This consideration directed us to the utilization of sylycristin, the highly selective Mct8 inhibitor, which disrupts the Mct8 pathway in a mouse that developed normally.

      The two approaches used to demonstrate neuronal T3 transport in vivo are fundamentally different. The hypothalamus experiments employed radioactive T3, whereas T3 crystals were used in the cerebral cortex. The first approach studied T3 transport whereas the second studied downstream T3 effects, logically requiring more time. The solid T3 implant requires time to release T3 and activate gene expression. In the original paper that utilized T3 implants in the rodent brain, samples were processed after 4 days. (Dyess et al. 1988 Endo; PMID 3139393)

      Discussion

      Considering the diversity of questions that are addressed in the study, it is not surprising that the discussion is not covering all aspects. The authors implicitly consider that their conclusions can be extended to all neurons, while they use in their experiments a variety of different populations coming from either the fetal cortex, hippocampus, adult cortex, or hypothalamus. The claim that they discovered a mechanism applying to all neurons is not supported by the data.

      Thank you. We agree with the reviewer: the high number of neuronal subtypes might include different mechanisms in T3 transport. Our studies involved cortical (central) and dorsal root ganglia (peripheral) neurons in vitro and cortical and hypothalamic neurons in vivo. Thus we think that the described mechanism is not confined to specific neuronal subtypes. The discussion has been modified accordingly (lines 402 to 411).

      Moreover, we have done immunofluorescence studies to characterize the neurons present in the MC-CS better. We have found that all the neurons residing in the MC-CS are excitatory, expressing the vesicular glutamate transporter 1 (Vglut1). But no neurons were expressing GAD67, a marker for inhibitory neurons Figure 5—figure supplement 5). This is supported by the fact that during the mouse's brain development, the embryonic days 14.5 to 17.5 is the birth date of layer 4 and 2/3 excitatory neurons (PMID: 34163074). These neurons are migrating and have not extended their cellular processes, making them more likely to survive the isolation protocol from the cortex. On the other hand, the neurons (mostly excitatory) already residing in the cortex may have expanded their processes and changed their morphology, making them less capable of surviving the isolation process.

      Some highly relevant literature is not cited. In particular:

      • Mct8 KO mice do not have marked brain hypothyroidism (PMID: 24691440) which at least suggests that the pathway discovered by the authors can be efficiently compensated by alternative pathways.

      We agree with the reviewer. As mentioned above, a compensatory mechanism triggered during development “compensates” for the inactivation of Mct8. That, however, does not mean that mct8 is not critically important. We have added that limitation to the discussion (lines 342); ref 46.

      • Dio3 KO only increases T3 signaling in a few brain areas and only in the long term (PMID: 20719855).

      Thank you. That is now included in the ms; ref 25.

      • Anterograde transport of T3 has been reported for some brainstem neurons (PMID: 10473259).

      Thank you. This was our mistake, indeed. We had worked on several versions of the manuscript that included references to her seminal work but unfortunately deleted it from the final version. This is now included in refs 48 and 49.

      Reviewer #2 (Public Review):

      Salas-Lucia et al. investigated two main questions: whether the Thr92Ala-DIO2 mutation impairs brain responsiveness to T4 therapy under hypothyroidism induction and the mechanisms of neuronal retrograde transport of T3. They find that the Thr92Ala-DIO2 mutation reduces T4-initiated T3 signaling in the hippocampus, but not in other brain regions. Using neurons cultured in microfluidic chambers, they further describe a novel mechanism for retrograde transport of T3 that depends on MCT8 and endosomal loading (possibly protecting T3 from D3-mediated cytosolic degradation) and microtubule retrotransport. Finally, they present evidence of retrograde transport of T3 through hypothalamic projections and interhemispheric connections in vivo. The main novelty of this study is the delineation of the mechanism of T3 retrograde transport in neurons. This is interesting from the cell biology perspective. The notion of impaired hippocampal T3 signaling is relevant for the cognitive outcomes of hypothyroidism and its associated therapy.

      Thank you.

      Although the data are exciting and relevant for the community, some issues need to be addressed so that conclusions are more clearly justified by data:

      1) The title and the abstract mean that dissecting this novel mechanism of T3 retrograde transport may help improve cognition or brain responsiveness in patients taking T4 or L-T3 therapy. However, how initial results (Figs 1 and 2) connect to later data is not essentially clear. For example, do Thr92Ala-DIO2 mice present altered retrograde transport of T3? Would stimulation of retrograde transport in Thr92Ala-DIO2 mice rescue neurological phenotypes? Can the authors address this experimentally?

      Thank you. These are all interesting points raised by the reviewer. However, the three reviewers felt that a connection between the studies in astrocytes and the studies in neurons was missing, and complained about the disjoint nature of the manuscript. To satisfy the reviewers we removed from the MS the experiments with astrocytes and DIO2 polymorphism, and focused on the neuronal transport of T3.

      2) Although the authors present in vivo evidence of retrograde T3 transport in the hypothalamus and motor cortex, given the select susceptibility of the hippocampus to hypothyroidism, it would be especially interesting to test whether this mechanism also happens in a hippocampal circuit (CA3-CA1 Schaffer collaterals, mossy fibers or perforant pathway).

      Thank you. We agree that this would be interesting, but technically challenging. Nonetheless, we intend to study this in the future.

      3) Table 1 should present the raw values for Ala92-DIO2 mice and treatments instead of only displaying the direction of change and statistical significance. From Panels 1E-J, it is unclear if Thr92Ala-DIO2 mice or treatments caused any real change in brain regions other than the hippocampus.

      Thank you. These experiments were removed from the new version of the MS.

      4) The authors put forward the notion that a rapid nondegradative endosome/lysosome incorporation protects T3 from D3 degradation in the cytosol. Their experiments with pharmacological modulation of MCT8, lysosomes, and microtubules are in this direction. However, they do not represent an unequivocal demonstration of this mechanism. Therefore, the authors should be more cautious in their interpretation and discuss the limitations of their approaches.

      Thank you. The manuscript was edited to reflect these important points.

      Reviewer #3 (Public Review):

      Initially, Salas-Lucia et al examined the effect of deiodinase polymorphism on thyroid hormone-medicated transcription using a transgenic animal model and found that the hippocampus may be the region responsible for altered behavior. Then, by changing to topic completely, they examined T3 transport through the axon using a compartmentalized microfluid device. By using various techniques including an electron microscope, they identified that T3 is uptaken into clathrin-dependent, endosomal/non-degradative lysosomes (NDLs), transported in the axon to reach the nucleus and activate thyroid hormone receptor-mediated transcription.

      Although both topics are interesting, it may not be appropriate to deal with two completely different topics in one paper. By deleting the topic shown in Table 1, Figure 1, and Figure 2, the scope of the manuscript can be more clear.

      Thank you. We did as suggested by the reviewer. These studies were removed from the present version of the ms.

      Their finding showing that triiodothyronine is retrogradely transported through axon without degradation by type 3 deiodinase provides a novel pathway of thyroid hormone transport to the cell nucleus and thus can contribute greatly to increasing our understanding of the mechanisms of thyroid hormone action in the brain.

      Thank you.

    1. Author Response

      Reviewer #2 (Public Review):

      In their study the authors aimed to investigate the dissemination of Enterobacterales plasmids between geographically and temporally restricted isolates recovered from different niches, such as human blood stream infections, livestock, and wastewater treatment works. By using a very strict similarity threshold (Mash distance < 0.0001) the authors identified so-called groups of near-identical plasmids in which plasmids from different genera, species, and clonal background co-clustered. Also, 8% of these groups contained plasmids from different niches (e.g., human BSI and livestock) while in 35% of these cross-niche groups plasmids carried antimicrobial resistance (AMR) genes suggesting recent transfer of AMR plasmids between these ecological niches.

      Next, the authors set-out to examine the wider plasmid population structure by clustering plasmids based on 21-mer distributions capturing both coding and non-coding plasmid regions and using a data-driven threshold to build plasmid networks and the Louvain algorithm to detect the plasmid clusters. This yielded 247 clusters of which almost half of the clusters contained BSI plasmids and plasmids from at least one other niche, while 21% contained plasmids carrying AMR genes. To further assess cross-niche plasmids similarities, the authors performed an additional plasmid pangenome-like analysis. This highlighted patterns of gain and loss of accessory plasmid functions in the background of a conserved plasmid backbone.

      By comparing plasmid core gene or plasmid backbone phylogenies with chromosome core gene phylogenies, the authors assessed in more detail the dissemination of plasmids between humans and livestock. This indicated that, at least for E. coli, AMR dissemination between human and livestock-associated niches is most likely not the result of clonal spread but that plasmid movement plays an important role in cross-niche dissemination of AMR.

      Based on these data the authors conclude that in Enterobacterales plasmid spread between different ecological niches could be relatively common, even might be occurring at greater rates than estimated, as signatures of near-identity could be transient once plasmids occupy and adept to a different niche. After such a host jump, subsequent acquisition, and loss of parts of the accessory plasmid gene content, as a result of plasmid evolution after inter-host transfer, may obscure this near-identity signature. As stated by the authors, this will raise challenges for future One Health-based genomic studies.

      Strengths

      The article is well written with a clear structure. The authors have used for their analysis a comprehensive collection of more than 1500 whole genome sequenced and fully assembled isolates, yielding a dataset of more than 3600 fully assembled plasmids across different bacterial genera, species, clonal backgrounds, and ecological niches. A strong asset of the collection, especially when analyzing dissemination of AMR contained on plasmids, is that isolates were geographically and temporally restricted. Bioinformatic analyses used to discern plasmid similarity are beyond state-of-the-art. The conclusions about dissemination of plasmids between genera, species, clonal background and across ecological niches are well supported by the data. Although conclusions about inter-host plasmid dissemination patterns may have been drawn before, this is to my knowledge the first time that patterns of dissemination of plasmids have been studied at such a high-level of detail in such a well selected dataset using so many fully assembled genomes.

      Weaknesses

      One conclusion that is not entirely supported by the data is the general statement in the discussion that "cross-niche plasmid in not driven by clonal lineages". From the tanglegram, displaying the low congruence between the plasmid and chromosome core gene phylogeny in E. coli, this conclusion is probably valid for E. coli, but this not necessarily means that this is also the case for the other Enterobacterales genera and species included in this study. For these other genera, the data supporting this conclusion are not given, probably because total number of isolates for certain genera were low, or because certain niches were clearly underrepresented in certain genera.

      Thank you for reviewing our manuscript.

      We agree that this statement in the conclusion was too general, and have adapted it (lines 407-409):

      “By examining plasmid relatedness compared to bacterial host relatedness in E. coli, we demonstrated that plasmids seen across different niches are not necessarily associated with clonal lineages”

      In the limitations section of the Discussion, we have also referenced this specifically as a limitation (lines 422-424):

      “Although we evaluated four bacterial genera, 72% (1,044/1,458) of our sequenced isolates were E. coli, and so our analyses and findings are particularly focused on this species.”

      Furthermore, the BSI as well as the livestock niches were analyzed as single niches while the BSI niche included both nosocomial and community-derived BSI isolates and the Livestock niche included samples from different livestock-related hosts. Given the fact that a substantial number of plasmids were available from cattle, sheep, pigs, and poultry, it would be interesting to see whether particular livestock hosts were more frequently found in the cross-niche plasmid clusters than other livestock hosts and whether the BSI plasmids in these cross-niche clusters were predominantly of community or nosocomial origin.

      We agree that analyses which distinguish between nosocomial/community acquired BSI isolates would be interesting further work, but are beyond the scope of this study. Our analysis of the BSI/livestock cross-niche near-identical plasmid groups details the livestock hosts involved (lines 144-154). Briefly, of the n=8 BSI/livestock cross-niche groups, these involved

      • pig/poultry (1/8)

      • poultry (1/8)

      • pig (2/8)

      • sheep (3/8)

      • cattle/pig/poultry (1/8)

      We have added a note of explanation in the methods to explain how the distance threshold we use for near-identical clustering is maximally conservative at small plasmid sizes (a single SNP produces a new plasmid cluster) but remains highly conservative (tens of SNPs) at large plasmid sizes.

      We have carefully considered the point about whether particular hosts were more frequently found in cross-niche plasmid clusters. However, we do not think it is easy to infer whether a particular livestock host is represented more frequently in these cross-niche events than would be expected from chance, given the low density of the sampling.

      We have reorganised the paragraph in lines 144-154 to provide more clarity on the groups’ niches.

      “Sharing between BSI and livestock-associated isolates was supported by 8/17 cross-niche groups (n=45 plasmids). Of these, n=3/8 groups contained BSI/sheep plasmids: one group contained mobilisable Col-type plasmids, the remaining two groups contained conjugative FIB-type plasmids. Of these, one group contained plasmids carrying the AMR genes aph(3'')-Ib, aph(6)-Id, blaTEM-1, dfrA5, sul2, and the other group contained plasmids carrying the MDR efflux pump protein robA (see Materials and Methods). A further n=2/8 groups contained BSI/pig mobilisable Col-type plasmids, of which one group other carried the AMR genes aph(3'')-Ib, aph(6)-Id, dfrA14, and sul2. Lastly, n=1/8 groups contained BSI/poultry non-mobilisable Col-type plasmids, n=1/8 contained BSI/pig/poultry/influent non-mobilisable Col-type plasmids, and n=1/8 contained BSI/cattle/pig/poultry/influent mobilisable Col-type plasmids.”

      We have also added this as a limitation in the discussion (lines 424-426):

      “Additionally, we did not sample livestock-associated niches densely enough to explore individual livestock types (cattle/pigs/poultry/sheep) sharing plasmids with BSI isolates (see Appendix 1 Fig. 9).”

      We have already recognised that our culture methods may have affected our sensitivity to detect Klebsiella spp. isolates in the livestock/environmental samples – we have expanded on this to explicitly highlight that this may have affected our capacity to evaluate Klebsiella-associated plasmids (lines 443-444):

      “This limited our ability to study the epidemiology of livestock Klebsiella plasmids.”

    1. Author Response

      Reviewer #1 (Public Review):

      Although the authors have identified some properties/molecular markers of canine H3N2 influenza viruses that highlight the potential for infecting humans, it needs to be cautious to emphasize the threat of these viruses to public health. One fact is that despite the increasing prevalence of these viruses in dogs and the close proximity between dogs and humans, there is so far no report of human infection with canine H3N2 influenza viruses. The authors are wished to discuss this in their manuscript so that the readers can have a more comprehensive understanding of their findings and the public health importance of canine influenza viruses.

      We agree with the reviewer. We added the related discussion and revised some words to not emphasize the threat of these viruses to public health (lines 342-346).

      Reviewer #3 ( Public Review):

      1) The investigators should run neuraminidase inhibition assays to established the level of cross reactivity of human sera to the canine origin NA (one of reasons proposed as to the lower impact of the H3N2 pandemic was the presence of anti0N2 antibodies in the human population).

      We performed neuraminidase inhibition assays as suggested for both ferret sera against human H3N2 virus and human sera. The results showed that the NI titers of ferret sera against human H3N2 virus to canine H3N2 viruses were <10 (lines 147- 148, Supplementary file 2). Additionally, 2.0%–3.0% of the children's serum samples, 1.0%–2.0% of the adult's serum samples, and 1.0%–2.0% of the elderly adult's serum samples had NI antibody titers of ≥10 to canine origin NA (lines 158-161, Table 1, and lines 435-445).

      2) Please tone down the significance of ferret-to-ferret transmission as a predictor of human-to-human transmission. Although flu viruses that transmit among humans do show the same capacity in ferrets, the opposite is NOT always true.

      We agree with the reviewer. To tone down the significance of ferret-to-ferret transmission as a predictor of human-to-human transmission, we added the related discussion and deleted or revised some words (lines 342-346, line 37, line 302, line 308, line 322, and line 341).

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Vias and co-authors develop HGSOC PDOs and characterized their genomes, transcriptomes, drug sensitivity, and intra-tumoural heterogeneity. They show that PDOs represent the high variability in copy number genotypes observed in HGSOC patients. Drug sensitivity was reproducible compared to parental tissues and the ability of these models to grow in vivo.

      Overall, the manuscript lacks sufficient novelty. Several pieces of information and a number of conclusions that are presented here have been previously published by other groups (Nina Maenhoudt, Stem cell reports, 2020; Shuang Zhang, Cancer Discov, 2021).

      We agree that several important papers on HGSOC organoids have been published. However, we disagree about your assessment of “lacks sufficient novelty”. Our MS addresses critical questions about conservation of mechanisms of chromosomal instability, how PDOs can be selected as clinical relevant models based on patterns of CIN and their comparative drug response. These questions are vital to using PDOs for therapeutic development and have not been explored before. By contrast, Maenhoudt et al. performed many analyses on several organoids (whole-genome sequencing, whole exome sequencing) but did not analyse the relationships between copy number profiles, mutational signatures or drug sensitivity between donor tissues and derived organoids and did not perform transcriptomic or scDNA analyses. A major novelty of our approach is to provide robust clinical validation of individual HGSOC PDOs by analysing how our PDOs are statistically representative of the various CN subclasses of HGSOC. Maenhoudt et al and Zhang et al classify their models only using infrequent recurrent mutations in driver genes. We do not understand how the Zhang MS overlaps with our MS as it describes the CRISPR-engineering of mouse cells to model HGSOC and investigates drivers of the mouse tumour microenvironment.

      Reviewer #3 (Public Review):

      1) The manuscript adequately demonstrates that genomic instability is maintained in HGSOC tumourspheres. The use of 3-dimensional HGSOC models to more greatly resemble the in vivo environment has been used for more than a decade, but this is the first demonstration using a variety of genomic assessment tools to show genomic instability in the HGSOC tumoursphere model. It is clearly demonstrated that these HGSOC tumourspheres represent copy number variations similar to information in public datasets (TCGA, PAWG, BriTROC-1) and that cellular heterogeneity is present in these tumourspheres. The simple steps outlined to establish and passage tumourspheres will benefit the field to further study mechanisms of genomic instability in HGSOC.

      Thank you for these positive comments.

      2) A weakness of the manuscript is the lack of operational definitions for what constitutes an organoid and an appropriate definition to distinguish genomic instability from chromosomal instability (a distinct type of genomic instability). Line 147 states "As PDOs consist of 100% tumour cells...", although this does not appear to have been established by any assessment. This limited characterization of the 3D model is a weakness since no data is provided on whether the tumourspheres constitute only a single cell type (as indicated on line 147) or multiple cell types (e.g., HGSOC cell, mesothelial cells) using markers beyond p53 expression. Based on this information, this model cannot be called a PDO, rather it should be referred to as a tumoursphere.

      We define continuous PDO models on page 3 stating our criteria based on passage > 5 and successful reculture after thawing (previous publications have not defined whether their models are continuous or finite). As shown in our targeted-gene mutation analysis, all our PDOs contain a TP53 mutation allele fraction between 80–95%. Moreover, in our single cell DNA-Seq data we do not observe any normal copy number profiles that would indicate normal cells. This information is now included in the text for clarification. Our reasons not to use the term spheroids or tumourspheres are:

      1. The word spheroid comes from the in vitro spheroid formation assay which was originally designed to overcome the difficulties found in functional in vivo serial transplantations. This method generates colony-forming units in suspension. Our patient-derived cells are not growing in suspension but within an extra-cellular matrix.

      2. Spheroids are clonally expanded from a single-cell as part of the colony-forming assay; our patient-derived organoids were not clonally expanded in any way.

      3. Organoids derived from patient-tumours have been named PDOs in multiple publications where pure tumour cellularity was stated for the PDOs [Vlachofiannis et al. Science (2018) 359, 920; Li et al. Nat. Comm.(2018) 9, 2983; Lee et al. Cell (2018)173, 515; Kopper et al. Nat Med (2019) 25, 838]. Use of other terms will cause confusion for readers and prevent important comparisons between PDO from different researchers.

      3) Chromosome instability (CIN) is a type of genomic instability that is broadly defined as an increased rate of chromosome gains or losses and is best identified through analysis of single cells (e.g., karyotype analysis), something that bulk whole genome sequencing cannot determine since it is a reflection of cell populations and not individual cells. While the data demonstrate genomic instability is retained in the tumourspheres, and chromosome losses or copy-number amplifications were observed using single-cell whole genome sequencing, evaluation of samples from the same patient over time was not evaluated. While there is evidence to support CIN in these samples, in agreement with other published work that has demonstrated CIN in >95% of HGSOC samples analyzed at the single-cell level, this work is not conclusive. The title of the manuscript should be modified to more accurately represent what the evidence supports.

      We have discussed the ambiguity of CIN in our recent publication “A pan-cancer compendium of chromosomal instability” Drews et al Nature 2022.

      “CIN has complex consequences, including loss or amplification of driver genes, focal rearrangements, extrachromosomal DNA, micronuclei formation and activation of innate immune signalling. This leads to associations with disease stage, metastasis, poor prognosis and therapeutic resistance. The causes of CIN are also diverse and include mitotic errors, replication stress, homologous recombination deficiency (HRD), telomere crisis and breakage fusion bridge cycles, among others.

      Because of the diversity of these causes and consequences, CIN is generally used as an umbrella term. Measures of CIN either divide tumours into broad categories of high or low CIN, are restricted to a single aetiology such as HRD, are limited to a particular genomic feature such as whole-chromosome-arm changes, or can only be quantified in specific cancer types. As a result, there is no systematic framework to comprehensively characterize the diversity, extent and origins of CIN pan-cancer, or to define how different types of CIN within a tumour relate to clinical phenotypes. Here we present a robust analysis framework to quantitatively measure different types of CIN across cancer types.”

      Many authors use CIN to include the consequences of CIN and other specifically use CIN to indicate ongoing numerical and structural change. We do not think our usage of CIN in the title and text is controversial and is consistent with previous peer reviewed publications, including our own.

      4) An additional weakness is missing information (e.g., Figure 1d, Supplementary Figure 3b, and Supplementary Table 4 were not included in the manuscript; the 13 anticancer compounds used to test drug sensitivity are not indicated) making an assessment of the data impossible, and assessment of some conclusions difficult.

      We apologise for this misunderstanding as a typo suggested that there was a Figure 1d (it should have referred to Figure 1c) or Figure 1-Figure supplement 3B (the label of which was missing); we also apologise for the omission of Supplementary Table 4. These errors have been corrected and the list of compounds is now included in the Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      We would like to thank reviewer #1 for her helpful comments and would like to respond to these as follows:

      1) “Editing efficiencies were variable (99% to 0%) depending on the species, being worst for L. major.”

      It is true that the editing efficiency was different in each species and worst for L. major. However, it is important to note that these efficiencies varied not only for each species but also amongst genes and especially chosen sgRNA sequences. Variations in efficiency across sgRNAs targeting the same gene and locus is a common problem in any CRISPR approach. We made this clearer in our revised manuscript (line 670 – 673).

      2) “The use of premature termination codons also clearly raises issues for false positives and negatives, especially as there is no evidence for nonsense-mediated mRNA decay in Leishmania.”

      We have now included in our revised manuscript that it is currently unclear whether a classical nonsense-mediated decay pathway is present in Leishmania or not. If such a pathway would be present, mutant mRNAs in which a termination codon is present within the normal open reading frame would be removed (Clayton, Open Biology 2019; Delhi et al., PLoS One 2011). But if not, remaining N-terminal protein parts could be functional and may lead to false positive and negative results. However, as reviewer #2 pointed out, this may also provide extra information about functional domains of the targeted protein and highlights that our tool can not only be used to create functional null mutants by inserting premature STOP codons but also to pursue targeted mutagenesis screens (line 674 - 683).

      3) “There are already two genome-wide screening options for Leishmania, so the advantages and disadvantages of the method proposed here need to be discussed in a much more detailed and balanced way.”

      We have revised our manuscript to include in our introduction (line 36 - 73) and discussion (line 658 - 697) a better comparison of all potential tools for genome-wide screening in Leishmania, including RNAi, bar-seq and base editing screening. We highlight why we think that base editing has unique advantages.

      4) “In the "LeishGEM" project (http://www.leishgem.org) all Leishmania mexicana genes will be knocked out and each KO will be bar-coded. At the end, 170 pooled populations of 48 bar-coded mutants will be publicly available. The only real reason the authors of the current paper give for not using this approach is that it is labour-intensive. However, LeishGEM is funded and underway, with several centres involved, so that argument is weak.”

      In our original manuscript we gave multiple reasons why we think that the LeishGEdit method, which is being used for the LeishGEM screen and has been developed by the lead author of our here presented study, has clear disadvantages compared to base editing.

      As written in our original manuscript (line 709 – 716): “However, for a bar-seq screen, each barcoded mutant needs to be created individually by replacing target genes with drug selectable marker cassettes (20,21), making them extremely labour intensive and most likely “one-offs” on a genome-wide scale. Furthermore, aneuploidy in some Leishmania species can be a major challenge for gene replacement strategies as multiple rounds of transfection or isolation of clones may be required to target genes on multi-copy chromosomes. Using gene replacement approaches it is also not feasible to study multi-copy genes that have copies on multiple chromosomes. These are major disadvantages of bar-seq screening.”

      Therefore, we still think that the main disadvantage of bar-seq screening is that it is labour-intensive as each mutant needs to be created individually. The fact that LeishGEM requires five years and several research centres to knockout all genes in just one Leishmania species is proof for this argument.

      However, to clarify our position about this further, we have listed other disadvantages of the LeishGEM screen, including difficulties of sharing mutant pools between labs, possible problems in expanding mutant pools without losing uniformity, no ability to change the composition of generated pools and limited ability to distinguish between technical failures and essentiality. If any of these problems would occur, it would require a de novo generation of barcoded mutants and therefore this is an extremely labour-intensive method for large-scale screening. We also added that bar-seq screens are not feasible in Leishmania species that display extreme cases of aneuploidy, such as L. donovani (line 59 – 73).

      Despite all these disadvantages of the LeishGEdit approach for the LeishGEM project, there are of course also clear advantages, which we also point out in our introduction (line 52 – 55).

      5) “There is also a preprint describing RNAi for functional analysis in Leishmania braziliensis.”

      Although our original manuscript included the pre-print about RNAi screening in Leishmania braziliensis already (line 706-709), we understand that this deserves a stronger discussion. We have therefore highlighted now RNAi as a possible tool for genome-wide screening in selected Leishmania species in our revised introduction (line 36 - 43). However, we also argue that RNAi approaches are at the moment only available to Leishmania of the Viannia subgenus and that RNAi activity greatly varies between the species (line 36 – 43 and 665 - 669). In addition, we discuss that the use of RNAi genome-wide screens is much less specific, as usually randomly sheared genomic DNA is used to generate RNAi libraries (line 687 - 689). Since the pre-print is now published, we have replaced the pre-print publication with the peer-reviewed one.

      Reviewer #2 (Public Review):

      We would like to thank reviewer #2 for helpful comments and would like to respond to those as follows:

      1) “Line 482 - the authors wrote 'As expected, the proportion of cells showing a motility phenotype in the IFT88 targeted L. infantum population decreased further' Why is this result expected? Presumably, this is due to the fact that cells without a functional IFT system lack flagella and grow slower so can be outcompeted by faster-growing mutants. This speaks to the major caveat highlighted by the authors in the discussion and the final small-scale screen. In a population of cells, those with deleterious mutations in an essential gene or one whose disruption results in slower growth will be outcompeted by cells in which a non-deleterious mutation has occurred, which feeds into the issue of timing.”

      As the reviewer highlighted himself, deleterious mutations that result in slower growth will be outcompeted by cells in which a non-deleterious mutation has occurred. We have stated that the complete deletion of IFT88 in Leishmania mexicana has been shown to have reduced doubling time (Beneke et al., PLoS Pathogens 2019) and are therefore most likely outcompeted from the pool (line 529 – 532 and 767 - 769).

      2) “The authors show with CRK3 this process of non-deleterious mutants outcompeting deleterious mutants does result in a detectable drop in the number of parasites with specific CRK3 guides but not in those with IFT88. Is this due to the fact that the outgrowth of the non-deleterious IFT88 mutants occurs rapidly or that the mutation of the targets in IFT88 was ineffective? The data presented in Figure 5 shows that for some species at least a mutation of the IFT88 gene was possible. This might mean that for certain genes the outgrowth occurs within the first 12 days after transfections so will not be seen using this approach, without a wider study, which is beyond the scope of this manuscript it will be difficult to know.”

      As we stated in our discussion, we did not test IFT88 guides individually in L. mexicana. Therefore, the editing rate observed for the IFT88 guides in L. major and L. infantum (Fig. 5) may differ from the editing rate in L. mexicana, which is the species we used for the pooled transfection screen. It is therefore difficult to conclude why IFT88 was not depleted from the pool. This may be due to lower guide activity in L. mexicana or rapid selection of non-deleterious mutations (line 769 - 774). We are therefore planning to further optimize our system by streamlining the editing efficiency and eliminating species-specifics effects (line 735 - 745). As the reviewer highlighted, this is beyond the scope of this study.

      However, the reviewer raises a fair point about the exact timing of isolating DNA from pools, which might influence when exactly parasites with a deleterious mutation are depleted from the pool. This may differ between guides and may even be gene specific. We have added this point to our discussion (776 - 780).

      3) “The authors highlight that this base editing approach will leave potentially functional regions of the NT of proteins, which is true and may mean genes are missed. However, this may also provide extra information about the protein's function/domain structure if STOP codons in certain positions showed an effect on function whereas those in others don't.”

      We thank reviewer #2 for pointing out that functional parts of truncated proteins following base editing may actually allow to draw additional conclusions. We have included this in the manuscript (681 - 683).

    1. Author Response

      Reviewer #1 (Public Review):

      This umbrella review aims to synthesize the results of systematic reviews of the impact of the COVID-19 pandemic on various dimensions of cancer care from prevention to treatment. This is a challenging endeavor given the diversity of outcomes that can be assessed in cancer care.

      Search and review methods are good and are in line with recommendations for umbrella reviews. Perhaps one weakness of the search strategy was that only one database (Pubmed) was searched. The search strategy appears adequate, though perhaps some more search terms related to reviews and cancer could have been included. It is therefore possible that some reviews may have been missed by the search strategy.

      It is challenging to perform a good umbrella review that yields novel insights, as it is difficult to combine results from different reviews which themselves combine results from different studies with different methodologies. However, I think perhaps one of the main weaknesses of this study is that it is not clear to me what is the core objective of the umbrella review, and how analyses relate to that core objective. In other words, I do not understand based on the introduction what new information the authors are hoping to learn from their umbrella review that could not be learned from reading the individual systematic reviews, beyond a vague objective of "synthesizing" the literature. Because of this, it is not very clear to me how the data extracted and the analysis fits into the larger objectives, and what the new knowledge generated by this review is. Based on the reported results, it would appear that one of the main goals is to assess the quality of systematic reviews and of the underlying studies in the reviews, but it is hard to tell. I think there are potentially important insights this review could tell us, but the message and implications of current evidence remain for me a little confused in the current manuscript.

      We thank the reviewer for the encouraging remarks on our work, and for the useful feedback. We have now addressed all concerns as outline below.

      Reviewer #2 (Public Review):

      This umbrella review summarizes the results of systematic reviews about the impact of the COVID-19 pandemic on cancer care. PRISMA checklist is used for reporting. The literature search was performed in PubMed and systematic reviews published until November 29th, 2022 were included. The quality of included systematic reviews was appraised using the AMSTAR-2 tool and data were reported descriptively due to the high heterogeneity of 45 included studies. Based on the results of this paper, regardless of the low quality of included evidence, COVID-19 affected cancer care in many ways including delay and postponement of cancer screening, diagnosis, and treatment. Also, patients with cancer had been affected psychologically, socially, and financially during the COVID-19 pandemic.

      The main limitation of the current study is that the authors have searched only one database, which might have missed some relevant systematic reviews. Also, most of the included reviews in this paper had low and medium methodological quality.

      We thank the reviewer for this excellent remark. Guideline on umbrella reviews suggest PubMed, reference screening and an additional bibliographic database for an optimal database combination for searching systematic reviews (Goossen K et al. 2020). To follow the guidelines, and considering the specialized focused on COVID-19, in addition to Pubmed and reference screening, we also performed a search in the WHO COVID-19 Database. Furthermore, we revised the search strategy in Pubmed to include mesh terms. The search was performed by a specialized librarian with experiences in systematic review searches. Overall, we retrieve 485 new references, and found 6 new studies that met out inclusion criteria to be included in final analysis. We have now revised the manuscript to reflect the above changes, and also highlighted this as a strength of our work. In addition, we added the new detailed search strategy in the supplemental material.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe in the nematode C. elegans the effects of perturbed organization of Intermediate filaments (IFs), which form the cytoskeleton of animal cells together with actin filaments. They focus on a previously identified mutant of the kinase SMA-5, which when mutated leads to disorganized IF structure in intestinal cells of C. elegans. The authors found that the phenotypes caused by the mutated SMA-5 kinase concerning gut morphology and animal health can be reversed by removing IF network components such as the protein IFB-2. This finding is extended to other components of the IF network, which also display a certain degree of sma-5 phenotype alleviation when depleted.

      Strength:

      The finding that suppressing the intestinal phenotypes caused in sma-5 mutants can be suppressed by removing functional IF components is an interesting observation. It confirms a previous study showing that bbln-1 mutation-caused IF phenotypes can be suppressed by depleting IFB-2.

      Weakness:

      1) The finding of suppressing the intestinal phenotypes caused in sma-5 mutants can be considered a minor conceptual advancement. However, the study comes short of providing insight into the molecular processes of how deranged IF networks and its consequence can be rescued/suppressed by removing e.g. the IFB-2 filaments. Many statements concerning the relationship between SMA-5 and the IFs are based on assumptions. The study requires protein biochemical analysis to show whether SMA-5 phosphorylates the IF proteins - mainly the IFB-2 polypeptide. The relationship between SMA-5 / IFB-2 is a central aspect of this study but the main conclusions are based on the notion that IFB-2 and other IF proteins may be phosphorylated by SMA-5. Mutating putative phosphorylation sites of IFB-2 without having shown any proof that the modification occurs by SMA-5 is futile. This important open question needs to be addressed. And will allow statements whether the ifb-2(kc20) mutant allele-encoded shorter IFB-2 protein lacks phosphorylation or not.

      We have addressed the major concern of the Reviewer by performing phosphorylation analyses of IFB-2 showing that loss of SMA-5 induces phosphorylation of multiple sites throughout the IFB-2 molecule. The results are presented in new Figs. 5 and S5.

      2) No quantification of the morphological defects such as using fluorescent-labeled IF proteins as in previous studies is provided in the manuscript. The EM pictures are not sufficient to provide information on how often the IF network perturbations and morphology defects occur. Also, the rescue of the actual morphological gut defects was not quantified. The assessment of development time and arrest, body length, lifespan, oxidative stress resistance, and others should be related to intestinal tube defects. They are useful and important but are an indirect measure of intestine defects and rescue.

      We provide the requested data on IF localization and intestinal morphology in new Figs. S2 and S3, respectively.

      3) It is not clear how exactly the mutant ifb-2 allele kc20 was identified. In the Materials and methods section, the authors provide information on the specific primers for the ifb-2 locus. But how did they know that the mutation lies within this region? Was there mutation mapping or whole-genome sequencing applied?

      The requested information is included in the revised Result section (first paragraph).

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, the authors use an embedding of human olfactory perceptual data within a graph neural network (which they term principal odor map, or POM). This embedding is a better predictor of a diverse set of olfactory neural and behavior data than methods that use chemical features as a starting point to create embeddings. The embedding is also seen to be better for comparison of pairwise similarities (distances of various sorts) - the claim is that proximity of pairs of odors in the POM is predictive of their similarity in neural data from olfactory receptor neurons.

      A major strength of the paper is the conceptualization of the problem. The authors have previously described a graph neural net (GNN) to predict verbal odor descriptors from molecular features (here, a 2019 preprint is cited, but a newer related one in 2022 describing the POM is not cited). They now use the embedding created by that GNN to predict similarities in large and diverse datasets in olfactory neuroscience (which the authors have curated from published work). They show that predictions from POM are better than just generic chemical features. The authors also present an interesting hypothesis that the underlying latent structure discovered by the GNN relates to metabolic pathway proximity, which they claim accounts for the success in the prediction of a wide range of data (insect sensory neuron responses to human behavior). In addition to the creativity of the project, the technical aspects, are sound and thorough.

      There are some questions about the ideas, and the size of the effects observed.

      1) The authors frame the manuscript by invoking an analogy to other senses, and how naturalstatistics affect what's represented (and how similarity is defined). However, in vision or audition, the part of the world that different animals "look at" can be very different (different wavelengths, different textures and spatial frequencies, etc). It is still unresolved why any given animal has the particular range of reception it has. Each animal is presumably adapted for its ecological niche, which can have different salient sensory features. In vision, different animals pick different sound bandwidths or EM spectra. Therefore, it is puzzling to think that all animals will somehow treat chemicals the same way.

      Our assumption (an assumption of the broader interpretation, not of the analyses themselves) that all terrestrial animals have a correlated odor environment is certainly only true for some values of “correlated”. One could imagine, for example, that some animals are able to exploit food energy sources that humans cannot (for example, plants with high cellulose content), and that they might therefore be adapted to smell metabolic signatures of such plants, whereas humans would not be so adapted. This seems quite reasonable and there are probably many such examples. In future work they might be used to test the theory directly: representations might be more likely to differ across species on tasks when the relevant ecological niches are non-overlapping. We have updated the discussion to propose such future tests. However, it is also apparent that the odor environment overall is nonetheless highly correlated across species. Recent work (Mayhew et al, PNAS) showed that nearly all molecules that pass simple mass transport requirements (that should apply to all mammals, at the least) are likely to have an odor to humans, so it seems unlikely that the “olfactory blind spots” are intrinsically large.

      2) The performance index could be made clearer, and perhaps raw numbers shown beforeshowing the differences from the benchmark (Mordred molecular descriptor). For example, can we get a sense of how much variance in the data does it explain, what percent of the hold-out tests does it fit well, etc.?

      The performance index in Figure 1 is required to compare across different types of tasks, which are in turn dictated by the nature of the data (e.g. continuous vs categorical). Regression tasks yields an R2 value and categorical tasks yield an AUROC. We normalized and placed these on a single scale in order to show all of the tasks clearly together. We have added a table to the shared code (from link in Methods section, go to predictive_performance/data/dataset_performance_index_raw.csv) that shows the original (non-normalized) values, for both the POM and the benchmark(s) across multiple seeds and various metrics with the model hyper-parameters that generate the best performance.

      3) The "fitting" and predictions are in line with how ML is used for classification and regression inlots of applications. The end result is a better fit (prediction), but it's not actually clear whether there are any fundamental regularities or orders identified. The metabolic angle is very intriguing, but it looks like Mordred descriptor does a very good job as well (extended figure 5 [now Figure 2-figure supplement 5]). Is it possible to show the relation between metabolic distance and Mordred distance in Figure 2c? In fact, even there, cFP distance looks very well correlated with metabolic distance (we are talking about r= 0.9 vs r = 0.8). This could simply be due to a slightly nonlinear mapping between chemical similarity and perceptual similarity (which was used to get POM distance).

      We show additional “showdown” comparisons between metabolic distance, POM distance, and alternative distance metrics in the new Figure 2-figure supplement 3 and Figure 2-figure supplement 4. Indeed, the Mordred descriptors perform well; after all, metabolic reactants and products must be at least somewhat structurally related. But POM (derived only from human perceptual data) outperforms it significantly. Visual inspection of Figure 2c also reveals that the dispersion of structural distances (at a given metabolic distance) is just much higher than the dispersion of POM distances. This won’t change if one uses a non-linear curve fit, as it is a property of the data itself.

      It’s also worth noting while r=0.8 and r=0.9 might seem close, in terms of variance unexplained (1 - r2) they are approximately two-fold different. Reducing the unexplained variance by half seems like a meaningful difference. Alternatively, if one simulates scatter plots with correlation r=0.8 vs r=0.9, it is apparent that the latter is simply a much tighter relationship.

      4) How frequent are such examples shown in Fig 2d? Pentenal and pentenol are actually verysimilar in many ways, and it may be that Tanimoto distance is not a great descriptor of chemical similarity. cFP edit distance is quite small, just like metabolic distance. The thiol example on the right is much better. Also, even in Fig 2C POM vs metabolic distance, the lowest metabolic distances have large variations in the POM values - so there too, metabolic reactions that create very different molecules in 1 step can vary widely in POM distance as well.

      We agree that Tanimoto distance is not perfect. We were unable to find a measure of structural distance that agreed with human intuitions about “structural distance” in all cases; indeed that intuition is often generated by an understanding of odor/flavor characteristics of function in metabolic networks, which would beg the question! To answer the question about the frequency of examples like the ones shown in Figure 2d, we created a new density map (Figure 2-figure supplement 4) showing the number of one-step metabolite pairs for a given range of POM vs cFP edit/Tanimoto distance. We found >25 pairs of metabolites in the same “small POM distance” and “large structural distance” quadrant from which we found the original examples shown in Figure 2d..

      5) A major worry is that Mordred descriptors are doing fine, and POM offers only a smallimprovement (but statistically significant of course). Another way to ask this question is this: if you plot pairwise correlation/distance of pairs of odors from POM against that for Mordred, how correlated does this look? My suspicion is that it will be highly correlated.

      It will look highly correlated (as shown in the new Figure 2-figure supplement 3). The reason is that metabolic reactions cannot make arbitrary transformations to molecules (the reactants must have some structural relationship to the products) or similarly that olfactory receptors (in any species) cannot have arbitrary tuning – at the end of the day receptors mostly bind to similar-looking classes of molecules. As stated above, we believe that the improvement here is not just statistically significant but meaningful – a 2-fold drop in unexplained variance is large – and that it is important to identify principles by which the nervous system can be tuned, above and beyond the physical constraints imposed by basic rules of chemistry.

      Also, the metabolic distances that we constructed from available data are themselves noisy, since not all metabolic pathways and the compounds that compose them are known, which places an upper bound on the correlation that we could have obtained. Despite that, we still found a correlation of r>0.9.

      6) The co-occurrence in mixtures and close POM distance may arise from the way theembedding was done - with perceptual descriptors used as a key variable. Humans may just classify molecules that occur in a mixture as similar just from experiencing them together. Can the authors show that these same molecules in Fig 4d,e have very similar representations in neural data from insects or mice?

      We have added a new Figure 4-figure supplement 1 to show this. One constraint is that the neural datasets must contain molecules that are also in the natural substance datasets used in Figure 4. In all cases where the data is sufficient to be powered to test the hypothesis (i.e. more than five co-occuring pairs of molecules in essential oil), we observe an effect in the predicted direction.

    1. Author Response

      Reviewer #1 (Public Review):

      This work focuses on the characterization of neutralizing antibodies from humans survivors of SNV and ANDV hantavirus infections, including the mapping of epitopes located in the Gn and/or Gc glycoproteins, and their mechanism of viral interference blocking receptor binding or membrane fusion. It also confirms previous data on broadly neutralizing epitopes allowing inhibition of different hantavirus species. The work covers for the first time in vivo evidence of cross-protection against HNTV infection by a broadly neutralizing antibody prepared from SNV infection using a prophylaxis animal model and compares the data with protection from ANDV lethal challenge using ANDV-specific neutralizing antibodies. The work provides valuable information for the development of therapeutic measures that cross-protect against several hantavirus species which seems a promising strategy to rise pharmaceutical interest against a group of viruses causing orphan disease.

      The strength of the work is based on the impressive amount of work and versatility of methods to identify residues involved in the binding and/or escape from seven different neutralizing antibody clones that allow for important conclusions on species-specific antigenic regions and confirm data on a region that seems broadly conserved among different hantavirus species. At the same time, the weakness of the work is that data processing does not allow for readers data analysis (Figs. 1b, 2a, 2c, Ext. Data Fig. 4).

      The authors clearly achieve their aim of characterizing the antigenic sites of neutralizing antibodies. Yet, the presented data on binding to ANDV mutant constructs and negative-staining EM does not allow for the conclusion that the epitope of the broadly neutralizing antibodies ANDV-44 and SNV-53 involved the Gn capping loop. An alternative explanation of the escape mutations in the Gn capping loop could be produced by an allosteric effect on the Gc fusion loop region, and a role in structuring the Gc fusion loop has been previously demonstrated (References 7 and 9). In addition, it is not clear why SNV-24 has no broad neutralizing activity although escape mutations occurred at the highly conserved residues K833 and D822 in Gc domain I.

      . . . it would be important to show viral RNA levels in lungs and kidneys in the lethal ANDV animal model (Fig. 7) to allow for comparison with the prophylaxis from HTNV infection (Fig. 6).

      ANDV does not necessarily cause significant viremia but this challenge model does allow detection of substantial virus load in organs. To monitor virus in organs, a separate animal study would be required with serial euthanasia. All treated animals survived and were kept until day 28. The previous study (DOI: 10.1016/j.celrep.2021.109086) demonstrated that virus was not detected in animals that survived until day 28. Here, we would have to perform another ABSL3 animal experiment with euthanasia and harvest organs at the expected peak for viral replication to confirm this finding. We do not believe repeating such a study is justified at this point, since the key endpoint for the experiment here is survival, and the study provided clear results. Increasing the number of animals in study in order to euthanize a subset in order to collect organs on a specific day makes more sense in a drug discovery effort where a candidate drug is not expected to protect the animals but might have some impact on the virologic endpoint only (e.g., reduce viremia in blood or organs). Thus, we do not believe repeated studies are justified to obtain this additional confirmatory data point.

    1. Author Response

      Reviewer #1 (Public Review):

      Collins et al use mesoscopic two-photon imaging to simultaneously record activity from basal forebrain cholinergic or noradrenergic axons in several distant regions of the dorsal cortex during spontaneous behavior in head-fixed awake mice. They find that activity in axons from both neuromodulatory systems is closely correlated with measures of behavioral state, such as whisking, locomotion and face movements. While axons were globally correlated with these behavioral state-related metrics across the dorsal cortex, they also find evidence of behavioral state independent heterogenous signals.

      The use of simultaneous multiarea optical recordings across a large extent of dorsal cortex with single axon resolution for studying the coherence of neuromodulatory afferents across cortical areas is novel and addresses important questions regarding neuromodulation in the neocortex. The manuscript is clearly written, the data is well presented and, for the most part, carefully analyzed. Parts of the manuscript confirm previous results on the influence of behavioral state on norepinephrine and acetylcholine cortical afferents. However, the observation that these modulations are globally broadcasted to the dorsal cortex while behavioral state independent heterogenous signals are also present in these axons is novel and important for the field.

      While the evidence for a behavioral state driven global modulation of activity in both neuromodulatory systems is quite clear, I have concerns that the apparent heterogeneity in axonal responses might be driven by movement-induced artifacts. Moreover, even in the case that the heterogeneity in calcium activity across axons is confirmed, it might not be driven by differences in spiking activity across neuromodulatory axons as concluded, but by other mechanisms that are not explicitly discussed or considered.

      1) Motion artifacts are always a concern when imaging from small structures in behaving animals. This issue is addressed in the manuscript in Fig 2A-C by comparing axonal responses to "autofluorescent blebs that did not have calcium-dependent activity" (line 1011). Still, as calcium-dependent activity and motion artifacts can both be locked to behavioral variables the "bleb" selection criterion seems biased and flawed with a circular logic. "Blebs" presenting motion-induced changes in fluorescence that may pass as neural activity will be wrongly excluded when from the "bleb" control group using this criterion. This will result in an underestimation of the extent of the contamination of the GCaMP signals by movement-induced artifacts. This potential confound might generate apparent heterogeneity across axons and regions as some axons and some cortical areas might be more prone to movements artifacts than others.

      Thank you for the suggestion. We agree that motion artifacts are a reasonable concern. We rigorously addressed this concern by introducing non-calcium-dependent mCherry into cholinergic cortical axons and demonstrating that motion cannot explain our results (see Fig. 2F, Fig. 4H,L,P, Fig. 4 - figure supplement 1G, Video 3, and response above). These axons were chosen for analysis based solely on their ability to be imaged, in a manner identical to that of GCaMP6s containing axons.

      We agree that the observed evidence of heterogeneity is not as clear as the evidence of a common signal. We now carefully present our evidence. Heterogeneity may arise from variations in activity between single axons that is not explained by a common signal such as behavioral state. Heterogeneity could also be signaled by variations in correlated activity between axons. We now address these two possibilities in our manuscript. Our new analysis reveals that the correlated activity between axons is as expected for axons that are variably correlated to a common signal, such as behavioral state. Although we do find some evidence of correlation outside this common signal, we are not able to discern if this is related to imaging axon segments that are part of the same axon, or if it truly represents an independent signal. This is now stated in the text. On the other hand, strong variations in axonal activity from trial to trial that appear to be separate from the common signal is also prevalent. We now point out this variation as a possible source of heterogeneity. Since we do not know the source or meaning of this heterogeneous activity, we discuss only the possibility that it may hold behaviorally relevant information in these modulatory systems.

      2) In the case that the heterogeneity is indeed due to differences in calcium activity, it might be not due to modularity in spiking activity within the LC or the BF as interpreted and discussed in the manuscript. As calcium signaling in axons not only relates to spiking activity but can also reflect presynaptic modulations, the observed heterogeneity might be due to local action of presynaptic modulators in a context of global identical broadcasted activity. The current dataset does not allow distinguishing which of the two different mechanisms underlies the observed signal heterogeneity.

      It is true that our data set is unable to determine whether presynaptic modulations contribute to any observed heterogeneity. We have adjusted our interpretation of heterogeneity throughout the manuscript and have specifically addressed this comment in the discussion by presenting the possibility that a global signal could be locally modulated.

      Reviewer #3 (Public Review):

      Acetylcholine and Norepinephrine are two of the most powerful neuromodulators in the CNS. Recently developments of new methods allow monitoring of the dynamic changes in the activity of these agents in the brain in vivo. Here the authors explore the relationship between the dynamic changes in behavioral states and those of ACh and NE in the cortex. Since neuromodulatory systems cover most of the cortical tissue, it is essential to be able to monitor the activity of these systems in many cortical areas simultaneously. This is a daunting task because the axons releasing NE and ACh are very thin. To my knowledge, this study is the first to use mesoscopic imaging over a wide range of the cortex at the single axon resolution in awake animals. They find that almost any observable change in behavioral state is accompanied by a transient change in the activity of cortical ACh and NE axonal segments. Whisking is significantly correlated with ACh and NE. The authors also explore the spatial pattern of activity of ACh and NE axons over the dorsal cortex and find that most of the dynamics is synchronous over a wide spatial scale. They look for deviation from this pattern (which I will discuss later). Lastly, the authors monitor the activity of cortical interneurons capable of releasing ACh.

      Comments:

      1) On a broad overview, I find the discussion of behavioral states, brain states, and neuromodulation states quite confusing. To begin with, I am not convinced by the statement that "brain states or behavioral states change on a moment-to-moment basis." I find that the division of brain activity into microstates (e.g., microarousal) is counterproductive. After all, at the extreme, going along this path, we might eventually have an extremely high dimensional space of all neuronal activity, and any change in any neuron would define a new brain state. Similarly, mice can walk without whisking, can whisk without walking, can walk and whisk, are all these different behavioral states? And if so, are they all associated with different brain states? And if so, are they all associated with different brain states? Most importantly, in the context of this manuscript, one would expect that different states (brain, behavior) would be associated with at least four potential states of the ACh x NE system (high ACh and High NE, High ACh and Low NE, etc.). However, the reported findings indicate that the two systems are highly synchronized (or at least correlated), and both transiently go on with any change from a passive state to an active state. Therefore, the manuscript describes a rather confined relationship of the neuromodulation systems with the rather rich potential of brain and behavioral states. Of course, this is only my viewpoint, and the authors are not obliged to accept it, but they should recognize that the viewpoint they take for granted is not shared by all and consider acknowledging it in the manuscript.

      We thank this reviewer for this thoughtful comment. While it is clear that animals do in fact exhibit distinct and clear brain and behavioral states (e.g. sleep, waking, grooming, still, walking, etc.), it is beyond the scope of the present manuscript to attempt to tackle this complex field - rather, we refer the reader to a recent review that we have published on this important topic (McCormick, Nestvogel, and He 2020). We agree that properly delineating brain and behavioral states is of great importance, as it could significantly impact experimental design and interpretation of results. Since all of the relevant substates that a mouse may exhibit have not yet been determined, we decided to use changes in whisking and walking behaviors to differentiate between distinct behavioral states owing to: 1) historical use of these measures in behavioral and neural states in head-fixed mice, 2) relative ease of measurement of these variables, 3) a clearly observable relationship with cholinergic and noradrenergic activity with these measures of behavior, and, arguably most importantly, 4) assumed relevance to the animal (Musall et al. 2019; Reimer et al. 2016; Salkoff et al. 2020; Stringer et al. 2019).

      Our manuscript seeks to simply relate the activity of cholinergic and noradrenergic axons across the dorsal surface of the cortex in comparison to these commonly used measures of spontaneous behavior in head-fixed mice to discern to what relative degree there are common, global signals in these two modulatory systems and how they relate to changes in the measured behaviors. Somewhat surprisingly, previous studies have found that neural activity throughout the dorsal cortex of mice is strongly related to movements of the face and body as well as behavioral arousal (Stringer et al. 2019; Musall et al. 2019; Salkoff et al. 2020). Here we determine to what degree these commonly used measures of “state” are already reflected in the GCaMP6s activity of cholinergic and noradrenergic axons (and local cortical interneurons).

      We agree with the interpretation that our results suggest a confined relationship between spontaneous cholinergic and noradrenergic activity in the cortex within the spontaneous behaviors that we observe. We, by no means, mean to suggest that this confined relationship is the only relationship cholinergic and noradrenergic systems exhibit to each other or to behavior. It seems very likely that in the wide variety of behavior exhibited by freely moving mice in their lifetime, there are times in which the activity of cholinergic and noradrenergic systems exhibit a radically different relationship to each other and to behavior. We simply cannot know this without experimental examination. We now mention this possibility in the discussion and give a few appropriate references.

      2) Most of the manuscript (bar one case) reports nearly identical dynamics of ACh and NE. Is that a principle? What makes these systems behave so similarly? Why have two systems that act nearly the same? Still, if there is a difference, it is the time scale of the ACh compared to the NE. Can the authors explain this difference or speculate what drives it?

      Perhaps one of the most striking findings in recent years from examination of mouse brain activity is the prominence and prevalence of a general signal in nearly all neural systems that relates to movement and arousal of the animal (Stringer et al. 2019; Salkoff et al. 2020). Here we report that this signal is also strongly present within the cholinergic and noradrenergic systems. Perhaps this is unsurprising, since everywhere one looks, one finds this global signal. However, we feel that understanding the presence and nature of this large signal is critical to deciphering behavior-related signals in these systems in the future. We discuss this point in the discussion. The one difference we did find is in the more transient nature of NE axonal activity versus both behavior and cholinergic axon activity. We now speculate on this difference in the discussion.

      3) Whisker activity explains most strongly the neuromodulators dynamics, but pupil dilation almost does not (in contrast to many previous reports including reports of the same authors). If I am not mistaken, this was nearly ignored in the presentation of the results and the discussion section. Could the author elaborate more on what is the reason for this discrepancy?

      We apologize for the misleading presentation of our results. In Fig. 3C and D it is clear that pupil diameter is highly coherent with both cholinergic and noradrenergic axon activity, as published previously. In the present study, this coherence peaks at 0.4 to 0.5 for both. In our previous study (Reimer et al. 2016), the cholinergic activity also peaked in coherence at low frequencies at around 0.4 to 0.5 (Reimer et al., Fig. 1H) while the noradrenergic activity coherence peaked at 0.6 to 0.7. The present study was not optimized for pupil diameter examination, since we kept the light levels as low as possible (resulting in low dynamic range of pupil dilations since they were nearly always enlarged to near maximum) in order to increase the S/N of cortical axon activity. We now mention these similarities and differences and caveats in the manuscript. An additional important point is that the kinetics of pupil diameter changes are slow in comparison to whisker movements, reducing the ability of pupil dilation to accurately track changes in axonal activity at frequencies greater than approximately 0.2 Hz (Fig. 2 - figure supplement 2). This is now mentioned in the text.

      4) I find the question of homogenous vs. heterogenous signaling of both the ACh and NE systems quite important. It is one thing if the two systems just broadcast "one bit" information to the whole brain or if there are neuromodulation signals that are confined in space and are uncorrelated with the global signal. However, the way the analysis of this question is presented in the manuscript is very difficult to follow, and eventually, the take-home message is unclear. The discussion section indicates that the results support that beyond a global synchronized signal, there is a significant amount of heterogeneous activity. I think this question could benefit from further analysis. I suggest trying to demonstrate more specific examples of axonal ROIs where their activity is decorrelated with the global signal, test how consistent this property is (for those ROIs), and find a behavioral parameter that it predicts.

      Also, in the discussion part, I am missing a discussion of the potential mechanism that allows this heterogeneity. On the one hand, an area may receive NE/ACh innervation from different BF/LC neurons, which are not completely synchronized. But those neurons also innervate other areas, so what is the expected eventual pattern? Also, do the results support neuromodulation control by local interneuron circuits targeting the axons (as is the case with dopaminergic axons in the Basal Ganglia)?

      Our results clearly demonstrate a robust global signal that is common across cholinergic and noradrenergic axons which is related to behavioral state. We have less strong, but still present, evidence for a heterogeneous signal in addition to this global signal. This evidence is based largely upon the large variation in activities in different axon segments during behavioral events that appear similar. This result suggests that the axon segments we monitored do not all act as if they are members of the same axon. We now discuss the strong evidence for the global signal present in our data, and leave open the possibility of a heterogeneous signal whose mechanisms and importance remains to be determined.

      5) The axonal signal seems to be very similar across the cortex. I am not sure this is technically possible, but given that NE axons are thin and non-myelinated and taking advantage of the mesoscopic scale, could the author find any clue for the propagation of the signal on the rostral to caudal axis?

      We were unable to detect propagation across the cortical sheet and believe this is beyond the scope of the present study.

      6) While the section about local VCIN is consistent with the story, it is somehow a sidetrack and ends the manuscript on the wrong note. I leave it to the authors to decide but recommend them to reconsider if and where to include it. Unfortunately, the figure attached was on a very poor resolution, and I could not look into the details, so I am afraid that I could not review this section properly.

      We believe this adds to the manuscript and therefore have decided to include this data.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors aim to identify the cell state dynamics and molecular mechanisms underlying melanocyte regeneration in zebrafish. By analyzing thousands of single-cell transcriptomes over regeneration in both wild-type and Kit mutant animals, they provide thorough and convincing evidence of (1) two paths to melanocyte regeneration and (2) that Kit signaling, via the RAS/MAPK pathway, is a key regulator of this process. Finally, the authors suggest that another proliferative subpopulation cells, expressing markers of a separate pigment cell type, constitute an additional population of progenitors with the ability to contribute to melanocytes. The data supporting this claim are not as convincing, and the authors failed to show that these cells did indeed differentiate into melanocytes. Despite the challenges of describing this third cell state, this study offers compelling new findings on the mechanisms of melanocyte regeneration and provides paths forward to understanding why some animals lack this capacity.

      The majority of the main conclusions are well supported by the data, but one claim, in particular, should be revisited by the authors.

      (1) Provided evidence that the aox5(hi)mitfa(lo) population of cells contributes to melanocyte regeneration is inconclusive and somewhat circumstantial. First, the transcriptional profiles of these cells are much more consistent with the xanthophore lineage. Indeed, xanthophores have been shown to express mitfa (in embryos in Parichy, et al. 2003 (PMID: 10862741), and in post-embryonic cells in Saunders, et al. 2019). Second, while the authors address this possibility in Supplemental figure 7 by showing that interstripe xanthophores fail to divide following melanocyte ablation, they fail to account for the stripe-resident xanthophores/xanthoblasts. The presence and dynamics of aox5+ stripe-resident xanthophores/xanthoblasts are detailed in McMenamin, et al., 2014 (PMID: 25170046) and Eom, et al., 2015 (PMID: 26701906). Without direct evidence that the symmetrically-dividing, aox5+ cells measured in this study do indeed differentiate into melanocytes, it is more likely that these cells are a dividing population of xanthophores/xanthoblasts. The authors should revise their claims accordingly.

      We agree with the editor and reviewers that the identities of the mitfa+aox5hi cells and the interplay between these cells and the mitfa+aox5lo cells is a fascinating, and originally unexpected, aspect of this manuscript. The issue, as we see it, is whether mitfa+aox5hi cells that arise via cell division during regeneration are multipotent pigment cell progenitors or ‘cryptic’ xanthophores. The experiments we have performed to address this ambiguity have not worked for technical reasons, so we have tempered text in the relevant Results and Discussion sections to leave both options open. We have backed off from calling these cells progenitors but have included additional data showing that they (i.e. the mitfa+aox5hi subpopulation of cells that we believe are daughters of mitfa+aox5hi cycling cells) express multiple markers associated with multipotent pigment cell progenitors that have been characterized in developing zebrafish. Our expanded Discussion is as follows:

      “Heterogeneity may also be evident by the additional mitfa+aox5hi G2/M adj subpopulation that likely arises via cell divisions during regeneration. There are reasons to think that this could be a progenitor subpopulation. Firstly, these cells arose in response to specific ablation of melanocytes. Secondly, this subpopulation expresses markers that are associated with multipotent pigment progenitors cells found during development (Budi, et al., 2011; Saunders, et al., 2019). Thirdly, although this subpopulation expresses aox5 and some other markers associated with xanthophores, we showed that differentiated xanthophores are not ablated by the melanocyte-ablating drug neocuproine and this mitfa+aox5hi subpopulation does not make new pigmented xanthophores following neocuproine treatment. However, current observations cannot definitively determine the potency and fates adopted by these cells. One possibility is that these cells are indeed progenitors that arise through cell divisions, are in an as yet undefined way lineally related to MP-0 and MP-1 subpopulations, and ultimately give rise to new melanocytes during additional rounds of regeneration. Given their expression of markers associated with multipotent pigment cell progenitors, these cells could be multipotent but fated toward the melanocyte lineage following melanocyte-specific ablation. However, we cannot exclude the possibility that these cells are another cell type. For example, there is a type of partially differentiated xanthophores that populate adult melanocyte stripes (McMenamin, et al., 2014). At least some of these cells arise from embryonic xanthophores that transitioned through a cryptic and proliferative state (McMenamin, et al., 2014). That the descendants remain partially differentiated could indicate that they are in more of a xanthoblast state and maintain proliferative capacity (Eom, et al., 2015). It is possible that some or all of the cells in question are melanocyte stripe-resident, partially-differentiated xanthophores that arise: a) from cell divisions that are triggered by loss of interactions with melanocytes or, b) simply to fill space that is vacated due to melanocyte death. Such causes for partially-differentiated xanthophore divisions have not been documented, but nonetheless this possibility must be considered given the mitfa and aox5 expression and proliferative potential of these cells. Transcriptional profiles of ‘cryptic’ xanthophores are not available to help clarify the nature of these cells. Lastly, the relationship between adult progenitor populations – MP-0, MP-1 and, potentially, mitfa+aox5hi G2/M adj – and other progenitors present at earlier developmental stages is unclear and could be defined through additional long-term lineage tracing studies. In particular, previous examinations of pigment cell progenitors in developing zebrafish have identified dorsal root ganglion-associated pigment cell progenitors in larvae that contribute to adult pigmentation patterns (Singh, et al., 2016; Dooley, et al., 2013; Budi, et al., 2011). It is possible that these cells give rise to the adult progenitors we have identified. The further alignment of cell types that have been observed in vivo and cell subpopulations defined through expression profiling is a necessary route for understanding the complex relationship between stem and progenitor cells in development, homeostasis, and regeneration.”

      (1) At line 140, it is noted that Xanthophores are pteridine-producing, but they also get their yellow color from carotenoids (especially in adults). This should be noted as well, especially since the authors display the xanthophore marker, scarb1, which plays a key role in xanthophore carotenoid coloration.

      [Mapping expression levels onto UMAP space for scarb1 and perhaps other markers of xan, irid, or proliferation would be helpful as a supplement to the dot plot in Fig 1 and could help to clarify the transcriptomic signature of mitfa+ aox5-hi cells and plausibility of the model that they are an McSC population. -Parichy]

      We thank the reviewer for the suggestion, and we have changed the text to include the carotenoid coloration facts of xanthophores as follows:

      “aox5 is expressed in differentiated xanthophores, a pteridine- and carotenoid-producing pigment cell type of zebrafish, and in some undifferentiated pigment progenitor cells”

      Additionally, we have also added a new Figure Supplement to Figure 1 (Figure 1 – figure supplement 3) with feature plots demonstrating the expression of xanthophore markers scarb1 and bco2b, iridophore markers lypc and cdh11, and proliferation markers pcna and mki67. As noted above, there is some heterogeneity within the large grouping of mitfa+aox5hi cells. Whereas some markers associated with xanthophores are broadly expressed in this grouping (e.g. scarb1), others have more restricted expression (e.g. bco2b). The heterogeneity could reflect multiple differentiation states of xanthophores, multiple types of differentiated xanthophores, xanthophore progenitors and/or less fate-restricted pigment cell progenitors that cluster in this grouping.

      (2) The authors should provide the list of genes that comprise their cluster signatures (line 252) as part of the supplementary tables.

      We have now included a table of genes in the cluster signatures. The Supplementary Table is called “Supplementary File 2.”

      (3) The authors should more clearly describe how they performed lineage tracing (line 339). Additionally, for the corresponding figure 4E, the authors should list the number of cells traced. The source data only contains calculated percentages rather than counts for each type of differentiation. My understanding is that the number listed in the figure legend is the number of fish (i.e. n = 4), but this should be clarified as well.

      [A supplementary figure of labeled cells is important here with enough context to show that cells can be re-identified unambiguously. Additionally note that "lineage tracing" will typically be assumed to mean single-cell labeling and tracking, so if that is not the case for these experiments it would be preferable to use an alternative descriptor. -Parichy]

      We have included additional detail in our revised manuscript. In Figure 4E we now include the number of cells imaged and have included a breakdown of the raw numbers in the Source Data. We have also included Supplementary Animations as examples of the single-cell tracing that we perform through serial imaging.

      Additionally, the point about using ‘lineage tracing’ is well taken. We have replaced this with ‘serial imaging’ through the text.

      (4) Line 321, the authors list the mean regeneration percentages for the kita and kitlga(lf) mutants, but these differences are not significantly different according to Figure 4B. By listing the means (which should be noted), the authors seem to be highlighting the differences but then do not comment on them. The description and integration of this result into the main text should be clarified.

      We have changed the wording in the text to clarify that the mean percentage is being listed. We have also reworded the text to de-emphasize the mean percentage difference between kita(lf) and kitlga(lf) mutants, instead highlighting that their defects are similar. In the figure legend we have clarified that the mean percentage regeneration is being shown.

      (5) In Figure 6E, the RNA-velocity result is not particularly consistent with the authors' claims. Visually, the arrows seem fairly randomly directed. The data in 6B, showing gene expression associated with the S phase and G2/M phase much more clearly convey the directionality of the loop (S phase, followed by G2/M). I suggest that the authors weaken their claim about the RNA-velocity result or remove it altogether and focus on the cell cycle-related gene expression signatures.

      We thank the reviewer for their careful eye here. We have decided to remove the RNA-velocity result previously displayed in Figure 6E. As the reviewer points out the results are more clearly demonstrated by Figure 6B.

    1. Author Response

      Reviewer #1 (Public Review):

      This study addresses the role of the general transcription factor TBP (TATA-binding protein), a subunit of the TFIID complex, in RNA polymerase II transcription. While TBP has been described as a key component of protein complexes involved in transcription by all three RNA polymerases, several previous studies on TBP loss of function and on the function of its TRF2 and TRF3 paralogues have questioned its essential role in RNA polymerase II transcription. This new study uses auxin induced TBP degradation in mouse ES cells to provide strong evidence that its loss does not affect ongoing polymerase II transcription or heat-shock and retinoic acid-induced transcription activation, but severely inhibits polymerase III transcription. The authors coupled TBP degradation with TRF2 knock out to show that it does not account for the residual TBP-independent transcription. Rather the study provides evidence that TFIID can assemble and is recruited to promoters in the absence of TBP.

      All together the study provides compelling evidence for TBP-independent polymerase II transcription, but a better characterization of the residual TFIID complex and recruitment of other general transcription factors to promoters would strengthen the conclusions.

      We thank the reviewer for their accurate summary of our findings and the public assessment of our manuscript.

      Reviewer #2 (Public Review):

      The paper is intriguing, but to me, a main weakness is that the imaging experiments are done with overexpressed protein. Another is that the different results for the different subunits of TFIID would indicate that there are multiple forms of TFIID in the nucleus, which no one has observed/proposed before. Otherwise, the experimental data would have to be interpreted in a more nuance way. Additionally, there is no real model of how a TBP-depleted TFIID would recruit Pol II. Do the authors suggest that when TBP is present, it is not playing a role in Pol II transcription, despite being at all promoters? Or that in its absence an alternative mechanism takes over? In the latter case, are they proposing that it is just based on the rest of TFIID? How? The authors do not provide a mechanistic explanation of what is actually happening and how Pol II is being recruited to promoters.

      We thank the reviewer for their public review of our manuscript. Although the reviewer poses many interesting questions raised from our findings, they would be a great focus for future directions.

      We agree that our imaging experiments using over-expressed constructs have limitations. Though they provide insight that is unique and orthogonal to the genomics analyses, we agree that they are still preliminary, and therefore we have removed them from the manuscript, with the hope of further developing these experiments into a follow-up manuscript.

      While we cannot exclude different forms of TFIID in the cell, previous studies have identified different TAF-containing complexes. Indeed, we referenced several of these studies in our manuscript, including TFTC/SAGA. Furthermore, in our Discussion section, we speculated how a large multi-subunit complex like TFIID may not behave as a monolith but rather have distinct dynamics/behavior among the subunits. Some studies are now revealing that biochemically defined complexes behave more as a hub, with subunits having distinct dynamics coming in and out of the complex, but in a way such that a snapshot at any given time would show a stably formed complex.

      What TBP does for Pol II is an intriguing question, and one that we had thought we could answer with our rapid depletion system. One possibility is that Pol II initiation has evolved to have so many redundant mechanisms such that removal of one factor (TBP) would not disrupt the whole system. And yet, TBP remains a highly essential gene (perhaps mostly for its essential role in Pol III transcription), and therefore, its binding to Pol II gene promoters has been maintained, almost in a vestigial way. Of course, this is speculative, and our rapid depletion system only shows us that TBP is not required for Pol II transcription, not what it does when it binds to promoters.

      Lastly, we believe that our study tested 3 potential mechanisms that could explain TBP-independence for Pol II transcription. 1) We tested the possibility that TBP is only needed for induction and not for subsequent re-initiation. We provide evidence using two orthogonal induction systems that this is not the case. 2) We tested whether the TRF2 paralog could functionally replace TBP, and show that this is also not the case. 3) We show that TFIID can form in the absence of TBP. While we agree that there are more mechanisms to test, addressing all of them would require a re-examination of over 50 years of research that would not be feasible to report in a single manuscript, especially for a system as complex as Pol II initiation.

      Reviewer #3 (Public Review):

      In this study, the authors set out to study the requirement of the TATA binding protein (TBP) in transcription initiation in mESCs. To this end they used an auxin inducible degradation (AID) system. They report that by using the AID-TBP system after auxin degradation, 10-20% of TBP protein is remaining in mESCs. The authors claim that as, the observed 80-90% decrease of TBP levels are not accompanied by global changes in RNA polymerase II (Pol II) chromatin occupancy or nascent mRNA levels, TBP is not required for Pol II transcription. In contrast, they find that under similar TBP-depletion conditions tRNA transcription and Pol III chromatin occupancy were impaired. The authors also asked whether the mouse TBP paralogue, TBPL1 (also called TRF2) could functionally replace TBP, but they find that it does not. From these and additional experiments the authors conclude that redundant mechanisms may exist in which TBP-independent TFIID like complexes may function in Pol II transcription.

      The major strengths of this manuscript are the numerous genome-wide investigations, such as many different CUT&Tag experiments, and NET-seq experiments under control and +auxin conditions and their analyses. Weaknesses lie in some experimental setups (i.e. overexpression of Halo-tagged TAFs), mainly in the overinterpretation (or misinterpretation) of the data and in the lack of a fair discussion of the obtained data in comparison to observations described in the literature. As a result, very often the interpretation of data does not fully support the conclusions. Nevertheless, the findings that 80-90% decrease in cellular TBP levels do not have a major effect on Pol II transcription are interesting, but the manuscript needs some tuning down of many of the authors' very strong conclusions, correcting several weaker points and with a more careful and eventually more interesting Discussion.

      We thank the reviewer for their public review of our manuscript. We would like to add that, in addition to testing the TBP paralog for redundancy, we also tested a mechanism in which TBP would be required for the initial round of transcription but not for subsequent ones. We show that data from orthogonal experiments that this mechanism is not the case. As in our response to Reviewer 2, we agree that our over-expression imaging experiments are still somewhat preliminary, and therefore we have removed these experiments and potential over/misinterpretation of these results from the manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      Dominant pathogenic variants of the Aac2/Ant1 ATP transporter cause disease by an unknown mechanism. In this manuscript the authors aim to reveal how these gain of function mutants impair cellular and mitochondrial health. To characterize the phenotype of Aac2 mutants in yeast, the authors use a series of single and double Aac2 mutations, within the 2nd and 3rd transmembrane domains that are associated with human diseases. Aac2A128P,A137D mutant, which caused high toxicity and damaged the mitochondrial DNA was selected for further analysis. This mutant was not imported efficiently into mitochondria and exhibited an increased association with TOM, suggesting that it clogs the TOM translocase. As a result, expression of Aac2A128P,A137D led to impaired import of other mitochondrial proteins. Several findings suggested that the single mutant Aac2A128P impaired mitochondrial import in a similar manner: 1. Mass spec analysis revealed its increased association with cytosolic chaperones, TOM and TIM22 subunits, 2. Aac2A128P overexpression led to global mitochondrial protein import deficiency, demonstrated by HSP60 precursor accumulation and activation of stress responses (transcription of chaperons, proteosome induction, and CIS1). Parallel mutants of human Ant1 (AntA114P and Ant1A114P,A123D) were ectopically expressed in HeLa cells. The mutants were demonstrated to clog TOM and cause a global defect in mitochondrial protein import. This was confirmed in tissues from Ant1A114P,A123D/+ knock-in mice. The Ant1A114P,A123D/+ mice exhibited decreased maximal mitochondrial respiration in muscles. Examination of the skeletal muscle myofiber diameter and COX and SDH activity revealed that Ant1A114P,A123D expression in heterozygous mice acts dominantly and causes a myopathic phenotype and in some case neurodegeneration.

      Major strengths -

      The ability of proteins to clog TOM and sequentially disrupt protein import into mitochondria was demonstrated in recent years. However, till now this was achieved using chemicals, artificial cloggers and overexpression of mitochondrial proteins. This study reveals, for the first time, that disease associated variants of native mitochondrial proteins can clog the entry into the organelle. Thus, this work demonstrates that TOM clogging is a physiological relevant phenomenon that is involved in human diseases.

      The manuscript is well-written and the experiments are well-designed, presenting convincing data that mostly support the conclusions. The methods used are well-establish and suitable techniques that are often used in the field. This work took advantage of 3 different biological systems/model organism, yeast, cell culture, and mice tissues, to validate the results, show conservation, and exploit the strengths of each system.

      Overall, this study is impactful, greatly contributes to the field and should be of interest to the general scientific community. The work sheds light of the mechanisms by which Ant1 pathogenic mutants impact cellular health and provides evidence for the involvement of translocases clogging and impaired protein import in human diseases. The gain of function Aac2/Ant1 mutants will provide a new and powerful tool for future studies of mitochondrial quality control and repair mechanisms.

      Major weaknesses -

      1) The evidence for clogging of mitochondrial translocases and for general defect in protein import are solid. However, there are not enough evidence to conclude that all phenotype seen in mice and yeast are directly connected to clogging.

      We completely agree with the reviewer that it is unreasonable to ascribe all phenotypes seen in mice and yeast directly to clogging. We are very open to the possibility that other unknown mechanisms contribute as well. The language in the manuscript has been modified to reflect this.

      2) This work implies that Aac2/Ant1 variants can clog TOM, TIM22, or both. Clogging of TIM22 is novel and interesting but is not fully discussed in the manuscript, as well as the possibility that clogging of different translocases can result in different defects.

      We thank the reviewer for this comment, and have directly addressed this in the revised manuscript. We added some speculation but overall, we prefer to keep this brief because the precise mechanism of carrier protein import and IMM insertion by the TIM22 complex remains unresolved, making an extensive discussion on its clogging premature.

    1. Author Responses

      Reviewer #1 (Public Review):

      This work aimed at investigating how a BMI decoding performance is impacted by changing the conditions under which a motor task is performed. They recorded motor cortical activity using multielectrode arrays in two monkeys executing a finger flexion and extension task in four conditions: normal (no load, neutral wrist position), loaded (manipulandum attached to springs or rubber bands to resist flexion), wrist (no load, flexed wrist position) or both (loaded and flexed wrist). They found, as expected, that BMI decoders trained and tested on data sets collected during the same conditions performed better at predicting kinematics and muscle activity than others trained and tested across conditions. They also report that the performance of monkeys a BMI task involving the online control of a virtual hand was almost unaffected by changing either the actual manipulandum conditions as above or switching between decoders trained from data collected under different conditions. As for the neuronal activity, they found a mix of changes across task contexts. Interestingly, a principal component analysis revealed that activity in each context falls within well-aligned manifolds, and that the context-dependent variance in neuronal activity strongly correlated to the amplitude of muscle activity.

      Strengths

      The current study expands on previous findings about BMI decoders generalizability and contributes scientifically in at least three important ways.

      First, their results are obtained from monkeys performing a fine finger control task with up to two degrees of freedom. This provides a powerful setting to investigate fine motor control of the hand in primates. The authors use the accuracy of BMI decoders between data sets as a measure of stationarity in the neuronsto-fingers mapping, which provides a reliable assessment. They show that changes in wrist angle or finger load affect the relationship between cortical neurons and otherwise identical movements. Interestingly, this result holds up for both kinematics and muscle activity predictions, albeit being stronger for the latter.

      Second, their results confirming that neuronal activity recorded during different task conditions lies effectively within a common manifold is interesting. It supports prior observations, but in the specific context of finger movements.

      Third, the dPCA results provide interesting and perhaps unexpected information about the fact that amplitude of muscle activity (or force) is clearly present in the motor cortical activity. This is possibly one of the most interesting findings because extracting a component from neural activity that can related robustly to muscle activity across context would provide great benefits to the development of BMIs for functional electrical stimulation.

      Overall, the analyses are well designed and the interpretation of the results is sound.

      Weaknesses

      I found the discussion about the possible reasons why offline decoders are more sensitive to context than online decoders very interesting. Nonetheless, as the authors recognize, the possibility that the BMI itself causes a change in context, "in the plant", limits their interpretation. It could mean for the monkeys to switch from one suboptimal decoder to another, causing a ceiling effect occluding generalization errors.

      Overall, several new and original results were obtained through these experiments and analyses. Nonetheless, I found it difficult to extract a clear unique and strong take-home message. The study comes short of proposing a new way to improve BMIs generalizability or precisely identifying factors that influence decoders generalizability.

      We thank the reviewer for the positive comments. Relating these results to BMI design and interpreting the adaptation to contexts during online trials comprised a bulk of the essential revisions from the eLife editorial staff. More details can be found in common response #2 and essential revisions #1-3. To summarize, we added an analysis of neural activity during online trials to provide insight into how the monkeys were adapting. We have expanded the discussion of online adaptation, as detailed in essential revision #2. We also expanded discussion of how both the online and offline results might affect BMI design, as detailed in essential revision #3.

      Reviewer #2 (Public Review):

      The authors motivate this study by the medical need to develop brain-machine interfaces (BMIs) to restore lost arm and hand function, for example through functional electrical stimulation. More specifically, they are interested in developing BMI decoding algorithms that work across a variety of "contexts" that a BMI user would encounter out in the real world, for example having their hand in different postures and manipulating a variety of objects. They note that in different contexts, the motor cortex neural activity patterns that produce the desired muscle outputs may change (including neurons' specific relationship to different muscles' activations), which could render a static decoder trained in a different context inaccurate.

      To test whether this potential challenge is indeed the case, this study tested BMI control of virtual (onscreen) fingers by two rhesus macaques trained to perform 1 or 2 degree-of-freedom non-grasping tasks either by moving their fingers, or just controlling the virtual finger kinematics with neural activity. The key experimental manipulations were context shifts in the form of springs on the fingers or flexion of the wrist (or both). BMI performance was then evaluated when these context changes were present, which builds on this group's previous demonstration of accurate finger BMI without any context shifts.

      The study convincingly shows the aforementioned context shifts do cause large changes in measured firing rates. When neural decoding accuracy (for both muscle and position/velocity) is evaluated across these context changes, reconstruction accuracy is substantially impaired. The headline finding, however, is that that despite this, BMI performance is, on aggregate, not substantially reduced. Although: it is noteworthy that in a second experiment paradigm where the decoder was trained on the spring or wrist-manipulated context and tested in a normal context, there were quite large performance reductions in several datasets as quantified by multiple performance measures; this asymmetry in the results is not really explored much further. The changes in neural activity due to context shifts appear to be relatively modest in magnitude and can be fit well as simple linear shifts (in the neural state space), and the authors posit that this would make it feasible (in future work) to find context-invariant neural readouts that would result in more robust muscle activity decoders.

      An additional novel contribution of this study is showing that these motor cortical signals support quite accurately decode muscle activations during non-prehensile finger movements (and also that the EMG decoding was more negatively affected by context shifts than kinematics decoding); previous work decoded finger kinematics but not these kinetics. Note that this was demonstrated with just one of the two monkeys (the second did not have muscle recordings).

      This is a rigorous study, its main results are well-supported, and it does not make major claims beyond what the data support.

      One of its limitations is that while the eventual motivating goal is to show that decoders are robust across a variety of tasks of daily living, only two specific types of context shifts are tested here, and they are relatively simple and potentially do not result in as strong a neural change as could be encountered in realworld context shifts. This is by no means a major flaw (simplifying experimental preparations are a standard and prudent way to make progress). But the study could point this out a bit more prominently that their results do not preclude that more challenging context shifts will be encountered by BMI users, and this study in its current form does not indicate how strong a perturbation the tested context shifts are relative to the full possible range of hand movement context shifts that would be encountered during human daily living activities.

      A second limitation is that while the discrepancy between large offline decoding performance reduction and small online performance reduction are attributed to rapid sensorimotor adaptation, this process is not directly examined in any detail.

      Third, the assessment of how neural dynamics change in a way that preserves the overall shape of the dynamics is rather qualitative rather than quantitative, and that this implementation of a more contextagnostic finger BMI is left for future work.

      We thank the reviewer for the positive comments. We agree that the paper could discuss how this work impacts a wider range of movements and we now include more discussion to that point as detailed in the responses to feedback below. We also acknowledge that the paper did not directly examine online adaptation and we have now included an analysis aimed at answering how the monkeys adapted to the context changes during online tasks.

      Reviewer #3 (Public Review):

      In this manuscript the authors ask whether finger movements in non-human primates can be predicted from neural activity recorded from the primary motor cortex. This question is driven by an ultimate goal of using neural decoding to create brain-computer interfaces that can restore upper limb function using prosthetics or functional electrical stimulation systems. More specifically, since functional use of the hand (real or prosthetic) will ultimately require generating very different grasp forces for different objects, these experiments use a constant set of finger kinematics, but introduce different force requirements for the finger muscles using several different techniques. Under these different conditions (contexts), the study examines how population neural activity changed and uses decoder analyses to look at how these different contexts affect offline predictions of muscle forces and finger kinematics, as well as the animals' ability to use different decoders to control 1 or 2-DOF online. In general, the study found that when linear models were trained on one context from offline data, they did not generalize well to the other context. However, when performance was tested online (monkeys controlling a virtual hand in real time using neural activity related to movement of their own hands) with a ReFIT Kalman filter, the animals were able to complete the task effectively, even with a decoder trained without the springs or wrist perturbation. The authors show data to support the idea that neural activity was constrained to the same manifold in the different contexts, which enabled the animals to rapidly change their behavior to achieve the task goals, compared to the more complex requirement of having to learn entirely new patterns of neural activity. This work takes studies that have been conducted for upper-limb movements and extends them to include hand grasp, which is important for creating decoders for brain-computer interfaces. Finally, the authors show using dPCA can extract features during changes in context that may be related to the activity of specific muscles that would allow for improved decoders.

      Strengths

      The issue of hand control, and how it compares to arm control, is an important question to tackle in sensorimotor control and in the development of brain-computer interfaces. Interestingly, the experiments use two very different ways of changing the muscle force requirements for achieving the same finger movements; springs attached to a manipulandum and changes in wrist posture. Using both paradigms the decoder analysis clearly shows that linear models trained without any manipulation do not predict muscle forces or finger kinematics well, clearly illustrating the limitations of common linear decoders to generalize to scenarios that might encompass real grasping activities that require forceful interactions. Using a welldescribed real-time decoder (ReFIT Kalman Filter), the authors show that this performance decrease observed offline is easily overcome in online testing. The metrics used to make these claims are welldescribed, and the likely explanations for these findings are described well. A particular strength of this manuscript is that, at least for these relatively simple movements and contexts, a component of neural activity (identified using dPCA) is identified that is significantly modulated by the task context in a way that sensibly represents the changes in muscle activity that would be required to complete the task in the new contexts. We thank the reviewer for the positive comments.

      Weaknesses

      The differences between exemplar data sets and comprehensively tested contexts was difficult to follow. There are many references to how many datasets or trials were used for a particular experiment, but overall, this is fragmented across the manuscript. As a result, it is difficult to assess how generalizable the results of the manuscript were across time or animal, or whether day-to-day variations, or the different data collection schedules had an effect.

      Thank you for the comment, we have added in the number of sessions in results in multiple places throughout the paper. For example, starting line 274 in the results:

      "During these 10 sessions the context changes were tested 15 times: four times for the wrist context, seven times for the spring context, and four times for the combined wrist and spring context."

      The introduction allocates a lot of space to discussing the concepts of generating (computing) movements as opposed to representing movements and relates this to ideas of neural dynamics. The distinction between these as described in the introduction is not very clear, nor is it clear what specific hypothesis this leads to for these experiments. Further, this line of thinking is not returned to in the discussion, so the contribution of these experiments to ideas raised in the introduction are unclear.

      Thank you for the comment, we have written a new paragraph relating these results to the concept of generating movement. Starting line 452 of the discussion:

      "During the offline tasks, many channels changed neural activity with context, with 20.9% to 61.7% of tuned SBP channels modulating activity with context (Table I). The magnitude of these shifts were relatively small, especially when compared to the large changes in required muscle activation (Figure 2D-E), with weak trends to require greater activation for resisted flexion and lesser for assisted extension (Figure 7B-C). Additionally, the neural manifolds underlying movements in each context were well-aligned (Figure 7D). Using dPCA we found that while a large proportion of neural variance was explained by dPCA components that did not change with context, a significant proportion of the neural variance is associated with components that are context-dependent (Figure 8B). Visually, the context components are shifting the trajectories without changing the overall shape and the shift in neural activity is strongly correlated with muscle activations in new contexts (Figure 8C). This agrees with other studies which found lower variance activity may be related to the actual motor commands (Gallego et al., 2018; Russo et al., 2018; Saxena et al., 2022)."

      The complexity of the control that was possible in this task (1 or 2 DOF finger flexion/extension) was low. Further, the manipulations that were used to control context were simple and static. Both these factors likely contribute to the finding that there was little change in the principal angles of the high-variance principal components. While this is not a criticism of the specific results presented here, the simplicity of the task and contexts, contrasted with the complexity of hand control more generally, especially for even moderately dexterous movements, makes it unclear how well the finding of stable manifolds will scale. On a related point, it is unclear whether the feature, identified using dPCA, that could account for changes in muscle activity, could be robustly captured in more realistic behaviors. It is stated that future work is needed, but at this point, the value of identifying this feature is highly speculative.

      Thank you for the comment, we have included more discussion to relate these results to decoder development in general as described in essential revision #3 from the editor.

      The maintained control in online BMI trials could also be explained by another factor, which I don't think was explicitly described by either of the two suggestions. Prism goggle experiments introduce a visual shift can be learned quickly, and some BCI experiments have introduced simple rotations in the decoder output (e.g. Chase et. al. 2012, J Neurophys). This latter case is likely similar in concept to in-manifold perturbations. Regardless, the performance can be rapidly rescued by simply re-aiming, which is a simple behavioral adaptation. In a 1DOF or 2DOF control case like used in these experiments, with constant visual feedback on performance, the change in context could likely be rapidly learned by the animals, maybe even within a single trial. In other words, the high performance in the online case may be a consequence of the relatively simple task demands, and the simple biomechanical solution to this problem (push harder). What is the expectation that the results seen in these experiments would be relevant to more realistic situations that require grasp and interaction?

      Thank you for the suggestion, we agree that the quick adaptation is likely related to re-aiming. To this end, we have included a re-aiming analysis, as described in essential revisions #1 and #2 from the editor and common response #2, to look into the quick adjustment.

      Some of the figures were difficult to read and the captions contained some minor incorrect information. The primary purpose of some of the figures was not immediately clear from the caption. For example, the bar plots in Figures 5 and 6 were very small and difficult to read. This also made distinguishing the data from the two different animals challenging.

      Thank you for the comments, multiple figures have been edited to increase legibility and a review of text has been done to fix errors and improve interpretability.

      There is no specific quantification of the data in Figures 4D and 5D. In Figure 4D it seems apparent that the vast majority of the points are below the unity line. But, it remains unclear, particularly in Figure 5D whether the correlations between the two contexts truly are different or not in a way that would allow conclusive statements.

      Thank you for the comments, Figure 4D has been moved to the supplement and 5D has now been replaced by figures analyzing the neural activity patterns during the online task.

    1. Author Response

      Reviewer #1 (Public Review):

      This is thorough, quantitative microbial ecology research on one of the most important problems of species coexistence in infection biology. The intermediate disturbance hypothesis is supported once again, and they show unsurprisingly that nutrition matters for their ratio of coexistence, but more specifically as a novel function of the ratio of metabolic fueling to reproductive rate, which the authors term absolute growth. I like this study for its care and completeness even though the results are fairly intuitive to those in the field of cystic fibrosis microbial ecology.

      We would like to thank the reviewer for acknowledging the importance, care, and completeness of our original manuscript. We have continued to employ our standards of rigor for this revision.

      Reviewer #2 (Public Review):

      The authors present a manuscript that addresses an important topic of bacterial co-existence. Specifically modeling infection-relevant scenarios to determine how two highly antibiotic-resistant pathogens will develop over time. Understanding how such organisms can persist and tolerate therapeutic interventions has important consequences for the design of future treatment strategies.

      We would like to thank the reviewer for acknowledging the importance of our work.

      A major strength of this paper is the methodical approach taken to assess the dynamics between the two bacterial species. Using carbon sources to regulate growth to test different community structures provides a level of control to be able to directly assess the impact of one dominant pathogen over another.

      The modeling aspect of this manuscript provides a basis for testing other disturbances and/or the impact of additional incoming pathogens. This could easily be applied to other infection settings where multiple microbes are observed ( for example viral/bacterial interactions in the lung).

      Thank you for acknowledging the rigor in our experimental and modeling approaches.

      The authors clearly show that by altering the growth rate and metabolism of various carbon sources, population structure can be modified, with one out-competing the other. Both modeling and experimental approaches support this.

      The exploration of the role of virulence factors is less clear, for example how strains unable to produce virulence factors are impacted in regard to their overall growth and whether S. aureus is able to sense virulence factors without transcriptional assays here. Although the hypothesis is strong, the experimental data does not fully support this conclusion.

      In addressing your comments below, we hope that we have increased your confidence in our hypotheses presented in our manuscript as it pertains to the involvement of virulence factors.

      Spatial disturbance has a significant impact on community structure. Although using one approach to assess this, it is not clear if the spatial structure is impacted without the comparable microscopy evaluation.

      We have indeed acknowledged this short coming in our revised manuscript. In the discussion, we write:

      “While we did not explicitly quantify spatial organization experimentally owing to technical limitations of our microplate reader and microscope setups, in theory, co-culture in an undisturbed condition should facilitate the creation of spatial organization.”

      In fact, we would really like to be able to track the position of each bacterium during shaking events. However, the plate reader cannot accommodate a microscope setup. While we could remove the plate from the plate reader and transport it to the microscope (two floors down), we cannot be certain that the position of the bacterium would not be altered during transport. We have thought about fixing the bacterium in place prior to transport. However, the injection of liquid for the purposes of fixation would likely alter the positioning of bacteria. Thus, we chose a modeling approach using an agent based model that is parametrized based on our experimental approach. Accordingly, we agree that this is a limitation of our current study. We hope that acknowledging this limitation in the discussion sits well with the reviewer.

      Overall this paper highlights the use of modeling approaches in combination with wet lab experiments to predict microbial interactions in changing environments.

      Reviewer #3 (Public Review):

      This is an intriguing manuscript with a rigorous experimental and computational methodology looking at the interaction of Pseudomonas aeruginosa (Pa) and Staphylococcus aureus (Sa). These two pathogens frequently co-habit infections but in standard liquid media often show a winner-take-all outcome. This study seeks to be mechanistically predictive as to the outcome of the co-culture based on the addition of specific carbon sources as filtered through the lens of metabolic efficiency or, as the authors term - absolute growth. Overall, the study is sound, but there are some specific caveats that I would like to present:

      We would like to thank the reviewer for acknowledging the rigor of our work.

      1) The study undersells the knowledge in the literature of what allows or prohibits the stability of the Pa and Sa co-cultures. While most of the correct papers are cited, the outcomes of those studies are downplayed in favor of the current predictive study. While the current study is indeed more "predictive", it strays exceedingly far from an infection-relevant media, whereas other studies show reasonable co-existence in host-relevant media.

      We have addressed this comment two different ways. First, we have included an entire paragraph in the discussion that acknowledges previous work and how our results fit into previous findings. We write:

      “Given the clinical importance of co-infection with both P. aeruginosa and S. aureus, multiple previous studies have identified mechanisms of co-existence. Indeed, long term co-existence of both species can result in physiological changes that reduce their competitive interactions. Strains of P. aeruginosa isolated from patients that enter into a mucoid state show reduced production of siderophores, pyocyanin, rhamnolipids and HQNO, which facilitates the survival of S. aureus [23, 24]. These strains can also overproduce the polysaccharide alginate, which in itself is sufficient to decrease the production of these virulence factors. Moreover, exogenously supplied alginate can reduce the production of pyoverdine and expression from the PQS quorum sensing system, which is responsible for the production of HQNO [25]. Changes in the physiology of S. aureus can also facilitate co-existence. Strains of S. aureus isolated from patients with cystic fibrosis show multiple changes in the abundance of proteins including super oxide dismutase, the GroEL chaperone protein, and multiple surface associated proteins [26]. Interestingly, the majority of proteins that show changes in abundance in S. aureus are related to central metabolism, which is consistent with our findings demonstrating that metabolism can influence the co-existence of both species. While it is unclear as to how long-term co-culture would affect the ratio of absolute growth, our findings provide an additional mechanism that can determine the co-existence of these bacterial species.”

      Second, as noted in our response in the ‘essential revisions’ section, we have tested the relationship between the final density ratio and the absolute growth ratio in SCFM medium, which we believe is host relevant. Our findings were fully consistent with the trends that we saw in our original submission. This data is presented in Fig. 3 and Figure 5 – figure supplement 3.

      2) The major weakness in the ability of this study to be extrapolatable to infection conditions is the basal media selected for this analysis. The authors choose TSB, which is an incredibly rich media from the start, and proceed to alter only 11% of the available carbon (per mass) with their carbon source manipulations. This suggests an underappreciation for the amino acid metabolism routes of these two pathogens that are taking advantage of the roughly 89% of carbon as amino acid content in the TSB components of tryptone and soytone (17g and 3g, respectively vs the 2.5g carbon source). There are a few major issues with this basal formulation:

      a) Comparison to all extant literature on Pa - The media historically used to assess Pa include (rich) LB, BHI, MH; (minimal) MOPS, M63, M9; (host-associated) ASM, SCFM, SCFM2, Serum, and DMEM. TSB is not a historically evaluated formulation for Pa (though it is often for non-mammalian pathogenic Pseudomonads and environmental species). Thus, this study is not inherently integrated into the Pa literature and presents an offshoot study for which a direct connection to extant literature is difficult. Explicitly testing these predictions in the most minimal media possible and then in a host-relevant model would be optimal.

      We would truly like to thank the reviewer for their rigor in reviewing our manuscript. We, admittedly, overlooked how amino acids might be influencing the growth of P. aeruginosa in TSB medium. We originally chose TSB medium as previous studies that have examined the co-culture of S. aureus and P. aeruginosa, or their mechanisms of interaction, have used this medium (e.g., [29-34]).

      To address this comment directly, we grew co-cultures in AMM minimal medium. This medium, to our knowledge, is the only minimal medium that allows growth of S. aureus. We, and others, have not reported growth of S. aureus in M9 or MOPS minimal medium despite the addition of components such as casamino acids and increases in the concentration of thiamine.

      While AMM as reported is quite complex relative to media such as MOPS and M9, we removed several vitamins (nicotinic acid, thiamine, calcium pantothenate, biotin), decreased the concentration of some salts, used a low concentration of casamino acids (0.01%), and used a higher concentration of carbon source (0.04%). In doing so, we hoped to reduce any ‘background effect’ of media components and thus absolute growth could be driven more by carbon source.

      Importantly, in using AMM medium, we continue to find a strong and significant relationship between the final density ratio and the absolute growth ratio. This data is presented in the Figure 3 and is described in a standalone paragraph in the results, along with our findings using SCFM.

      b) TSB is not remotely host-relevant. The Whiteley lab has done monumental work evaluating in vitro models that mimic human infection (scrupulously matching transcriptomes) and TSB is about as far as you can get. Thus, the ability to extrapolate from the current work to infection without testing in host-relevant media is limited.

      As noted above, we repeated our core experimental analysis in SCFM. The results are fully consistent with our original submission. This data is presented Figure 3 and in Figure 5- figure supplement 3.

      c) The experimental situation has a component that is both good and bad- O2 tension. By overlaying with mineral oil, the authors immediately bias Staph (a more versatile fermenter) to success, whereas Pa deals with most of these carbon sources better at body level or higher O2 levels. The benefit of this is that many of the infection sites in which these two species co-occur are low in O2.

      This was an interesting observation that we have partially addressed experimentally and acknowledged in the discussion.

      First, we acknowledged the limitations of our experimental approach as it pertains to O2 levels in the discussion as follows:

      “We note that our findings may be relevant to infections occurring in both high and low O2 environments. While P. aeruginosa is limited in its ability to perform fermentation [35], we have provided evidence that the absolute growth ratio can affect community composition in both aerobic (Figures 2-5) and more anaerobic environments (Figure 2 - figure supplement 1, panel H). The limited ability of P. aeruginosa to grow in anaerobic environments was apparent in SCFM as we could not obtain reliable or robustly quantifiable growth of this bacteria when succinate or -ketoglutarate was provided as a carbon source.”

      Second, we tested the effect of placing mineral oil over top of the co-culture experiments, thus increasing the anaerobic nature of the environment. We found that, in general, as the ratio of absolute growth increased, so did the dominance of P. aeruginosa in the growth medium. This new data is presented in Figure 2 - figure supplement 1, panel H.

      Taken together, we hope that these two modifications meet the Reviewer’s expectations.

      d) Some of the tested metabolites are osmotically active (sucrose), while others are not (acetate), confounding the interpretation of what absolute metabolism means in the context of this study since the concentrations of all tested metabolites vary from above to below physiologic-dependent on the metabolite. A much better approach would have been to vary a single metabolite or combination to alter 'absolute metabolism' and test whether the stability of the co-culture held.

      e) The manuscript never goes into the fact that for some of these "the carbon source" sources, they are catabolite repressed compared to the basal TSB amino acids (or not). Both organisms show exquisite catabolite repression control, yet this is not addressed at all within the text of the manuscript. Since this response in both organisms is sensitive to relative proportions of the various C-sources, failure to vary C-sources or compare utilization compared to the massive excess tryptone and soytone in the media makes the 'absolute metabolism' difficult to interpret.

      To address comments d and e, and to acknowledge the potential limitations of our findings, we have included the following in the discussion. In this paragraph, we acknowledge the osmotic activity of the different carbon sources and preferential consumption of amino acids in TSB medium.

      “One drawback of our approach in using different carbon sources to manipulate absolute growth is that some carbon sources are osmotically active, whereas others are not, which could have additional physiological effects on the bacteria outside of changing growth and metabolism. Moreover, both species of bacteria have different carbon source preferences; as above S. aureus tends to prefer carbon sources such as glucose [36] whereas P. aeruginosa prefers organic and amino acids [37]. Given the carbon source preferences of each species, in complex medium such as TSB, there is the potential that P. aeruginosa consumes amino acids prior to consuming the supplied carbon source. This is perhaps less of a concern in AMM medium or SCFM where the concentration of amino acids and additional nutrient components is reduced as compared to TSB medium. Along this line, it is certainly worth investigating how each nutrient component and its ordered utilization by both species contributes to changes in absolute growth. Minor or transient changes in absolute growth owing to preferential nutrient consumption may provide windows of opportunity for one species to increase its relative density to the other.”

      f) The authors left out the 'favorite' sources of Pa that are known to be relevant in vivo - the TCA intermediates: citrate, succinate, fumarate (and directly relevant to host-pathogen interactions, itaconate)

      We have included the analysis of succinate as a carbon source in both TSB medium (Figs. 1 and 2) and AMM medium (Fig. 3). However, we could not achieve reliable or a quantifiable growth rate of P. aeruginosa in SCFM medium supplemented with succinate in our experimental setup. Accordingly, this carbon source was not used in SCFM.

      3) Statistics: Most of the experiments presented are comparisons in which there are more than two experimental groups and the t-tests employed therefore need to be corrected for multiple comparisons. The standard way to do this is to employ an ANOVA with the appropriate multiple-comparison-corrected post-test. These appear to be appropriate for Dunnett's post-testing but the comparator group is not directly defined within the figure legends. Multiple comparison testing is critical for this analysis, as the H0 is that all are the same - the more samples potentially pulled from the same distribution will result in a higher likelihood that one or more will appear as from a distinct population (i.e. H0 rejected). Multiple comparisons correct for this and are absolutely critical for the evaluation of the data presented in this manuscript.

      We have addressed this comment two different ways.

      First, where there was a clear control group, we performed either a Dunnett’s (for normally distributed data) or a Dunn’s (for non-parametric data sets) following either an ANOVA or Kruskal-Wallis, respectively. These tests were applied to the data presented in Figure 2B, 5H (top and bottom panels) and in Figure 2 - figure supplement 1, panels K-L.

      Second, we did not broadly perform multiple comparisons across all data sets. The reason is that this approach would test the significance of relationships that are not relevant to the central premise of the manuscript. For example, a multiple comparison for figure 1B would test the growth rate of all carbon sources against all carbon sources. However, we are only interested if S. aureus or P. aeruginosa grows faster than one another. However, we do understand the need for a corrected P value to reduce the occurrence of Type 1 errors. To accomplish this, we applied a Benjamini-Hochberg Procedure [38] with a 8.5% discovery rate to all P values in the manuscript, including those that tested the distribution of data. This reduced the P value to indicate significance at < 0.0472. We have updated all claims and indications of significance in the figures based on this adjusted P value.

      4) The authors missed including Alves et Maddocks 2018 in relation to priority effects and other contributing factors to stable Pa/Sa co-culture.

      We have indeed included this manuscript and its findings in the introduction where we write:

      “While S. aureus can initially aid in the establishment of the P. aeruginosa population [8], production of N-acetylglucosamine from S. aureus augments…..”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present data identifying the role of the bacterial enhancer binding protein (bEBP) SypG in the regulation of the Qrr1 small RNA, which is known to be a key regulator of Vibrio fischeri bioluminescence production and squid colonization. Previously, only the bEBP LuxO was known to activate Qrr1 expression. LuxO and Qrr1 are conserved in the Vibrionaceae, and the authors show that SypG is conserved in ~half of the Vibrio family, suggesting that this Qrr1 regulatory OR gate controlled by LuxO or SypG may play important roles in physiology processes in other species.

      Successful squid colonization by Vibrio fischeri is a complex process, known to be influenced by several factors, including the formation of and dispersal from cellular aggregates prior to entering squid pores, and inoculation of the light organ crypts, and biofilm formation within the crypts. Previously, it was shown that strains lacking qrr1 were at a deficit for crypt colonization in the presence of wild-type V. fischeri. Conversely, cells lacking binK, which encodes a hybrid histidine kinase, were at an advantage for crypt colonization in the presence of wild-type cells. However, the authors identified BinK as a negative regulator of Qrr1 expression in a transposon screen. The authors used genetic epistasis experiments and found that Qrr1 transcription can be activated by either phosphorylated LuxO at low cell densities (in the absence of quorum sensing signals) or by SypG, presumably by binding to the two upstream activation sequences in the promoter of qrr1 to activate transcription by the required alternative sigma factor sigma-54. The competition between these bEBPs has not been tested. The model proposed is an OR gate through which quorum sensing and aggregation signals control Qrr1. However, there are several untested aspects of this model. First, the role of phosphorylation in SypG activity, and the connection to BinK, are not addressed in this manuscript, which may confound the observed effects observed on qrr1 transcription. Further, the authors did not test whether SypG directly binds to the qrr1 promoter, nor did they assess the individual role of LuxO binding to the two LuxO binding sites in the absence of SypG. The study is lacking an in vivo assessment of SypG and LuxO binding/competition at the Qrr1 promoter based on the authors' model of the OR gate.

      Major comments:

      • What is known about the connection between BinK and SypG? BinK is a hybrid HK (intro states this). Does BinK phosphorylate/dephosphorylate SypG - directly or indirectly? I saw a published paper (Ludvik et al 2021) with a diagram suggesting BinK does inhibit SypG, but the connection is unclear. This diagram also suggested that SypG needs to be phosphorylated. Can the authors comment - does SypG need to be phosphorylated to be active? Because SypG has the same sequence as the LuxO linker (Fig. S2), then I presume that SypG would also need to be phosphorylated to be active (like LuxO)? The authors utilize a phosphomimic of LuxO to test function under constitutive activity (Fig. S3), but they do not use a phosphomimic of SypG (Fig 4). If the authors used a constitutive allele, would those assays reveal more about the competition between SypG and LuxO, in the presence of phosphorylated LuxO at low cell density? The authors should include a putative cartoon model for how BinK HK activity connects to SypG, based on what is already in the literature, to aid the reader.

      We have added information & corresponding cartoon model in the results section about the signaling pathway involving BinK and SypG, including that SypG must be phosphorylated to be active and that BinK acts as a phosphatase towards SypG. We have also generated a SypGD53E mutant and found increased Pqrr1 activity, which suggests that phosphorylation of SypG has a major impact on SypG-dependent activation of Pqrr1.

      • Line 246: Figure S3: nucleotide substitutions in both UAS regions showed loss of Pqrr1-gfp, but this could be due to binding/activation by SypG or LuxO. This should be tested in a sypG- strain to determine the sole effect of LuxO binding to these two UASs. In Figures 4G and 7, the luxO- sypG- Ptrc-sypG strain backgrounds allow the independent analysis of the two bEBPs. It is important to test which of these two sites is critical for LuxO-dependent activation of Pqrr1, given the conservation of the LuxO-Qrr1 region in other Vibrios (line 327, Fig. S5). Thus, the authors could also discuss whether these two proteins would compete at both sites. Further, the authors should comment that they have not shown biochemical evidence that SypG binds to the two UASs in the Qrr1 promoter. The regulation of this locus by SypG is only shown by genetic assays in this manuscript.

      We have added a paragraph in the discussion highlighting how useful protein-DNA assays would be to address competition along with the barriers encountered with approaches to purify SypG. Regarding the contribution of each UAS to LuxO-dependent activation, we refer to the phosphomimic data of LuxO (Fig. S4) in the supplement that highlight G-131 and G-97 do not affect LuxO-dependent activation (as pointed out by reviewer #2), which has contributed to our test of a G-131T mutant in the co-colonization experiment.

      • Examination of the binding of LuxO and SypG (e.g., ChIP-seq) in combination with their transcriptional reporter under varying conditions (low cell density vs high cell density, with or without rscS* overexpression) would be extremely beneficial in testing the model proposed.

      We agree but have not had success in our attempts to perform ChIP due to protein instability. For example, we have tried SypG with a C-terminal TAP tag, which my colleague Dr. Lu Bai at Penn State has used extensively for ChIP, ChIP-seq, and ChIP-exo, but we could not observe a signal even when RscS* allele was included in the strain.

      Reviewer #2 (Public Review):

      The study by Surrett et al. uncovers a novel regulatory axis in Vibrio fischeri that controls the expression of the qrr1 small RNA, which post-transcriptionally controls various quorum-dependent outputs. This study is timely and addresses a major question about the physiology of this important model symbiosis and potentially other Vibrio species. The results should be of broad interest within the field of microbiology.

      While it was previously believed that qrr1 expression is under the strict control of the LuxO-dependent quorum sensing cascade, the authors demonstrate that qrr1 expression can be induced by another bEBP, SypG, in a manner that is quorum-independent. It was previously shown that qrr1 is important for colonization, and the authors recapitulate and extend this finding here. However, bacteria are likely at high cell density prior to entry into the crypts, which would repress qrr1 expression. Thus, despite the importance of qrr1 expression for crypt colonization, it would counterintuitively be repressed. The discovery of the SypG quorum-independent induction of qrr1 in this study may help resolve this conundrum. The authors take a largely genetic approach to characterize this novel regulatory pathway in combination with a squid colonization model. The experiments performed are generally well controlled and the data are clearly presented. The authors, however, fail to provide experimental evidence to support the physiological relevance of SypG-dependent control of qrr1 expression during host colonization.

      Fig. 2 - It is unclear why there is a disconnect between qrr1 expression and qrr1-dependent effects. Data in 2B, indicate that qrr1 is induced in the ∆binK mutant according to the Pqrr1-gfp reporter but this expressed qrr1 does not have any effect on phenotypes like bioluminescence according to the data presented in 2C. While the authors reveal an effect of the binK deletion when rscS is overexpressed, it is unclear why this is necessary since simple deletion of bink without rscS is sufficient to induce qrr1 in 2B. Could this discrepancy be due to the fact that experiments in 2B are done on solid media while the experiments in 2C are performed in liquid media? Do cells in liquid not express qrr1? Or conversely, perhaps testing the bioluminescence of cells scraped off of plates could reveal a phenotype for the binK mutant similar to those seen in the rscS background in liquid. Or alternatively, if cells in a liquid culture still express qrr1, perhaps there is a posttranscriptional mechanism that prevents qrr1 from exerting an effect on bioluminescence? The latter possibility would alter the proposed model.

      To help explain why we chose to overexpress RscS, we have added the cartoon in Fig. 2C, which highlights how BinK dephosphorylates SypG. We believe that the conditions used in the bioluminescence assay do not phosphorylate SypG, which prevents an effect by BinK. However, overexpression of RscS permits phosphorylation of SypG, which enables a phenotype to emerge in a binK mutant. We have tested the bioluminescence of cells within spots but did not detect a difference.

      The authors propose a model in which sypG dependent activation of qrr1 is required for appropriate temporal regulation of this small RNA and contributes to optimal fitness of V. fischeri during colonization, however, this was not directly tested, and experimental evidence to support a physiological role for spyG-dependent regulation of qrr1 remains lacking. Data in Fig. S3 and Fig. 4G-H suggest that the Gs at -131 and -97 in Pqrr1 are largely dispensable for LuxO-dependent activation, but are important for SypG-dependent activation of Pqrr1. Also, the Pqrr1 mutations at C -130 and -96 completely prevent sypG-dependent activation while only partially reducing LuxO-dependent activation. If SypG-dependent activation of qrr1 is critical for the fitness of V. fischeri, a strain harboring these Pqrr1 promoter mutations should be attenuated in a manner that resembles the qrr1 deletion mutant as shown in Fig. 3C.

      We thank the reviewer for this suggestion, which led us to generate and test a G-131T mutant in vivo.

      Fig. S4 - these data suggest that LuxO cannot enhance transcription of PsypA and PsypP at native expression levels. But sypG-dependent induction of qrr1 was largely tested with Ptrc-dependent overexpression of SypG. Would overexpression of LuxO induce PsypA and PsypP? The authors should at least acknowledge this possibility in the text.

      As requested, we have added text that acknowledges this possibility.

      The authors adopt three distinct strategies to induce sypG-dependent activation of qrr1 in distinct figures throughout the manuscript: deletion of binK, overexpression of rscS (rscS*), and direct overexpression of sypG. It is not entirely clear why distinct approaches are used in different figures. This is particularly true for Fig. 5 since the authors already demonstrated that the direct overexpression of sypG can be used, which is a more direct way of addressing this question. Similarly, sypG overexpression should inhibit bioluminescence in Fig. 2 based on the proposed model, which would have tested the claims made more directly. Additional text to clarify this would be helpful.

      As requested, we have added Fig. 2C and text to describe how SypG is regulated, which provides ways to test SypG-dependent activation of qrr1.

      The Fig. 5D legend indicates that the strains harbor a Ptrc-GFP reporter. However, the text would suggest that these strains should harbor a Pqrr1-GFP reporter to test the question posed.

      This has been corrected.

      The conclusion that SypG and LuxO share UASs in the qrr1 promoter is based on fairly limited genetic evidence where point mutations were introduced into 3 bp of the predicted LuxO UASs within the qrr1 promoter. This conclusion needs to be qualified in the text or additional experimental evidence is needed to support this claim. For example, in vivo ChIP-exo could be used to map the SypG and LuxO binding sites. Or SypG and LuxO could be purified to assess binding to the qrr promoter in vitro (to map binding sites or test competitive interactions of these proteins to the qrr promoter).

      As described above and in the text, we have not been able to construct a functional tagged SypG that would enable these types of studies.

      On a related note, SypG binding to the qrr1 promoter is speculated based on indirect genetic evidence. But the authors do not directly demonstrate this. This should be acknowledged in the text or additional experimental evidence should be provided to support this claim.

      As requested, we have added text in the discussion that highlights this problem.

      Reviewer #3 (Public Review):

      In this manuscript, Surrett and coworkers aimed to identify the mechanism that regulates the transcription of Qrr1 sRNA in the squid symbiont Vibrio fischeri. In many Vibrio species, Qrr1 transcription is regulated by quorum sensing (QS) and activated only at low cell density. Qrr1 is important for V. fischeri to colonize the squid host. In the QS systems that have been studied so far, LuxO is the only known response regulator that activates Qrr sRNA transcription. However, the authors argued that since V. fischeri forms aggregates before entering into the light organ of the squid, Qrr1 would not be made as high cell density QS state is likely induced within the aggregates. Therefore, they hypothesized that additional regulatory systems must exist to allow Qrr1 expression in V. fischeri to initiate colonization of the light organ. In turn, the authors identified that disruption of the function of the sensor kinase BinK allowed Qrr1 expression even at high cell density. Through a series of cell-based reporter assays and an in vivo squid colonization assay, they concluded that BinK is also involved in Qrr1 regulation within the squid light organ. They went on to show that another sigma54-dependent response regulator SypG is also involved in controlling Qrr1 expression. The authors propose dual regulation of LuxO and SypG on Qrr could be a common regulatory mechanism on Qrr expression in a subset of Vibiro species.

      Overall, the experiments were carefully performed and the findings that BinK and SypG are involved in Qrr1 regulation are interesting. This paper is of potential interest to an audience in the field of QS and Vibrio-host interaction. However, experimental deficiencies and alternative explanations of the results have been identified in the manuscript that prevents a thorough mechanistic understanding of the interplay between QS and these new regulators.

      1) The premise that Qrr1 expression in the light organ has to be regulated by systems other than QS is unclear. In lines 108-109, it was stated that "...prior to entering the light organ, bacterial cells are collected from the environment and form aggregates that are densely packed", however, in lines 184-185, it was stated that "The majority of crypt spaces each contained only one strain type (Fig. 3B), which is consistent with most populations arising from only 1-2 cells that enter the corresponding crypt spaces". So, if the latter case is true (i.e., 1-2 cells/crypt), why Qrr1 could not be made at that time point as predicted by a QS regulation model?

      We have not changed this section because if Qrr1 is expressed only after the cells have already entered the crypt space, then the Δqrr1 mutant would colonize a number of crypt spaces comparable to that of wild type cells.

      2) The involvement of the rscS allele for the ∆binK mutant to show an altered bioluminescence phenotype is confusing. It is unclear why a WT genetic background was sufficient to show the high Qrr1 phenotype in the original genetic screen that identified BinK (Fig. 2A-B), while the rcsS allele is now required for the rest of the experiments to show the involvement of BinK in bioluminescence regulation (Fig 2C). Is the decreased bioluminescence phenotype observed in rcsS* ∆binK mutant (fig. 2C) dependent on LuxU/LuxO/Qrr1/LitR? Could it be through another indirect mechanism (e.g., SypK as discussed in line 403)? A better explanation of the connection between RcsS/Syp and BinK and perhaps additional mutant characterization are necessary to interpret the observed phenotypes.

      As described above, we have added a cartoon that illustrates the pathway involving BinK (Fig. 2C) and additional justification in the results section, which better explains why RscS overexpression was used.

      3) In squid colonization competition assays (Fig. 3), it was concluded that the ∆qrr1 allele is epistatic to the ∆binK allele (line 204), and the enhanced colonization of the ∆binK mutant is dependent on Qrr1 (section title, line 162). This conclusion is hard to interpret. The results can be interpreted as ∆qrr1 mutation lowers the colonization efficiency of the ∆binK mutant which could imply BinK regulates Qrr1 in vivo. Alternatively, it could be interpreted that the ∆binK mutation increases the colonization efficiency of the ∆qrr1 mutant. Direct competition between single and double mutants in the same animals may resolve the complexity. And direct comparison of Qrr1 expression of WT and ∆binK mutants inside the animals, if possible, will also help interpret these results.

      We thank the reviewer for the suggestion and were able to test the ΔbinK and ΔbinK Δqrr1 mutants directly (Fig. S2). We were unable to interpret the data using the Pqrr1 reporter due to unexpected heterogeneity in Pqrr1 activity throughout the crypt spaces.

      4) Similar concern to above (#2), in Fig. 4, the link between BinK and Qrr1 regulation is not fully explored. What connects BinK and Qrr1 expression? Does BinK function via LuxU (or other HPT) to control SypG like the other QS kinases? And what is the role of other known kinases (e.g., SypF) in the signaling pathway? And did the authors test other bEBPs found in V. fischeri for their role in Qrr1 regulation?

      We have added to the discussion content that highlights examining LuxU as a direction worthwhile to pursue to understand how BinK affects signaling that activates Qrr1.

      5) In addition to the genetic analysis, additional characterization of SypG is required to demonstrate the proposed regulatory mechanism: What is the expression level (and phosphorylation state) of SypG and LuxO at different cell densities? Does purified SypG directly bind to the qrr1 promoter region? c. How do these two bEBPs compete with each other if they are both made and active?

      We agree that these are interesting questions, but as described above, we were unable to purify SypG to address the biochemistry.

      6) The molecular OR logic gate is used to describe the relationship between LuxO and SypG, but this logic relationship is not always true in all conditions (if at all). In WT, deletion of luxO completely abolished Qrr1 expression (Fig. 4C). Even in the binK mutant, LuxO still seems to be the more prominent regulator (Fig. 4D) as deletion of luxO already caused a smaller but significant drop in Qrr1 expression. The authors may need to use this term more precisely.

      We note that in wild-type cells, SypG is not active under the conditions tested, so SypG would not contribute to activating Qrr1 expression. The level of Pqrr1 activity by the SypG(D53E) variant surpasses the basal level of LuxO, which suggests that LuxO does not always serve as the prominent regulator. We have added content to the discussion to highlight how LuxO may contribute more to the regulation.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Berryer et al describe a fully automated, scalable approach to quantify the number of synaptic inputs formed onto human iPSC-derived neurons (hNs) in 2D culture. They validate the sensitivity of their approach by synapsin1 knock-down and test almost 400 small molecules for their effect on synapses, and the role of astrocytes. They identify BET inhibitors as strong modifiers of synapse numbers in hNs and performed follow-up experiments to confirm the finding, characterize the effect further and demonstrate the critical role of astrocytes.

      Every step of the protocol is automated to achieve high reproducibility and homogeneity throughout the experiments. This automated approach has great potential for scaling up drug screening, genetic perturbations, and disease modeling experiments related to synapses.

      The authors successfully identified, in two independent hNs lines, three small-molecule inhibitors of transcription modifiers of the BET family as the strongest positive modifiers of synaptic inputs. The initial study performed with immunofluorescence was then validated by Western blot analysis and mRNA-seq analysis, which showed an increase in the expression of trans-synaptic signaling genes.

      While accessing the molecular mechanisms of BET inhibitors, the authors observed that the increased synaptic inputs occurred only in cocultures of astrocytes and neurons, and not in hNs monoculture. Finally, the authors report that the presence of astrocytes alone is a major driving force to promote synaptic inputs.

      Overall, the experiments are well conducted, and the conclusions are supported by the data. The new approach reaches beyond the current state of the field, especially in the first steps of automation and the identified modulators (BET inhibitors) are interesting and novel, and the subsequent validation is convincing.

      On the other hand, the manuscript does not yet define the exact resolution and power of the new methods, and does not convincingly show that the observed synapsin-puncta are synapses and that the data of the validation experiments can be improved.

      MAJOR POINTS:

      1) Although the manuscript contains a lot of quantitative data on variance, the current manuscript stops short of an exact definition of the resolution of the assay and its statistical power. With the real (measured) variance of the assay, the power to detect certain effects can be computed. To be relevant for other applications than the current (e.g. genetic perturbations and disease modelling), it is relevant to define this for smaller effects too: can this assay detect a 25% effect with reasonable numbers of observations? Such assessments can also provide important recommendations on when it makes sense to add more repeated measures of the same specimens (wells, ROIs) and when more independent inductions are required (and how much this adds to overall power). The manuscript would also benefit from a short discussion on how to optimize future study designs (repeated measures, independent inductions, number of subjects).

      As mentioned above, we have now calculated Cohen’s d for: (1) the primary screen overall as well as for compound included in the primary screen, (2) validation experiments performed in neuron monocultures and (3) validation experiments performed in neuron + astrocyte co-cultures, and these data have been added to Figure 5, Figure 5-figure supplement 1 and Supplementary File 2. For the validation experiments, we have also added a discussion of study design, given the observed effect sizes. These analyses are discussed in depth on pages 19-20 of the Results section and page 26 of the Discussion section in the PDF. In brief, we obtained a Cohen’s d of -0.18 for the primary screen where individual small molecules increased as well as decreased synaptic density. Also from the primary screen, we obtained a Cohen’s d of 2.914 for JQ1 and 3.710 for I-BET151, indicating large effects for the BET inhibitors. We also noted large effects for BET inhibitors in the co-culture validation experiments, where we could have scaled down on the number of fields and wells analyzed. While we were reasonably powered to detect changes in the monoculture validation experiments, here, effect sizes were much smaller and required the 50+ wells that we analyzed in order to achieve 95% power. Example from Figure 5 below shows well level data for the co-culture and monoculture validation experiments -

      2) It is widely recognized that synapses formed in networks of NGN2-induced excitatory neurons only, may not model synapses in the real human brain very well (yet), especially not at DIV21. First, the authors can be more open/precise about this, e.g., in line 156 the authors indicate they use hNs at DIV21 because they are "electrophysiologically active" based on three references. However, (a) these references indicate that hNs cultures start to mature from DIV21 onwards but are not really mature yet, and (b) being "electrophysiologically active" seems not the most relevant criterion. Synaptic parameters like initial release probability, rise/decay time, and synchronicity are more relevant (none of which indicate synapses are mature at DIV21). Second, especially in the light of the claims the authors make regarding the effects of compounds on "synaptic connectivity" it seems essential to test, at least in a set of validation experiments, the distribution of postsynaptic markers. Synapsin-positive puncta may not be accompanied by a postsynaptic specialization and rather represent (mobile) vesicle clusters and/or release sites without postsynaptic partners. In addition, the authors claim synapsin1 is a pan-neuronal synapse marker. This is not yet validated for human neurons. A few control stainings with synaptic vesicle and active zone markers will secure this claim.

      We thank the reviewer for this comment and have now updated the text to indicate and expand on the fact that we are looking at immature synapses at day 21 in vitro (e.g., please see pages 8 and 12 of the Results section in the PDF).

      As mentioned above, we also tested conditions for four additional postsynaptic antibodies, drawing from those used in published studies of human cellular models (and species that would not cross-react with antibodies used for Synapsin1 and MAP2). Specifically, we tested antibodies against PSD-95, NLGN4, Homer1 and BAIAP2 at a range of concentrations in co-cultures generated from two independent cell lines. Of these antibodies, we only obtained quantifiable signal for PSD-95, while NLGN4, Homer1 and BAIAP2 appeared to be of poor quality in our culture systems (e.g., nonspecific signal, high signal in astrocytes, etc.). As shown below and in Figure 1-figure supplement 1, analysis of PSD-95 revealed that 43.1% of PSD-95 puncta on MAP2 also colocalized with synapsin1, and 28.8% of synapsin1 puncta on MAP2 also colocalized with PSD-95. Discussions of these data and limitations have been significantly elaborated upon on pages 10-11 of the Results section and pages 24 and 29 of the Discussion section in the PDF. For example, we discuss how the partial colocalization could be due both to the relative immaturity of the synapses discussed above (presynaptic assembly preceding postsynaptic assembly at this early stage of neuronal development) as well as the overall poorer quality of the PSD-95 signal in human cellular material (PSD-95 signal was of insufficient quality and consistency for screening applications and was generally quite difficult to resolve as compared to Synapsin1).

      Additionally, we tested two additional presynaptic antibodies, including synaptophysin and SV2A. Of these antibodies, we obtained reasonable quality signal for synaptophysin, which we have quantified in Figure 1-figure supplement 1. While SV2A also gave some signal, it was of poorer quality and difficult to reliably quantify. We observed roughly half of the Synapsin1 signal on MAP2 colocalizing with synaptophysin, and vice versa. Lack of complete colocalization could be due to reports that synapsin1 expression precedes synaptophysin expression in the cortex (e.g., Pinto et al 2013), reports that synaptophysin is also expressed at extra synaptic sites (e.g., Micheva et al 2010), or the reduced quality of staining for synaptophysin that we obtained compared with synapsin1. These data are now elaborated upon on pages 10-11 of the Results section and page 24 of the Discussion section in the PDF.

      We have also expanded our discussion of Synapsin1 as a presynaptic marker including additional references on the use of Synapsin1 to label cortical glutamatergic synapses in rodent (e.g., Micheva 2010) and the use of Synapsin1 on MAP2 as a pan-synaptic marker in human neurons (e.g., Chanda et al 2019, Pak et al 2015, Yi et al 2016; page 10). We have also included the use of Synapsin1 on MAP2 as a specific Limitation on page 29 where we discuss that reliance on this system in developing neurons may be capturing sites which do not then develop into fully functional synapses with postsynaptic partners.

      3) The analysis of the transcriptional effects of BET inhibitors is rather basic, especially given the rather strong claim: "BET inhibitors enhance synaptic gene expression programs". Which programs? Differentially expressed transcripts can at least be analysed further in terms of subcellular localization (pre/post) or synaptic functions, e.g. using SYNGO, also to address point 2 above.

      We thank the reviewer for this comment and have now incorporated SynGO analysis into Figure 6 to examine the synaptic ontology terms. As shown below, Figure 6g now includes the top 5 significantly enriched terms and Figure 6h shows the gene counts by cellular component. Here, we focused on genes upregulated after both JQ1 and Birabresib treatment compared with a background list of expressed genes. The most enriched synaptic ontology terms related to the post-synaptic membrane, so we also validated protein level changes in two postsynaptic proteins (Homer1 and BAIAP2) by Western blot analysis in Figure 6. In addition to Figure 6, these data are now included in Supplementary File 5 and discussed on page 22 of the Results section.

    1. Author Response:

      Reviewer #1 (Public Review):<br /> <br /> Roberts et al have developed a tool called "XTABLE" for the analysis of publicly available transcriptomic datasets of premalignant lesions (PML) of lung squamous cell carcinoma (LUSC). Detection of PMLs has clinical implications and can aid in the prevention of deaths by LUSC. Hence efforts such as this will be of benefit to the scientific community in better understanding the biology of PMLs.

      The authors have curated four studies that have profiled the transcriptomes of PMLs at different stages. While three of them are microarray-based studies, one study has profiled the transcriptome with RNA-seq. XTABLE fetches these datasets and performs analysis in an R shiny app (a graphical user interface). The tool has multiple functionalities to cover a wide range of transcriptomic analyses, including differential expression, signature identification, and immune cell type deconvolution.

      The authors have also included three chromosomal instability (CIN) signatures from literature based on gene expression profiles. They showed one of the CIN signatures as a good predictor of progression. However, this signature performed well only in one study. The authors have further utilised the tool XTABLE to identify the signalling pathways in LUSC important for its developmental stages. They found the activation of squamous differentiation and PI3K/Akt pathways to play a role in the transition from low to high-grade PMLs

      The authors have developed user-friendly software to analyse publicly available gene expression data from premalignant lesions of lung cancer. This would help researchers to quickly analyse the data and improve our understanding of such lesions. This would pave the way to improve early detection of PMLs to prevent lung cancer.

      Strengths:

      1. XTABLE is a nicely packaged application that can be used by researchers with very little computational knowledge.<br /> 2. The tool is easy to download and execute. The documentation is extensive both in the article and on the GitLab page.<br /> 3. The tool is user-friendly, and the tabs are intuitively designed for successive steps of analysis of the transcriptome data.<br /> 4. The authors have properly elaborated on the biological interest in investigating PMLs and their clinical significance.

      Weaknesses:

      The article is focused on the development and the utility of the tool XTABLE. While the tool is nicely developed, the need for a tool focussing only on the investigation of PMLs is not justified. Several shiny apps and online tools exist to perform transcriptomic analysis of published datasets. To list a few examples - i) http://ge-lab.org/idep/ ; ii) http://www.uusmb.unam.mx/ideamex/ ; iii) RNfuzzyApp (Haering et al., 2021); iv) DEGenR (https://doi.org/10.5281/zenodo.4815134); v) TCC-GUI (Su et al., 2019). While some of these are specific to RNA-seq, there are plenty of such shiny apps to perform both RNA-seq and microarray data analysis. Any of these tools could also be used easily for the analysis of the four curated datasets presented in this article. The authors could have elaborated on the availability of other tools for such analysis and provided an explanation of the necessity of XTABLE. Since 3 of the 4 datasets they curated are from microarray technology, another good example of a user-friendly tool is NCBI GEO2R. This is integrated with the NCBI GEO database, and the user doesn't need to download the data or run any tools. iDEP-READS (http://bioinformatics.sdstate.edu/reads/) provide an online user-friendly tool to download and analyse data from publicly available datasets. Another such example is GEO2Enrichr (https://maayanlab.cloud/g2e/). These tools have been designed for non-bioinformatic researchers that don't involve downloading datasets or installing/running other tools.

      Two of these tools (IDEP and TCC-GUI) were reviewed in a literature review covering 20 Shiny apps performed two years ago prior to work on XTABLE starting. Three of the suggested tools (IDEP, RNFuzzyApp, TCC-GUI) are for processing only RNA-seq datasets. IDEAMEX appears to be for RNA-seq data only and is severely limited in its downstream analysis capabilities. DEGenR appears to handle microarray datasets and features an option to retrieve data directly from GEO. However, it appears to be based on GEO2R (with additional downstream analyses) where it automatically logtransforms already log-transformed data and unlike GEO2R, you do not have the option to not apply a log-transformation. A refreshed literature search focusing on microarray datasets highlighted three additional tools. iGEAK which hasn’t been updated in three years and seems to have compatibility issues running on new Windows and Mac machines. sMAP, an upcoming Shiny app for microarray data published in bioRxiv on 29 May 2022. MAAP which has the same issue of log-transforming already log-transformed data. iDEP-READS does not list the datasets used in XTABLE. GEO2Enrichr appears to require the counts table and experimental design in one file, performs a “characteristic direction” DEG test and outputs enriched pathways. These apps require not just downloading of datasets but reformatting and renaming of expression data files and creation of additional files for setting up the DEG analysis which is not practical for the number of samples we have (122, 63, 33, 448) even if these apps handled microarray data. XTABLE also incorporates AUC metrics, which is appropriate given the number of samples in each dataset and tool known for adequately controlling FDR, which is not seen in other apps as well as emphasis on individual gene results and interrogation.

      A new paragraph on the discussion section (lines 361-370) of the discussion addresses the potential use of existing applications instead of XTABLE

      Secondly, XTABLE doesn't provide a solution to integrate the four datasets incorporated in the tool. One can only analyse one dataset at a time with XTABLE. The differences in terms of methodology and study design within these four datasets have been elaborated on in the article. However, attempts to integrate them were lacking.

      We repeatedly considered different strategies of integrating the analysis of the four datasets and we always reached the conclusion that it was hardly going to offer any advantage, or that it might be counterproductive.

      Integration can occur at multiple levels. One possibility is to carry out the same analysis (e.g. expression of a given gene in two groups of samples) in all datasets. Since the design and methodologies of the four studies differ substantially (different stages, different definitions of progression status, etc), a unique stratification for all datasets is not possible. Moreover, interrogating the four datasets simultaneously would slow the analysis, with no significant advantage in terms of speed. Another possibility is the integration of results in the same output. For instance, obtain a single chart with the expression of a given gene in multiple subgroups of the four datasets. We think that the results from each cohort should be kept separately and then compared with a similar analysis from other datasets due to differences in design. Scientifically, this is the best way to proceed as it avoids confusions.

      Nevertheless, XTABLE allows the export of data for further analysis. The user can use this option to integrate data using other applications or statistical packages.

      We do understand the attractiveness of integration between the four datasets is and we seriously considered it. But there is a fine balance between user-friendliness, flexibility, and scientific rigour. We think that XTABLE achieves this balance. Increasing integration of datasets might lead to error and wrong conclusions due to biological and methodological differences between studies. We believe that comparing analyses obtained independently from the four cohorts is the most sensible way to proceed.

      We propose to discuss these aspects accordingly.

      The integrative analysis of two or more datasets has been discussed in a new paragraph (382-391)

      The tool also lacks the flexibility for users to add more datasets. This would be helpful when there are more datasets of PMLs available publicly.

      This was also a permanent topic for discussion while designing XTABLE. Creating a tool that could be used to analyse other cohorts of precancerous lesions, while maintaining the ease of use was certainly a challenge. We had to adapt XTABLE to the characteristics of each one of the four databases: specific stratification criteria, different nomenclatures for the different sample types, etc. Designing a shiny app that can be adapted to other present or future datasets without the need of changing the code is simply not practical.

      The flexibility that these other Shiny apps incorporate to analyse any RNA-seq dataset requires the contrasts used for the differentially expressed gene analysis be manually defined. IDEP requires an experimental design file where sample names in the counts file must match exactly the sample names in this experimental design file and pre-processing visualisation is limited to the first 100 samples. RNFuzzyApp is similar but we could not format the experimental design file in a way that did not result in the app crashing upon upload. TCC-GUI requires all the sample names to be renamed to the contrast group with the addition of the replicate number. Apps that allow datasets to be uploaded do not have a practical or easy way to set up the DEG analysis of more than a couple dozen samples.

      Future versions of XTABLE can be updated to include additional curated PML datasets that would enhance hypothesis generation upon request. Importantly, the code is freely available and can be modified by other scientists to add their cohorts of interest, although we agree that a high level of expertise in coding will be needed. We propose to add these considerations to the text.

      The possibilities of expansion of XTABLE to new databases are discussed in lines 392-398

      Understanding the biology of PML progression would require a multi-omics approach. XTABLE analyses transcriptome data and lacks integration of other omics data. The authors mention the availability of data from whole exome, methylation, etc from the four studies they have selected. However, apart from the CIN scores, they haven't integrated any of the other layers of omics data available.

      Only one dataset (GSE108104) contains whole-exome sequencing and methylation data. We considered that a multi-omics approach in XTABLE would result in an overcomplicated application. As far as early detection and biomarker discovery is concerned, transcriptomic data is the most interesting parameter.

      Also discussed in lines 382-391

      Lastly, the authors could have elaborated on the limitations of the tool and their analysis in the discussion.

      We propose to raise these limitations accordingly in the discussion.

      See above.

      Reviewer #2 (Public Review):

      In this manuscript, Roberts et al. present XTABLE, a tool to integrate, visualise and extract new insights from published datasets in the field of preinvasive lung cancer lesions. This approach is critical and to be highly commended; whilst the Cancer Genome Atlas provided many insights into cancer biology it was the development of accessible visualisation tools such as cbioportal that democratised this knowledge and allowed researchers around the world to interrogate their genes and pathways of interest. XTABLE is trying to do this in the preinvasive space and should certainly be commended as such. We are also very impressed by the transparency of the approach; it is quite simple to download and run XTABLE from their Gitlab account, in which all data acquisition and analysis code can be easily interrogated.

      We would however strongly advocate deploying XTABLE to a web-accessible server so that researchers without experience in R and git can utilise it. We found it a little buggy running locally and cannot be sure whether this is due to my setup or the code itself. Some issues clearly need development; Progeny analysis brings up a warning "Not working for GSE109743 on the server and not sure why". GSEA analysis does not seem to work at all, raising an error "Length information for genome hg38 and gene ID ensGene is not available". In such relatively complex software, some such errors can be overlooked, as long as the authors have a clear process for responding to them, for example using Gitlab issue reporting. Some acknowledgement that this is an ongoing development would be helpful.

      We thank the reviewer for these comments. We will inspect the code to address those warnings, implement a system for issue reporting, and add the acknowledgements suggested by the reviewer. Regarding the deployment of XTABLE to a web-accessible server, this could present a challenge in the long term as computing resources need to be allocated for years and the economic cost involved.

      The code has been inspected to remove the warning and errors pointed out by the reviewer.

      The authors discuss some very important differences between the datasets in the text. Most notably they differ in endpoints and in the presence of laser capture. We would advocate including some warning text within the XTABLE application to explain these. For example, the "persistent/progressive" endpoint used in Beane et al (next biopsy is the same or higher grade) is not the same as the "progressive" endpoint in Teixeira et al (next biopsy is cancer); samples defined as "persistent/progressive" may never progress to cancer. This may not be immediately obvious to a user of XTABLE who wishes to compare progressive and regressive lesions. Similarly, the use of laser capture is important; the authors state that not using laser capture has the advantage of capturing microenvironment signals, but differentiating between intra-lesional and stromal signals is important, as shown in the Mascaux and Pennycuick papers. The authors cannot do much about the different study designs, but as the goal is to make these data more accessible We think some brief description of these issues within the app would help to prevent non-expert users from drawing incorrect conclusions.

      The authors themselves illustrate this clearly in their analysis of CIN signatures in progression potential. They observe that there is a much clearer progressive/regressive signal in GSE108124 compared to GSE114489 and GSE109743. This does not seem at all surprising, since the first study used a much stricter definition of progression - these samples are all about to become cancer whereas "progressive" samples in GSE109743 may never become cancer - and are much enriched for CIN signals due to laser capture. Their discussion states "CIN scores as a predictor of progression might be limited to microdissected samples and CIS lesions"; you cannot really claim this when "progression" in the two cohorts has such a different meaning. To their credit, the authors do explain these issues but they really should be clearly spelled out within the app.

      This is a very good point. We will add the warning text about the differences between studies regarding the definition of progression potential and the differences and sample processing (LCM or o not) so that the user is permanently aware of the differences between cohorts.

      A new tab (Dataset) has been added table with the methodologies used in each of each study, and the differences in progression status definitions. Additionally, we emphasized these differences in the main text of the manuscript (lines 296-300 and 403-409).

      We are not sure we agree with their analysis of CDK4/Cyclin-D1 and E2F expression in early lesions. The authors claim these are inhibited by CDKN2A and therefore are markers of CDKN2A loss of function. But these genes are markers of proliferation and can be driven by a range of proliferative processes. Histologically, low-grade metaplasias and dysplasias all represent proliferative epithelium when compared to normal control, but most never become cancer. It is too much of a leap to say that these are influenced by CDKN2A because that gene is inactivated in LUSC; do the authors have any evidence that this gene is altered at the genomic level in low-grade lesions?

      We are grateful for this comment. There is currently not evidence that CDKN2A mutations occur in low-grade lesions and therefore, we cannot argue that the of CDK4/Cyclin-D1 and E2F expression signature are the result of CDKN2A inactivation in low-grade lesions. We propose to modify the text to introduce these caveats to our conclusion an make our interpretations more accurate.

      We have modified the discussion (lines 443-454) to address the interpretation of our results regarding the connection between CDKN2A inactivation and the CDK4/cyclin-D1 and E2F signatures. We now focus our conclusions on the pathway itself and we mention Cyclin-D1 and CDKN2A alterations as a potential modulator of the changes in the pathway, but leaving the discussion open to other drivers.

      Overall this tool is an important step forwards in the field. Whilst we are a little unconvinced by some of their biological interpretations, and the tool itself has a few bugs, this effort to make complex data more accessible will be greatly enabling for researchers and so should be commended. In the future, we would like to see additional molecular data integrated into this app, for example, the whole genome and methylation data mentioned in line 153. However, we think this is an excellent start to combining these datasets.

    1. Author Response

      Reviewer #2 (Public Review):

      The idea of using fluorescently labeled tandem SH2 domains to target tagged RTKs is brilliant and could potentially provide a powerful new way to assess the activation of RTKs in situ and in multiple physiological contexts. Thus, it was disappointing that there was insufficient characterization of the system to be able to interpret the data it generates. Although the paper shows that tagging the EGFR appears to have minimal impact on its biological activity, the readout for receptor kinase activity is % clearance of the fluorescent reporter tag from the cytosol. Such clearance is likely to depend on a variety of different factors, including the ratio of tagged receptors to probe, the number of functional pools in which the probe exists, the exchange rate between these pools, and the affinity of the probes for the tagged receptor. Without determining how each of these factors impacts % clearance, it is difficult to interpret either the dose-response curves or response kinetics.

      We appreciate the reviewer’s point that the paper would be improved by a thorough analysis of how membrane translocation depends on our biosensor’s expression levels. We have attempted to address this thoroughly in our response to the Editor’s summary comments above. Briefly, we have now added 3 new supplementary figures (Figures S2-S4) in which we quantify ZtSH2 translocation as a function of expression levels. We find that the ratio of EGFR/ZtSH2 expression predicts the extent of ZtSH2 translocation in both NIH3T3 and HEK293T cells, matching results from our computational model. We have also added a new section to the main text to clearly explain these results (Lines 190-235). We hope that these data clarify the design constraints for two-component biosensors of this type.

      For example, the difference in activation kinetics between EGFR and ErbB2 is very interesting, but the almost instantaneous rise (Fig S4B) is very surprising. The kinetics of activation of the EGFR have been extensively studied by mass-spectrometry and are generally limited by ligand binding, which has a characteristic time of several minutes, not seconds (pmid: 26929352; pmid: 1975591). Thus, such a response is suggestive of a freely exchanging ZtSH2 reporter pool that is mostly depleted in seconds with the slow secondary kinetics reflecting a slowly exchanging ZtSH2 reporter pool. Alternately, the cells could be accumulating an intracellular pool of activated receptors over time. That the authors are using concentrations of EGF >100-fold physiological levels (pmid: 29268862) further complicates the interpretation of these experiments.

      We thank the reviewer for bringing these papers to our attention. However, we strongly disagree with their interpretation of the results. In a paper cited by the reviewer (PMID:26929352), phosphotyrosine responses are extremely fast, with phosphorylation occurring within tens of seconds even in response to 20 nM EGF (see Figure 2 from Reddy et al PNAS 2016). Reddy et al further claim in their abstract “Significant changes were observed on proteins far downstream in the network as early as 10 s after stimulation.” While the timescale of EGFR phosphorylation may be of some debate, the response timescale we observe is consistent with previously published observations.

      It is also important to point out that the secondary gradual rise of ZtSH2 recruitment is only observed upon treatment with EGF, not EREG or EPGN (Figure 3A). The gradual rise can also be observed upon treatment with EREG in the presence of a GBM-associated EGFR mutation that alters receptor dimerization (Figure 3E). These data indicate that the secondary rise is not an intrinsic feature of the ZtSH2 reporter, and instead represents a feature of ligand-receptor activation itself.

      The reviewer suggests that perhaps there is some internal pool of ZtSH2 or EGF, but we find no evidence for such a pool in our microscopy imaging. To clarify this point to the reader, we have now added a new supplementary figure (Figure S6) showing representative cells for all stimulation conditions used in Figure 3A, showing consistent, high levels of EGFR and ZtSH2 enrichment at the plasma membrane and uniform cytosolic intensity for at least 30 min after stimulation across all ligands.

      Finally, while the reviewer mentions the use of high EGF doses in our paper, we would like to point out that we performed extensive experiments at other doses in the manuscript, testing 14 total doses of three EGFR ligands in Figure 3, and present additional data at 20 ng/mL EGF throughout Figures 2, S2, and S7. It is also very important to test high input doses for our negative controls to ensure that the ZtSH2 biosensor retains specificity for ITAM sequences and fails to show recruitment to untagged EGFR even under saturating conditions. It is also quite customary in the field: for example, the Erk KTR paper that the reviewer mentions in a later comment (Regot et al, Cell 2014) exclusively tests their biosensors using saturating doses of 50 ng/mL anisomycin, 100 ng/mL FGF, and 10 μM forskolin to characterize p38, Erk and PKA biosensor responses.

      There is also insufficient attention paid to either controlling or measuring important parameters, such as expression levels of tagged receptors or levels of endogenous receptors. 3T3 cells, contrary to the statement of the authors, do not have "negligible" numbers of EGFR: they have ~40K, which is typical for mouse fibroblasts. This is much higher than MCF7 cells, which are frequently used as a model system to study EGFR responses. Yet they do not see transactivation of their ErbB2 construct in 3T3 cells without expressing additional EGFR (Fig. 4C), suggesting low sensitivity of the assay. Conversely, they show a significant response mediated by endogenously tagged EGFR in HEK 293 cells, which are frequently used as an EGFR-negative cell line (PMID: 26368334). This indicates that their assay is extremely sensitive. Which is it? As mentioned above, it likely depends on the expression level and affinity of the different components of their system.

      After extensive searching we have not found any publications with an estimate as high as 40K EGFR receptors/cell in NIH3T3 cells. Livneh et al 1986 report that NIH3T3 cells express as little as 500 EGFR receptors per cell and do not respond mitogenically to EGF, and subsequent Schlessinger lab papers use NIH3T3 cells as an EGFR-null background for introduction of receptor variants. Eierhoff et al PLOS Pathogens 2010 use NIH3T3s as an EGFR-null control, showing immunoblot data of undetectable pEGFR responses. The paper we found with the highest stated EGFR expression per cell in NIH3T3 cells is Verbeek et al, FEBS Lett 1998, which reports a value of 3,000 receptors per cell, but does so without any literature citation or measurement. These references are consistent with our experience: over nearly a decade of MAPK signaling experiments in the lab, we have only seen weak or undetectable EGF-stimulated responses in unmodified NIH3T3s, depending on the assay. We are quite confident that more potent responses are elicited in HEK293T cells, where we observe EGFR expression by fluorescence imaging of CRISPR-tagged cells, immunofluorescence staining, and immunoblotting, and where we observe robust signaling responses using biosensors. We also now cite some of these references to support our claim (Line 144).

      The reviewer makes an excellent point in the last sentence of their comment: indeed, it is essential to match the expression level of our SH2-based biosensor to the expression level of EGFR in any system in order to observe potent membrane translocation! This was imperative for visualizing any translocation in our CRISPR-tagged HEK293Ts: we had to switch to an exceptionally bright fluorophore and select cells with very low ZtSH2 expression to observe translocation. The ZtSH2/EGFR ratio is a crucial design parameter, which we now present extensive data and modeling to support (Figure S2-S4; Lines 190-235). Our data suggests that quite sensitive biosensor responses are possible with appropriate balance between ZtSH2 and EGFR expression levels (Figure 6) and, in general, biosensor responses can be matched to a dynamic range of interest by scaling ZtSH2 expression with EGFR levels.

      A great advantage of using the EGFR system as a test case for the new system is that thousands of investigations have been performed over the last four decades. This provides a strong foundation for determining whether the new technology is working correctly. For example, the dynamics of EGFR activation and trafficking at the single cell level have been documented in many studies, which show a remarkable consistency (e.g. see pmid: 24259669; pmid: 11408594; pmid: 25650738). Unfortunately, instead of using differences between the new results and previously reported data as a basis for refining their technique, the authors attempt to apply their raw data to address complex questions of EGFR dynamics, with less than satisfactory results.

      For example, they attempt to use their technique to understand the basis of different signaling dynamics between EGFR ligands. Rather than being a relatively recent observation, differences in EGFR ligand signaling have been explored for over 30 years (pmcid: PMC361851), and are generally ascribed to differences in trafficking (pmid: 7876195). Based on these observations and resulting mathematical models, novel EGFR ligands have been designed with enhanced potency (pmid: 8195228 , pmid: 9634854 ). All this work was done over 20 years ago. Since then, new natural ligands for the EGFR have been discovered from sequence analysis and differences in their potency have similarly been ascribed to differences in their intracellular trafficking patterns (pmid: 19531065 - cited by the authors). An alternate hypothesis was proposed more recently by Freed et al (2017) as described by the authors, but that is what it is: an alternative hypothesis.

      We thank the reviewer for pointing out many excellent, classic studies on EGFR endocytosis and trafficking. We agree that this is a well-established field and that EGFR is certainly internalized, recycled, and degraded in a manner that depends on ligand affinity on the cell surface and in endosomes. These seminal studies lead the reviewer to propose an alternative hypothesis to explain our kinetic data in Figure 3: that differences in trafficking and maintenance of EGFR levels at the plasma membrane are the source of the altered kinetics between high- and low-affinity ligands. To address this question, we have now included new supplementary data examining endocytosis and trafficking in multiple contexts.

      First, we examine membrane EGFR levels in 3T3 cells overexpressing our EGFR-pYtag system (or ITAM-less EGFR as a control) after EGF stimulation (Figure S5A-C). We find that EGFR membrane intensity is virtually unchanged after 60 min of saturating EGF stimulation, a response that does not depend on whether ITAMs are appended to the receptor. We also now include still images of cells at every concentration examined in our dose-response experiments for all 3 ligands (Figure S6), which do not show clear differences in the subcellular distribution of EGFR before and after stimulation as a function of ligand identity. We also remind the reviewer that our interpretation is not simply an untested hypothesis – we experimentally tested a GBM-associated EGFR variant whose effect on receptor dimerization has been quantified, and observe EGF-like response kinetics even after EREG stimulation, a result predicted by our model (Figure 3D-E).

      We believe that the sustained membrane-localized signaling we observe might be ascribed to two factors: our choice of cell line and its expression level of EGFR. This conjecture is supported by some data: in contrast to our EGFR-overexpressing NIH3T3 cells, HEK293Ts harboring endogenous or low EGFR levels exhibit a dramatic redistribution of EGFR after EGF stimulation (Figure S3, Figure 6). This is clearly a context where transient versus sustained signaling might depend on the choice of ligand and its consequences on internalization.

      We also note that our data identify ligand-specific signaling differences that are distinct from prior studies, which focused on transient vs sustained signaling downstream of different EGFR ligands. In contrast, we identify a biphasic increase in EGFR activity after stimulation with EGF versus a rapid approach to steady state after stimulation with EREG or EPGN, despite the continued presence of high levels of membrane-localized EGFR in each case.

      Unfortunately, the model that the authors use to test this hypothesis does not even include endocytosis or receptor trafficking but instead uses variable "scaling" factors to see if the data can fit the dimerization hypothesis. In the supplement, they state that "Since our simulations were run on relatively short time scales (~30 min post-stimulation), we did not consider trafficking and degradation of receptors." However, the half-life of EGFR internalization is generally ~3-4min (pmid: 1975591) and degradation ~1hr, so most of the signal shown in Figure 3 is likely to come from internalized rather than surface-associated ligand-EGFR complexes. A further complication is that internalization rates are strongly influenced by receptor expression levels (pmid: 3262110), which are not controlled for here. Thus, the omission of trafficking in their model is not appropriate. This does not mean that the authors are wrong, it simply means that without validation or calibration, their new technology is not ready to resolve current problems in the field.

      We thank the reviewer for pointing out ways to improve our modeling (endocytosis) and discussion of its parameterization (scaling factors). We address both points below:

      Scaling factors: We thank the reviewer for their comments & agree that our discussion of model parameterization was lacking. To clarify: our base-case model for EGF includes 9 parameters, 6 of which are obtained from literature and 3 which reflect lumped kinetic processes of EGFR dimerization and activation and which we set to match our data. We then used experimentally-determined values to change the base-case model to simulate low-affinity ligand stimulation: a fold-change in ligand affinity and a fold-change in receptor dimerization. This is why we simulate EREG with β=50 and γ=100, reflecting the 10-to-100-fold differences in binding affinity and receptor dimerization that have been experimentally measured for this low-affinity ligand. Similar experimentally defined values constrain β and γ in the case of GBM-associated mutations. A more thorough explanation of our model and these scaling parameters is now included in Lines 334-362.

      Endocytosis: We wholeheartedly agree that our model is quite simplified, and a thorough treatment of endocytosis and trafficking would be essential for capturing nuances associated with these steps of the cascade. However, while we appreciate the 3-4 min rule of thumb for EGFR internalization that the reviewer mentions, it is simply not reflective of the membrane-associated EGFR levels we observe in our cells. Examples can be observed in Figure 1C, Figure 2A, Figure 5F, Figure S1B, Figure S2A-B, Figure S5A, and Figure S6, as well as in the quantification of membrane associated EGFR at 0 and 60 min in Figure S5B. It is quite likely that endocytosis and trafficking are operating throughout this time course, but are balanced to maintain similarly high level of EGFR at the cell surface. We wholeheartedly agree with the reviewer’s helpful note that EGFR expression levels heavily influence internalization, which our data also support, and may explain our results. For example, we also see rapid EGFR membrane clearance in HEK293T CRISPR cells (Figure 6) and in HEK293Ts that express low levels of EGFR but not high levels of EGFR (Figure S3A).

      In sum, we argue that our inclusion of additional data showing sustained EGFR protein levels and ZtSH2 recruitment at the plasma membrane should help justify our assumption of membrane-associated signaling in our model. However, we happily concede that this is a highly simplified model, and that endocytosis is a very important process that should be accounted for in future studies (e.g., Line 344-346: “However, we expect that internalization and trafficking can play a crucial role in EGFR dynamics in many contexts, which would need to be included in future models to adequately assess those scenarios”).

    1. Author Response

      Reviewer #3 (Public Review):

      Over the past decade, Cryo-EM analysis of assembling ribosomes has mapped the major intermediates of the pathway. Our understanding of the mechanisms by which ATPases drive the transitions between states has been slower to develop because of the transient nature of these events. Here, the authors use cryo-EM and biochemical and molecular genetic approaches to examine the function of the DEAD-box ATPase Spb4 and the AAA-ATPase Rea1 in RNP remodeling. Spb4 works on the pre-60S in an early nucleolar state. The authors find that Spb4 acts to remodel the three-way junction of H62/H63/H63a at the base of expansion segment ES27. Interestingly, Spb4 appears to interact stably with a folding intermediate in the ADP rather than ATP-bound form. This work represents one of the few cases in which an RNA helicase of ribosome biogenesis has been captured and engaged with its substrate. The authors then show that the addition of the AAA-ATPase Rea1 to Spb4-purified particles results in the release of Ytm1, a known target of Rea1. However, they did not observe an efficient release of Ytm1 when particles were affinity purified via Ytm1, suggesting that the recruitment of Spb4 is important for this step. Cryo-EM analysis of Spb4-particles treated with Rea1 revealed the previously characterized state NE particles but no additional intermediates. Consequently, this analysis of Rea1 is less informative about its function than is their work on Spb4 helicase activity. In general, the data support the authors' conclusions and the data are well presented.

      Major points

      1) The Erzberger group has recently published work regarding the function of Spb4. They similarly found that Spb4 is necessary for remodeling the 3-way junction at the base of ES27. Although it was posted to Biorxiv in Feb 2022, it was not formally published until Dec 2022. The authors should cite this work and include a brief discussion comparing conclusions.

      We are now citing this study in the introduction and discussion and are briefly comparing the conclusions.

      2) L311. The heading "Coupled pre-60S dissociation of the Ytm1-Erb1 complex and RNA helicase Has1" should be changed. Coupling implies a mechanistic interplay. Although the release of Ytm1 and Has1 both depend on Rea1, the data do not support the conclusion of mechanistic coupling. In fact, the authors write in lines 328-329 "Thus, the Rea1-dependent pre-60S release of the Ytm1-Erb1 complex occurs before and independently of Has1..." Independently cannot also imply coupling.

      We have changed the heading into “Ytm1–Erb1 release promotes the dissociation of the RNA helicase Has1”.

      3) L339-342 Combining data sets for uniform processing was a great idea! This approach should be used more often in cryo-EM analyses of in vitro maturation reactions.

      We agree with the reviewer that this approach is appropriate to analyse such reactions.

      4) L428 The authors need to amend their comment that this is the first structure of Spb4-bound to the substrate as this has recently been published by the Erzberger group and was first posted as a preprint in early 2022.

      We have removed the statement regarding the first structure of Spb4 and added a citation of the study published by Cruz et al.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript builds on data from the same group showing that Lphn2 functions cell-autonomously as a receptor in CA1 pyramidal axons and cell-non-autonomously as a ligand in the neurons of the subiculum. In either case, binding of teneurin-3 to Lphn2 mediates repulsive events, and since different populations of neurons within each region express differing levels of both proteins, this mechanism allows proximal CA1 pyramidal axons to preferentially project to distal subiculum and distal CA1 pyramidal axons to project to proximal subiculum. The authors now ask mechanistic questions about the role of Lphn2 signaling in these wiring processes.

      The authors demonstrate that G-protein signaling downstream of Lphn2, which is mediated by the tethered agonist, is necessary for the ability of ectopically expressed Lphn2 to redirect proximal CA1 axons from distal to proximal subiculum. Moreover, the authors show that while autoproteolytic activity of Lphn2 facilitates G-protein signaling, possibly by making the tethered agonist more available for signaling, it is not necessary for axonal mistargeting. Thus, the authors conclude that tethered agonistdependent G-protein signaling is required for Lphn2-mediated hippocampal neural circuit assembly. Most of the data shown in support of these conclusions are convincing, though I have some concerns about the expression levels and/or effects of the tethered agonist mutants in CA1, which is important since the analyses assume that any defects are in the repulsive interactions described above.

      We thank Reviewer 1 for their suggestion to incorporate data on the expression levels of the tethered agonist mutants in CA1. We have now performed additional experiments and included a new Figure 1—figure supplement 2A-B to address this concern.

      The authors also use heterologous cells to determine that Lphn2 couples to Ga12/13, but not other heteromeric G-proteina-subunits. Within the context of heterologous cells, these experiments are well controlled and exhaustive, as every mutant used in vivo is carefully analyzed. One potential criticism of this work, however, is that perhaps the authors assume too much in simply translating their results in heterologous cells to neurons, especially when one of the most interesting conclusions of this paper (see below) is that Lphn2 signaling is context-dependent. Without further data to confirm the results of these experiments in the neuronal populations studied, these data primarily illustrate possibilities, but don't exclude other possibilities.

      We are grateful to Reviewer 1 for bringing this potential criticism to our attention. We have now included clarification of this point in the text and discussion of the manuscript, as noted in our response to Essential Revision #3 above.

      Finally, the authors test the role of Lphn2 functioning as a ligand in the subiculum by driving its expression in the normally Lphn2-low dorsal subiculum. As they reported before, this alteration decreases the ability of proximal CA1 axons to project to this area. Interestingly, and in contrast to the role of Lphn2 as a receptor above, neither Lphn2 autoproteolysis nor tethered agonist function are required for this effect. This finding is very interesting and will merit follow-up, though I agree with the authors that this manuscript does not require this for publication.

      In summary, this is an interesting paper that addresses timely and pressing issues in the adhesion-GPCR field.

      Reviewer #2 (Public Review):

      This is an intriguing study investigating the molecular mechanisms of the adhesion G-protein coupled receptor latrophilin-2 control of neural circuit developmental organization. Using the model CA1 to subiculum hippocampal circuit with its spatially segregated axon targeting, the authors experiments find that ectopic Lphn2 expression in CA1 neurons that normally do not express it, leads to axon mistargeting. The authors detail these circuitry alterations with Lphn2 genetic manipulations, finding that axon targeting is dependent on its GPCR signaling, likely through Galpha12/13 coupling.

      Strengths: Building off the author's previous studies, the experiments are well designed and analyzed. The advance in this study is finding that Lphn2 expression in CA1 cells that normally do not express impacts its axon targeting. They go on to show compelling data that implicates this mistargeting is dependent on Lphn2 GPCR signaling properties, identified as likely Galpha12/13 dependent.

      Weaknesses: The system used is a "misexpression system". By forcing cells with ordinally low levels to overexpress Lphn2, circuitry alterations are observed. While this gain of function assay demonstrates the importance as to why Lphn2 is not expressed in certain cell types, it isn't a physiologically relevant system to investigate Lphn2 dependent circuit development.

      We thank Reviewer 2 for the appreciation of our study. We wish to clarify, in response to the critiques of the artificial nature of misexpression system, that experiments involving loss-of-function of endogenous Lphn2 have been described in our previous study (Pederick et al., 2021). When we conditionally deleted Lphn2 in CA1, Lphn2+ mid-CA1 axons spread to distal, Ten3+ subiculum. Thus, both the gain-of-function experiment described in this study and the loss-of-function experiment described in Pederick et al., 2021 support the notion that Lphn2 acts in axons as a repulsive receptor for the Ten3 ligand.

      To strengthen this study, the following specific points could use addressing:

      1) While the data is strong, some of the terminology used is unclear, including use of terms "repulsive receptor" and "repulsive ligand". The authors use "repulsive receptor" to describe Lphn2 action for axon targeting, but repulsion and attraction processes are simultaneous. Is Lphn2 really by acting as a repulsive receptor, or alternatively, by acting to shift axon attraction to Lphn2 expressing subiculum neurons?

      We apologize for the lack of clarity. The terms “receptor” and “ligand” are used to refer to a molecule’s role in axons or target neurons, respectively, a common usage in the axon guidance field (Kolodkin and Tessier-Lavigne, 2011; PMID 21123392). Using a series of loss and gain of function manipulations, our previous data support a role for Lphn2 both as a repulsive receptor in axons and repulsive ligand in target neurons. When Lphn2 is deleted in CA1 axons they invade Ten3 subiculum target neurons. Similarly, deletion of Ten3 in the subiculum results in Lphn2-positive axons invading the Ten3 KO area. Unlike its partner Ten3, which can serve as an attractive receptor when the ligand is Ten3 and repulsive receptor when the ligand is Lphn2, Lphn2 only serves as a repulsive receptor to the Ten3 ligand. We (and others) have shown that Lphn2 does not bind homotypically (Boucard et al., 2014 and Pederick et al., 2021). We have clarified these points in the revised manuscript (2nd paragraph of Introduction).

      2) For their proposed axon guidance model to work, Lphn2 has to be signaling through Ga12/13 proteins near the axon growth cone to induce its collapse and retraction. By using Flag-tagged Lphn2 constructs in their assays, this should be visible. Clear Flag-Lphn2 signal is observed in the dendrites of infected cells (Figure1-figure supplement 1; Figure5- figure supplement 1). But does Flag-Lphn2 also localize to the pCA1 axons that are projecting to the subiculum?

      Thank you for this important question. We have added new data to show that FLAG-tagged Lphn2 is indeed found in CA1 axons. Please see our response in “Essential Revision #2” above.

      3) With their previous work, pCA1 to dSub circuit patterning is dependent on Ten3+ to Ten3+ homophilic attraction that exists between the two regions. Its unclear how ectopic Lphn2 is able to override this Ten3+ to Ten3+ connection patterning. Does ectopic Lphn2 outcompete Ten3 function in these neurons? Or alternatively, is Ten3 expression/localization impacted by the presence of ectopic Lphn2?

      We believe it is the former. Regarding the latter, please see our response in “Essential Revision #1” above.

    1. Author Response

      Reviewer #1 (Public Review):

      Idiosyncratic drug-induced liver injury is a disease that appears to be linked to mitochondrial DNA (mtDNA), but there is a lack of model cell lines for the study of this link. To help address this problem, the authors developed ten cybrid HepG2 cell lines that have had their mitochondrial DNA replaced with the mitochondrial DNA of ten human donors. Analysis of single nucleotide polymorphisms in all of the patients' mtDNA allowed the authors to assign the donors to two haplogroups (H and J) with five patients each. The authors also present the results of several assays (e.g. oxygen consumption, ATP production) performed on all ten cell lines in the absence and presence of five clinically-relevant drugs (or drug metabolites). Significant attention was paid to differences observed between the cell lines in the H and J haplogroups. The work is methodologically and scientifically rigorous, ethically conducted, and objectively presented according to the appropriate community standards.

      While I feel that the manuscript will be useful to the research field and is an important step towards improving patient outcomes, I feel that the work lacks a broad interest. Much of the paper is spent discussing small and/or statistically insignificant differences between haplogroups H and J. While some interesting interpretations and suggestions are presented in the discussion, the authors didn't perform follow-up experiments to try to nail down any particular mechanistic insights that would be useful to the broader community. I also didn't feel a strong sense that the paper produced any specific suggestions for how clinical outcomes could be improved. Accordingly, any clear insights that would be interesting to a broad scientific community would probably require follow-up studies.

      Again, we strongly believe that the subject is of broad interest to researchers in both academia and the pharmaceutical industry. Evidence of the level of interest in this subject can be quantified by the access metrics of the 3 publications we have recently published on this topic (Biochem Soc Trans, 2020, PMID: 32453388; Arch Toxicol, 2021, PMID: 33585966; Front Genetics, 2021, PMID: 34484295), which have been accessed >6000 times.

      The structure of the paper is also not friendly to a broad audience; the results are presented without interspersed commentary that could help the reader understand the meaning or utility of the results as they are being presented. Accordingly, I often felt unsure about how the results being presented were relevant to solving the broader problem established nicely in the introduction.

      We thank the reviewer for this comment and have revised the manuscript to now contain a combined results and discussion section.

      Finally, it wasn't clear that the generated cell lines were made available for anyone to purchase through a cell bank (perhaps the authors did do this, but I don't recall seeing a mention of it). As these cell lines appear to be the primary output of this work, it seems important to better highlight the extent to which they are being made accessible to the scientific community.

      The cells are currently in the process of being deposited under licence with XimBio. This will allow other researchers to easily access them. They are also available upon request from me. This has been conveyed in the revised manuscript (pg 18, lines 1-2).

      Reviewer #2 (Public Review):

      In this work, Ball et al. investigated the possibility to generate a novel set of HepG2 liver cell lines to generate "mitochondrial DNA-personalized" models as novel tools to study idiosyncratic drug-induced liver injury related to mitochondrial variation. This work represents the generation of a comprehensive collection of n=10 HepG2 lines, half reflecting haplogroup H and half reflecting haplogroup J. The authors then assessed their impact on basic mitochondrial function in liver cells. Interestingly, they find a greater respiratory complex activity driven by complex I and II of the haplogroup J lines relative to haplogroup H. Finally, the authors make an attempt at using this novel set of lines to probe the consequential effects of mitochondrial genotype on drug-induced liver toxicity. This work provides an interesting proof-of-concept study and is a starting point towards studying and predicting idiosyncratic drug-induced liver injury in a personalized manner. This technique may be broadly extrapolated to other commonly used liver cell models within the toxicology field.

      Strengths:

      1) This work presents an exciting initiative to study interindividual variability in idiosyncratic drug-induced liver injury focusing on mitochondrial haplotypes. In further follow-ups, this work could be extended to also represent other different haplogroups to establish a thorough "biobank". The established lines allow for future in-depth characterization and testing of many putative hepatotoxic compounds through a variety of toxicity measures that could shed further light on the impact of mitochondrial DNA variation on (idiosyncratic) drug-induced liver injury.

      2) This technique may be broadly extrapolated to other commonly used liver cell lines within the toxicology field (e.g. HepaRG cells or iPSC-derived cells) that are potentially also more metabolically competent. A short discussion on this could be added to the current manuscript.

      We thank the reviewers for this comment, which we agree with. We have now incorporated this into the conclusion (pg 18, lines 23 - 27).

      Weaknesses:

      1) The major weakness of the current manuscript is the rather large variation across sample measurements regarding the proof-of-concept experiments to study drug effects (fig. 3-6). This makes much of the data rather hard to interpret and to infer conclusions. As an example, proton leak (fig. 3f/4f) seems to 2-fold increase in the J group even under basal conditions (0 uM flutamide/metabolite), while this is not observed in fig. 2a and this effect seems to be also absent under 0 uM tolcapone (fig. 5f). Unfortunately, the current data do not allow us to draw confident conclusions about whether the tested drugs have effects on the mitochondrial respiration of the different haplogroups. This may well be linked to the methods used for measuring mitochondrial activity, but since this is the predominant method needed in the current paper, either increasing the number of experiments (across more lines) or identifying a more rigorous methodological manner to obtain consistencies of experiments would help the authors to make more confident claims about their data.

      The reviewers have noted the inherent variability in the respiratory measurements from plate to plate. To counter this, experiments were designed so that for each cybrid cell line the control and treated cells were always positioned on the same plate. However, we believe that the reporting of such data, and their limitations, is a fundamental aspect of unbiased science reporting feeding into the principles of data reproducibility. In this resubmission, we have updated the methodology of our data analysis, which better accounts for this variability. The new figures plot each cybrid as a distinct point to easily visualise the variation across haplogroups dependent upon each cybrid within the group. We have included this limitation in the conclusion (pg 18, lines 15 – 19).

      2) The data on the effects of inhibition of complex I/II activity are not sufficiently convincing to support the claim that haplogroup J is more susceptible to flutamide/metabolite (fig. 6). Both seem to respond rather identical to flutamide or its metabolite, i.e. at higher concentrations complex I/II activity decreases, but with the sole difference that the haplogroups represent different basal activity (not influenced by the drug). Estimating fold changes, for example, for both haplogroups, complex I and II activity decreases ca. 2-fold at the highest concentration of the metabolite (fig. 6c-d), therefore concluding that there is no difference between haplogroup susceptibility unlike the authors claim. It is furthermore unclear what the statistical significance currently represents: it should represent whether at different/increasing concentrations the activity of the complexes significantly differs vs. the previous/basal conditions from the same haplogroup. If it represents (which it seems to be) the significance of the haplogroup J vs. the haplogroup H, it is non-informative as it is obvious that haplogroup J presents with a higher baseline.

      Thank you for this comment, we agree with the shortcomings of statistical analysis in fig 6 and have reanalysed the dataset using a more appropriate statistical methodology, see response 2.2.

      3) It would help to mention how many lines per haplogroup H/J were used in the analyses across all figures. This should be clarified, as the error bars for most experiments are rather high and therefore statistical significance is lacking, making data interpretation complex. It could be helpful if the authors present at least for some analyses single plots of data obtained across different lines from the same haplogroup to evaluate the consistency of the effects of the genotypes as supplementary figures. If only 1-2 lines were used per group, it would help to perform additional experiments to assess consistencies across groups.

      We apologise that the number of lines per haplogroup that were employed in the analyses is unclear. In every case, we included 5 cybrid lines per haplogroup. We have further clarified this point in the methods and results. Furthermore, in the new figures, each cybrid is now represented as a single data point.

    1. Author Response

      Reviewer #2 (Public Review):

      1) A major point of the manuscript is the description of Hrc+ fibroblasts (Fibroblast 3) as profibrogenic in diabetes. However, fibroblast 3 expresses several cardiomyocyte markers Nppa, Ryr2, Ttn alongside Hrc which is described to play a role in Ca2+ handling at the sarcoplasmic reticulum in cardiomyocytes (Fig. 4C) and shows a low correlation with other fibroblast clusters (Fig. 4B). A possible explanation is technical, e.g. if two nuclei (one fibroblast, one cardiomyocyte) were captured together in one droplet (barcode collisions or doublets). Unfortunately, this uncertainty makes interpretation of all following snRNA-seq analyses based on this fibroblast subpopulation impossible.

      Thank you very much for the precious comments of the reviewer. We went over scRNA-seq results carefully. Firstly, for quality of cells, we used a relatively high threshold to ensure that we have filtered out the most of barcodes associated with empty partitions or doublet cells. We quantified the number of genes and UMIs, and kept high quality cells with the detection threshold of 500-2,500 genes and 600-8,000 UMIs. Then cells with unusually high detection rate of mitochondrial gene expression (≥10%) were excluded in this study. Taking into account the multicellular effects as you mentioned, we tried to identify doublets cells by applying the DoubletFinder (v2.0.3) by the generation of artificial doublets, using the PC distance to find each cell’s proportion of artificial k nearest neighbors (pANN) and ranking them according to the expected number of doublets. We finded that 3.20% cells (19 cells) were labeled as doublets in fibroblast-3 (594 cells). Then 19 doublet cells were removed, the trends of cell proportion and the Hrc gene expression trend in fibroblast-3 was the same as before. Therefore, our data analysis results do not affect the conclusions in this study, and it was also validated by Hrc and vimentin double immunostaining experiments (Figure 4E). Thanks again to the reviewer for these professional comments.

      2) To follow the study and be able to appreciate the data quality, individual sample metadata and UMAPs colored based on a sample and/or condition (diabetes or control) would be helpful. The paper would benefit from an analysis to show if the differences in the number of detected genes are due to the number of nuclei per cluster or if the bigger clusters are really also the ones with the most dramatic changes. Instead of showing expression levels of differentially regulated genes in distinct clusters (Fig1 S2), the differential expression could be displayed with violin plots or heatmaps that illustrate values for both conditions. Clusters that did not reveal any differential expressed genes, e.g. Adipo can be removed. Fig 1F these KEGG enrichments are hard to interpret since they can be confounded by highly expressed cardiomyocyte genes that are detected in all clusters (1B) and thus drive the GO enrichment of e.g. "cardiac muscle contraction" in T cells.

      Thanks to the reviewer for these comments. Fig1 S2 shows top 10 upregulated genes in different cell populations and the expression characteristics of these genes in a concise way. More detailed expressions levels of differentially regulated genes in distinct clusters can be seen in supplemental file 2-5. At the same time, if we use violin plot or heat maps to show the differential expression information of top 10 upregulated genes, we need too many supplement figures in the main text and therefore take up too much space. On the other hand, cell populations without differentially expressed genes in Figure 1E have been removed as you suggested.

      3) The study looks into the pathogenesis of cardiac fibrosis in diabetic mice. The authors show that downregulation of Itgb1 with siRNA (Fig 6I) leads to less fibrosis in diabetic mice. This effect might be expected since Itgb1 is an extracellular matrix-linked gene and might indicate that downregulation could be beneficial. Given this, it is confusing to see the following analysis which links several genetic variants associated with Type 2 Diabetes to Itgb1 (one leading to premature stop) and its ligand. This analysis seems out of place in relation to the remainder of the study which focuses to identify the downstream effects of diabetes on cardiac fibrosis.

      Thank you very much for the precious comments of the reviewer. We have deleted the results of the association of Itgb1 variants with diabetic cardiac fibrosis in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Han et al use sophisticated genetic approaches to investigate leptin-responsive neural circuits. Overall, this is an impressive series of studies that provide fairly convincing evidence for a key inhibitory pathway downstream of AGRP neurons. A few data sets require additional validation or explanation.

      We appreciate the reviewer’s strong interests and support of this manuscript and these valuable comments below. We have revised the manuscript accordingly to incorporate reviewer’s suggestions and critiques.

      Reviewer #2 (Public Review):

      Using a novel genetic system to conditionally ablate Lepr from Agrp neurons in adults, the authors discovered that leptin-AgRP neuron signaling strongly modulates the DMH and sought to understand the DMH targets and mechanisms of action in the response to AgRP neuron signaling. GABA signaling likely underlies the effects of AgRP neuron-mediated hyperphagia (etc). DMH Mc4R neurons appear to lie downstream of Agrp neurons. GABA in the DMH appears to mediate many of the effects of AgRP neurons on feeding and body weight. Furthermore, Deletion of Lepr from AgRP neurons increases DMH GABA-ARa3, and modulation of this receptor in the DMH alters food intake and the response to leptin.

      Unfortunately, there is little quantification or other validation data from many of the systems deployed, and the analysis jumps around a fair amount, without really uniting the results in a way that paints a convincing picture of the final model that they build.

      Thanks for these positive comments on our studies. In the revised manuscript, we have added substantial amount of new experimental data, more controls, and data validation that significantly strengthen our proposed model.

      Reviewer #3 (Public Review):

      The manuscript by Han et al characterizes a pathway from AgRP(LepR) neurons to DMH(MC4R) neurons that is involved in energy balance control. They use a conditional knockout strategy to show that AgRP(LepR) knockout increases body weight and this effect was reversible by blocking GABA signaling. They also showed that activation of AgRP-DMH projection increases food intake, and highlighted a role for alpha3-GABAA receptor signaling in the DMH for regulating feeding behavior. While these data highlight a potential circuit that modulates feeding, there are concerns about the paper in its current form that diminish enthusiasm. The lack of proper controls in many of the experiments raises doubts about the findings.

      Strengths: The authors use new tools to characterize a new circuit for leptin-mediated energy balance control. The conditional knockout has several advantages over previous techniques that are described within the manuscript. Further, the authors use combinations of different techniques (gene knockout, optogenetic manipulation, in vivo activity monitoring) to make observations at multiple levels of analysis.

      Weaknesses: Several experiments within the paper have worrisome caveats or lack proper controls, raising concerns about the overall conclusions made.

      We appreciate the reviewer’s positive comments. We added more control and validation data in our updated manuscript to support our conclusion.

    1. Author Response

      Reviewer #1 (Public Review):

      Demographic inference is a notoriously difficult problem in population genetics, especially for non-model systems in which key population genetic parameters are often unknown and where the reality is always a lot more complex than the model. In this study, Rose et al. provided an elegant solution to these challenges in their analysis of the evolutionary history of human specialization in Ae. aegypti mosquitoes. They first applied state-of-the-art statistical phasing methods to obtain haplotype information in previously published mosquito sequences. Using this phased data, they conducted cross-coalescent and isolation-with-migration analyses, and they innovatively took advantage of a known historical event, i.e., the spread of Ae. aegypti to South America, to infer the key model parameters of generation time and mutation rate. With these parameters, they were able to confirm a previous hypothesis, which suggests that human specialists evolved at the end of the African Humid Period around 5,000 years ago when Ae. aegypti mosquitoes in the Sahel region had to adapt to human-derived water storage as their breeding sites during intense dry seasons. The authors further carried out an ancestry tract length analysis, showing that human specialists have recently introgressed into Ae. aegypti population in West African cities in the past 20-40 years, likely driven by rapid urbanization in these cities.

      Given all the complexities and uncertainties in the system, the authors have done outstanding jobs coming up with well-informed research questions and hypotheses, carrying out analyses that are most appropriate to their questions, and presenting their findings in a clear and compelling fashion. Their results reveal the deep connections between mosquito evolution and past climate change as well as human history and demonstrate that future mosquito control strategies should take these important interactions into account, especially in the face of ongoing climate change and urbanization. Methodologically, the analytical approach presented in this paper will be of broad interest to population geneticists working on demographic inference in a diversity of non-model organisms.

      In my opinion, the only major aspect that this paper can still benefit from is more explicit and in-depth communication and discussion about the assumptions made in the analyses and the uncertainties of the results. There is currently one short paragraph on this in the discussion section, but I think several other assumptions and sources of uncertainties could be included, and a few of them may benefit from some quantitative sensitivity analyses. To be clear, I don't think that most of these will have a huge impact on the main results, but some explicit clarification from the authors would be useful.

      Below are some examples:

      Thank you very much for your kind words and your feedback! We have expanded our discussion of assumptions and uncertainties – we have responded to each point below:

      1) Phasing accuracy: statistical phasing is a relatively new tool for non-model species, and it is unclear from the manuscript how accurate it is given the sample size, sequencing depth, population structure, genetic diversity, and levels of linkage disequilibrium in the study system. If authors would like to inspire broader adoption of this workflow, it would be very helpful if they could also briefly discuss the key characteristics of a study system that could make phasing successful/difficult, and how sensitive cross-coalescent analyses are to phasing accuracy.

      We agree that this is an important topic to expand on. We have clarified as follows:

      Results, Page 4, last paragraph: “Over 95% of prephase calls had maximal HAPCUT2 phred-scaled quality scores of 100 and prephase blocks (i.e. local haplotypes) were 728bp long on average (interquartile range 199-1009bp). We then used SHAPEIT4.2 to assemble the prephase blocks into chromosome-level haplotypes, using statistical linkage patterns present across our panel of 389 individuals (25).”

      Discussion, Page 8, last paragraph: “Overall linkage disequilibrium is relatively low in Ae. aegypti, dropping off quickly over a few kilobases and reaching half its maximum value within about 50kb (37); this is likely sufficient for assembling shorter, high-confidence prephase blocks into longer haplotypes in many cases. However, phase-switch errors may be common across longer distances – potentially affecting inferences in the most recent time windows. Nevertheless, the similar results we obtain using different proxy populations (and thus different input haplotype structures) for human-specialist and generalist lineages (see Figure S1) suggest that our results are robust to potential mistakes in long-range haplotype phasing.”

      Discussion, Page 9, paragraph 2: “Here, we take advantage of a continent-wide set of genomes, combined with read-based prephasing and population-wide statistical phasing to develop a phasing panel that should enable future studies in Ae. aegypti with a lower barrier to entry. The same approach may work for other study organisms with similar population genomic properties; high levels of diversity are helpful for prephasing and at least moderate levels of linkage disequilibrium are important for the assembly of prephase blocks.”

      2) Estimation of mutation rate and generation time: the estimation of these importantparameters is made based on the assumption that they should maximize the overlap between the distribution of estimated migration rate and the number of enslaved people crossing the Atlantic, but how reasonable is this assumption, and how much would the violation of this assumption affect the main result? Particularly, in the MSMC-IM paper (Wang et al. 2020, Fig 2A), even with a simulated clean split scenario, the estimated migration rate would have a wide distribution with a lot of uncertainty on both sides, so I believe that the exact meaning and limitations of such estimated migration rate over time should be clarified. This discussion would also be very helpful to readers who are thinking about using similar methods in their studies. Furthermore, the authors have taken 15 generations per year as their chosen generation time and based their mutation rate estimates on this assumption, but how much will the violation of this assumption affect the result?

      This is a great point. We have expanded our discussion of how this assumption affects our conclusions (see Discussion page 9, first paragraph): “Furthermore, we chose a scaling factor that maximized overlap between the peak of estimated Ae. aegypti migration and the peak of the Atlantic Slave Trade (Fig. 2B). If we instead consider alternative scenarios where peak migration occurred at the very beginning of the slave trade era, around 1500, then our inferred mutation rate would be lower (about 2.4e-9, assuming 15 generations per year), pushing back the split of human-specialist lineages to about 10,000 years before present. This scenario seems less plausible, in part because our isolation-with-migration analyses suggest a gradual onset of migration between continents rather than a single, early-pulse model. It would also make it harder to explain the timing of the bottleneck we see in invasive populations; the first signs of this bottleneck occur at the beginning of the slave trade (~500 years ago) with our current calibration (Fig. S1A), but would be pushed to a pre-trade date in this alternative scenario. We can also consider a scenario in which peak Ae. aegypti migration occurred more recently, perhaps around 1850, corresponding to increased global shipping traffic outside the slave trade alone. In this case, our inferred mutation rate would be higher (or generation time lower), and the split of human-specialist lineages would be placed at about 3,000 years ago. Overall, the best match between the existing literature and our data corresponds to our main estimates, but alternative scenarios could gain support if future research finds evidence for a different time course of invasion than is suggested by the epidemiological literature.”

      We have slightly expanded our description of calibration in Results, page 5, last paragraph: “The fact that we see good overlap between the two distributions (yellow–white color) across a wide range of reasonable mutation rates and generation times for Ae. aegypti is consistent with our understanding of the species’ recent history and supports our approach. For example, if we take the common literature value of 15 generations per year (0.067 years per generation) (17, 20), the de novo mutation rate that maximizes correspondence between the two datasets is 4.85x10-9 (black dot in Figure 2A, used in Figure 2B), which is on the order of values documented in other insects. We chose to carry forward this calibrated scaling factor (corresponding to any combination of mutation rate and generation time found along the line in Figure 2A) into subsequent analyses.”

      We have also expanded on the uncertainty of our analyses (see Discussion page 8, last paragraph): “First, the temporal resolution of our inferences is relatively low, and both previously published simulations (39) and our own bootstrap replicates (Figure 2B–D, grey lines) suggest relatively wide bounds for the precise timing of events.”

      3) The effect of selection: all analyses in this paper assume that no selection is at play,and the authors have excluded loci previously found to be under selection from these analyses, but how effective is this? In the ancestry tract length analysis, in particular, the authors have found that the human-specialist ancestry tends to concentrate in key genomic regions and suggested that selection could explain this, but doesn't this mean that excluding known loci under selection was insufficient? If the selection has indeed played an important role at a genome-wide level, how would it affect the main results (qualitatively)?

      We have clarified that we excluded those loci from our timing estimates for both MSMC and ancestry tract analyses, but then re-ran the ancestry tract analysis with all regions included to visualize and assess how tracts were distributed along chromosomes. See Methods, page 12, paragraph 2: “Since selection associated with adaptation to urban habitats could shape lengths of admixture tracts, we masked regions previously identified as under selection between human-specialists and generalists when estimating admixture timing—namely, the outlier regions in (2). However, we used an unmasked analysis to determine and visualize the genome-wide distribution of ancestries (Fig. 3).”

      We have also added additional discussion of the expected effects of selection on our analyses (see Discussion, page 9, last paragraph): “Positive selection during adaptive introgression can increase tract lengths and make admixture appear to be more recent than it actually is. For this reason, we masked regions of the genome thought to underlie adaptation to human habitats before running our analysis. Nevertheless, if selection has acted outside these regions, admixture may be somewhat older than we estimate.”

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors present evidence from studies of biopsies from human subject and muscles from young and older mice that the enzyme glutathione peroxidase 4 (GPx4) is expressed at reduced levels in older organisms associated with elevated levels of lipid peroxides. A series of studies in mice established that genetic reduction of GPx4 and hindlimb unloading each elevated lipid peroxide levels and reduced muscle contractility in young animals. Overexpression of GPx4 or N- acetylcarnosine blocked atrophy and loss of force generating capacity resulting from hindlimb unloading in young mice. Cell culture experiments in C2C12 myotubes were used to develop evidence linking elevated lipid peroxide levels to atrophy using genetic and pharmacologic approaches. Links between autophagy and atrophy were suggested.

      Experiments on GPx4 expression levels, lipid peroxide levels, muscle mass and muscle force generating capacity were internally consistent and convincing. I thought the experiments supporting the view that autophagy contributed to atrophy were convincing. The hypothesis that altered lipidation of autophagy factors contributed was tested or supported in my view. Evidence for muscle atrophy in response to genetic or pharmacologic manipulations is a bit inconsistent throughout the paper, possibly because the small N of some experiments does not provide sufficient power to detect observed numeric differences in the means. The pattern of muscle fiber atrophy by fiber type is consistent throughout the paper but there is variability in which comparisons reached the threshold for significance, again, possibly because of the small N of the experiments. I agree with the authors that altered activity of enzymes in the contractile apparatus provides one explanation for the observed weakness but respectfully wish to point out there are others such as impaired excitation-contraction coupling which is well known to occur in aging.

      We thank Dr. Cardozo for taking time to carefully review our manuscript, and for providing an enthusiastic feedback for the significance of our work. We are grateful for additional suggestions and modified our manuscript accordingly.

      Reviewer #2 (Public Review):

      This is a well-written paper that reports that the accumulation of LOOH with age and disuse contributes to the loss of skeletal muscle mass and strength. Moreover, the authors report that LOOH neutralization attenuates muscle atrophy and weakness. The mechanism via which LOOH contributes to these phenotypes remains unclear but seems to be mediated by the autophagy- lysosomal axis. In addition, the paper also reports the efficacy of N-acetylcarnosine treatment in ameliorating muscle atrophy in mice.

      We thank the reviewer 2 for their positive response to our manuscript. Very much appreciated! Below please find our response to your specific comments.

      The authors should consider the following points to improve the manuscript:

      • The authors showed that inhibition of the autophagy-lysosome axis by ATG3 deletion or BafA1 was sufficient to reduce LOOH levels induced by GPx4 deletion, erastin, or RSL3. Moreover, they found that 4-HNE co-localizes with LAMP2. However, it remains unclear the precise mechanism via which LOOH contributes to muscle atrophy and how it is amplified by the autophagy-lysosomal axis. The authors could further test the functional interaction of 4-HNE with LAMP2 with additional experiments such as immunoprecipitation.

      Thank you for these comments. We agree with the reviewer that our observations on autophagy-lysosomal axis is yet backed by a tangible mechanism. To clarify, we only show 4HNE and LAMP2 colocalization to show that they are proximate to each other. We do not necessarily claim that LAMP2 is the protein that becomes 4-HNE-ylated. We are currently developing a proteomic platform to detect 4-HNE conjugations on peptides, and this should hopefully shed light to the nature of interaction between LOOH and the autophagy-lysosomal axis. We now include additional discussion on autophagy-lysosomal axis with LOOH in lines 280-291.

      • A weak point of the paper is not having performed the experiments on 24-month-old-mice. At 20 months of age, the mice do not display any muscle wasting and myofiber atrophy compared to young mice that have completed postnatal muscle growth (=6-month-old-mice). It would be interesting to see the levels of 4-HNE in 24- or 30-month-old mice, and if N-acetylcarnosine treatment in older mice is able to rescue muscle atrophy induced by aging.

      This is a nuanced but a very important point. We initially set out to study mice in the 24 months old mice, but these mice did not tolerate the hindlimb unloading procedure well and ended up using the 20 months old mice instead. While mice at this age tolerated our HU procedure well, they did not manifest significant reduction in muscle mass compared to young. We included additional discussions in lines 298-300 and 310-314. To address this point, we are currently performing a 6-month N-acetylcarnosine intervention in 24 months old mice, and examine the effect that this compound has on the effect of aging (without HU) in multiple organ systems. We have thus completed 2 cohorts for this preclinical trial. Results on the effects of long-term N- acetylcarnosine treatment on muscle will be included in the separate manuscript.

      Previous studies have shown that inhibition of autophagy accelerates (rather than protect) from sarcopenia, and that autophagy is required to maintain muscle mass (Masiero 2009, PMID: 19945408; Castets 2013, PMID: 23602450; Carnio 2014, PMID: 25176656). On this basis, the authors should test whether their findings are valid only in the context of disuse atrophy or also in the context of sarcopenia (=24-30-month-old mice).

      We agree with the reviewer that the role of autophagy and muscle mass is likely complex. In the current study, we only showed that a SHORT-TERM inhibition of autophagy by ATG3 deletion prevents muscle atrophy induced by a SHORT-TERM disuse intervention. Inhibition of autophagic machinery long-term will likely be detrimental, and as shown in references provided by the reviewer, accelerates sarcopenia. We now include these discussions in lines 280-287. We respectfully request that the experiments in 24-30 month old ATG3-MKO mice be beyond the scope of the study. As discussed above, there is much more to study regarding the nature of interaction between the autophagy-lysosomal axis and LOOH.

      • In Fig.2 the authors report that GPx4 KD, erastin, and RSL3 reduce the diameter of myotubes. For how long and when was the treatment done? Looking at the images, it seems that there are some myoblasts in the cultures treated with GPx4 KD, erastin, and RSL3. Is it possible that these compounds reduce myotube size by inhibiting myoblast fusion rather than by inducing myotube atrophy?

      Thank you for point this out. We now provide further details in the method section (lines 439- 443). For KD experiments, we treat myoblasts with virus simultaneous to differentiation, due to lower infection efficiency in myotubes. This is certainly a caveat. However, erastin and RSL3 experiments were done on fully differentiated myotubes. It is common to have non- differentiated myoblasts under differentiated myotubes.

      • MDA quantification was done in the gastrocnemius although all the experiments in this paper were performed in the soleus and EDL. It would be good if the authors could explain the reason for this.

      MDA and 4-HNE WB were done on gastroc for all mouse models because some soleus and EDL muscles are below 7 mg and provided insufficient materials to perform MDA or 4-HNE. Soleus and EDL were used for contractile experiments (gastr0c cannot be used for this experiment) and for histological analyses.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Jigo et al. measured the entire contrast sensitivity function and manipulated eccentricity and stimulus size to assess changes in contrast sensitivity and acuity for different eccentricities and polar angles. They found that CSFs decreased with eccentricity, but to a lesser extent after M scaling while compensating for striate-cortical magnification around the polar angle of the visual field did not equate to contrast sensitivity.

      In this article, the authors used classic psychophysical tests and a simple experimental design to answer the question of whether cortical magnification underlies polar angle asymmetries of contrast sensitivity. Contrast sensitivity is considered to be the most fundamental spatial vision and is important for both normal individuals and clinical patients in ophthalmology. The parametric contrast sensitivity model and the extraction of key CSF attributes help to compare the comparison of the effect of M scaling at different angles. This work can provide a new reference for the study of normal and abnormal space vision.

      The conclusions of this paper are mostly well supported by data, but some aspects of data collection and analysis need to be clarified and extended.

      1) In addition to the key CSF attributes used in this paper, the area under the CSF curve is a common, global parameter to figure out how contrast sensitivity changes under different conditions. An analysis of the area under the CSF curve is recommended.

      – We have added the area under the CSF (AULCSF) [lines 305-319, Fig 5 E-F; lines 339-343, Fig 6 E-F]. Differences for non-magnified and magnified stimuli are not eliminated.

      2) In Figure 2, CRFs are given for several SFs, but were the CRFs at the cutof-sf well-fitted? The authors should have provided the CRF results and corresponding fits to make their results more solid.

      – As reported in Fig 4A,C,E, the group data fits were very high (≥.98).

      3) The authors suggested that the apparent decrease in HVA extent at high SF may be due to the lower cutoff-SF of the perifoveal VM. Analysis of the correlation between the change in HVA and cutoff SF after M scaling may help to draw more comprehensive conclusions.

      – We have rephrased our explanation [lines 453-460]. As per your suggestion, we correlated the change in HVA and the cutoff SF after M scaling and found these correlations to be non significant.

      4) In Figure 6, it would be desirable to add panels of exact values of HVA and VMA effects for key CSF attributes at different eccentricities, as shown in Figures 4B, D, and F, to make the results more intuitive.

      – We have added these panels [FIG 6] and the corresponding analysis in the text [lines 321-343]

      5) More discussions are needed to interpret the results. 1) Due to the different testing distances in VM and HM, their retinae will be in a different adaptation state, making any comparison between VM and HM tricky. The author should have added a discussion on this issue.

      – Note that the mean luminance of the display (from retina to monitor) was 23 cd/m2 at 57cm and 19 cd/m2 at 115 cm. The pupil size difference for these two conditions is relatively small (< 0.5 mm) and should not significantly affect contrast sensitivity (Rahimi-Nasrabadi et al., 2021) [lines 483-491]. Moreover, the differences we get here are consistent with the asymmetries we (e.g., Carrasco, Talgar & Cameron, 2001; Cameron, Tai & Carrasco, 2002; Fuller, Park & Carrasco, 2009; Abrams, Nizam & Carrasco, 2012; Corbett & Carrasco, 2012; Himmelberg, Winawer & Carrasco, 2020) and many others (e.g., Baldwin et al., 2012; Pointer & Hess, 1989; Regan and Beverley, 1983; Rijsdijk et al., 1980; Robson and Graham, 1981; Rosén et al., 2014; Silva et al., 2008) have observed for contrast sensitivity when the vertical and horizontal meridian are tested simultaneously at the same distance.

      6) In Figure 4, the HVA extent appears to change after M-scaling, although the analysis shows that M-scaling only affects the HVA extent at high SF. In contrast, the range of VMA was almost unchanged. The authors could have discussed more how the HVA and VMA effects behave differently after M-scaling.

      – We had commented on this pattern and have further clarified it [lines 436-451]

      7) The results in Figure 4 also show that at 11.3 cpd, the measurement may be inaccurate. This might lead to an inaccurate estimate of the M scaling effect at 11.3 cpd. The authors should discuss this issue more.

      – We have explained why this data point is at chance [FIG 4 caption]

      8) The different neural image-processing capabilities among locations, which is referred to as the "Qualitative hypothesis", is the main hypothesis explaining the differences around the polar angle of the visual field. To help the reader better understand this concept, the author should provide further discussions.

      – We have expanded the discussion of the qualitative hypothesis of differences in polar angle (lines 86-92; lines 476-481).

      9) The authors should also provide more details about their measures. For example, high grayscale is crucial in contrast sensitivity measurements, and the authors should clarify whether the monitor was calibrated with high grayscale or only with 8-bit. Since the main experiment was measuring CS at different locations, it should also be clarified whether the global uniformity of the display was calibrated.

      – The monitor was calibrated with 8-bit at the center of the display [lines 607].

      – Regarding global uniformity, although we only calibrated at the center of the display, please note that the asymmetries are not due to the particular monitor we used. We have obtained these asymmetries in contrast sensitivity in numerous studies using multiple monitors over 20 years (e.g., Carrasco, Talgar & Cameron, 2001; Cameron, Tai & Carrasco, 2002; Fuller, Park & Carrasco, 2009; Abrams, Nizam & Carrasco, 2012; Corbett & Carrasco, 2012; Hanning et al., 2022a; Himmelberg et al., 2020) and other groups have reported these visual asymmetries as well (Baldwin et al., 2012; Pointer and Hess, 1989; Rosén et al., 2014). Also important, as we had mentioned in the Introduction [lines 55-59], the HVA and VMA asymmetries shift in-line with egocentric referents, corresponding to the retinal location of the stimulus, not with the allocentric location (Corbett & Carrasco, 2011).

      10) In addition, their method of data analysis relies on parametric contrast sensitivity model fitting. One of the concerns is whether there are enough trials for each SF to measure the threshold. The authors should have included in their method the number of trials corresponding to each SF in each CSF curve.

      – We have specified number of trials [lines 637-644]

      Reviewer #2 (Public Review):

      This is an interesting manuscript that explores the hypothesis that inhomogeneities in visual sensitivity across the visual field are not solely driven by cortical magnification factors. Specifically, they examine the possibility that polar angle asymmetries are subserved by differences not necessarily related to the neural density of representation. Indeed, when stimuli were cortically magnified, pure eccentricity-related differences were minimized, whereas applying that same cortical magnification factor had less of an effect on mitigating polar angle visual field anisotropies. The authors interpret this as evidence for qualitatively distinct neural underpinnings. The question is interesting, the manuscript is well written, and the methods are well executed.

      1) The crux of the manuscript appears to lean heavily on M-scaling constants, to determine how much to magnify the stimuli. While this does appear to do a modest job compensating for eccentricity effects across some spatial frequencies within their subject pool, it of course isn't perfect. But what I am concerned about is the degree to which the M-scaling that is then done to adjust for presumed cortical magnification across meridians is precise enough to rely on entirely to test their hypothesis. That is, do the authors know whether the measures of cortical magnification across a polar angle that are used to magnify these stimuli are as reliable across subjects as they tend to be for eccentricity alone? If not, then to what degree can we trust the M-scaled manipulation here? In an ideal world, the authors could have empirically measured cortical surface area for their participants, using a combination of retinotopy and surface-based measures, and precisely compensated for cortical magnification, per subject. It would be helpful if the authors better unpacked the stability across subjects for their cortical magnification regime across polar angles.

      –– We note that the equations by Rovamo and Virsu are commonly used to cortically magnify stimulus size. This paper has many citations, and the conclusions of many studies are based on those calculations [lines 115-128].

      –– In response to Rev’s 3 comment, “In lieu of carrying out new measurements, it could also suffice to compare individual cortical magnification factors to the performance to quantify the contribution to the psychophysical performance”, we found a significant correlation between the surface area and contrast sensitivity measures at the horizontal, upper-vertical and lower-vertical meridians. However, we found no significant correlation between the cortical surface with the difference in contrast sensitivity for fixed-size and magnified stimuli at 6 deg at each meridian. These findings suggest that surface area plays a role but that individual magnification is unlikely to equalize contrast sensitivity [lines 366-380; Fig 7; lines 511-529].

      2) Related to this previous point, the description of the cortical magnification component of the methods, which is quite important, could be expanded on a bit more, or even placed in the body of the main text, given its importance. Incidentally, it was difficult to figure out what the references were in the Methods because they were indexed using a numbering system (formatted for perhaps a different journal), so I could only make best guesses as to what was being referred to in the Methods. This was particularly relevant for model assumptions and motivation.

      –– We now detail M-scaling in the Introduction [lines 115-135], and we have fixed the references in the Methods section.

      3) Another methodological aspect of the study that was unclear was how the fitting worked. The authors do a commendably thorough job incorporating numerous candidate CSF models. However, my read on the methods description of the fitting procedure was that each participant was fitted with all the models, and the best model was then used to test the various anisotropy models afterwards. What was the motivation for letting each individual have their own qualitatively distinct CSF model? That seems rather unusual.

      Related to this, while the peak of the CSF is nicely sampled, there was a lack of much data in the cutoff at higher spatial frequencies, which at least in the single subject data that was shown made the cutoff frequency measure seem like it would be unreliable. Did the authors find that to be an issue in fitting the data?

      –– We have further clarified that we fit all 9 models to the grouped data [lines 177-178] and in Methods [lines 693, 716, 725], and that the fit in Figure 3 corresponds to the grouped data [Fig 3 caption]. As reported in Fig 4A,C,E, the group data fits were very high (≥.98). Please note that the cutoff spatial frequency is reliable. The data point (11.3 cpd) in the differences which does not follow the same function (Fig 4D,F) reflects the fact that for both magnified and not-magnified stimuli, performance was at chance, consistent with the fact that high SF are harder to discriminate at peripheral locations [Fig 4 caption].

      4) The manuscript concludes that cortical magnification is insufficient to explain the polar angle inhomogeneities in perceptual sensitivity. However, there is little discussion of what the authors believe may actually underlie these effects then. It would be productive if they could offer some possible explanation.

      –– We have expanded the discussion of qualitative hypothesis of differences in polar angle [lines 86-92; lines 476-481].

      –– We have expanded the discussion of possible mechanisms [lines 496-529].

      –– We have explained why having assessed the VM and HM and different distances does not significantly influence our measures [lines 483-491].

      –– We have expanded the discussion of how the HVA and VMA effects behave differently after M-scaling [lines 435-450].

      –– We have clarified that the fits are reliable and made explicit that the highest SF data point is at chance in both conditions [FIG 4 caption].

      Reviewer #3 (Public Review):

      Jigo, Tavdy & Carrasco used visual psychophysics to measure contrast sensitivity functions across the visual field, varying not only the distance from fixation (eccentricity) but also the angular position (meridian). Both parameters have been shown to affect visual sensitivity: spatial visual acuities generally fall off with eccentricity, it is now widely accepted that it is superior along the horizontal than the vertical meridian, and there may also be differences between the upper and lower visual field, although this anisotropy is typically less pronounced. The eccentricity-dependent decrease in performance is thought to be due to reduced cortical magnification in peripheral compared to central vision; that is, the amount of brain tissue devoted to mapping a fixed amount of visual space. The authors, therefore, include a crucial experimental condition in which they scale the size of their stimuli to account for reduced cortical magnification. They find that while this corrects for reduced performance related to stimulus eccentricity, it does not fully explain the variation in performance at different visual field meridians. They argue that this suggests other neural mechanisms than cortical magnification alone underlie this intra-individual variability in visual perception.

      The experiments are done to an extremely high technical standard, the analysis is sound, and the writing is very clear. The main weakness is that as it stands the argument against cortical magnification as the factor driving this meridional variability in visual performance is not entirely convincing. The scaling of stimulus size is based on estimates in previous studies. There are two issues with this: First, these studies are all quite old and therefore used methods that cannot be considered state-of-the-art anymore. In turn, the estimates of cortical magnification may be a poor approximation of actual differences in cortical magnification between meridians.

      –– We note that the equations by Rovamo and Virsu are commonly used to cortically magnify stimulus size. This paper has many citations, and the conclusions of many studies are based on those calculations [lines 115-128].

      –– In response to Rev’s 3 comment, “In lieu of carrying out new measurements, it could also suffice to compare individual cortical magnification factors to the performance to quantify the contribution to the psychophysical performance”, we found a significant correlation between the surface area and contrast sensitivity measures at the horizontal, upper-vertical and lower-vertical meridians. However, we found no significant correlation between the cortical surface with the difference in contrast sensitivity for fixed-size and magnified stimuli at 6 deg at each meridian. These findings suggest that surface area plays a role but that individual magnification is unlikely to equalize contrast sensitivity [lines 366-380; Fig 7; lines 511-529].

      Second, we now know that this intra-individual variability is rather idiosyncratic (and there could be a wider discussion of previous literature on this topic). Since these meridional differences, especially between upper and lower hemifields, are relatively weak compared to the variance, a scaling factor based on previous data may simply not adequately correct these differences. In fact, the difference in scaling used for the upper and lower vertical meridian is minute, 7.7 vs 7.68 degrees of visual angle, respectively. This raises the question of whether such a small difference could really have affected performance.

      That said, there have been reports of meridional differences in the spatial selectivity of the human visual cortex (Moutsiana et al., 2016; Silva et al., 2017) that may not correspond one-to-one with cortical magnification. This could be a neural substrate for the differences reported here. This possibility could also be tested with their already existing neurophysiological data. Or perhaps, there could be as-yet undiscovered differences in the visual system, e.g., in terms of the distribution of cells between the ventral and dorsal retina. As such, the data shown here are undoubtedly significant and these possibilities are worth considering. If the authors can address this critique either by additional experiments, analyses, or by an explanation of why this cannot account for their results, this would strengthen their current claims; alternatively, the findings would underline the importance of these idiosyncrasies in the visual cortex.

      We now include discussion of the different points that the reviewer raised here in our new section 'What mechanism might underlie perceptual polar angle asymmetries' [lines 497-530].

    1. Author Response

      Reviewer #1 (Public Review):

      • The statistical procedures used are not completely described and may not be appropriate.

      We revised the text in Methods and Results sections to give more details about the methods used.

      -As only two levels of delay were tested, it is not possible to directly test whether the subjective discounting function is hyperbolic or exponential and hence whether the delay is encoded subjectively or objectively.

      We agree with the reviewer. A higher number of task parameters may offer a better resolution to evaluate the discounting functions. Fortunately, this does not affect our main results.

      • The task has several variable interval lengths (hold in: 1.2-2.8 s, short delay: 1.8-2.3 s, long delay: 3.5-4s) that frustrate interpretation. The distribution of these delays is not described, for example as it reads it seems possible that some long delay rewards are delivered with shorter latency between cue and reward than some short delay rewards (1.2 + 3.5 = 4.7s vs. 2.8+2.3 = 5.1 s).

      We revised the text to address that ambiguity. In the new version of the manuscript, we describe short versus long delays considering the total delay intervals between instruction cue onset and reward delivery [short delay (3.5-5.6s) and long delay (5.2-7.3s)]. Within each delay category, individual delays were distributed in a gaussian fashion such that the two delay ranges overlapped for 9% of trials. These details are now described in the revised Methods section (pg. 22).

      -The authors have not considered that if the delay value is encoding, then the value, both objectively and subjectively, may be changing as the delay elapses. The variation of these task intervals may have an effect on the value of delay.

      In the present study, we report a dynamic integration between the desirability of the expected reward and the imposed delay to reward delivery across the waiting period. Our results (e.g. see Fig. 6) do not fit with simple linear (or logarithmic) effects corresponding to continuous regular changes as the delay elapses. We found different types of interactions (Discounting± and Compounding±) at different periods of the hold period and in different single units. We did not find a way to model all these types of interactions with this type of approach.

      Reviewer #2 (Public Review):

      • Plots of "rejection rate" (trials where the monkeys failed to wait until the rewards) as a function of delay and reward size seem to indicate that the monkeys understood the visual cue. The rejection rates were very low (less than 4% for almost all conditions) which indicates that the monkeys did not have a hard time inhibiting their behavior. It also meant that the authors could not compare trials where the monkeys successfully waited with trials where they failed to wait. This missing comparison weakens the link between the neurophysiological observations and the conclusions the authors made about the signals they observed.

      Here, our main goal was to describe the dynamic STN signals engaged during the waiting period without studying action-related activities. In the discussion (pg. 20), we clearly wrote ‘Further research is needed to determine whether the neural signals identified here causally drive animals’ behavior or rather just participate to reflect or evaluate the current situation.’ Consequently, our conclusions were already tempered by that point.

      In addition, we address the same limitation by writing (pg. 20): “An important avenue for future research will be to determine how STN signals, such as those described here, change when animals run out of patience and finally decide to stop waiting. To do this, however, smaller reward sizes and longer delays might be used to promote more escape behaviors during the delay interval.”

      • The authors examined the STN activity aligned to the start of the delay and also aligned to the reward. Most of the "delay encoding" in the STN activity was observed near the end of the waiting period. The trouble with the analysis is that a neuron that responded with exactly the same response on short and long trials could appear to be modulated by delay. This is easiest to see with a diagram, but it should be easy to imagine a neural response that quickly rose at the time of instruction and then decayed slowly over the course of 2 seconds. For long trials, the neuron's activity would have returned to baseline, but for short trials, the activity would still be above baseline. As such, it is not clear how much the STN neurons were truly modulated by delay.

      We agree with the reviewers. Our original analyses using two-time windows had the potential to introduce biases in the detection of neuronal activities modulated by the delay. To overcome this issue, we modified the time frame of all of our analyses (neuronal activity, eye position, EMG). Now, the revised version of the manuscript only reports activities across one-time window aligned to the time of instruction cue delivery (i.e., -1 to 3.5s relative to instruction cue onset). This time frame corresponds to the minimum possible interval between instruction cues and reward delivery. We have revised all of the figures and we re-calculated all of the statistics using that one analysis window. Despite these major modifications, our key findings were not changed substantially. We found the same pattern in STN activities, with a strong encoding of reward (48% of neurons) preceding a late encoding of delay (39% of neurons). We also updated the text in Methods and Results sections to reflect the revised analyses.

      • Another concern is the presence of eye movement variables in the regressions that determine whether a neuron is reward or delay encoding. If the task variables modulated eye movements (which would not be surprising) and if the STN activity also modulated eye movements, then, even if task variables did not directly modulate STN activity, the regression would indicate that it did. This is commonly known as "collider bias". This is, unfortunately, a common flaw in neuroscience papers.

      Because the presence of eye variables did not influence how neurons were selected by the GLM, we do not think it likely that our analysis was susceptible to “collider bias”. Nonetheless, to control for that possibility directly, we have now repeated the GLM analyses with eye movement variables excluded. Results are shown in a new figure (Fig.4 – supplementary 1). Exclusion of eye parameters produced results that are very similar to those from the GLM that included eye parameters (differences <3 degrees). We have added text to the manuscript describing this added control analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      The work integrated genomic and transcriptomic data to reconstruct the origin of the svPDE gene from the ancestral ENPP3 gene. The authors also analyzed the expression of svPDE along different snake lineages and different tissues in three species of venomous snakes. Finally, they purified an svPDE from the venom of Naja atra and analyzed its crystallographic structure and enzymatic function. The experiments are adequately designed and carefully planned and the conclusions made by the authors are well supported by evidence.

      I have the following suggestions:

      1) I could not find a section where the authors provided information regarding the origin of the analyzed venom and tissues. i.e. muscle tissue from Naja atra and venom for purification of svPDE. It is important to include this information.

      We thank the reviewer for mentioning this.

      The information for the venom purification has been described in Results (LINE 116) as “This svPDE was directly purified from the crude venom of Naja atra captured in Taiwan”. The information for the tissues of sequencing data has been included in Results (LINE 117) as “… with publicly available RNA-Seq data and compared them with the corresponding genomes available in the NCBI Assembly database (SI Appendix, Table S1)”, and Material and Methods (Line 403) as “DNA was extracted from the muscle tissue of a male Naja atra …”.

      Also, the SI Appendix Table S1 summarized all samples used for sequence analysis with their tissue origins.

      We are still grateful for this comment and have updated the text to make it clearer as follows:

      “The target genomes included the draft one of Naja atra sequenced from a muscle tissue (ongoing internal project, see Material and Methods for detail) and the complete one of its sister species, Naja naja, from the public data (Suryamohan et al., 2020).”

      We have also updated the text when the first time mentioning the comparative genomics and transcriptomes analysis to indicate where the information is described.

      “To test our hypothesis, we comprehensively de novo assembled transcriptomes from the species across 13 clades of Toxicofera (Fig. 1B) with publicly available RNA-Seq data and compared them with the corresponding genomes available in the NCBI Assembly database (see SI Appendix, Table S1 for sample details).”

      2) The authors mention (Line 156) that "the genomic sequences of svPDE-E1a were present in all species of Serpentes but not in the species of Dactyloidae, Varanidae, and Typhlopidae.". As I understand it, the family Typhlopidae is included in the Suborder Serpentes. The conclusions stand of course, but I believe it is worth revising, for accuracy.

      We thank the reviewer for noticing this issue.

      We have updated the text as follows to prevent misleading:

      From “the genomic sequences of svPDE-E1a were present in all species of Serpentes but not in the species of Dactyloidae, Varanidae, and Typhlopidae. This suggests an early emergence of svPDE-E1a in the common ancestor of Serpentes and became …”

      To

      “the genomic sequences of svPDE-E1a were present in all species of Serpentes except for the earliest diverged Typhlopidae. This suggest an early emergence of svPDE-E1a in the Serpentes evolution and became …”

      3) During the discussion (Line 315), it is stated that the expression of svPDE in Lamprophiidae is probably associated with the adaptation of prey selection as a dietary generalist compared to Viperidae and Elapidae. Provided that both of these clades have several species considered dietary generalists, I believe this statement is not strongly supported.

      We agreed with the reviewer’s comment that we overstated it without solid support. However, here we believe it is worth mentioning and providing a hint for future studies that Lamprophiidae, a less-known clade, has svPDE expression and is not lower than several species of Elapidae. Therefore, we have revised this paragraph to include the finding without further speculations.

      “Comparative transcriptomics is a powerful tool to reveal species-specific or tissue-specific novel transcripts, providing new insights for further studies. For example, the svPDE expression of Lamprophiidae, even higher than several species of Elapidae, indicates the worth of further study for this less-known clade to fill the knowledge gap.”

      4) Also in the discussion (Line 320), the authors mention that Colubridae is traditionally regarded as a non-venomous clade. This statement is far from accurate given that Colubridae is a very diverse clade and several species within it have been shown to be at least moderately venomous. Various species have been shown to produce secretions comparable to those of front-fanged snakes. Furthermore, despite their difference in morphology, I believe there is little to no evidence that suggests Duvernoy's glands in colubrids have any functions differing from the venom glands of front-fanged snakes.

      We thank reviewer’s comment for revising the interpretation. This paragraph has been rewritten to as follows:

      “Interestingly, the svPDE expression in Duvernoy’s glands of Colubridae, although low, several species within the diverse Colubridae clade have been shown to be moderately venomous. The expression of svPDE in the Duvernoy’s glands also highlights its potential function despite that Duvernoy’s glands exhibit morphological difference from the venom glands of front-fanged snakes”

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript "Interplay between PML NBs and HIRA for H3.3 dynamics following type I interferon stimulus" by Kleijwegt and colleagues describes a study that's set out to explore the details of the PML-HIRA axis in H3.3 deposition at ISGs upon IFN-I stimulation. First, the authors establish that HIRA colocalized at PML NBs upon TNFa and TNFb treatment. This process is SUMO-dependent and facilitated by at least one of the identified SIM domains of HIRA. Next, the authors set out to determine whether interferon responsive genes (ISGs) are dependent on HIRA or PML. By knocking-down either HIRA or PML, only an effect on ISGs was observed when PML was knocked down. In fact, immune-FISH showed that PML NBs are in close proximity of ISGs upon TNFb treatment. To address the histone chaperone function of HIRA, the deposition of the replication-independent H3.3 on ISGs is tested. In specific, the enrichment of H3.3 across the ISG gene body. ChIP-seq data (Fig 5B) showed an enrichment around the TES, whereas qPCR (Fig 5A) showed less convincing enrichment (for details see below). When either HIRA or PML are knocked down, a mild loss of H3.3 enrichment was observed (Fig 5E). Interestingly, when HIRA is sequestered away from PML NBs by Sp100, an increased enrichment of H3.3 was observed. To understand the interplay between H3.3 deposition and HIRA's role in this process in the presence of PML NBs, H3.3 was overexpressed. Two population of cells were observed: low or high levels of H3.3. In the former, HIRA formed foci and the latter, HIRA did not form foci. Surprisingly, when HIRA is overexpressed, PML NBs form in the absence of TNFb. Finally, a two-sided model is proposed, where PML NBs is required for ISG transcription promoting H3.3 loading. The second side is that PML NBs function as a "storage center" for HIRA to regulate its availability.

      Overall, it the model is intriguing, but the data presented seems insufficient to support the current claims.

      We thank the reviewer for his/her constructive comments. We want to point out that there is a confusion in the reviewer's statement (highlighted in red here above) between TNFb and IFNb, because it is IFNb that was mostly used in our study. We suppose it is a typo error. Concerning the sentence: "when HIRA is overexpressed, PML NBs form in the absence of TNFb", it is inaccurate. Indeed, PML NBs are present in our cells with or without IFNb treatment. Overexpression of HIRA triggers accumulation of the ectopic HIRA in the PML NBs in absence of IFNb, probably as part of a buffering mechanism.

      Major concerns:

      • The suggested function of HIRA at the PML NBs as storage is interesting. Ideally, this would be tested by real-time single molecule tracking.

      While surely interesting, we believe that the real-time single molecule tracking is beyond the scope of our article. In addition, with our hypothesis that PML NBs act as buffering places for HIRA, HIRA might come in and out of PML NBs depending on its concentration and/or the availability of free binding sites and single molecule tracking might not be informative for long- term possible storage functions of PML NBs.

      • The link between PML NBs containing HIRA and H3.3 deposition is very intriguing and indeed the ChIP-seq data shown in Figure 5B shows a clear increase in the H3.3 signal around the TES. This distribution is very intriguing as recent work (Fang et al 2018 Nat Comm) showed that H3.3 deposition across the gene body was diverse and dynamic. Ideally, the qPCR of some select ISGs would confirm the ChIP-seq data. Here a more complex picture emerges. Just as with the ChIP-seq, a modest decrease of H3.3 at the TSS was observed, but only in 2 of the 3 genes shown is H3.3 enriched at the TES and only in 1 gene (ISG54) is H3.3 enriched at the gene body. As qPCR is later used in the manuscript (Fig 5E and 5G), it is essential that the results of two different techniques give similar results. With regards to Fig 5E and 5G, it is unclear why certain gene regions are shown, but not others.

      We agree with the reviewer that distribution of H3.3 on active genes follows a diverse and dynamic pattern. H3.3 is enriched on gene bodies but several papers have shown an important increase of H3.3 loading on the TES region of actively transcribed genes (Tamura et al. 2009; Sarai et al. 2013). Our ChIP-qPCR data (Figure 6A) and our ChIP-Seq data (Figure 6B) are consistent and show a moderate increase of H3.3 on gene bodies, eg on MX1 mid or ISG54 mid regions shown by qPCR on Figure 6A (this enrichment is reproducible but not necessarily statistically significant) and on gene bodies of the 48 core ISGs as shown in our ChIP-Seq data (see the light blue line between TSS and TES on figure 6B). In addition, our ChIP-qPCR and ChIP-Seq data also consistently show a higher enrichment of H3.3 on the TES regions of ISGs (see the significant enrichment found in ChIP-qPCR in the TES regions of MX1, OAS1 and ISG54, as well as the strong increase in H3.3 deposition with IFN seen by the light blue line for ChIP- Seq data on figure 6B).

      Since the strongest enrichment for H3.3 was found on the TES region, we focused on this region to evaluate the impact of HIRA or PML knock-down. Our ChIP-Seq data (now added in main Figure 6F for the whole ISG region, or with a zoom on the TES region in Figure 6G) shows that the strongest effect of HIRA or PML knock-down is indeed visible in the TES region of ISGs. Our ChIP-qPCR presented on Figure 6E data totally supports this effect.

      Overall, the link between HIRA and PML in H3.3 loading is only mildly affected (Fig 5E and 5F). The conclusion that HIRA and PML are essential (Page 12, line 8) is not represented by the presented data. The authors propose that DAXX could play a role. Indeed, work on another H3 variant, CENP-A, showed that non-centromeric localization is dependent on both HIRA and DAXX (Nye et al 2018 PLoS ONE). It would be interesting to learn if a double knock-down of HIRA and DAXX can prevent the enrichment of H3.3 at TES of ISGs upon TNFb treatment.

      To address the first part of the comment, we have now added 3 things :

      (1) we have tuned-down our conclusion by saying that HIRA and PML are 'important' for the long-lasting deposition of H3.3 on ISGs,

      (2) we provide new data of time-ChIP qPCR experiments suggesting that HIRA is important for H3.3 recycling during transcription of ISGs. We believe that these results strengthen the importance of HIRA for the global H3.3 enrichment on ISGs (by acting both in the de novo deposition and/or recycling of H3.3).

      We agree with the reviewer that it could be interesting to study the impact of the double knock-down of DAXX and HIRA on H3.3 enrichment at ISGs. However, we decided to focus our attention on SP100 since it could help us to better tease apart the role of HIRA localization in PML NBs, versus its role in H3.3 deposition at ISGs. In addition, since SP100 knock-down unleashes ISGs transcription, it also provided us with the opportunity to study the impact of an elevated ISGs transcription on H3.3 deposition and whether this is also mediated by HIRA.

      (3) we thus now also provide data of the double knock-down of SP100 and HIRA showing that the increase in H3.3 loading on ISGs seen upon SP100 knock-down is mediated by HIRA. This new result also strengthens the importance of HIRA for H3.3 enrichment on ISGs upon transcription.

      • In Figure 6B, two versions of HIRA are overexpressed and the authors conclude that the number of PML NBs goes up. Earlier in the manuscript, the authors showed that PML NB formation upon IFNb exposure brings HIRA into the PML NBs via a SUMO-dependent mechanism. Is overexpression of HIRA and its accumulation in PML NBs also SUMO-dependent or SUMO-independent? Overexpressing the SIM mutants from Figure 3F would address this question. In addition, the link between the proposed HIRA being stored at PML NBs could be strengthened by overexpressing HIRA and see at both short and late time points whether H3.3 is enriched on ISG genes.

      We want to clarify the first point: we do not conclude that the number of PML NBs goes up upon overexpression of HIRA. The number of PML NBs seems stable, although we have not quantified it. The aim of Figure 4A (previously Figure 6B) is to show that upon overexpression, ectopic forms of HIRA localize in PML NBs without IFN-I treatment, as part of a buffering mechanism.

      The SIM mutant of HIRA from Figure 3F is indeed overexpressed and does not localize in PML NBs upon IFN-I treatment. We have now added an IF (Figure 3- figure supplement 1C) showing that it does not localize either in PML NBs in non-treated cells. Thus, this underscores that accumulation of ectopic HIRA in PML NBs is SUMO-SIM-dependent regardless of the IFN-I treatment.

      • BJ cells are known to senesce rather easily. Did the authors double-check what fraction of their cells were in senescence and whether this correlated with the high or low expression of ectopic H3.3?

      BJ cells can indeed enter into senescence, but there are less prone to senesce than other human primary cells such as IMR90 for example. Nevertheless, we checked EdU incorporation both in BJ cells (Figure 1 - Figure supplement 1F) and BJ eH3.3i cells with expression of ectopic H3.3, with or without IFN-I treatment (Figure R2 for reviewer). We could clearly see that in our conditions (Dox addition for 24h maximum, IFNb at 1000U/mL for 24h), there is no significant difference in the number of EdU+ cells (ie proliferating cells), thus excluding effects due to senescence entry. As positive control, we have treated BJ cells with etoposide, a known senescence-inducing drug (Kosar et al., 2013; Tasdemir et al., 2016) which indeed reduces the number of EdU positive cells. We have now added a sentence in the main text as well to underscore that cells are not senescent.

      • In Figure 6 - figure supplement D, it appears that the levels of HIRA go up upon TSA and IFNb treatment. Rather than relying on visual inspection, ideally, all Western blots should be quantified to confirm the assessment that protein levels are not affected by different experimental procedures.

      We now provide quantification of all WBs below each WB. In addition, we have removed data on TSA since it could appear too preliminary.

      Reviewer #2 (Public Review):

      HIRA chaperone complex has been previously shown to localize at PML Nuclear Bodies upon various stress or stimuli (senescence, viral infections, interferon treatment). The authors have previously unraveled an anti-viral role of PML NBs through the chromatinization of HSV-1 viral genome by H3.3 chaperones. Here, the authors identify SUMOylation, as well as a SIM-like sequence in HIRA, as drivers for HIRA recruitment at PML Nuclear Bodies upon interferon-I treatment. These HIRA-containing PML NBs localize close to interferon-stimulated gene (ISG) loci. Although the functional role of HIRA/PML interaction is yet not solved, HIRA and PML regulate H3.3 loading at transcriptional end sites of IGS upon Interferon-I treatment. The authors propose that PML NBs play a buffering role for HIRA, regulating its availability depending on H3.3 level or chromatin dynamics.

      Strength:

      The authors used primary human diploid BJ fibroblasts, a relevant cell line for studying physiological regulation upon inflammatory cytokines. The role of SUMO/SIM on HIRA localization upon interferon beta treatment was assessed using interesting - but already described - tools, such as SUMO-specific affimers. The authors provide convincing results on the requirement of PML SUMOylation and a putative SIM sequence on HIRA for its localization at PML Nuclear Bodies. Other interesting observations are described, such as some PML or HIRA-dependent long-lasting H3.3 loading at transcription end site of ISGs upon interferon beta treatment, as shown by ChIP analyses of ISG loci, but also by endogenous H3.3 ChIPseq analysis.

      Weakness:

      The authors claim HIRA partitioning at PML NBs correlates with increase in "PML valency" upon interferon-I. The "valency" refers to the number of interaction domains, but the number of SUMOs conjugated on PML is not explored here (nor the number of SIMs on HIRA). Although the authors have proposed interested hypothesis and discussion, the inhibitory role of H3.3 overexpression or acetylation inhibition on HIRA localization at PML Nuclear Bodies clearly deserves further investigations.

      More generally, the manuscript explores many directions, but the links between the various observations remain unclear and limit firm conclusions.

      We thank the reviewer for his/her constructive comments.

      We have now addressed these 3 weaknesses pointed out by the reviewer.

      • Our claims on PML valency have been removed. We have now underscored the link between HIRA accumulation in PML NBs and the increase in PML and SP100 protein levels, without lingering on the valency aspects which was not the focus of our paper.

      • The role of H3.3 overexpression in inhibition of HIRA localization in PML NBs has been moved in the first part of the paper describing the mechanistic for accumulation of HIRA in PML NBs. We feel that these data are still of importance and support the role of PML NBs as a buffering place for HIRA depending on DAXX levels (new data) as well as H3.3 levels.

      We agree that the acetylation inhibition would deserve further investigations and we have thus removed the part on TSA treatment.

      • Thanks to the reviewer's comments, we have now remodeled the article to better convey two main conclusions : (1) PML NBs serve as a buffering site for HIRA. Accumulation of HIRA in PML NBs depends both on PML and SP100 concentration (and on PML SUMOylation) as well as DAXX and H3.3 levels and (2) upon IFN-I treatment, PML regulates ISGs transcription and thus indirectly regulates HIRA loading on ISGs, which controls H3.3 deposition and recycling during transcription. HIRA-mediated H3.3 deposition/recycling is highly dependent on ISGs transcription levels and is thus increased upon SP100 knock-down which unleashes ISGs transcription.
    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides the first cellular analysis of how neuronal activity in axons (in this case the optic nerve) regulates the diameter of nearby blood vessels and hence the energy supply to neuronal axons and their associated cells. This is an important subject because, in a variety of neurological disorders, there is damage to the white matter that may result from a lack of sufficient energy supply, and this paper will stimulate work on this important subject.

      Axonal spiking is suggested to release glutamate which activates NMDA receptors on myelin-making oligodendrocytes wrapped around the axons: the oligodendrocytes - either directly or indirectly via astrocytes - then generate prostaglandin E2 which relaxes pericytes on capillaries, thus decreasing the resistance of the vascular bed and (presumably) increasing blood flow in the nerve.

      Strengths of the paper

      The paper identifies some important characteristics of axon-vascular coupling, notably its slow temporal development and long-lasting nature, the involvement of PgE2 in an oxygen-dependent manner, and a role for NMDARs. Rigorous criteria (constriction and dilation of capillaries by pharmacological agents) are used to select functioning pericytes for analysis.

      Weaknesses of the paper

      The study focuses exclusively on pericytes. It would have been interesting to assess whether arteriolar SMCs also contribute to regulating blood flow

      We thank reviewer #1 for his/her positive comment on our manuscript. We also share the future interest in the optic nerve’s arteriole (there is only one main arteriole covered by SMC). However, it is not always visible in the preparation due to the orientation of the nerve - if not on the surface and directly under the microscope it is not possible to image it.

      Reviewer #2 (Public Review):

      This paper describes a new concept of "axo-vascular coupling" whereby action potential traffic along white matter axons induces vasodilation in the mouse optic nerve. This is an initial report dissecting some of the mechanisms that are undoubtedly complex as in gray matter NVC. I like the novel AVC concept.

      We really appreciate the reviewer’s positive comments.

    1. Author Response

      Reviewer #2 (Public Review):

      Understanding the molecular mechanism of obesity-associated OA is highly in clinical demand. Overall, the current study is well-designed and illustrated that down-regulated GAS6 impairs synovial macrophage efferocytosis and promotes obesity-associated osteoarthritis. Based on the patient's sample, the data indicated synovial tissues are highly hyperplastic in obese OA patients and infiltrated with more polarized M1 macrophages than in non-obese OA patients. Further authors proved that obesity promotes synovial M1 macrophage accumulation and GAS6 was inhibited in synovitis during OA development in mice models. The sample size, data collection, and quality of the IHC and immunofluorescent histological sections are outstanding. The results were well presented with appropriate interpretation. But the following major questions should be addressed.

      Major:

      1) Animal model: Ten-week-old animals received DMM surgery and were fed a standard/HFD diet for 4 or 8 weeks prior to specimen harvest. Since Wang J and other studies have shown that male ApoE(-/-) and C57BL/6J wild-type (WT) mice fed with a high-fat diet for 12 or 24 weeks, and the ApoE(-/-) mice gained less body weight and had less fat mass and lower triglyceride levels with better insulin sensitivity and lower levels of inflammatory markers in skeletal muscle than WT (Wang J, et al. Atherosclerosis. 2012 Aug;223(2):342-9. PMID: 22770993; Hofmann SM, et al. Diabetes. 2008 Jan;57(1):5-12. PMID: 17914034; Kypreos KE et al. J Biomed Res. 2017 Nov 1;32(3):183-90. PMID: 29770778). Thus, it is very important to provide the data on the final body weight gained in your groups and provide a relative background of the animal model chosen in the introduction or discussion. Please explain why ApoE-/- mouse model, and how this animal model is clinically relevant. Does a high-fat diet induced obsess OA available in C57BL/6 WT?

      Thank you for your valuable comment. We have added the body weight change data for each group of mice in Revised Figure 2-figure supplement 3. We also provided a relative background of the animal model in paragraph 2 of the Discussion section, which reads, “ApoE plays an important role in maintaining the normal levels of cholesterol and triglycerides in serum by transporting lipids in the blood. Mice lacking ApoE function develop hypercholesterolemia, increased very low-density lipoprotein (VLDL) and decreased high-density lipoprotein (HDL), exhibiting chronic inflammation in vascular disease and nonalcoholic steatohepatitis.”.

      Epidemiological study results suggest obesity is an independent risk factor for OA pathological progression. Gierman et al. found that increased plasma cholesterol levels play a vital role in the development of OA1,2. Mice deficient in ApoE-/- showed naturally high levels of LDL-cholesterol independent of gender and age, which could additionally be increased by a cholesterol-rich diet3,4. Moreover, recent studies found that ApoE-/- mice feeding with HFD gained more body weight than those feeding standard chow-diet groups5–7. We have re-analyzed the body weight statistics and found that ApoE-/- fed with HFD (19.81±1.33g) gained more body weight than the control (16.89±0.75g). These manuscripts indicated that feeding HFD to ApoE-/- mice for a short period could accelerate the increase in LDL cholesterol levels and cause more body weight gain. ApoE-/- mice may be partially clinically relevant to pathological progression in obese osteoarthritis patients with elevated plasma LDL cholesterol levels. As Reviewer #2 mentioned, an HFD induced obesity is available in C57BL/6 WT according to our weight gain data. However, the effect of obesity on OA progression in these two kinds of animals deserves further study.

      References:

      1. Gierman LM, Kühnast S, Koudijs A, et al. Osteoarthritis development is induced by increased dietary cholesterol and can be inhibited by atorvastatin in APOE*3Leiden.CETP mice—a translational model for atherosclerosis. Ann Rheum Dis. 2014;73(5):921-927.

      2. Gierman LM, van der Ham F, Koudijs A, et al. Metabolic stress-induced inflammation plays a major role in the development of osteoarthritis in mice. Arthritis Rheum. 2012;64(4):1172-1181.

      3. Wu D, Sharan C, Yang H, et al. Apolipoprotein E-deficient lipoproteins induce foam cell formation by downregulation of lysosomal hydrolases in macrophages. J Lipid Res. 2007;48(12):2571-2578.

      4. Naura AS, Hans CP, Zerfaoui M, et al. induces lung remodeling in ApoE-deficient mice: an association with an increase in circulatory and lung inflammatory factors. Lab Invest. 2009;89(11):1243-1251.

      5. Tung MC, Lan YW, Li HH, et al. Kefir peptides alleviate high-fat diet-induced atherosclerosis by attenuating macrophage accumulation and oxidative stress in ApoE knockout mice. Sci Rep. 2020;10(1):8802.

      6. Bao M hua, Luo H qing, Chen L hua, et al. Impact of high fat diet on long non-coding RNAs and messenger RNAs expression in the aortas of ApoE(−/−) mice. Sci Rep. 2016;6(1):34161.

      7. Cao X, Guo Y, Wang Y, et al. Effects of high-fat diet and Apoe deficiency on retinal structure and function in mice. Sci Rep. 2020;10(1):18601.

      2) Control group: The DMM surgery was performed on the right leg, and the contralateral knee joint should be used as a baseline to show the level of M1 macrophage infiltration under the obsess microenvironment.

      Thank you for this insightful comment. The reason why we used the right lower limb as the control group in our experiment was mainly because we considered the impact of right knee surgery on the left lower limb. A book published in 2014 described a series of method for inducing mouse osteoarthritis model, authors noted that sham-operated left knee joints would develop OA-like symptoms after right knee joints received DMM. Thus, Lorenz et al. strongly recommend using a separate control group for sham surgeries.

      References:

      1. Lorenz, J., Grässel, S. (2014). Experimental Osteoarthritis Models in Mice. In: Singh, S., Coppola, V. (eds) Mouse Genetics. Methods in Molecular Biology, vol 1194. Humana Press, New York, NY.
    1. Author Response

      Reviewer #1 (Public Review):

      The goal of this study was to investigate the mechanisms that lead to the release of photosynthetically fixed carbon from symbiotic dinoflagellate alga to their coral host. The experimental approach involved culturing free-living Brevolium sp dinoflagellates under "Normal" and "Low pH" conditions (respective target pH of 7.8 and 5.50) and measuring the following parameters: (Fig.1) cell growth rate over ~28 days, photosynthetic activity, glucose and galactose secretion at day 1; (Fig. 2) Cell clustering, external morphology (using SEM), and internal morphology (using TEM) after 3 weeks; (Fig. 3) Transcriptomic analyses at days 0 and 1; and (Fig. 4) glucose and galactose concentration in Normal culturing medium after 24h incubation with a putative cellulase inhibitor (PSG).

      The paper reports decreased growth at Low pH coupled with decreased photosynthetic rates and increased glucose and galactose release in 1-day Breviolum sp. cultures. At this same time point, genes related to cellulase were upregulated, and after 3 weeks morphological changes on the cell wall were reported. The addition of the cellulase inhibitor PSG to cells in pH 7.8 media decreased the release of glucose and galactose.

      The paper concludes that acidic conditions mimicking those reported for the coral symbiosome -the intracellular organelle that hosts the symbiotic algae- upregulate algal cellulases, which in turn degrade the algal cell wall releasing glucose and galactose that can be used as a source of food by the coral host. However, there are some methodological issues that hamper the interpretation of results and conclusions.

      We appreciate your helpful comments and apologize the confusion caused by insufficient descriptions in the previous manuscript. In the revised manuscript we clarify what we originally intended to demonstrate including the followings:

      (1) Most analyses including SEM and TEM were done at day 0 and 1, except for a few, i.e. growth rate over 28 days and cell clumping assay done 3 weeks after the inoculation, which is summarized as a schematic panel and clarified in the revised manuscript.

      (2) Inhibitor assay for secreted celluloses was done in pH 5.5.

      (3) We do not intend to suggest that low pH medium mimics symbiosomes, as these organelles are far more complex than simple culture media and how symbiosomes are maintained and what the interior environment is like are not fully understood in general. Based on previous studies, presumably they are featured by low pH, high CO2, host-derived nutrients. Among these, we focus on low pH, which is a stressor for dinoflagellates to go through in not only symbiosomes but also natural environments, e.g. animal gut.

      In this study, we clarified how algae respond to low pH as an environmental stressor, which can also provide insights into how they interact with the host inside the guts as well as symbiosomes.

      Reviewer #2 (Public Review):

      Ishii and colleagues investigated the process of monosaccharide release from algae in low-pH environmental conditions, mimicking the acidic lysosomal-like intracellular compartment where the algae reside symbiotically and transfer nutrients to their hosts, namely corals and other animals. Upon exposure of cultured algae to low pH, subsequent physiological changes as well as the increased presence of glucose and galactose were measured in the surrounding media. Concurrently, photosynthetic activity was decreased, and further experiments employing the photosynthetic inhibitor DCMU to cultures also replicated the increased monosaccharide release. Transcriptomic comparison of algae in low pH to controls showed differential expression in glycolytic pathways and, interestingly, a strong upregulation of signal-peptide-containing isoforms of cellulases. Finally, the elegant use of a cellulase inhibitor on the cultured algae revealed a decrease in monosaccharides in the media. This led the authors to propose a pathway of sugar release in which acidic conditions trigger a cellulase-driven cascade of cell wall degradation in the algae and their consequent release of monosaccharides. These results have interesting implications on the molecular mechanisms of coral-algae symbiosis, contributing to the understanding of how these important symbioses function on the cellular level.

      Overall the conclusions of this manuscript are supported by the data presented, but clarification and elaboration are needed to fully justify the proposed mechanisms and better situate the results in a broader context of the field.

      We thank the reviewer for the positive comments. In the revised the manuscript we show that the results could be better explained with the proposed mechanisms in a broader context.

    1. Author Response

      Reviewer #2 (Public Review):

      1) Mechanistic details of how FCA regulates FLC have been extensively studied, and both transcriptional and co-transcriptional regulations occur. I understand that FCA affects the 3'end processing of antisense COOLAIR RNAs, which regulate FLC. FCA also physically interacts with COOLAIR RNAs and other proteins, including chromatin-modifying complexes, which establish epigenetic repression of FLC regardless of vernalisation. In addition, FCA appears to function to resolve R-loop at the 3' end FLC, and FLC preferentially interacts with m6A-modified COOLAIR by forming liquid condensates. FCA is also alternatively spliced in an autoregulatory manner, and fca-1 mutant was reported to be a null allele as fca-1 cannot produce the functional form of FCA transcripts (r-form).

      However, I could not find any information on the fca-3 allele, which was reported to exhibit a weaker phenotype in terms of flowering time (Koornneef et al., 1991). In this manuscript, the authors showed that the level of FLC expression is lower than fca-1 and higher than Ler WT, but I could not find any other relevant information on the nature of the fca-3 allele. Given the known details on the function of FCA, the authors should explain how fca-3 shows an "intermediate" phenotype, which is highly relevant to the argument for an "analog" mode of regulation in fca-3. Therefore, the nature of the fca-3 mutant should be described in detail.

      We thank the reviewers for pointing out this omission. We have added much more information on the genotypes in the methods of the manuscript. We emphasise, however, that the rationale for selecting fca-3 as an intermediate mutant was empirical: namely, it generates an intermediate level of FLC expression (Fig. 1C and Fig. 1S1).

      2) The authors used a transgene (FLC-venus) in which an FLC fragment from ColFRI was used. Both fca-1 and fca-3 is Ler background where FLC sequence variations are known. I understand that the authors introgressed the transgenic in Ler background to avoid the transgene effect, but it is not known whether fca-1 or fca-3 mutations have the same impact on Col- FLC.

      We tested the expression of both endogenous (Ler) and FLC-Venus (Col-FLC) copies in these mutants by qPCR and found similar results (Fig. 1S1C,D), indicating that the fca-1 and fca-3 mutations have similar effects in both cases.

      3) Fig. 3A: I understand that Fig 3A is the qRT-PCR data using whole seedlings, and the gradual reduction of FLC from 7 DAG to 21 DAG was used to test the "analog" vs. "digital" mode of gene regulation in fca-1 and fca-3. I am not sure whether this is biologically relevant.

      Indeed, Ler is the only line that has transitioned to flowering during the experiment, with both fca lines being late flowering mutants. We totally agree that for Ler, later timepoints may be biologically irrelevant. It is used in this case as a negative control for the imaging, since FLC in Ler was already mostly OFF from the first timepoint and no biological conclusions are drawn from the later times. We have added a comment to this effect in the results section, also clarifying in the discussion that our focus is on the early regulation of FLC. Therefore, by looking at the young seedling in wildtype Ler, as we and others have previously, we are already looking too late to capture the switching of FLC to OFF. However, we expect that this combination of analog and digital regulation will be highly

      relevant to FLC regulation in wild-type plants in different accessions, partly leading to the differences in autumn FLC levels that were shown to be so important in the wild (Hepworth et al. 2020).

      3-a) The authors wrote that "This experiment revealed a decreasing trend in fca-3 and Ler (Fig. 3A)". But, I do also see a "decreasing trend" in fca-1 as well (although I understand that they may not be statistically significant). I also noticed that the level of FLC in fca-1 at 7 day has a greater variation. Is there any explanation?

      The level of FLC in fca-1 at 7 days is indeed more variable in these experiments. However, in a new second experiment, this is not the case (Fig. 3S2). In addition, a similar effect has not been observed in the ColFRI genotype (Fig. S9F of Antoniou-Kourounioti et al. 2018). Therefore, we believe this greater variation in one data set may simply be due to random fluctuations.

      For the decreasing trend in fca-1 in Fig. 3A, as the reviewer says, this is not significant. However, in the second experiment, we again see a decrease, which is now slow but significant. The decrease could be due to a subset of fca-1 ON cells switching off (in tissue that we have not imaged) and we comment on this slow decrease in the text.

      3-b) The decreasing trend observed in Ler (although the expression of FLC is already relatively low in Ler) may be the basis for the biological relevance. But Fig. 3D shows that the FLC-venus intensity in Ler root is not "decreasing". The authors interpreted that "root tip cells in Ler could switch off early, while ON cells still remain at the whole plant level that continue to switch off, thereby explaining the decrease in the qPCR experiment." Does this mean that the root tip system with FLC-venus cannot recapitulate other parts of plants (especially at the shoot tip where FLC function is more relevant)?

      The authors utilize the root system with transgenes in mutant backgrounds to observe and model the gene repression (transgene repression, to be exact). If the root tip cells behave differently from other parts of plants, how could the authors use data obtained from the root tip system?

      We now show that FLC-Venus in Ler, fca-3, fca-1 in young leaves have similar expression patterns to roots, thus validating the root system as an appropriate one to study the switching dynamics, see response to Essential comment 3. Nevertheless, in Fig. 3A, we show that FLC expression declines even in Ler. However, the levels here are low, so if it is indeed a subfraction of late-switching cells that are responsible, these cells cannot form a large proportion of the plant. We now make this clear in the text.

      4) I do see both fca-1 and fca-3 can express FCA at a comparable level (Fig. 3B); thus, I guess that the authors are measuring total FCA transcripts and that fca-3 may result in different levels of "functional form" of FCA. But this is not clearly discussed.

      We have now added yellow boxes in Fig. 2S3 to show additional examples of short files of ON cells in fca-3 and fca-4. To further improve the interpretation of this image (and all others in the manuscript) we have changed the presentation of the imaging using a different colourmap to enhance clarity.

      5) Quantification based on image intensity needs to be carefully controlled. Ideally, a threshold to call "ON" or "OFF" state should be based on the comparison to internal control and it is not clear to me how the authors determined which cells are ON or OFF based on image intensity (especially in fca-3).

      For the wild-type and fca-1 situations there is no switching in the model, and hence no dynamical changes in the FLC protein levels. As the FLC levels in the ON or OFF states are simply fit to the data using log-normal distributions, this would simply be a fitting exercise for fca-1 and Ler, and little would be learnt. Hence, we have not pursued this line of analysis.

      6) In many parts, I had to guess how the experiments were performed with what kind of tissues/samples. The methods section can benefit from a more thorough description.

      We have now gone through and added the missing information.

      Related to Public review #2. What is the phenotype (flowering time) of FLC-venus in fca-1 and fca-3? In addition, how many independent lines were used? Do they behave similarly?

      It was observed that with the additional FLC gene (in the form of the FLC-Venus), flowering is delayed as expected. However, this was not quantified in this work. Instead, we validated that the expression of the transgene was equivalent to endogeneous between genotypes, as shown in Fig. 1S1, supporting that this is an appropriate readout for FLC expression. One line for each genotype was selected and used in this work. In addition, we also now use fca-4, which has similar expression to fca-3, and where FLC-Venus also behaves similarly to the fca-3 case (Fig. 1S1, 2S3).

      Reviewer #3 (Public Review):

      1) The way the authors define ON and OFF cells sounds a bit arbitrary to me and, in my understanding, can affect a lot the outcomes and derived conclusions. The authors define ON cells to those cells having more than one transcript, or when they are above the value of 0.5 of the Venus intensity measure - what would it happen if the thresholds are slightly above these levels? And why such thresholds should be the same for the studied lines Ler, fca-3 and fca-1? By looking at the distributions of mRNAs and Venus intensities in Ler and fca-3 plants, one could argue that all cells are in an OFF, 'silent' state, and that what is measured is some 'leakage', noise or simply cell heterogeneity in the expression levels. If there is a digital regulation, I would expect to see this bimodality more clearly at some point, as it was captured in Berry et al (2015) - perhaps cells in fca-1 show at a certain level of bimodality? When seeing bimodality, one could separate ON and OFF states by unmixing gaussians, or something in these lines that makes the definition less arbitrary and more robust.

      As explained in Essential comment 5, we have removed arbitrary thresholding from the manuscript and only used absolute thresholds from smFISH (now changed to >3, and shown that our results are robust to varying these thresholds, Fig. 2S2). If all cells are in the OFF state and fca-3 just has higher noise/heterogeneity, then this does not explain the reduction in expression over time. Nor can such heterogeneity explain the short files of ON cells and longer files of OFF cells in Fig. 2S3: the cells should just be a random mix of varying FLC levels. Our results are much more compatible with switching into a heritable silenced state. Finally, with bimodality, this is difficult to see as clearly as before due to the wide levels of expression in fca-3, but we believe it is present: a well-defined OFF state together with a broad ON state. This broadness makes extracting the ON cells quite difficult as a completely rigorous unmixing of the two states is just not possible.

      2) The authors use means in all their plots for histograms and data, and perform tests that rely on these means. However, many of these plots are skewed right distributions, meaning that mean is not a good measure of center. I think using median would be more appropriate, and statistical tests should be rather done on medians instead of means. If tests using medians were performed, I believe that some of the pointed results will be less significant, and this will affect the conclusions of this work.

      Highly expressing FLC lines and mutants, such as ColFRI and fca-9, often used for vernalization studies, are late flowering, but do eventually flower even with no decrease in FLC levels (and so no switching). This is not an artifact of using roots versus shoots, and presumably arises from there being multiple inputs into the flowering decision which can allow the FLC-mediated flowering inhibition to eventually be overcome.

      3) Some data might require more repeats, together with its quantification. For instance, the expression levels for fca-1 in Fig 2E and Fig 3D at 7 days after sowing look qualitatively different to me - not just the mean looks different, but also the distribution; fca-1 in Fig 3D looks more monomodal, while in Fig 2E it looks it shows more a bimodal distribution. Having these two different behaviours in these two repeats indicates that, more ideally, three repeats might be needed, together with their quantification. Fig. 2C would also need some repeats. In Fig 1S1 C and D, it would be good to clarify in which cases there are 2 or more repeats -3 repeats might be needed for those cases in Fig 1S1 C-D that have large error bars.

      The data in Figs. 2C and 2E are both based on two independent experiments, with the results combined. The data in Fig. 3D is almost entirely based on three independent experiments. We have now stated this in the legend. The Venus imaging was performed on separate microscopes for Fig. 2 and Fig. 3 and this possibly accounts some of the observed differences. However, we do not think that the data in Fig 2E for fca-1 supports a bimodal distribution: the slight peak at higher levels is, we believe, much more likely to be a statistical fluctuation. For Fig. 1S1 C and D, we now clarify in the legend that n=2 biological replicates for fca-3 and n=3 for others.

      Also, when doing the time courses, I find it would be very beneficial to capture an earlier time point for all the lines, to see whether it is easier to capture the digital nature of the regulation. Note that the authors have already pointed that 7 days after sowing might be too late for Ler line to capture the switch.

      We agree that capturing earlier time points for Ler in particular is interesting and important. However, we have found that this requires specialist imaging in the embryo and we feel that this is really beyond the scope of this manuscript and will instead form the basis of a future publication.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use what is potentially a novel method for bootstrapping sequence data to evaluate the extent to which SARS-CoV-2 transmissions occurred between regions of the world, between France and other European countries, and between some distinct regions within France. Data from the first two waves of SARS-CoV-2 in Europe were considered, from 2020 into January 2021. The paper provides more detail about the specific spread of the virus around Europe, specifically within France, than other work in this area of which I am aware.

      First of all, we would like to thank reviewer #1 for their evaluation and their various comments which, in our opinion, have allowed us to considerably improve the manuscript.

      An interesting facet of the methodology used is the downsampling of sequence data, generating multiple bootstraps each of around 500-1000 sequences and conducting analysis on each one. This has the strength of sampling, in total, a large number of sequences, while reducing the overall computational cost of analysis on a database that contains in total several hundred thousand sequences. A question I had about the results concerns the extent of downsampling versus the rate of viral migration: If between-country movements are rapid, a reduced sample could be misleading, for example characterising a transmission path from A to B to C as being from A to C by virtue of missing data. I acknowledge that this would be a problem with any phylogeographic analysis relying on limited data. However, in this case, how does the rate of migration between locations compare to the length of time between samples in the reduced trees? Along these lines, I was unclear to what extent the reported proportions of intra- versus inter-regional transmissions (e.g. line 223) would be vulnerable to sampling effects.

      This question is indeed a very important one. Between-country movement rate can be high but the contagious period for a SARS-CoV-2-infected individual is short (a bit less than two weeks in average). In our subsamples, the dated trees have a median branch length around 20 days. To ensure that our subsamples did not introduce errors in estimating the exchange events between locations, we conducted a simulation. Briefly, we generated a tree of 1,000,000 tips with a five-states discrete trait. We then took 100 subsampled 1000-leaves trees, reconstructed the ancestry for the discrete trait and assess transitions between states. The error rate is less than 3% on average: it comprises the missing data, as you pointed out, and the errors in reconstructing the ancestry for the trait deeper in the tree.

      We think that overall, less than 3% is a satisfying error rate.

      The results of this specific simulation were added to the paper (lines 150-157) and as Figure 2—figure supplement 1.

      A further question around the methodology was the use of an artificially high fixed clock rate in the phylogenetic analysis so as to date the tree in an unbiased way. Although I understood that the stated action led to the required results, given the time available for review I was unable to figure out why this should be so. Is this an artefact of under-sampling, or of approximations made in the phylogenetic inference? Is this a well-known phenomenon in phylogenetic inference?

      We thank reviewer #1, who was, as reviewer #2 and the editor, disturbed by the use of an artificially fast and fixed molecular clock. It was an artifact to correct a mistake in our code that has been fixed. See the answer to point (3) of the editor.

      The value of this kind of research is highlighted in the paper, in that genomic data can be used to assess and guide public health measures (line 64). This work elucidates several facts about the geographical spread of SARS-CoV-2 within France and between European countries. The more clearly these facts can be translated into improved or more considered public health action, through the evaluation of previous policy actions, or through the explication of how future actions could lead to improved outcomes, the more this work will have a profound and ongoing impact.

      This is a very interesting point to emphasize indeed. We are currently discussing with public health specialists in our institution on how to assess past public health actions using phylodynamics data in a statistically valid manner.

      Reviewer #2 (Public Review):

      This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020 and the authors should be congratulated for tackling this important question. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors provided compelling, although as of yet correlative and incomplete, evidence towards how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, however this work suffers from a number of serious analytical shortcomings, all of which can be overcome in a major revision and re-analysis.

      We would like to thank the reviewer #2 for their evaluation and their various comments. We want to point that reviewer #2 was contacted for advice on strategy for the molecular clock since she performed a study on a similar topic describing SARS-CoV-2 epidemics in Canada during 2020. We strongly believe that all reviewer #2 comments drastically contributed to improve the quality of this work.

      With this genomic epidemiology analysis, the authors disentangled the relative contributions of different geographic levels to transmission events in France and in Europe in the first two COVID-19 waves of 2020. By partitioning the analysis into three complementary, but distinct, geographic levels, the migration flows in and out of continents, countries in Europe, and regions in France were inferred using maximum likelihood ancestral state reconstruction. The major strengths of this paper were the inclusion of multiple geographic levels, the comparison of different rate symmetries in the ancestral character estimation, and the comprehensive qualitative descriptions of comparisons over time and geographies. However, there were also major weaknesses that need to be addressed and are described in more detail below. They include summing across replicates that were drawn with replacement and were not independent; inadequate justification for excluding underrepresented geographies; the assertion that positive correlation between intra-regional transmission and deaths validates the accuracy of the analysis; considering the framework the authors have chosen for this analysis the analysis would accommodate and benefit strongly from increasing the size of the sequence sets selected for analysis in each replicate; and the sparsity of quantitative (over qualitative or exploratory) comparisons and statistics in the reporting of results. In particular, it would greatly strengthen the paper if the authors could better evaluate the effect of travel restrictions on importations and exportations by testing hypotheses, quantifying changes in the presence of restrictions, or estimating inflection points in importation rates.

      We are grateful for this comprehensive listing of the strengths and weaknesses of our study. Regarding the limitations of this study, these will be detailed specifically for each dedicated remark of the reviewer. We would like to emphasize that all the remarks and limitations reported here by reviewer #2 are in our opinion fully justified. We hence have tried to bring additional analyses (study of the Pango lineages, averaging of the subsamples, simulation study to justify the size of the sampling), a modification of the methodology (in particular concerning the molecular clock) and a thorough rewriting of the “Results” section.

      General comments on the Background: Need to elaborate on how this study fits into the big picture in the first paragraph. Should discuss how phylodynamics contributes to understanding of viral outbreaks, SARS-CoV-2 epidemiology and viral evolution.

      We have added in the “Introduction” section some elements to better understand why phylodynamics is an important field in the epidemiology of SARS-CoV-2 and its evolution.

      The authors should consider a hypothesis driven framework for their analyses, for example considering the geographically central position of France what hypotheses stem from this considering sources of viral importations and destinations of exportations from/to Europe vs other international? Or other a priori expectations.

      We agree with reviewer #2 about this remark. Indeed, given the central position of France, we can hypothesize that it has strongly participated in the dissemination of the virus within Europe. This hypothesis has been included in the "Introduction" section of the revised version (lines 102-105).

      To address the computational limits of phylogenetic reconstruction, 100 replicates of fewer than 1000 sequences each were sampled for each epidemic wave at each level. The inter- and intra-regional transmissions were averaged and then summed across replicates in order to compare the relative roles played by each geography towards transmission. While we see the logic in using the sum across replicates, this is highly likely to bias results, especially since in the methods, this is described as sampling with replacement between replicates (LX). The validity of summing replicates needs to be discussed and are likely most appropriately presented as mean or median. Also, these samples are quite small considering the computational capacity of the maximum likelihood tools being used. We recommend repeating the analysis with a substantially larger number of sequences per sample.

      We thank reviewer #2 for this relevant remark. We initially summed the subsamples, a strategy that may possibly bias the results. In the new version of the manuscript, we averaged the subsamples by region and by week as recommended (and stated in the methods, line 536-537).

      About the size of our subsamples, it made no difference to use 1,000, 2,000 or 5,000 genomes in each subsample. To get a more definitive and scientifically sound answer, we performed a simulation assay that has been included in the manuscript and is shown is what is now figure 2 (and figure 2—figure supplement 1). These simulations show that our subsampling strategy allows for an accurate estimate of transition rates for a discrete parameter (lines 107-160).

    1. Author Response

      Reviewer #1 (Public Review):

      The paper addresses an interesting question - how genetic changes in Y. pestis have led to phenotypic divergence from Y. pseudotuberculosis - and provides strong evidence that the frameshift mutation in rcsD is involved. Overall, I found the data to be clearly presented, and most of the conclusions well supported by the data. The authors convincingly show that (i) the frameshift mutation in rcsD alters the regulation of biofilm formation, (ii) this effect depends upon expression of a small protein that corresponds to the C-terminal portion of RcsD, and (iii) the frameshift mutation in rcsD prevents loss of the pgm locus. I felt that the discussion/conclusions about what phosphorylates/dephosphorylates RcsB and how this impacts biofilm formation are overstated, as there are no experiments that directly address this question. I also felt that the authors' model for what phosphorylates/dephosphorylates RcsB in Y. pestis should be more clearly articulated, even if it is only presented as speculation. Lastly, the authors propose that full-length RcsD is made in Y. pestis and contributes to phosphorylation of RcsB, but the evidence for this is weak (faint band in Figure 2d). It may be that the N-terminal domain of RcsD is functional. I recommend either softening this conclusion or testing this hypothesis further, e.g., by introducing an in-frame stop codon early in rcsD after the frame-shift.

      Thanks for your comments. We have provided a model and revised the discussion about phosphorylation/dephosphorylation of RcsB and how this impacts biofilm formation (Figure 8 and Supplementary Figure 4). In addition, we have introduced an in-frame stop codon in rcsD before the frameshift and showed that full-length RcsD is only made in wildtype Y. pestis but not in the rcsDpe-stop mutant (Supplementary Figure 1g).

      Reviewer #2 (Public Review):

      Guo et al. have investigated the consequences of a frameshift mutation in the rcsD gene in the Yersinia pseudotuberculosis progenitor that is conserved in modern Y. pestis strains. Interestingly, they identify a start codon with a ribosome binding site that enables production of an Hpt-domain protein from the C-terminus in Y. pestis. Targeted deletion of this Hpt-domain increased biofilm production in Y. pestis. They find that the ancestral RcsDpstb (full length) is a positive regulator of biofilm in Y. pestis while the Hpt-domain version (RcsDYP) represses biofilm in vitro. When fleas were infected with Y. pestis expressing the ancestral RcsDPSTB protein, there was no difference in bacterial survival or rate of proventricular blockage. This strain also killed mice the same rate (in a different Y. pestis strain background). However, replacing RcsDYP with RcsYPTB dramatically increases the frequency of pgm locus deletion (containing Hms ECM and yersiniabactin genes) during flea infection. The authors predict that this would reduce the invasiveness of the bacteria in mammals and/or flea blockage in subsequent flea-rodent-flea transmission cycles. They also measured global gene expression differences between RcsDPSTB compared to the wild-type strain. They argue that the frameshift of RcsD maintaining the Hpt-domain (RcsDYP) was needed to regulate biofilm while limiting loss of the pgm locus.

      Loss of the pgm locus was not tested in the Y. pestis rcsD mutant strain (lacking the entire gene or just the C-terminal Hpt domain). Therefore, the claim that maintaining the Hpt-domain protein was important lacks convincing evidence. Additionally, it is possible that the population of rcsDpe::rcsDpstb after in vitro growth for 6 days would still be proficient at infecting and blocking fleas, even though many of the bacteria would have lost the pgm locus. Production of Hms polysaccharide by pgm+ could trans-complement those that are pgm-. The nature of the pgm locus loss is assumed to be due to recombination between IS elements. This is certainly the likeliest explanation but not the only one. The authors checked for pgm loss by phenotype (CR binding) and by two sets of primers, one targeting the hmsS gene and another set that is unspecified. Loss of the entire pgm (especially yersiniabactin genes) should be clarified.

      Thanks for your comments. We have now provided the data to show that deletion of RcsD-Hpt resulted in increased loss of the pgm locus (Figure 5d) to strengthen the claim that maintenance of the Hpt-domain is significant for retention of the pgm locus. We also agree that 6-day old cultures of a mixture of pgm+ and pgm- rcsDpe::rcsDpstb will still be capable of infecting and blocking fleas. However, these strains will be less efficient at causing disease in the vertebrate host in the absence of the pgm locus. We agree that recombination between IS elements might not be the only cause of loss of the pgm locus. To verify the loss of the pgm locus, we have used two sets of primers. One set targets the hmsS gene and another set targets the upstream and downstream sequences of the pgm locus (Supplementary Table 3). We have clarified this in the revised manuscript (Line 610-613).

      Reviewer #3 (Public Review):

      The Rcs phosphorelay plays an important role in regulating gene expression in bacteria; most of the current knowledge about the Rcs proteins is from E. coli. Yersinia pestis, carrying mutations in two central components of the Rcs machinery, provides an interesting example of how evolution has shaped this system to fit the life cycle of this bacteria. In bacteria other than Y. pestis, most Rcs activating signals are sensed via the outer membrane lipoprotein RcsF; from there, signalling depends on inner membrane protein IgaA, a negative regulator of RcsD. Histidine kinase RcsC is the source of the phosphorylation cascade that goes from the histidine kinase domain of RcsC to the response regulator domain of RcsC, from there to the histidine phosphotransfer (Hpt) domain of RcsD, and finally to the response regulator RcsB. RcsB, alone or with other proteins, regulates transcription of many genes, both positively and negatively. These authors have previously shown that RcsA, a co-regulator that acts with RcsB at some promoters, is functional in Y. pseudotuberculosis but mutant in Y. pestis, and that this leads to increased biofilm in the flea. The authors also noted that rcsD in Y. pestis contains a frameshift after codon 642 in this 897 aa protein; in theory that should eliminate the Hpt domain from the expressed protein. However, they found evidence that the frame-shifted gene had a role in regulation. This paper investigates this in more depth, providing clear evidence for expression of the Hpt domain (without the N-terminal domain), and demonstrating a critical role for this domain in repressing biofilm formation. The Y. pseudotuberculosis RcsD does not express a detectable amount of the Hpt domain nor does it repress biofilm formation. The ability of the Hpt domain protein to keep biofilm formation low explains most of what is observed for the full-length frame-shifted protein.

      1) The authors provide a substantial amount of data supporting the expression of the C-terminus of RcsD is sufficient and necessary for low biofilm levels, and that this is dependent upon the active site His in the RcsD Hpt domain (H844A) as well as other components of the basic phosphorelay (RcsC and RcsB). However, it is only possible to see this protein by Western blot in 100-fold "Enriched" lysates (Figure 2). No small protein was detected in the RcsDpstb strain, although the enriched lysate was not shown for this. Without that experiment, it is not possible to evaluate whether the small protein is also made from the rcsDpstb gene. Either answer would be interesting, and would allow other conclusions to be drawn. Is the RBS and start codon the same for the HPT region of this rcsD gene (it could be added to Supplementary Table 6). If the small protein is made, is its ability to function blocked by the excess full length protein in terms of interactions with RcsC? Or is the expression of the small protein dependent upon loss of overlapping translation from the upstream start?

      The small Hpt protein may be produced from expression of the epitope tagged rcsDpstb gene as it can be detected in an enriched isolation of this sample (Supplementary Figure 1f). Because only a small amount of the RcsD-Hpt is produced from the rcsDpstb substitution, it might only function at low levels in the presence of large amounts of RcsDpstb. The RBS and start codon are the same for the RcsD-Hpt in Y. pestis and Y. pseudotuberculosis, we have added them in the Supplementary Table 6. In addition, we have provided a model to show the function and regulation of RcsD and Hpt (Supplementary Figure 4).

      2) In many phosphorelays, the protein kinase also acts as a phosphatase, and which direction P flows is critical for regulation. It is often difficult to follow what the model for this is in this paper, and that is important to understand for evaluating the results. Most of this paper uses two assays, biofilm formation and crystal violet staining (also related to biofilm formation) to assess the functioning of the Rcs phosphorelay. Based on the behavior of the rcsB mutant, it would seem that functional Yersinia pestis Rcs (RcsDpe) represses this behavior, and this correlates with RcsB phosphorylation (Figure4). What is the basis (Line 443-44) for saying that RcsD phosphorylates RcsB while RcsDHpt dephosphorylates? Yersinia pseudotuberculosis RcsD(pstb) shows no difference with the rcsB mutant. Doesn't that suggest that RcsDpstb is no longer repressing (phosphorylating)? In the presence of the RcsDpstb as well as multicopy RcsF, an activating signal in other organisms, RcsDpstb seems able to phosphorylate. This all suggests that the full-length protein, like the Hpt domain, is capable of phosphorylating, but that it may be doing nothing in the absence of signal (or dephosphorylating). Given these results, saying that RcsDpstb is positively regulating biofilm formation (Fig.1 title, and elsewhere) is somewhat misleading. What it presumably does is prevent the Hpt domain, expressed from the chromosomal locus in Figure1b, from signalling to RcsB. By itself, it is not clear it is doing anything. Understanding this clearly is important for interpreting this system and the tested mutants. A clear model and how phosphate is flowing in the various situations would help a lot. Currently Supplementary Figure3 seems to reflect the appropriate directional arrows, but the text does not. Moving the rcsB data earlier in the paper (after Figure1, 2, or maybe earlier, before Figure3) would certainly help.

      RcsD dephosphorylates RcsB while RcsD-Hpt phosphorylates RcsB. Expression of RcsDpstb in the wild type strain and the N-term deletion mutant resulted in increased biofilm, indicating RcsB is less phosphorylated (Figure 1b and 1c). While over-expression of RcsD-Hpt resulted in decreased biofilm formation, indicating RcsB is more phosphorylated. In addition, the Phos-tag experiments showed that the RcsDpstb strain has a lower level of phosphorylated RcsB (Figure 4b). Expression of RcsDpstb in the wild type strain showed similar results as a rcsB mutant indicating a lower level of phosphorylated RcsB in the presence of RcsDpstb.

      It is possible that the RcsDpstb interferes with the ability for RcsD-Hpt to phosphorylate RcsB. However, plasmid expression of the rcsDpstb-H844A mutant in the Y. pestis rcsDN-term deletion mutant formed significantly less biofilm than wild type rcsDpstb indicating H844 might be important for RcsD to dephosphorylate RcsB (Supplementary Figure 2b and Line 180-183). In addition, it is known that RcsD plays a dual role in phosphorylation and dephosphorylation of RcsB in other organisms (Majdalani N, et al., 2005, J. Bacteriol. https://doi.org/10.1128/JB.187.19.6770-6778.2005; Wall EA, et al., 2020, Plos Genetics, https://doi.org/10.1371/journal.pgen.1008610; Takeda S., et al., 2001, Mol. Microbiol., https://doi: 10.1046/j.1365-2958.2001.02393.x). We therefore think it is safe to say that the full length RcsD might function to dephosphorylate RcsB. We have modified the model in the revised manuscript (Supplementary Figure 4 and Figure 8). Regulation of RcsB has been investigated previously. The main finding of our manuscript is regulation of RcsB by the mutated RcsD (RcsD-Hpt). Thus, we have moved the known rcsB deletion mutant data to Figure 1 in the revised manuscript as suggested. We kept the rest of data in Figure 4 the same. We think it might be better to first show the mutation of rcsD alters Rcs signaling and then show how this occurs (by affecting RcsB phosphorylation).

      3) The authors show (in their pull-down) that there is a bit of full-length RcsD even in the frame-shifted protein. Is there any clear evidence this does anything here? Does the N-terminus (truncated after the frame-shift) have a function?

      We have introduced a stop codon in rcsDpe and showed that full-length RcsD is made by rcsDpe but not by rcsDpe with the stop codon (Supplementary Figure 1g). RcsDN-term seems do not have a function in our tested condition (Figure 1e).

      4) While the RNA seq data is useful addition here, it is difficult to interpret without a bit more data on the strain used for the RNA seq, including the biofilm phenotypes of the WT and mutant derivatives, as well as the relevant rcsD sequences, and maybe expression of a few genes or proteins (Hms or hmsT). Are these similar in the parallel strains used earlier in the paper and the one for RNA seq, in WT, rcsB- and the RcsDpstb derivative? It would appear that rcsB- and rcsDpstb have opposite effects, at least at 25{degree sign}C, while in Figure4, these two derivatives have similar effects on biofilm. Is this due to temperature, strains, or biofilm genes that are not shown here? It is certainly possible that the ability of the full-length RcsD changes its kinase/phosphatase balance as a function of temperature, or dependent on other differences in these Y. pestis strains.

      The strain used for RNA seq is a derivative of the biovar Microtus strain 201 which has a similar in vitro phenotype as the strain KIM6+ (Line 297-298). We used this strain for RNA seq because it has the virulence plasmid pCD1 and we wanted to analyze the gene expression of this plasmid, which is required for virulence, as well. RNAseq data showed that rcsB- and rcsDpstb have opposite effects on mRNA level of some genes. However, no significant change in expression of biofilm genes was noted in the RNAseq data set. In fact, our previous data has shown that the biofilm related (hmsT and hmsD) genes are only moderately (Less than 2-fold change between wild type and rcsB mutant) regulated by RcsB based on RT-PCR and β-gal analysis (Sun YC, et al., 2012, J. Bacteriol. https:// doi: 10.1128/JB.06243-11and Guo XP, et al., 2015, Sci. Rep. https://doi: 10.1038/srep08412 and Figure 4c).

    1. Author Response

      Reviewer #1 (Public Review):

      Sex determination and dosage compensation are two fundamental mechanisms in organisms with distinct sexes. These mechanisms vary greatly across the various model organisms in which they have been studied. Comparisons across more closely related members of the same genus have already proven productive in the past, to understand how these essential mechanisms evolve. In this study, the authors compare some aspects of the dosage compensation and sex determination mechanisms across two Caenorhabditis species that diverged ~15-30 MYA.

      Previously, the authors have studied dosage compensation and sex determination extensively in C. elegans. Here, they first identify the homologs of some key factors in C. briggsae, a species that independently evolved hermaphroditism. The authors show that some of the key players in these processes play the same roles in C. briggsae as they do in C. elegans. Namely, they show that the nematode-specific SDC-2 protein plays a role in both dosage compensation and sex determination also in C. briggsae, they find the homologs of some of the SMC protein complex that performs dosage compensation also in C. elegans and they study the binding specificity on the X chromosome.

      Overall, the work is thorough and compelling and is very clearly presented. The authors generate a number of genetic tools in C. briggsae and the careful genetic analyses together with a number of binding assays in vivo and in vitro, support the authors' main conclusions: that the main players and genetic regulatory hierarchy are conserved between these two nematodes, but the binding sites for the DCC on the X chromosome have diverged and the mode of binding has changed as well. Whereas in C. elegans the DCC binds sites in the X chromosome that contain multiple sequence motifs in a synergistic manner, in briggsae they seem to do so additively. This latter point is supported by the data, but it could be explored a bit more deeply using the available ChIP-seq data that the authors have generated. In addition, it would be interesting to discuss the possible implications of this difference.

      One minor weakness of this work is that it could be better put in the context of other related comparisons of these mechanisms. For example, the comparison of sex determination pathway by Haag et al. in Genetics 2008, and the comparison of dosage compensation across Drosophila species (Ellison and Bachtrog, Plos Genetics, 2019), and possibly others. The other point that the authors could provide deeper insight into, is the rate of divergence of proteins like SDC-2 (which is thought to be the protein that contacts DNA), versus some other proteins in the DCC and in general other proteins not involved in sex determination or dosage compensation (this doesn't need to be limited to comparing elegans and briggsae as there are numerous Caenorhabditis genomes available). This would provide a more complete view of the evolution of these processes.

      Regarding the comparison of our studies to those of the C. briggsae sex determination pathway described by Haag and others, we have included the following in our revised manuscript:

      Pages 8-9. "Within the Caenorhabditis genus, similarities and differences occur in the genetic pathways governing the later stages of sex determination and differentiation (Haag, 2005). For example, three sex-determination genes required for C. elegans hermaphrodite sexual differentiation but not dosage compensation, the transformer genes tra-1, tra-2, and tra-3, are conserved between C. elegans and C. briggsae and play very similar roles. Mutation of any one gene causes virtually identical masculinizing somatic and germline phenotypes in both species (Kelleher et al., 2008). Moreover, the DNA binding motif for both Cel and Cbr TRA-1 (Berkseth et al., 2013), a Ci/GL1 zincfinger transcription factor that acts as the terminal regulator of somatic sexual differentiation (Zarkower and Hodgkin, 1992), is conserved between the two species.

      At the opposite extreme, the mode of sexual reproduction, hermaphroditic versus male/female, dictated the genome size and reproductive fertility of Caenorhabditis species diverged by only 3.5 million years (Yin et al., 2018; Cutter et al., 2019). Species that evolved self-fertilization (e.g. C. briggsae or C. elegans) lost 30% of their DNA content compared to male/female species (e.g. C. nigoni or C. remanei), with a disproportionate loss of male-biased genes, particularly the male secreted short (mss) gene family of sperm surface glycoproteins (Yin et al., 2018). The mss genes are necessary for sperm competitiveness in male/female species and are sufficient to enhance it in hermaphroditic species. Thus, sex has a pervasive influence on genome content. In contrast to these later stages of sex determination and differentiation, the earlier stages of sex determination and differentiation had not been analyzed in C. briggsae."

      Regarding the comparison to Drosophila dosage compensation, including the work of Ellison and Bachtrog (2019), we included the following in the Discussion of our revised manuscript (page 22) and included related remarks in the abstract.

      "In contrast to the divergence of X-chromosome target specificity between Caenorhabditis species, X-chromosome target specificity has been conserved among Drosophila species. A 21-bp GA-rich sequence motif on X is utilized across Drosophila species to recruit the dosage compensation machinery, although it may not be the sole source of X target specificity (Alekseyendo, 2008; Kuzu, 2016, Ellison, 2008; Alekseyendo, 2013)."

      Regarding a comparison of our work to that of other rapidly evolving processes, we have made the following revision to our Discussion (page 22):

      "Conservation of DNA target specificity among species is also a common theme among developmental regulatory proteins that participate in multiple, unrelated developmental processes, such as Drosophila Dorsal in body-plan specification (Schloop et al., 2020) or Caenorhabditis TRA-1 in hermaphrodite sexual differentiation and male neuronal differentiation (Berkseth et al., 2013; Bayer et al., 2020). Typically, for such multi-purpose proteins, target-site specificity is evolutionarily constrained: protein function is changed far more by changes in the number and location of conserved cis-acting target sequences than by changes in the target sequences themselves (Carroll, 2008; Nitta et al., 2015). Hence, the divergence in X-chromosome target specificity across the Caenorhabditis genus is atypical among developmental regulatory complexes with highly diverse target genes and could have been an important factor for establishing reproductive isolation between species. Our finding is reminiscent of the discovery that centromeric sequences and their corresponding centromere-binding proteins have co-evolved rapidly as a consequence of hybrid incompatibilities (Malik and Henikoff, 2001; Henikoff et al., 2001; Talbert and Henikoff, 2022). Occurrence of rapidly changing DNA targets and their corresponding DNA-binding proteins (see also Lienard et al., 2016; Ting et al., 1998; Ting et al., 2004; Sun et al., 2004) is an increasingly dominant theme contributing to reproductive isolation."

      A brief comment about all three comparisons is also made in the beginning of the Discussion on page 18.

    1. Author Response

      Reviewer #1 (Public Review):

      Following previous publications showing that NR2F2 controls atrial identity in the mouse and human iPS cells, the authors address in the fish the role of the transcription factor Nr2f1a, which is specific to the atrial chamber. This had been initiated in a previous publication (Duong et al, 2018) and is extended in this manuscript. In mutant fish, the atrial chamber is smaller and mispatterned. Markers of the atrioventricular canal and of the pacemaker are expanded. Transcriptomic analyses and electrophysiological measures further support this observation. A putative enhancer of nkx2.5 is identified by ATAC-seq and shown to be repressed in nr2f1a mutants, suggesting that Nkx2.5, a known repressor of pacemaker identity, may be a mediator of Nr2f1a. Overexpression of nkx2.5 delays the appearance of pacemaker cells, and is proposed to partially rescue the absence of nr2f1a.

      Overall, this work provides novel insight into the mechanism of atrial chamber patterning in the fish and discusses the conservation of the role of nr2f1a. However, the claim that atrial cells switch their identity into ventricular and pacemaker cells is currently not demonstrated. Alternative hypotheses of mispatterning, cell number changes by proliferation, survival, or ingression are not ruled out by the data presented. The claim that "Nr2f1a maintains atrial nkx2.5 expression" or of a "progressive loss of Nkx2.5 within the ACs" needs to be further supported. The definition of "atrial cells (AC)" varies between figures.

      Major comments:

      1) The definition of "AC" varies from figure to figure: amhc+ in Fig 1A, amhc+vmhc- in Fig.1S1A, amhc+fgf13a- in Fig. 2 and 5, morphological area in Fig. 3. Please clarify how the atrial chamber is delineated in mutants in Fig. 3 since the avc constriction is not obvious.

      a. As stated in the response to Essential Revisions comment 1.B, we have tried to clarify the definitions of the cardiomyocytes populations in the revised text by indicating the specific markers used in the text and the figures. We then provide our interpretation for what this means regarding the different cardiomyocyte populations.

      b. Since the analysis of the electrophysiology cannot be performed with markers or the transgenic zebrafish embryos using GFP, we chose areas for analysis closer to the middle of the morphological atrium in the nr2f1a mutant and WT sibling control embryo hearts that would be consistent with having Amhc+ expression and fgf13a:EGFP+ transgenic and Isl1 markers that were found from the analysis with immunohistochemistry. This strategy was schematized in Figure 3A and is now explicitly stated on lines 266 and 267 of the revised manuscript.

      2) The claim of a switch in cell identity or transdifferentiation is not demonstrated. This would require cell tracking or single-cell transcriptomics. I don't see how "AVC (..) [is] resolving to ventricular identity", since amhc seems to be maintained throughout the atrial chamber at all stages. The claim that "the number of vmhc+ only cardiomyocytes progressively increased" is not supported by Fig1S1. The expansion of pacemaker cells may result from cell ingression at the arterial pole. This hypothesis is in keeping with the expression of nr2f1a outside the heart tube in putative atrial progenitors (Duong, 2018). The phenotype upon nkx2.5 overexpression may also be interpreted along this line: ingression of pacemaker cells is delayed. The claim that "PC identity progressively expands throughout nr2f1a mutant atria" is not supported by the quantifications of a mean of 12 fgf13a+amhc+ cells at 96hpf (Fig. 2H), which is as many as fgf13a-amhc+ cells (Fig. 2G) and a quarter of the total amhc+ cells in Fig. 1J. The schema in Fig 6 does not reflect quantifications at 96hpf, which indicate the persistence of amhc+vmhc+ cells, amhc+ only, or amhc+fgf13a- in Fig 1S1 and 2G.

      "We did not observe effects on cell death or proliferation in the hearts of nr2f1a mutants": please provide the data, since proliferation was shown to be affected in mouse mutants (Wu, 2013).

      a. As indicated above in our response to the Essential Revisions comment 1.D, our quantification of cardiomyocytes indicates there are progressively fewer Amhc+/Vmhc+ cardiomyocytes in the nr2f1a mutant hearts (Figure 1J-L). The total number of Vmhc+ cardiomyocytes (Amhc+/Vmhc+ and Amhc-/Vmhc+) cardiomyocytes is increased in the nr2f1a mutant hearts relative to the WT sibling hearts. However, the number of Vmhc+-only (Amhc-/Vmhc+) cardiomyocytes, which reflect the ventricles, does not increase significantly in the n2f1a mutants and are not statistically different than their WT siblings at each of the stages, despite their trending that way (Figure 1 – figure supplement 2C). The total number of cardiomyocytes in the nr2f1a mutant hearts also is not increasing during these stages (Figure 1L). Along with the lack of cardiomyocyte death or proliferation (Figure 1 – figure supplements 3 and 4), this suggests that these hearts have more total Vmhc+ cardiomyocytes and the addition of Vmhc+-only cardiomyocytes is primarily coming from the cardiomyocytes in the Vmhc+/Amhc+ atrioventricular canal progressively losing Amhc expression. As indicated in the response to Essential Revisions comment 1.D, we have provided the individual image channels in a revised Figure 1 – figure supplement 1 and proportions of Vmhc+ cardiomyocytes in Figure 1 – figure supplement 2D to help clarify this issue.

      b. Regarding the transdifferentiation vs ingression of newly-differentiating cardiomyocyte hypotheses for the expansion of pacemaker markers, was addressed in the response to Essential Revision comment 2. Please see that comment for how we addressed this concern.

      3) The claim that "Nr2f1a maintains atrial nkx2.5 expression" or of a "progressive loss of Nkx2.5 within the ACs" needs to be further supported by quantification of the number of nkx2.5 positive cells in nr2f1a mutants. It seems that some cells in Fig. 4 co-express nkx2.5 and pacemaker markers in the mutant, which questions the repressive role of Nkx2.5. Following the observation of an nkx2.5 enhancer active next to pacemaker cells in control heart but absent in nr2f1a mutants, shouldn't we expect a gap of nkx2.5 expression next to pacemaker cells in mutants? It is unclear why pacemaker cells express nr2f1a (Fig. 6S1) but not nkx2.5. This needs clarification.

      a. The repressive role of Nkx2.5 with respect to pacemaker identity has been well documented in zebrafish and mice (Colombo et al., 2018). Nkx2.5 and Isl1 expression at the venous pole of zebrafish hearts are predominantly mutually exclusive, although there are a few cardiomyocytes at their borders that the express both Nkx2.5 and pacemaker markers. We recgonize that there are still some Nkx2.5-expressing cardiomyocytes that overlap with the pacemaker maker cardiomyocytes in the nr2f1a mutant hearts, as shown in Figure 4F. However, the majority of these cardiomyocytes have lower expression than the adjacent cardiomyocytes that form a border and do not have overlapping expression. Furthermore, as shown in Figure 4D-F and Figure 4 – figure supplement 2, the overall effect appears to be a regression of Nkx2.5+ expression in cardiomyocytes and corresponding expansion of pacemaker markers from the venous pole from 48 though 96 hpf in the nr2f1a mutant hearts, consistent with the established role of Nkx2.5 in repressing pacemaker identity. In the revised manuscript, we have provided each of the individual channels for the images in Figure 4 to better allow visualization of the different cardiomyocyte markers and a new supplemental figure showing the predominantly mutually exclusive expression of Nkx2.5 and Isl1 at the venous pole of zebrafish embryo hearts (Figure 4 – figure supplement 1).

      b. The expression of Nkx2.5 within the heart, like any gene, is likely controlled by multiple different regulatory elements. It is not clear to us why Reviewer #1 feels one would expect to see a gap in expression between Nkx2.5+ and pacemaker cardiomyocytes in the nr2f1a mutant hearts, unless Nkx2.5 was not required to repress pacemaker identity or there was a significant delay between loss of Nkx2.5 and gain of pacemaker markers. As indicated in the response to Essential Revisions comment 3.C, in the revised manuscript, we show experiments in which we have deleted the putative nkx2.5 enhancer element and found there is a loss of Nkx2.5+ and gain of fgf13a:EGFP+ cardiomyocytes in the atrium, as one might expect if the enhancer promotes or maintains Nkx2.5 expression in atrial cardiomyocytes that border the pacemaker cardiomyocytes. In the revised manuscript, this experiment is described in the Results (lines 348-364 and included in a revised Figure 6 and new Figure 6 – figure supplement 2.

      c. Please see our response to Essential Revision comment 3.A regarding the issue of Nr2f1a expression in pacemaker cardiomyocytes.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Warren et al., presents evidence suggesting that aberrant Yap signaling plays a role in epithelial progenitor cell dysregulation in lung fibrosis. This work builds on a body of work in the literature that Hippo signaling is aberrantly regulated in idiopathic pulmonary fibrosis. They use a combination of single nuclear and spatial transcriptomics, together with in vivo conditional genetic perturbations of Hippo signaling in mice, to investigate roles for Yap/Taz signaling in alveolar epithelial homeostasis and remodeling associated with exposure to a fibrosing agent, bleomycin. They show that Taz and Tead1/4 are most abundantly expressed by alveolar type 1 (AT1) cells, but Nf2 immunoreactivity (upstream activator of Hippo) is observed predominantly within airway and AT2 cells. Bleomycin exposure was associated with reduced p-Mst in regenerating alveolar epithelium, that inactivation of Yap/Taz arrested AT2>AT1 differentiation, and inactivation of either Nf2 or Mst1/2 promoted AT1 differentiation after bleomycin exposure and reduced matrix deposition/fibrosis. They go on to show that compromised alveolar regeneration resulting from inactivation of Yap/Taz results in enhanced bronchiolization of injured alveoli. Experiments are well designed and include quantitative endpoints where appropriate, data of high quality, and results are generally supportive of conclusions. These studies provide valuable new data relating to roles for the Hippo pathway in regulation of alveolar homeostasis and epithelial regeneration/remodeling in injury/repair and fibrosis.

      We thank the reviewer for their enthusiastic and constructive comments.

      Reviewer #2 (Public Review):

      The authors explored non-redundant, and potentially contrasting, roles of the Hippo effector transcription factors, YAP and TAZ, in the epithelial regenerative response to non-infectious lung injury. The strength of the work is the use of genetic mouse models that explored inducible loss of function of YAP and/or TAZ in an alveolar epithelial type 2 (AT2) specific manner. The main weakness of the work is that gene(s) inactivation was performed prior to lung injury and, therefore, does not take into account the contextual and dynamic nature of YAP/TAZ signaling; for example, work by other groups have shown that YAP/TAZ is activated early following injury followed by a decrease in activity, thus balancing proliferation and differentiation of AT2 cells (for review, see PMID: 34671628).

      We thank the reviewer for their enthusiastic and constructive comments.

      We agree that knocking out genes prior to injury might not take into account the contextual and dynamic nature of YAP/TAZ signaling. However, the Hippo pathway allows cells to sense changes in their environment. We have published that in the airway epithelium the Hippo pathway becomes inactivated upon naphthalene injury in surviving airway epithelial cells sensing the loss of their neighbors, to induce Wnt7b expression which then induces Fgf10 expression in airway smooth muscle cells to drive airway epithelial regeneration. Normally when regeneration is complete and cell density is restored the Hippo pathway reactivates and the repair cascade is inactivated. Knocking out Mst1/2 in airway epithelium chronically activates this cascade and leads to overproliferation of the airway epithelium. Interestingly, upon inactivation of Mst1/2 in the airway epithelium some airway epithelial cells also turn into AT1 cells.

      However, AT1 cells do not proliferate. As such we believe that inactivation of Mst1/2 or Nf2 in AT2 cells will not result in overproliferation but mainly promote AT1 cell differentiation. That being said there are other pathways and molecules that affect Yap/Taz nuclear localization. So inactivation of Mst1/2 or Nf2 in AT2 cells most likely primes/activates AT2 cells to regenerate AT1 cells but this decision is likely not binary.

      Reviewer #3 (Public Review):

      The manuscript entitled "Hippo signaling impairs alveolar epithelial regeneration in pulmonary fibrosis" is a rigorous and timely report detailing the significance of Hippo signaling, Taz and Yap in AT2/AT1 differentiation and the subsequent impact on the progression of lung fibrosis versus repair/ regeneration. The authors experimental design and results support their conclusions. The identification of the distinct effects of Taz and Yap in these processes highlight the pathway and specific molecules as potential therapeutic targets.

      The major strengths of these studies lie in the rigor of the elegant transgenic developmental/adult injuryrepair mouse models combined with spatial transcriptomics and analyses. The weaknesses include a lack of detail presented in the methods, some legends and discussion.

      We thank the reviewer for their enthusiastic and constructive comments. And have addressed the issues raised.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper showing that during amino acid starvation of Neurospora, the general amino acid control factors CPC-1 and CPC-3 are crucial to maintaining circadian rhythm at the levels of rhythmic growth and transcription of the FRQ gene. They show that deleting both genes leads to reduced and arrhythmic cell growth and FRQ transcription that can be accounted for by severely reduced occupancy of the FRQ promoter by the key transcription factor WCC. This defect in turn appears to result from diminished H3 acetylation of the FRQ promoter that was observed at least in the cpc-1 mutant, which is mediated by Gcn5. Thus, they show that Gcn5 occupancy at FRQ is rhythmic and impaired by cpc-1 knock-out, that CPC-1 occupies the FRQ promoter, and provide coIP evidence that Cpc-1 interacts with Gcn5 and Ada2 and, hence, could act directly to recruit these cofactors to the FRQ promoter. Importantly, they show that knock out of GCN5 eliminates rhythmic cell growth and FRQ expression (although surprisingly not FRQ mRNA abundance), as well as reducing H3ac levels and WCC binding at FRQ. They further show that TSA treatment can reverse the effects of histidine starvation on the circadian period in WT cells, and can partially restore rhythmic growth to histidine-starved cpc-3 cells, and that elimination of HDAC Hda1 increases H3ac at FRQ in WT cells. They provide some evidence that transcriptional activation of certain aa biosynthetic genes by CPC-1 is also rhythmic, although the evidence for this is not strong and it's unclear whether CPC-1 occupancy or its activation function would be periodic. They also did not address whether CPC-1 occupancy at FRQ is rhythmic.

      This work is important in providing convincing evidence that CPC-1-mediated induction of transcription factor CPC-3 in starved Neurospora cells mediates CPC-1-mediated recruitment of Gcn5 and acetylation of the FRQ promoter, which counteracts the function of histone deacetylase HDA1 to maintain high occupancy of the transcription factor WCC and attendant circadian rhythm of FRQ transcription. Although the work does not identify new regulatory circuits, such as rhythmic transcription of FRQ, the role of Gcn5, Hda1, and promoter histone acetylation in supporting transcriptional activation, and the general amino acid control response to amino acid starvation are all well-established mechanisms, the work is significant in showing how these pathways and mechanisms are integrated to maintain circadian rhythm in the face of amino acid limitation.

      There is an abundance of convincing experimental evidence provided to support the key claims just summarized above. However, there are a few instances in which additional experiments might be required to resolve a discrepancy in the data or provide stronger evidence to support a claim.

      Thanks for the comments. We have revised the manuscript as suggested.

      Reviewer #2 (Public Review):

      This study by Liu et al. investigates the mechanism that enables the Neurospora circadian clock to maintain robust molecular and physiological rhythms under conditions of nutrient stress. The authors showed that the nutrient-sensing GCN2 signaling pathway is required to maintain robust circadian clock function and output rhythms under amino acid starvation in the filamentous fungus Neurospora. Specifically, they observed that under amino acid starvation conditions, knocking out GCN2 pathway components GCN4 (CPC-1) and GCN2 (CPC-3) severely disrupts rhythmic transcription of core clock gene frequency (frq) and clock-regulated conidiation rhythm. They provided data to indicate that the observed disruptions are due to reduced binding of the White Collar (WC) complex to the frq promoter stemming from lower histone H3 acetylation levels. This prompted the authors to propose a model in which GCN2 (CPC-3) and GCN4 (CPC-1) are activated upon sensing amino acid starvation, recruit GCN-5 containing SAGA acetyltransferase complex to maintain robust histone acetylation rhythm at the frq promoter. They then performed a battery of assays to show that both GCN-5 and ADA-2 are necessary for maintaining robust H3ac, frq mRNA, and conidiation rhythms under normal conditions. To support that low H3ac level at the frq promoter is the cause for impaired WC binding and frq transcription, they demonstrated they can partially rescue the observed rhythm defects of the knockout mutants under amino acid starvation using an HDAC inhibitor. Finally, the authors used RNA-seq to identify genes and pathways that are differentially activated by GCN4 (CPC-1) under amino acid starvation conditions. Many of these genes are involved in amino acid metabolism and they showed that 3 of them exhibit rhythmic expression in WT but low and non-rhythmic expression in the CPC-1 KO strain.

      Strength: The 24-hour period length of the circadian clock is known to be stable over a range of environmental and metabolic conditions because of circadian compensation mechanisms. Whereas temperature compensation (maintenance of circadian period length over a physiological range of temperature) has been studied extensively in multiple model organisms, the phenomenon of nutritional compensation and its underlying mechanisms are poorly understood. This study provides new insights into this important yet understudied area of research in chronobiology. In addition to advancing our understanding of fundamental mechanisms governing clock compensation mechanisms, this study also adds to our understanding of metabolic regulation of rhythmic biology and the relationship between nutrition and healthy biological rhythms. Given that the GCN2 nutrient-sensing pathway is broadly conserved beyond Neurospora, findings from this study will likely be relevant to other eukaryotic systems.

      The authors provided strong evidence supporting their claims that the GCN2 signaling pathway is important for maintaining the robustness of the Neurospora clock under conditions of amino acid starvation. The authors performed parallel experiments in normal (no 3-AT) vs amino acid-starved conditions (+3-AT). Their observations of relatively minor disruptions of molecular and conidiation rhythms in cpc-3 and cpc-1 KO strains in normal nutrient conditions compared to starvation conditions support their model that sensing of amino acid starvation by GCN2 pathway-induced changes at the chromatin and transcriptional level that are necessary to maintain a robust frq oscillator. Without the comparison between normal vs amino acid starved conditions, this part of their model will not be as strong.

      Previously Karki et al. (2020) showed that rhythmic activation of GCN2 kinase is regulated by the clock, resulting in clock-control rhythmic translation initiation. This study uncovers an additional mechanism through which GCN2 pathway modulates circadian rhythms by regulating histone acetylation of rhythmic genes. RNA-seq as described in Figure 7 provides some potential targets.

      Thanks for the comments and suggestions. We have revised the manuscript as suggested.

      Weakness:

      (1) The authors propose a model (Figure 8) in which the GCN2 pathway is ,activated by amino acid starvation and recruits the SAGA complex to promote histone acetylation level at the frq promoter. There is however no data in this study showing that the GCN2 pathway is activated in amino acid-starved conditions, only that it is required to maintain robust frq and conidiation rhythms. The authors should clarify how they are defining "activation of the GCN2 pathway" in this study. For example, is it recruitment of GCN-5 and SAGA complex to frq promoter?

      Thanks for the question. CPC-3, the GCN2 homolog in Neurospora, is the only eIF2α kinase responsible for eIF2α phosphorylation at serine 51(Karki S et al. 2020, PMID: 32355000). As shown in the revised Figure 1-figure supplement 1A, the eIF2α phosphorylation and CPC-1 were induced by 3-AT treatment in the WT but not in the cpc-3KO strain. These results demonstrate that the GCN2 pathway is activated by amino acid starvation, and as a result, the CPC-1 expression is activated to recruit the SAGA complex to the frq promoter.

      (2) The experiments to examine the involvement of GCN-5 and ADA-2 were performed in normal conditions (no amino acid starvation). Unlike cpc-1 and cpc-3 KO strains, gcn-5 and ada-2 KO strains showed severely disrupted frq rhythms in normal nutrient conditions, suggesting they are normally required for robust circadian rhythms. If GCN-5 and the SAGA complex are normally involved in regulating H3ac rhythms in the frq loci, how does GCN2 pathway modulates the activity of GCN-5 and SAGA complex in conditions of amino acid starvation? Are the interactions between GCN2/4 with GCN-5 and SAGA complex different in normal vs amino acid starved conditions? The authors should clarify their model.

      As mentioned above, our data suggested that GCN-5 and ADA-2 are required for robust circadian rhythms under normal conditions. As suggested, we did detect dampened rhythmic expression of frq in the gcn-5KO and ada-2KO strains under amino acid starvation (Figure 5D and 5E and Figure 5–figure supplement 1E and 1F). We also performed Co-IP to compare the difference of interactions between CPC-1 with ADA-2 and GCN5 with ADA-2 under normal and amino acid starved conditions. The results showed that although the Myc.GCN-5, MYC.CPC-1 or Flag.ADA-2 protein level was repressed by 3 mM 3-AT treatment (likely due to global translational inhibition by induced eIF2α phosphorylation) (Karki S et al. 2020, PMID: 32355000), the interactions between CPC-1 with ADA-2 and GCN-5 with ADA-2 were almost the same under normal and amino acid starved conditions (IP was normalized with Input) (Figure 4B and 4C). These results indicated that amino acid starved conditions had little impact on the protein interactions between CPC-1 with GCN-5 and SAGA complex.

      In our model, we proposed that amino acid starvation resulted in compact chromatin structure (due to decreased H3ac) in the frq promoter in the WT strain (Figure 3B), likely due to activation of histone deacetylases or inhibition of histone acetyltransferases. Amino acid starvation activates GCN2 pathway and induces CPC-1 expression. The induced CPC-1 can recruit GCN5-containing SAGA complex to the frq promoter to loosen the chromatin structure, promoting frq rhythmic transcription under starvation conditions. However, in the cpc-3KO mutants, CPC-1 could not effectively recruit GCN5 containing SAGA complex to frq promoter, resulting in arrhythmic frq transcription. We have now clarified our model in the revised discussion.

      (3) Given that the GCN2 pathway is important for nutrient sensing, the authors should not disregard the alternative hypothesis that the GCN2 pathway may be important for nutrient compensation and plays a role in maintaining the robustness of rhythms in a range of nutrient conditions.

      Thanks for the suggestion. We now discussed the alternative hypothesis in the revised manuscript. “Because GCN2 signaling pathway is important for nutrient sensing, it may be important for nutrient compensation and plays a role in maintaining the robustness of rhythms in a range of nutrient conditions”.

      (4) The authors should use circadian statistics to compute the phase and amplitude of the mRNA, DNA binding of the WC complex, and H3Ac rhythms. This will allow them to compare between rhythms and provide statistical significance values, rather than just providing qualitative descriptions. This will be valuable when comparing rhythms between strains and between nutrient conditions.

      As suggested, we used CircaCompare to analyze our data.

      Reviewer #3 (Public Review):

      This is an important paper anchored by the observation that cultures of Neurospora undergoing amino acid starvation lose circadian rhythmicity if orthologs in the classic GCN2/CPC-3 cross-pathway control system are absent. Data convincingly show that Neurospora orthologs of Saccharomyces GCN2 and GCN4 (CPC-3 and CPC-1 respectively) are needed to promote histone acetylation at the core clock gene frequency to facilitate rhythmicity. While the binding of CPC-1 and thereby GCN-5 are plainly rhythmic, the explanation of exactly where rhythmicity enters the pathway is incomplete.

      Figure 1 shows that inhibition of the HIS-3 activity affected by 3-AT, which should trigger cross-pathway control, is correlated with a graded reduction in the amplitude of the rhythm, and eventually to arrhythmicity at 3 mM 3-AT. While normalized data are shown in Figure 1B, raw data should also be provided in the Supplement as sometimes normalization hides aspects of the data. Ideally, this would be on the same scale in wt and in mutant strains.

      We revised as suggested and added the raw data. The results are now shown in Figure 1–figure supplement 2A and 2B and Figure 5–figure supplement 1B and 1C.

      Figure 2. The logical conclusion from Fig 1 is that circadian frq expression driven by the WCC has been impacted by amino acid starvation in the mutants. If so, either WC-1/WC-2 levels might be low, or else they might not be able to bind to DNA. When this was assessed, ChIP assays showed a loss of DNA binding. Although documented, an interesting result is that WCC protein amounts are sharply increased, especially for WC-1. The authors could comment on possible causes for this.

      Line 176, "hypophosphorylation of WC-1 and WC-2 (which is normally associated with WC activation . . . )". While the authors are correct that this is often the case it is not always the case and this introduces a potentially interesting caveat. That is, the overall phosphorylation status of WCC does not always reflect its activity in driving frq transcription. This was first noticed by Zhou et al., (2018 PLOS Genetics) who reported that even though WCC is always hyperphosphorylated in ∆csp-6, the core clock maintains a normal circadian period with only minor amplitude reduction. This should be noted, cited, and discussed.

      Thanks for the suggestion. We revised the manuscript as suggested, “It should be noted that the overall phosphorylation status of WCC does not always reflect its activity in driving frq transcription, possibly due to the unknown function of multiple key phosphosites on WCC (Wang et al., 2019; X. Zhou et al., 2018)”.

      Figure 2 and Figure 2 Suppl. report different gel conditions and show that the sharply increased WC1/WC-2 levels seen in Fig 2 resulting from 3-AT treatment of the cpc pathway mutants are due to the accumulation of hypophosphorylated WC-1/2. The conclusion would be stronger if the gels in the Supplement showed the same degree of difference between wt and mutants as seen in Fig 2. In any case, these hypophosphorylated WC should be active and able to bind DNA but plainly are not based on Fig 2.

      Thanks for the comments. It’s correct that WC-1/WC-2 were hypo-phosphorylated and their protein levels were increased (Figure 2 and Figure 2-figure supplement 1). However, the reduced binding of WC-1/WC-2 at the frq promoter explains for the reduced frq transcription in the cpc-1KO or cpc-3KO mutants under amino acid starvation.

      Figure 3 correlates the unexpected loss of DNA binding by hypophosphorylated WCC with reduced histone H3 acetylation at frq. The 3 mM 3-AT reported to result in arrhythmicity in cpc mutants in Figures 1 and 2 results in a small (~20%?) and not statistically significant reduction in H3 acetylation in wt, compatible with the sustained rhythms seen in wt in Figure 1, but in a substantial (~5 fold) loss of binding in the ∆cpc-1 background; so CPC-1 is needed for H3 acetylation at frq to sustain the rhythm during amino acid starvation. The simplest explanation here then is that the hypophosphorylated WCC cannot bind to DNA because the chromatin is closed due to decreased AcH3.

      Thanks for the comments.

      Figure 4. Title:" Figure 4. CPC-1 recruits GCN-5 to activate frq transcription in response to amino acid starvation"; the conditions of amino acid starvation should be mentioned here for the reader's benefit. (In the unlikely case that there was no amino acid starvation here then many things about the manuscript need to be reconsidered.)

      Based on the model from yeast where amino acid starvation activates GCN2 (aka CPC-3 in Neurospora) kinase which activates the transcriptional activator GCN4 (aka CPC-1) which recruits the SAGA complex containing the histone acetylase GCN5 to regulated promoters, CPC-1 was tagged and shown by ChIP to bind rhythmically at frq. Co-IP experiments establish the interaction of components of the SAGA complex in Neurospora and Neurospora GCN-5 indeed is bound to frq, likely recruited by CPC-1. This part all follows the Saccharomyces model with the interesting twist that the binding CPC-1 is weakly rhythmic and GCN-5 strongly rhythmic in a CPC-1-dependent manner. Based on the figure legend title, these cultures should always be starved for amino acids (although as noted this should be made explicit in the figure legend). In any case, given this, from where does the rhythmicity in GCN-5-binding arise? This question is developed more below.

      Line 224, "low in the cpc-1KO strain, suggesting that CPC-1 rhythmically recruit GCN-5". Because ChIP was done only for a half circadian cycle (DD10-22), it is hard to conclude "rhythmically". The statement should be modified.

      To address the concern, we performed the ChIP assay using the CPC-1 antibody instead of Myc antibody (revised Figure 4A). Analysis of the ChIP results with CircaCompare showed that CPC-1 binding at the frq promoter was rhythmic without 3-AT (Figure 4A) or with 3 mM 3-AT treatment (Figure 4-figure supplement 1A). Due to the ADA-2-GCN5 and CPC-1-ADA-2 interactions with/without 3-AT treatment (Revised Figure 4B-C), CPC-1 should be able to recruit GCN-5-containing SAGA complex to activate frq transcription in response to amino acid starvation. We have now clarified this model in the revised manuscript. Please also see response to Reviewer 2/point 5.

      It was previously reported that the CPC-3/CPC-1 signaling pathway was rhythmically controlled by circadian clock, as indicated by CPC-3-mediated rhythmic eIF2α phosphorylation at serine 51 (Karki S et al. 2020, PMID: 32355000). Our data showed rhythmic CPC-1 and GCN-5 binding at the frq promoter in the WT strain and decreased GCN-5 binding in the cpc-1KO mutant (Figure 4A and 4D). These results suggested that the circadian clock controlled the CPC-3/CPC-1 signaling pathway rhythmically, which in turn promoted the rhythmic frq transcription through recruiting GCN5 containing SAGA complex under amino acid starvation. We clarified the model and description in the discussion.

      As suggested by the reviewer, we modified the statement "suggesting that CPC-1 recruits GCN-5-containing SAGA complex to the frq promoter".

      Figure 5 shows that rhythmicity in general and of frq/FRQ specifically requires GCN-5 even under conditions of normal amino acid sufficiency, and that normal levels of H3 acetylation and its rhythm at frq require GCN-5. Not surprisingly, high H3 acetylation at frq correlated with high WC-2 DNA binding, and ADA-2 is required for SAGA functions.

      As earlier, raw bioluminescence data corresponding to panel B should be provided in the figure or Supplement.

      Also, if CPC-3 and CPC-1 regulate frq transcription through GCN-5, why is the frq level extremely low in the cpc-3KO or cpc-1KO(Fig.1D) but remains normal in gcn-5KO (Fig. 5D)?

      Raw bioluminescence data are listed in Figure 5–figure supplement 1B and 1C. For frq transcription in the WT and gcn-5KO mutant, please see response to Essential Revisions point 4.

      Figure 6 documents the counter effects of TSA which inhibits histone deacetylation and shortens the period versus 3-AT which decreases (via CPC-3 to CPC-1 to GCN-5) histone acetylation and causes period lengthening or arrhythmicity. HDA-1 is necessary for histone deacetylation at frq.

      Thanks for the comments.

      Figure 7 documents extensive changes in gene expression associated with 3-AT-induced amino acid starvation and the CPC-3 to CPC-1 pathway. How do these results compare with other previously studied systems, particularly Saccharomyces, where similar experiments have been done? Are the same genes regulated to the same extent or are there some interesting differences?

      Thanks for the suggestion. We revised our manuscript by comparing the difference of these genes in Saccharomyces. GCN4/CPC-1 targets are similar. “Similar to Saccharomyces cerevisiae (Natarajan et al., 2001), genes in amino acid biosynthetic pathways, vitamin biosynthetic enzymes, peroxisomal components, and mitochondrial carrier proteins were also identified as CPC-1 targets”.

      Figure 8 provides a model consistent with the role of the CPC-3/GCN2 pathway in regulating genes in response to amino acid starvation. It seems this could be any gene responding to amino acid starvation.

      Not accounted for in the model is the data from Fig 4 which show the rhythmic binding of CPC-1 and stronger rhythmic binding of GCN-5 to frq, both under amino acid starvation. In the presence of 3-AT, amino acid starvation is constant, which should mean that CPC-3 and CPC-1 would always be "on". Why doesn't CPC-1 recruit GCN5 at the same level at all times leading to constant high H3 acetylation rather than rhythmic H3 acetylation as seen in Figure 3? Perhaps, unlike the statement in lines 345-34, it is WCC that regulates rhythmic GCN-5 binding and facilitates rhythmic histone acetylation at frq. Or perhaps the clock introduces rhythmicity upstream from GCN5. Without an answer to the question of where rhythmicity comes into the pathway, the story is only about how the CPC-3/GCN2 pathway in regulating genes in response to amino acid starvation; without explaining the rhythmicity the story seems incomplete.

      It was previously reported that the CPC-3/CPC-1 signaling pathway was rhythmically controlled by circadian clock, as indicated by CPC-3-mediated rhythmic eIF2α phosphorylation at serine 51 (Karki S et al. 2020, PMID: 32355000). Our data showed rhythmic CPC-1 and GCN-5 binding at the frq promoter in the WT strain and decreased GCN-5 binding in the cpc-1KO mutant (Figure 4A and 4D). These results suggested that the circadian clock controlled the CPC-3/CPC-1 signaling pathway rhythmically, which in turn promoted the rhythmic frq transcription through recruiting GCN5 containing SAGA complex under amino acid starvation. We clarified the model and description in the discussion.

    1. Author Response

      Reviewer 2 (Public review):

      A quasi-experimental before and after design as the methodological intention should be stated in the article. Although there are equally powerful alternatives with arguably less-stringent requirements that are appropriate and well-tested for natural experiments such as that intervened by the COVID-19 pandemic given the simulation methods, as of now obtaining the actual stage distribution of cancer and the cancer-specific mortality rates before and after the pandemic is possible for making scientifically valid conclusions based on observed data to support the simulation study.

      We agree with the reviewer that a modelled before-and-after analysis would have been informative. However, the required mortality and cancer stage distribution data to inform this analysis is not yet available for Australia. In future, our modelled predictions can be compared to emergent observed national stage and mortality data. The current paper presents estimates that were modelled during rapid-response modelling commissioned by the Australian Government early in the pandemic. Findings therefore demonstrate what could be done with the information available at that time. We have amended, as shown in bold below, the end of the introduction as follows:

      “We demonstrate what could be estimated by a rapid response evaluation based on information available early in the pandemic, and discuss how these estimates relate to subsequent observed disruptions to screening. In future, our modelled predictions can be compared to emergent observed national stage and mortality data.”

      The screening disruption is the only concerned parameter in modelling the change of cancer progression in this study. But delayed diagnosis after screening as another concern could be possibly affected by the pandemic. This should be taken into consideration in the simulation. The authors also claimed the cancer treatment could also be affected by the pandemic, the evaluation on mortality is therefore not feasible. However, the impacts of COVID-19 pandemic on the delayed treatment and cancer treatment are important issues which should be covered by simulation study.

      We clearly state that this is a limitation of the current study. We have added the following sentence to the discussion, lines 377-379.

      ‘These effects will be incorporated in future modelled evaluations, after careful calibration and validation to observed data, with a view to extending the modelled outcomes to mortality estimates.’

      By simulations, the confident intervals for the outcomes should be provided as the requirement to determine the required reliability for the estimates.

      The manuscript aims to present indicative estimates for a range of scenarios, with numerous simplifying assumptions as indicated. In this context, generating meaningful uncertainty intervals is not feasible or appropriate.

    1. Author Response

      Reviewer #1 (Public Review):

      There has been a lot of work showing that multi-peaked tuning curves contain more information than single peaked ones. If that's the case, why are single-peaked tuning curves ubiquitous in early sensory areas? The answer, as shown clearly in this paper, is that multi-peaked tuning curves are more likely to produce catastrophic errors.

      This is an extremely important point, and one that should definitely be communicated to the broader community. And this paper does an OK job doing that. However, it suffers from two (relatively easily fixable) problems:

      I) Unless one is an expert, it's very hard to extract why multi-peaked tuning curves lead to catastrophicerrors.

      II) It's difficult to figure out under what circumstances multi-peaked tuning curves are bad. This isimportant, because there are a lot of neurons in the sensory cortex, and one would like to know whether multi-peaked tuning curves are really a bad idea there.

      And here are the fixes:

      I) Fig. 1c is a missed opportunity to explain what's really going on, which is that on any particular trialthe positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). However Fig. 1c shows the average log likelihood, which makes it hard to understand what goes wrong. It would really help if Fig. 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multi-peaked tuning curves but not for single peaked ones. You could also indicate why, when multi-peaked tuning curves do give the right answer, the error tends to be small.

      We thank the reviewer for this suggestion. We have now split the first figure into two.

      In the new Figure 1, we provide an intuitive explanation of local vs catastrophic errors and single-peaked / periodic tuning curves. We have also added smaller panels to illustrate how the distribution of errors changes with decoding time (using a simulated single-peaked population).

      The new Figure 2 shows sampled likelihoods for 3 different populations. We hope this provides some intuitive understanding of the phase shifts. Unfortunately, it proved difficult not to normalize the “height” of each module’s likelihood as they can differ by several orders of magnitude across the modules. However, due to the multiplication, the peak likelihood values can (approximately) be disregarded in the ML-decoding. Lastly, we have also added more simulation points (scale factors) compared to what we had in the earlier version of the figure (see panels d-e).

      II) What the reader really wants to know is: would sensory processing in real brains be more efficient ifmulti-peaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with single peaked tuning curves and a good code with multi-peaked tuning curves. My guess is that a good code would have lambda_1=1 and c around 0.5 (you could use the module ratio the grid cell people came up with -- I think 1/sqrt(2) -- although I doubt if it matters much). My guess is that it's the total number of spikes, rather than the number of neurons, that matters. Some metric of performance (see point 1 below) versus the contrast of the stimulus and the number of spikes would be invaluable.

      We thank the reviewer for this comment and the suggestions. We agree, ideally such an expression would be useful. However, as you note it is a very challenging task due to the large parameter space (number of neurons, peak amplitude, spontaneous firing rate, width of tuning, stimulus dimensionality etc) and is beyond the scope of the present study. We have instead included a new figure (see Figure 7 in the manuscript) detailing the minimal decoding times for various choices of parameter values. We believe this gives an indication to how minimal decoding time scales with various parameters.

    1. Author Response:

      Reviewer #1 (Public Review):

      […] This novel system could serve as a powerful tool for loss-of-function experiments that are often used to validate a drug target. Not only this tool can be applied in exogenous systems (like EGFRdel19 and KRASG12R in this paper), the authors successfully demonstrated that ARTi can also be used in endogenous systems by CRISPR knocking in the ARTi target sites to the 3'UTR of the gene of interest (like STAG2 in this paper).

      We thank the referee for highlighting the novelty and potential of the ARTi system.

      ARTi enables specific, efficient, and inducible suppression of these genes of interest, and can potentially improve therapeutic target validations. However, the system cannot be easily generalized as there are some limitations in this system:

      • The authors claimed in the introduction sections that CRISPR/Cas9-based methods are associated with off-target effects, however, the author's system requires the use CRISPR/Cas9 to knock out a given endogenous genes or to knock-in ARTi target sites to the 3' UTR of the gene of interest. Though the authors used a transient CRISPR/Cas9 system to minimize the potential off-target effects, the advantages of ARTi over CRISPR are likely less than claimed.

      We thank the reviewer for raising these very valid concerns about potential off-target effects related to the CRISPR/Cas9-based gene knockout or engineering of endogenous ARTi target sites. In contrast to conventional RNAi- and CRISPR-based approaches, such off-target effects can be investigated prior to loss-of-function experiments through comparison between parental and engineered cells, which in the absence of CRISPR-induced off-target events are expected to be identical. Subsequent ARTi experiments provide full control over RNAi-induced off-target activities through comparison of target-site engineered and parental cells. However, we agree that undetected CRISPR/Cas9-induced off-target events cannot be ruled out in a definitive way, which we will point out in our revised manuscript.

      • Instead of generating gene-specific loss-of-function triggers for every new candidate gene, the authors identified a universal and potent ARTi to ensure standardized and controllable knockdown efficiency. It seems this would save time and effort in validating each lost-of-function siRNAs/sgRNAs for each gene. However, users will still have to design and validate the best sgRNA to knock out endogenous genes or to knock in ARTi target sites by CRISPR/Cas9. The latter is by no-means trivial. Users will need to design and clone an expression construct for their cDNA replacement construct of interest, which will still be challenging for big proteins.

      We fully agree that the required design of gene-specific sgRNAs and subsequent CRISPR-engineering steps are by no means trivial. However, we believe that decisive advantages of the method, in particular the robustness of LOF perturbations and additional means for controlling off-target activities, can make ARTi an investment that pays off. In our experience, much time can be lost in the search for effective LOF reagents, and even when these are found, questions about off-target activity remain. While ARTi overcomes many of these challenges by providing a standardized experimental workflow, we do not propose to replace all other LOF approaches by this method. Instead, we would position ARTi as a unique orthogonal approach for the stringent validation and in-depth characterization of candidate target genes, as we will highlight in our revised discussion.

      • The approach of knocking-out an endogenous gene followed by replacement of a regulatable gene can also be achieved using regulated degrons, and by tet-regulated promoters included in the gene replacement cassette. The authors should include a discussion of the merits of these approaches compared with ARTi.

      We thank the reviewer for pointing out these alternative LOF methods. We had already included a brief discussion of chemical-genetic LOF methods based on degron tags. While we certainly share the current excitement about degron technologies, they inevitably require changes to the coding sequence of target proteins, which can alter their regulation and function in ways that are hard to control for. In our revised discussion, we will add a brief comparison to conventional tet-regulatable expression systems, which unlike ARTi require the use of ectopic tet-responsive promoters. Overall, we would position ARTi as an orthogonal tool that enables inducible and reversible LOF perturbations without changing the coding sequence and the endogenous transcriptional control of candidate target genes.

      Reviewer #2 (Public Review):

      […] The system is very innovative, likely easy to be established and used by the scientific community and thus very meaningful.

      We thank the reviewer for their enthusiasm about ARTi.

  4. Feb 2023
    1. Author Response

      Reviewer #1 (Public Review):

      Starrett, Gabriel et al. investigated 43 bladder cancers (primary tumors), 5 metastases and 14 normal tissues from 43 solid organ transplant recipients of 5 Transplant Cancer Match Study participating registries (US) for the presence of viral genetic signatures, their host genome integration and possible contribution in carcinogenesis. They isolated DNA and RNA from FFPE tissues to perform state of the art whole genome and transcriptome sequencing. They find that 20 of the primary tumors, 3 of the metastases and 7 of the normal tissues harbor viral signatures with BKPyV and JCPyV being the most prevalent viruses detected. The bulk of the experiments focuses on the 9 BKPyV-positive primary tumors. They report that several of the BKPyV-positive tumors show host genome integration of BKPyV with associated focal amplifications of adjacent host chromosome regions, with chromosome 1 being the most prevalent. Furthermore, BKPyV-positive tumors show a distinct transcriptomic signature with gene expression changes related to DNA damage responses, cell cycle progression, angiogenesis, chromatin organization, mitotic spindle assembly, chromosome condensation/separation and neuronal differentiation. The authors only touch the features of other virus-positive tumors, e.g. those with JCPyV and HPV signals, without offering further detail or thought. The overall mutation signature analysis reveals no clear correlation between presence of viral sequences and tumor mutation burden suggesting that many different, virus-unrelated, factors possibly contribute to bladder cancer genesis and progression. Most striking are cases potentially linked to aristolochic acid, ABOBUCK3 and SBS5. Thus, while the approach is state-of-the-art, the causality of viral signatures and oncogenesis and vice versa remains unsolved.

      Strengths:

      1) The study assesses 43 primary tumors, 5 metastases and 14 normal tissues from 43 solid organ transplants of different kinds (24x kidney, 4x liver, 14x heart and/or lung, 1x pancreas) rather than just focusing on a particular organ.

      2) The study makes use of whole genome sequencing and transcriptomics and the assayed material is extracted from FFPE tissue, which shows a high level of practical, technical and computational skills and expertise.

      Weaknesses:

      1) There have been multiple inconsistencies in sample number and figure references throughout the publication. Is it 19 or 20 cases that have viral sequences detected? A comprehensive checker board table showing all cases, the available tissue samples and respective analyses would be in order.

      We would like to thank the reviewer for their detailed assessment of the manuscript. A checkerboard table of all samples tissues and analysis has been added as supplemental table 1 (Supplementary file 1a).

      2) The overall low coverage of the whole genome sequencing, which the authors mention, and the relatively big variation in coverage in both datasets (WGS, transcriptomics) are major limitations of the study. Possibly, this was done to increase specificity, but sorting out and discarding reads may also be problematic. Please comment.

      Besides performing quality and adapter trimming as described in the methods, we did not discard any reads. Experimental design and analysis were conducted to be as inclusive as possible considering the rarity of these specimens.

      Reviewer #2 (Public Review):

      Starrett et al performed whole genome and transcriptome sequencing of bladder cancers from 43 organ transplant recipients. They found that most of these tumors contained DNA from one of four viruses (BKPyV, JCPyV, HPV, and TTV). Viral genomes are most often integrated into the genomes of these tumor cells and the authors provide evidence that the integration utilized the POL theta-mediated end joining pathway. In most cases, viral RNA was detected in tumors with viral DNA. This suggests that the viruses are actively altering the cellular environment. Frequently, this resulted in similarities for overall gene expression patterns in the tumors that were grouped by the type of virus present in the tumor. Moreover, the changes in expression linked with viral gene expression were found in genes relevant to tumorigenesis. Immunohistochemical detection of viral proteins in these tumors also demonstrated active viral gene expression. However, the presence of viral proteins was heterogenous within the tumor, with between 1 and 100% of the tumor staining positive for BKPyV large T antigen. An analysis of mutational signatures in these tumors indicate that the viruses are also shaping the tumor genome by inducing mutations. Evidence that specific viruses are contributing to tumorigenesis in organ transplant patients has fundamental implications for preventing tumorigenesis in these patients.

      The conclusions of this paper are generally well supported by the data provided. Indeed, there is little doubt that viral infections are more likely in these tumors. However, there are aspects of the paper that could be improved and or clarified. Most importantly, despite the strong evidence that the viruses are altering the tumor cell environment, it is unclear if these changes are necessary for tumorigenesis or less excitingly the result of an even more immune suppressive environment within the tumor. The heterogeneity of the LT expression suggests that the presence of the viral DNA and RNA may not be enough to assess whether it is actively contributing to the tumor. Is an increased frequency of viral protein staining linked with any evidence of an active contribution to tumorigenesis (fewer tumor-suppressor/oncogene mutations). that they reduced mutations in tumor suppressors. This might be easiest to assess with the tumors that have oncogenic HPV DNA. If those tumors lacked p53 and RB mutations, it would support a causative role for the virus.

      We thank the reviewer for their thoughtful review. Indeed, in Figure 6 we show that no BKPyV-positive or HPV-positive tumor harbored mutations in RB1. Additionally, only one BKPyV-positive tumor and none of the HPV-positive tumors had a mutation in TP53. We have added further emphasis to this point on page 14, “None of the HPV-positive tumors with WGS harbored mutations in TP53 or RB1. Similarly, none of the polyomavirus-positive tumors harbored mutations in RB1 and only TBC08 had a frameshift mutation in TP53.”

    1. Author Response

      Reviewer #1 (Public Review):

      In mammals, a small subset of genes undergoes canonical genomic imprinting, with highly biased expression in function of parent of origin allele. Recent studies, using polymorphic mouse embryos and tissues, have reevaluating the number of allele-specific expressed genes (ASE) to 3 times more than previously thought, however with most of these novel genes showing a very low ASE (50%-60% bias toward one parental allele). Here, the authors undergo a comparison of 4 datasets and complete bioinformatic reanalysis of 3 recent allele specific RNAseq to study potential novel imprinted genes, using recently released iSoLDE pipeline. Very few genes have been confirmed with true ASE in the different studies and/or validated by pyrosequencing analysis, However, the authors show that most of the newly discovered ASE genes are lying in close proximity of already known imprinted loci and could be co-regulated by these imprinted clusters. This is important to understand how and to which extent imprinted control regions control gene expression.

      This manuscript highlights the number of potential false discovered imprinted genes in previous datasets that could result to either lack of replicates, weak allelic ratio or low gene expression and lack of read depth. But the lack of overlap in the ASE called genes (at the exception to the known imprinted genes) between the different datasets is worrying and important to discuss, as the authors did. I would have appreciated more details into the differences between the different datasets that could explain the lack of reproducibility : library preparation protocol, sequencer technology, SNP calling, number of reads per SNP, bioinformatics pipeline.

      We agree and a comparison of all the studies is included in the methods section. In particular, we have now included more information on SNP calling and sequencer technology.

      Studying allele specific expression of lowly expressed genes is difficult by technology based on PCR amplification (library preparation, pyrosequencing) and could result on a bias expression only due to the random amplification of a small pool of molecules. Could the author compare the level of expression of their different classes of genes? The more robust ASE genes in their study could be the more highly expressed? Several genes were identified only in one or two of the previous studies, were they expressed in the other studies when not define as ASE? This would also allow defining a threshold of expression to study allelic bias in the future. To conclude, this study is an important resource for the epigenetic field and better understand genomic imprinting.

      We thank-you for this suggestion. We have now taken all RNAseq data that we had run through the ISoLDE pipeline and extracted the transcripts per million (TPM) expression levels for each of the genes called in the original studies. We find no over representation of lowly expressed genes in the novel biased genes compared with known imprinted genes. We also looked specifically at the expression levels of the genes tested by pyrosequencing in these datasets and saw no relationship between validation and expression levels. Expression levels are consistent between studies, especially in the same tissue, indicating the lack of reproducibility between studies is not due to differing expression. These observations have been added to the manuscript.

      Reviewer #2 (Public Review):

      This work aims to understand genomic imprinting in the mouse and provide further insight to challenges and patterns identified in previous studies.

      Firstly, genomic imprinting studies have been surrounded by controversy especially ~10 years ago when the explosion of sequencing data but immature methods to analyze it lead to highly exaggerated claims of widespread imprinting. While the methods have improved, clear standards are not set and results still have some inconsistencies between studies. The authors first do a meta-analysis of previous studies, comparing their results and doing a useful reanalysis of the data. This provides some valuable insights into the reasons for inconsistencies and guides towards better study designs. While this work does not exactly set a common standard for the field, or provide a full authoritative catalog of imprinted loci in mouse tissues, it provides a step in that direction. I find these analyses relatively simple and straightforward, but they seem solid.

      Previous studies have described a relatively common pattern of subtle expression bias towards one parental allele, rather than the classical imprinting pattern of fully monoallelic expression. This work digs deeper into this phenomenon, using first the meta-analysis data and then also targeted pyrosequencing analysis of selected loci. The analysis is generally well done, although I did not understand why gDNA amplification bias was not systematically corrected in all cases but only if it was above a given (low) threshold. I doubt this would affect the results much though. To some extent the results confirm previously observed patterns (bimodal distribution of either subtle or full bias, and effect of distance from the core of the imprinted locus). The novel insights mostly concern individual loci, with discovery and validation of some novel genes, typically with a subtle or context-specific parental bias.

      The study also provides some insights into mechanisms, especially by analysis of existing mouse models with a deletion of the ICR of specific loci. The change in the parental bias pattern was then used to infer potential methylation and chromatin-related mechanisms in these imprinted loci, including how the subtle bias further away is achieved. There are interesting novel findings here, as well as hypotheses for further research. However, this is an area where the conclusions rely quite heavily on published research especially as this study doesn't include single-cell resolution, and it's not entirely clear how much of e.g. the Figure 7 mechanisms part is based on discoveries of this study.

      We agree that Figure 7 does not illustrate models based exclusively on data generated in this study: instead, it serves as hypotheses to be tested in the coming years

      Imprinting is a fascinating phenomenon that can be informative of mechanisms of genome regulation and parental effects in general. It is a bit of a niche area though, and the target audience of this study is likely going to be limited to specialists doing research on this specific topic. As the authors point out, the functional importance of the findings is unknown.

    1. Author Response:

      Reviewer #1 (Public Review):

      1) All feeding data presented in the manuscript are from the interactions of individual flies with a source of liquid food, where interaction is defined as 'physical contact of a specific duration.' It would be helpful to approach the measurement of feeding from multiple angles to form the notion of hedonic feeding since the debate around hedonic feeding in Drosophila has been ongoing for some time and remains controversial. One possibility would be to measure food intake volumetrically in addition to food interaction patterns and durations (e.g. via the modified CAFE assay used by Ja).

      We acknowledge that our FLIC assays address only one dimension of feeding behavior, physical interaction with liquid food. However, there is clear evidence that interactions are strongly predictive of consumption, and it would be technically difficult to measure feeding durations at the resolution of milliseconds using a Café assay.  Nevertheless, we appreciate the spirit of this comment and agree that expanding our inference to other measures of feeding, as well as feeding environments, is an important next step. To this end, we will include measures of feeding on more traditional solid food, using the ConEx assay, and find that flies in the hedonic environment consume twice as much sucrose volume compared to flies in the control environment. These will be added as supplemental data (Figure 1 – Figure Supplement 1A), and the text will be updated to reflect our findings.

      2) Some of the statistical analyses were presented in a way that may make understanding the data unnecessarily difficult for readers. Examples include:

      a) In Table I the authors present food interaction classifications based on direct observation. These are helpful. However, the classification system is updated or incompletely used as the manuscript progresses, most importantly changing from four categories with seven total subcategories to three categories and no subcategories. In subsequent data analyses, only one or two of these categories are assessed. It would be helpful, especially when moving from direct observation to automated categorization, to quantify the exact correspondences between all of the prior and new classifications, as well as elaborate on the types of data that are being excluded.

      We appreciate the feedback on our usage of the behavioral classification system and will make several adjustments to improve it. We will rename some of the behaviors to make them more intuitive (see Reviewer #2, comment #1), and update the main text and Table 1 to reflect these changes. We will update the text and figures to be more transparent about when we group subcategories into main categories for quantification and when we quantify all subcategories separately. Because these videos required manual scoring by an experimenter, after our initial characterizations we opted to score only main categories (which contain subcategories). We agree that it would be useful to quantify correspondence between subcategories and the automated FLIC signal. However, we believe this task is better suited for more advanced and automated video tracking software, and, incidentally, more sophisticated analysis of FLIC data, which has a very high-dimensional character that has yet to be properly exploited. At the moment, therefore, we are not confident in the ability to understand the data at the desired resolution.

      b) The authors switch between a variety of biological and physiological conditions with varying assays, which makes following the train of reasoning nearly impossible to follow. For example, the authors introduce us to circadian aspects of feeding behavior to introduce the concept of 'meal' and 'non-meal' periods of the day. It is then not clear in which of the subsequent experiments this paradigm is used to measure food interactions. Is it the majority of the subsequent figure panels? However, the authors also use starved flies for some assays, which would be incompatible with circadian-locked meals. The somewhat random and incompletely reported use of males and females, which the authors show behave differently, also makes the results more difficult to parse. Finally, the authors are comparing within-fly for the 'control environment' and between flies for their 'hedonic environment' (Figure 3A and subsequent panels), which I believe is not a good thing to do.

      We apologize for our difficulties conveying our inference, which was also noted by Reviewer #2.  We will work hard to improve this component in the revision. With respect to the confusion about circadian feeding, we introduced circadian meal-times to complement starvation as a second (perhaps more natural) way to measure behaviors associated with hunger. Importantly, we do not use circadian meal-times beyond Figure 1; all subsequent FLIC experiments were conducted during non-meal times of day for 6 hours, which avoids confounding our data with circadian-locked meals even when we use starved flies. We will clarify this point in the revision.

      The reviewer also points out that we make both within-fly and between-fly comparisons, which is a point that we note. Perhaps some concern arises, again, from the challenges that we faced in properly delineating our inferences about different types of feeding measures (and motivations). Inference about homeostatic feeding was made using within-fly measures, comparing events on sucrose vs. those on yeast. Inference about hedonic feeding was made using between fly measures (average durations of different flies on 2% vs. 20% sucrose). Treatment comparisons to control always used measures of the same type, such that inference was not made using between-fly measures for treatment and within-fly for control (i.e., all of our figure panels were either within-fly or between fly). We will clarify this in the revision.

      Importantly, our approach to all experiments avoided confounding by used randomized design at multiple levels (e.g., randomizing control and hedonic environments to FLIC DFMs, alternating food choice sidedness in the DFMs), by ensuring that flies in both environments are sibling flies that came from the same vial environment before being tested, and by performing each experiment multiple times.

      c) Statistical analyses are not always used consistently. For example, in Figures 3B and C, post hoc test results are shown for sucrose vs. yeast interactions, but no such statistics are given for 3E and 3F, preventing readers from assessing if the assay design is measuring what the authors tell us it is measuring.

      We report p-values for two-way ANOVA interaction terms for all appropriate experiments. If (and only if) the interaction term is significant, we conduct post-hoc tests for more detailed statistical analysis and report the p-values. The reviewer points out that we do not perform post-hoc tests in figures 3E and 3F. These figures had a non-significant interaction term, and thus, we did not feel a post-hoc test was warranted.

      Reviewer #2 (Public Review):

      1) The dissection of feeding into distinct behavioral elements and its correlation with electrical FLIC signals that allow interpreting feeding types is a fundamental new method to dissect feeding in flies. However, the categories of micro-behaviors in Table 1 are not intuitive.

      We agree and will update the Table, figures, and main text. Please see also our response to Reviewer #1, comment #1.

      2) The details for the behavioral data analysis are not clear and should be made more obvious. For example, how many males and females were used in each experiment? Were any of the females mated or were they all virgins? If all virgins, why not use mated females? Mating status may have an effect on the feeding drive. If mated and virgin females were used, are there any differences between them? Similarly, for diurnal feeding experiments, it is not immediately clear from the graphs how many animals were used and how the frequencies were obtained (Fig. 1F, presumably averages for each category per fly but that is inconsistent with the legend in the supplement for this figure). Why does the transition heat map not include all micro-behaviors (Fig. 1E, no LQ data which are significant in diurnal feeding)?

      We will clarify the number of flies and events for each behavioral experiment in Figure 1, and we will update the figure legend appropriately. We note that these behavioral datasets are non-overlapping, and each time we mention the number of events scored in the text, that number includes only “new” videos. Female and male flies for all experiments were mated, and we will clarify this in the main text and methods.

      For the diurnal experiment in Figure 1F, we scored over 700 events from new (non-overlapping) video compilations and updated the number of flies and event number in the figure legend. The diurnal data we present in the supplement for this figure is a separate experiment conducted on 38 flies, intended only to demonstrate the circadian nature of fly feeding.

      For the transition heat map, analysis of this sort seems to require a large amount of data to have sufficient power to return a transition matrix. LQ events are relatively low in frequency, so we opted to combine them with L events for this analysis. We have updated the figure and figure legend to reflect this.

      3) The CaMPARI images do not look great, particularly in the pan-neuronal condition (Fig. 5A). It would be useful to include the movie of the stack. Did any other brain regions show activity differences, such as SEZ or PI? These regions are known to be involved in feeding so it seems surprising they show no effect.

      We find that CaMPARI imaging is subject to high levels of noise and background, especially when using a broad driver as the reviewer has pointed out. This is why we opted to follow-up our pan-neuronal CaMPARI experiment using a more specific mushroom body driver and to test our correlational findings of increased MB activity in hedonic environments with genetic approaches in the remainder of Figure 5. We will include movies of the confocal stacks for both CaMPARI experiments, as requested.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes the accrual of RSV mutations in a severely immunocompromised child with persistent infection and demonstrates that ribavirin increases the observed mutation rate with base pair changes (C to U and G to A) compatible with its known mechanism. The paper utilizes a mathematical model to explain the counterintuitive finding that viral load does not decrease despite loss of viral fitness and clinical improvement. Positive selection is observed but does not keep pace with deleterious mutations induced by ribavirin. Overall, though the data is restricted and limited to a single person, the analysis is rigorous and supports the paper's interesting conclusions.

      The paper is fascinating, but its generalizability is somewhat limited by the single study participant. Nevertheless, comparisons of therapy-induced deleterious mutations versus adaptive mutations over time is potentially important for multiple viruses.

      We thank the reviewer for their comments. Although we acknowledge that this is only a single case of infection, we believe that it is an interesting case, and we are keen to share our findings with the broader scientific community.

      Reviewer #2 (Public Review):

      In this work, Illingworth et al. investigate the effectiveness of ribavirin and favipiravir on the treatment of a paediatric patient with chronic RSV. These drugs cause mutations and the authors tested whether they could observe this effect through deep sequencing viruses from nasal aspirates over the course of treatment. They found an increase in mutations caused by ribavirin but favipiravir appeared to have no additional mutagenic effect. Despite the lack of change in viral load, the authors suggest that the ribavirin reduced viral fitness and did not lead to adaptive escape mutations. The authors modelled how generation time and fitness interacted with mutational load. They also estimated fitness for different haplotypes generated from the mutational data.

      Strengths of the paper:

      Using mutagenic drugs to treat viruses is generally accepted but results have been mixed with severe viral infections and specific evidence of the precise effects of the drugs is often lacking. This paper is especially valuable for demonstrating that despite in vitro evidence that favipiravir had some effect against RSV, there was no evidence for favipiravir having an effect in a patient. This differs from the authors previous work showing a clear clinical benefit to favipiravir in treating influenza. This paper also appears to be the first to sequence RSV from a patient having been exposed to ribavirin which is important for demonstrating that the drug is having a measurable effect.

      Weaknesses in the paper:

      I think there is a conceptual problem with the paper. Ribavirin is supposed to increase the mutational rate of the virus which would increase the mutational load. Mutational load has been calculated by summing up the frequencies of minor alleles. However, if a particular mutation rises in frequency, it does not mean that ribavirin has caused additional mutations at the same site but rather viruses containing the mutation have risen in frequency. If a subpopulation containing mutations rises through drift or selection to a relatively high percentage that will bias the mutational load. The authors provide ~75 mutations which were at significant percentages across multiple different timepoints. It seems that these mutations contribute significantly to the mutational load but changes in mutation percentages between samples do not reflect changes in mutational events but changes in viral haplotypes/subpopulations. In a previous study Lumby et al. 2020, the authors removed mutations at >5% from their analysis but there is no indication that they performed this step similarly here. Summing many small changes will give an indication of background mutational rate (though counting only a single mutation at each locus is perhaps the only method to remove the effect of viral clonal expansion).

      The mutational load is defined as the mean number of mutations per virus with respect to the consensus, equal to the sum of minor allele frequencies across the genome. We filter variant frequencies prior to calculating mutational load to compensate for noise arising from genome sequencing.

      We use a deterministic model of mutation-selection balance to describe the overall dynamics of mutational load, but are conscious that the dynamics of individual variants are complex. Genetic drift could contribute to these dynamics, as might hidden structure in the viral population, with stochastic observations of viruses from distinct subpopulations. As we make clear, our key assumption regarding mutational load is that all variants from the consensus are at least mildly deleterious; under this assumption calculating the sum of allele frequencies is an appropriate measurement of mutational load. Our model accounts for the possible presence of variants under stronger and weaker selection being observed at lower and higher frequencies respectively.

      We note that, in a case where distinct variants occurred in subpopulations, these variants would be observed in a mixture at lower frequencies than they existed in the subpopulations. This would lead to the observation of more variants overall, with each variant being at a reduced frequency. While stochastic effects would alter the frequencies of mutations in individual samples, if mutational load acted equally on each subpopulation, the total mutational load would be preserved across samples. The existence of subpopulations would not of itself invalidate the calculation of mutational load as we have performed it.

      Our previous study Lumby et al, 2020 considered a case where favipiravir was given for a short period of time in a case of influenza B infection. In that case we did not make an assessment of the total mutational load in a population, although we did remove mutations at >5% when considering the spectrum of mutations i.e. the proportion of mutations of each type C to T, G to A, etc. We are still working on different approaches to measuring mutational load, but we are not convinced that removing high frequency mutations is always a good idea when evaluating the total mutational load. Cutting out higher frequencies is potentially a useful means to study mutational spectra under viral mutagenesis, but in a measurement of mutational load it could exclude deleterious mutations.

      While ribavirin appears to have shown an effect, many questions remain. Why does the mutational load only increase for 3 points before plateauing? The authors would likely argue that this is the new saturation point for mutation load but they don't test it. Sequencing points from after the cessation of treatment would be expected to show lower mutational load but this data was not collected. Furthermore, questions remain over the methodology. It is thought that Ribavirin should only increase transitions and a transition/transversion ratio for the different samples would have been helpful. The absolute numbers of many mutation classes appear to have increased including transversions e.g AU. There isn't a good reason why nucleoside analogues should have caused this effect and perhaps it is an artefact.

      Ribavirin has been shown to increase C to T and G to A mutations; these are both transitions, but T to C and A to G mutations are also transitions; the proportion of these was found to decrease under treatment. We have included a new figure showing Ts/Tv ratios but we do not find a significant pattern of change in these statistics over time.

      The plateauing of the observed mutational load is consistent with the theory of mutationselection balance. Following a change in the mutation rate we would expect a shift to a new equilibrium U/s.

      Sequencing was conducted as part of an investigation that was secondary to treatment of the patient: All of the samples that were collected were sequenced. We agree that upon the cessation of mutagenic drugs we would expect to see a fall in mutational load.

      I don't think that the authors can reasonably determine how many haplotypes there are in the population from short read sequencing data. I think that the sequencing data very clearly shows subpopulations due to the large changes in mutation frequencies between different time points. The authors say that their analysis assumes a well-mixed population which is clearly not the case. Therefore, determining fitness of different haplotypes or mutations is likely not accurate.

      Although we have short read sequencing data, some of the reads we have span more than one locus, providing some information about linkage between variants. As noted in the Methods section our inference approach provides a minimal reconstruction of haplotypes: Our reconstruction details the smallest set of distinct haplotypes necessary to explain the data.

      Where we use a haplotype-based model to reconstruct the within-host evolution of the population, we neglect the potential presence of subpopulations by assuming a well-mixed population, then fully discuss the implications of this assumption for our result.

      Our basic question is whether within-host adaptation leads to a gain in viral fitness in excess of the loss of fitness imposed by an increase in mutational load. In this comparison we make a conservative (i.e. low) estimate for the extent of the loss of fitness through mutational load.

      When we look at within-host evolution our assumption of a well-mixed population attributes all of the systematic change in the viral population to the effects of selection. If some of this change arises through stochastic differences in emissions from a structured population, the influence of selection would be less than our inference. Thus, our estimate of the gain in fitness through within-host adaptation is a high estimate. As our high estimate of within-host fitness gain is less than a low estimate of the fitness lost through mutational load, our result is robust to our assumption.

      The authors construct a model to estimate viral fitness and suggest that viral fitness decreased with the drug. This is somewhat problematic to me as viral load has not changed so it would be reasonable to say that viral fitness was likely unaffected by the drug. The authors define fitness in terms of the number of mutations that each virus likely has and assumes that these mutations are deleterious. The authors then use this to claim that mutagenic drugs reduce fitness. This seems very circular to me. If the drugs reduce fitness, it should be observed as a property of the virus population. As the only measure was viral load, which didn't change, it is difficult to claim ribavirin reduced viral fitness. There are other reasons why there could be an increase in the number of mutations e.g. sequencing more subpopulations which would have nothing to do with fitness.

      We have discussed our assumption that variants in the viral population are deleterious; this lies behind the use of a model of mutation-selection balance. Under this assumption, the accumulation of a greater number of mutations following ribavirin treatment is indicative of a loss of viral fitness, although we cannot precisely quantify the magnitude of this loss. The link between an increased mutation rate and lower viral fitness is intrinsic to the concept of mutagenic drugs; our data show an increase in mutational load coincident with the therapeutic use of ribavirin.

      A change in viral fitness does not necessarily lead to a substantial and clearly observable drop in viral load; we say more about this in the response to comments below.

      At various points, the paper assumes that there is no selection taking place but immunoglobulin was being applied weekly and palivizumab monthly. The timing of when these drugs were given should be included. How did the palivizumab affect selection? The K272E mutation seems to go up and down but it is not clear if this was in response to drug infusion timing or if this mutation was present in a subpopulation.

      Our approach assumes that selection could act at two distinct levels: Firstly, we assume that the observed increase in mutational load correlates to a reduction in viral fitness; the link between viral fitness and mutational load is intrinsic to the equation of Haldane. Secondly we use a haplotype-based model to infer how selection is acting on the level of higherfrequency mutations; under the assumption of a well-mixed model we identify a signal of within-host adaptation.

      We have added details of the timing of palivizumab treatment to Figure 1. Immunoglobulin was given throughout; details of treatment have been given in Supporting Data. As we have now clarified in the Methods, our identification of potentially selected alleles was a two stage process, with the first assessing the level of noise in the data. Our model of noise envisages nonuniformity arising from multiple sources, including a situation whereby the viral population was divided in subpopulations, and in which reads comprised stochastic samples from these subpopulations. Given our model for noise, the observation of the K272E mutation at generally higher frequencies in earlier samples and generally lower frequencies in later samples was sufficient to call this as a potentially selected variant. We did not explore more complex models of drug-dependent selection.

      I think the main impact of the paper will be that favipiravir will not be used in the future to treat RSV. Given that the EC50 of favipiravir against RSC is ~100x that of influenza, favipiravir was unlikely to reach a therapeutic level in the patient. Nucleoside analogues have a mixed record at treating serious viral infections. Hopefully, this work will spur on future studies to precisely measure the effect that ribavirin has on RSV.

      Favipiravir was used in this patient following its successful experimental use against a case of influenza B infection (Lumby et al., 2020). We would be happy if our work inspires future research in this area.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript explores how biliary epithelial cells respond to excess dietary lipids, an important area of research given the increasing prevalence of NAFLD. The authors utilize in vivo models complemented with cultured organoid systems. Interesting, E2F transcription factors appear important for BEC glycolytic activation and proliferation.

      We thank this reviewer for his/her comments and for finding the E2F-mediated mechanism of interest.

      Much of the work utilizes the BEC-organoid model, which is complicated by the fact that liver cell organoid models often fail to maintain exclusive cell identity in culture. The method used by the authors (Broutier et al., 2016) can lead to organoids with a mixture of ductal and hepatocyte markers. It would be helpful for the authors to further demonstrate the cholangiocyte identity of the organoid cells.

      We understand the concern of this reviewer. Indeed, this method can give rise to biliary cells or more hepatocyte-like cells. However, this choice depends on the culture media used. Our experiments used BEC-organoids in an undifferentiated state with a biliary expression profile. Please see point 1 above for a detailed answer.

      The authors suggest that BECs form lipid droplets in vivo by detecting BODIPY immunofluorescence of liver cryosections. While confocal microscopy would ensure that the BODIPY fluorescence signal is within the same plane as the cell of interest, the authors use a virtual slide microscope that cannot exclude fluorescence from a different focal plane. The conclusion that BECs accumulate lipids does not seem to be fully supported by this analysis.

      We fully agree with this criticism. To address this concern, we decided to use FACS analysis, a quantitative and independent method, to further confirm our initial findings. To this end, we stained sorted EPCAM+ BECs isolated from livers of CD- or HFD-fed mice with BODIPY, quantified the number of BODIPY+/EPCAM+ BECs in each experimental condition, and confirmed that these cells accumulate more lipids after HFD feeding (New Figure 1I, page 5, lines 112-115, and see also reply rebuttal to point 4).

      Several mouse experiments rely heavily on rare BEC proliferation events with the median proliferation event per bile duct being 0-1 cell. While the proliferative effect appears consistent across experiments, a more quantitative approach, such as performing Epcam+ BEC FACS and flow cytometry-based cell cycle analyses, would be helpful.

      Following this suggestion, we quantified proliferative EdU+ BEC cells by FACS in a new cohort of C57BL/6J mice fed CD or HFD. These data, now included in the revised manuscript (New Figure 2G, page 7, lines 143-147), strongly confirm that immunofluorescence quantification mirrors the FACS quantification and reinforce the initial finding that EPCAM+ BECs proliferate more in the livers of HFD-fed mice. Please see point 6 above for a detailed answer.

      Finally, it is not yet clear how relevant the findings in this study are to ductular reaction, which is a non-specific histopathologic indicator of liver injury in the context of severe liver disease. In NAFLD, the ductular reaction is uncommon in benign steatosis, and if seen at all, occurs in the setting of substantial liver inflammation and fibrosis (Gadd et al., Hepatology 2014). The authors use a dietary model containing 60 kcal% fat, which causes adipose lipid accumulation as well as subsequent liver lipid accumulation. This diet does not cause overt inflammation or fibrosis that would represent experimental NASH, which typically requires the addition of cholesterol in dietary lipid NASH models (Farrell et al., Hepatology, 2019). While the E2F-driven proliferation may be important for physiologic bile duct function in the setting of obesity, the claim that E2Fs mediate DR initiation would require an additional pathophysiologic model or human data to demonstrate relevance. The authors could clarify this point in their discussion.

      We agree with this reviewer that 15 weeks of HFD on C57BL/6J feeding are insufficient to trigger a ductular reaction. For this purpose, we used the term “BEC activation” in our manuscript, which refers to the first mandatory step for the ductular reaction to initiate. We apologize if our initial manuscript did not sufficiently emphasize this point. However, as suggested by the reviewer we investigated the ductular reaction in our model. In order to further characterize the livers after 15 weeks of CD or HFD feeding, we stained the bile ducts for pancytokeratin (PANCK) and osteopontin (OPN) and asked a pathologist (Dr. Christine Gopfert at EPFL) to evaluate these sections with a particular focus on the bile ducts. She concluded that the livers of HFD-fed mice showed steatosis and inflammation but no apparent fibrosis (New Figure 1 – figure supplement 1E). The shape of bile ducts was similar in the livers of CD- and HFD-fed mice (New Figure 1 – figure supplement 1I), concomitant with the absence of portal fibrosis and inflammation. In addition, we checked the expression levels of several established markers of ductular reaction in our RNA sequencing data and observed that, of all these genes, only Ncam1 was significantly upregulated with HFD feeding in EPCAM+-BEC cells (New Figure 2 – figure supplements 1D and 1E, Page 6, lines 127-131). Overall, these data support our conclusion that HFD triggers BEC activation without signs of an established ductular reaction and might suggest Ncam1 as a marker for this initial BEC activation process. Please see point 3 above for a detailed answer.

      Reviewer #2 (Public Review):

      The manuscript by Yildiz et al investigates the early response of BECs to high fatty acid treatment. To achieve this, they employ organoids derived from primary isolated BECs and treat them with a FA mix followed by viability studies and analysis of selected lipid metabolism genes, which are upregulated indicating an adjustment to lipid overload. Both organoids with lipid overload and BECs in mice exposed to a HFD show increased BEC proliferation, indicating BEC activation as seen in DR. Applying bulk RNA-sequencing analysis to sorted BECs from HFD mice identified four E2F transcription factors and target genes as upregulated. Functional analysis of knock-out mice showed a clear requirement for E2F1 in mediating HFD induced BEC proliferation. Given the known function of E2Fs the authors performed cell respiration and transcriptome analysis of organoids challenged with FA treatment and found a shift of BECs towards a glycolytic metabolism. The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures.

      We appreciate that this reviewer finds our study well-constructed, clear, and with high-quality figures.

      My major point is the lack of classification of the progression of DR, since the authors investigate the early stages of DR associated with lipid overload reminiscent of stages preceding late NAFLD fibrosis. How are early stages distinguished from later stages in this study? Molecularly and/or morphologically? While the presented data are very suggestive, a more substantial description would support the findings and resulting claims.

      We thank the reviewer for the suggestion. We would like to emphasize that instead of ductular reaction, we used the term “BEC activation” in our revised manuscript, referring to the first mandatory step for initiating the ductular reaction. Both reviewers criticized the poor characterization of the ductular reaction process in the first version of our study; we put substantial effort into further clarifying this point. Our response to this point can be read in our reply to the last comment of reviewer 1 and point 3 of the rebuttal.

    1. Author Response

      Reviewer #1 (Public Review):

      IRF8 is a key transcription factor in the differentiation of hematopoietic cell lineages including dendritic cell (DC) and monocyte/macrophage lineages. The promoter and enhancer regions of Irf8 have been a focus of intense research in recent times. In the submitted study Xu H. et. Al., have first time reported a lncRNA transcribed specifically in the pDC subtype from +32Kb which is also the region for the enhancer for Irf8 specifically in the cDC1 subtype. Authors have employed modern-day tools for an in-depth understanding of the role of lncIrf8, its promoter region, and crosstalk with Irf8 promoter to identify that it is not the lncIRF8 itself but its promoter region is crucial for pDC and cDC1 differentiation conferring feedback inhibition of Irf8 transcription. In the attempt to decipher the crosstalk between the promoter regions of IRF8 and lncIRF8 by employing various in vitro artificial systems, the study falls short of identifying the real significance of the lncIRF8 which is specifically expressed in pDC subtype.

      We appreciate the public review made by the reviewer. We agree with the reviewer that most of the experiments on the identification of the negative feedback regulation of IRF8 via the lncIRF8 promoter element were carried out in vitro. But we would like to point out also our in vivo work: (i) transplantation lncIRF8 promoter KO cells into mice demonstrates that pDC and cDC1 development were compromised (Figure 3); (ii) lncIRF8 is expressed in in vivo BM and spleen pDC (new Figure 1-figure supplement 3). We also would like to emphasize that (i) in vivo studies on the identification of the negative feedback regulation of IRF8 via the lncIRF8 promoter element and (ii) mechanistic studies with CRISPR activation and CRISPR interference would have been difficult to perform in vivo with current tools available in mice.

      According to our current understand lncIRF8 act as an indicator of +32 kb enhancer activity and we agree with the reviewer that further potential functions of lncIRF8 still need to be explored. We added a sentence on page 13, lines 427 and 428 on potential additional functions of lncIRF8:

      "However, lncIRF8 might have additional functions in DC biology, which are not revealed in the current study and remain to be identified."

      Reviewer #2 (Public Review):

      The manuscript of Xu and colleagues examines in detail the regulation of the important transcription factor IRF8 in dendritic cell (DC) subsets. They identify a long noncoding RNA arises from the +32kb enhancer of IRF8 specifically in plasmacytoid DCs (pDCs)and show clearly that this lncIRF8 marks the activity of a region of this enhancer but the RNA itself does not appear to have any function. Deletion of the promoter of the lncIRF8 ablated cDC1 and pDC differentiation using an in vitro cell differentiation model. The authors propose an innovative model that the lncIRF8 promoter sequences act to limit IRF8 expression in cDC1, but are inactive in pDCs, resulting in their characteristically very high IRF8 expression.

      This is a conceptually interesting study that makes excellent use of an extensive set of genomic data for the DC subsets. There has been a lot of recent research investigating the regulation of the IRF8 gene in hematopoiesis and this study provides an important new aspect to the work. The use of an in vitro model of DC differentiation is a powerful practical approach to investigating IRF8 regulation, as is the innovative use of CRISPR technology. Perhaps the biggest limitation of this study is that the authors have not conformed to the in-cell system data by creating a mouse strain lacking the lncIRF8 element. Such approaches by others, most notably the Murphy lab, have been instrumental in pushing this field forward. Nevertheless, Xu et al. significantly add to our current knowledge of the regulation of IRF8, a critical step in forming the dendritic cell network.

      We appreciate the public review made by the reviewer and the positive assessment of our work. We agree with the review that extending our in-cell system data to lncIRF8 promoter KO mice will further strengthen our data and this will be subject of our future work.

    1. Author Response:

      We thank the reviewers and editor for their feedback, which we will carefully consider as we revise the manuscript. We aim to provide more detail on how this technique could be used with other probes, ideally showing experimental data to support this use. We will add further detail of the histology from our ex vivo ovine and porcine and in vivo porcine testing. We will also provide a more thorough comparison of our technique to other recently developed lesioning techniques. In order to provide more complete evidence that our technique perturbs local neuron populations, we will refine the action potential analysis presented before and after lesions in non-human primates. In addition to providing further clarity of the method, we will include more non-human primate data where possible.

    1. Author Response:

      We are very glad that the reviewers found our paper of broad interest to the community of population, evolutionary, and ecological genetics. We thank them for their positive feedback and insightful comments and suggestions. We are preparing a revision of the preprint that will address these points. 

      One issue raised by the reviewers was that it is important to acknowledge possible limitations of the demographic model used in simulation in capturing different aspects of genomic variation. In particular, different demographic models inferred for the same species using different methods or sets of samples may have different strengths and weaknesses, and this should be considered when selecting a demographic model for simulation. This is an important point that we intend to discuss in the revised version of our manuscript. We also plan to expand the documentation of the stdpopsim catalog to include more information about  the type of data used to fit every demographic model. Below we provide an outline of our thoughts on the topic.

      First of all, it is important to acknowledge that demographic models inferred from genomic data cannot fully capture all aspects of the true demographic changes in the history of a species. As a result, these models do a good job in capturing some aspects of genetic variation, but not all of them. This is primarily determined by two factors: the method used for demographic inference, and the samples whose genomes were used in inference. Regardless of the method applied, the inferred demographic model can only reflect the genealogical ancestry of the sampled individuals, and this will typically make up a small portion of the complete genealogical ancestry of the species (albeit the genealogy of any set of sampled individuals includes many ancestors). Thus, demographic models inferred from larger sets of samples from diverse ancestry backgrounds may provide a more comprehensive depiction of genetic variation within a species, as long as a sufficiently realistic demographic model can be fit. That said, the choice of samples used for inference will mostly influence recent changes in genetic variation. This is because the genealogy of even a single individual consists of numerous ancestors in each generation in the deep past (which is the premise behind PSMC-style inference methods).

      The computational method used for inference also affects the way genetic variation is reflected by the demographic model, because different methods derive their inference from different features of genomic variation. Some methods make use of the site frequency spectrum at unlinked single sites (e.g., dadi, Stairway plot), while other methods use haplotype structure (e.g., PSMC, MSMC, IBDNe). This, in turn, may influence the accuracy of different features in the inferred demography. For example, very recent demographic changes, such as recent admixture or bottlenecks, are difficult to infer from the site frequency spectrum, but are more easily inferred by examining shared long haplotypes (as demonstrated by the demographic model inferred for Bos Taurus by MacLeod et al. (2013)). There have been several studies that compare different approaches to demography inference (e.g., Biechman et al. (2017); Harris and Nielsen (2013)), but unfortunately, there is currently no succinct handbook that describes the relative strengths and weaknesses of different methods. Indeed, we hope that the standardized simulations provided by stdpopsim will facilitate systematic comparisons between methods, which will, in turn, provide valuable insights for researchers when selecting demographic models for simulation.

      It is important to note that inclusion of a demographic model in the stdpopsim catalog does not involve any judgment as to which aspects of genetic variation it captures. Any model that is a faithful implementation of a published model inferred from genomic data can be added to the stdpopsim catalog. Thus, potential users of stdpopsim should use the implemented models with the appropriate caution, keeping in mind the limitations discussed above. Scientists contributing a new model to the catalog are required to write a brief summary, which is added to the documentation page of the catalog: https://popsim-consortium.github.io/stdpopsim-docs/ latest/catalog.html. This summary includes a graphical description of the model (such as the one shown for Anopheles gambiae in Fig. 2B of the paper), as well as a description of the data and method used for inference. We will mention this in the revised manuscript to help users of stdpopsim navigate through this resource.

    1. Author Response:

      First of all, we would like to thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article.

      The main criticism on our manuscript, from both reviewers, is that the cryo-EM structures are of low resolution and that the fit of the crystallographic structures of the PAD and the stalk domain into these low-resolution structures is questionable. We would like to point out that the cryo-EM data, and the conclusions from it, are not essential for the main conclusions of the article. All mutants that we made in this study were designed based on the structural data obtained from the high-resolution X-ray structures, with no input from the low-resolution cryo-EM docked models. We chose to include the cryo-EM data since it allowed us to speculate about the interaction between the PAD and the stalk domain of PrgB, domains that we have separately determined the structures of via X-ray crystallography. We agree with the reviewers that further experiments are needed to verify this potential interaction. Therefore, we will perform additional biochemical assays to investigate the proposed interaction. We will also try to optimize the cryo-EM data to hopefully allow for a more reliable fit of our high-resolution crystallographic structures. Once that is done, we will submit a revised version of the manuscript.

      On behalf of all authors,

      Ronnie Berntsson

    1. Author Response:

      We’d like to thank the three reviewers for reviewing our work in depth and providing insightful comments and suggestions.

      Reviewer 1

      1. The in vivo efficacy of MS023 does not seem to be very great. The mice treated with MS023 display a very small reduction in ADMA levels and a small increase in SDMA levels (Fig S6A).

      REPLY: We have quantified proteins with ADMA and SDMA by Western blotting tail clippings from mice treated with vehicle (n=6) and MS023 (n=6). These were normalized for equal loading to b-actin levels. The average ADMA relative expression was 0.92 for vehicle treated mice and 0.86 for MS023 treated mice (p < 0.044). The average SDMA relative expression was 0.89 for vehicle treated mice and 0.98 for MS023 treated mice (p < 0.000019). These whole-body measurements show MS023 promotes the decrease of proteins with ADMA and increasing proteins with SDMA, as observed before with inhibition of PRMT1 (Dhar et al, 2013).

      Reviewer 2

      1. Two weaknesses are noted which lie in overstatements of the findings. There are six type I PRMTs (PRMT1, 2, 3, 6, 8, and CARM1), all of which are inhibited by MS023. While the authors demonstrate that their observations are not due to the inhibition of CARM1, they do not demonstrate that it is due to the inhibition of PRMT1, as they suggest. 

      REPLY: MS023 has been shown to have in vitro activity for several type I enzymes (Eram et al, 2016) and the same goes for GSK3368712 (Fedoriw et al, 2019). MS023 IC50 in vitro 30nM PRMT1, 119 nM PRMT3, 83 nM CARM1, 4 nM PRMT6, and 5 nM PRMT8 (Eram et al., 2016).  It was documented early that PRMT1 is the major cellular type I enzyme (Pawlak et al, 2000) and this is why PRMT1 and PRMT5, major type II, are embryonic lethal in mice (Guccione & Richard, 2019). In vivo data using MS023 is paralleled by using siPRMT1 (Gao et al, 2019; Plotnikov et al, 2020; Wu et al, 2022; Zhu et al, 2019). Thus in vivo, MS023 targets the main type I PRMT, PRMT1. Further, in support of our claim that MS023 targets PRMT1 in MuSCs is our previous observation that deleting PRMT1 stimulates MuSC proliferation. Since this effect was irreversible (Blanc et al, 2016) we pursued studies with the reversible MS023, the only compound to have significant activity towards PRMT1 in vivo. For these reasons, we are convinced that the effect of MS023 is mainly mediated by inhibiting PRMT1 in the MuSC.

      To be thorough we should test all other type I PRMT inhibitors as they become available. CARM1 was shown to be a player in MuSC (Kawabe et al, 2012), but we excluded it using a CARM1 inhibitor TP-064 (Nakayama et al, 2018). PRMT6 mice that we generated are perfectly viable without overt phenotypes, suggesting PRMT6 is not involved (Neault et al, 2012), and PRMT8 is brain specific (Taneda et al, 2007).

      2. Furthermore, this study suggests that the switch and elevated cellular metabolism in muscle stem cells due to MS023 enhanced self-renewal and engraftment capabilities but does not demonstrate this fact directly as stated. 

      REPLY: Agreed. The link between cellular metabolism and MS023 enhanced self-renewal and engraftment capabilities is correlative and we will edit the revised text to reflect this.

      Reviewer 3

      1. However, the proposed underlying mechanism, which is claimed to rely on the expansion of MuSC and 'reprograming' of MuSCs towards a "unique and previously uncharacterized identity" is not sufficiently supported. The extent of the description of scRNA-seq data is inappropriate. Some conclusions from the scRNA-seq data appear to be overinterpreted or are rather trivial.

      REPLY: We presented the top marker genes for each subpopulation that was identified in our scRNAseq to aid the reader in establishing a broad view of whether a given subpopulation was quiescent-like, proliferating, or differentiating. M1-M5 clusters were all enriched for cell cycle markers (Mki67, Cdk1, etc), indicating a proliferative identity. The unique finding in our data is that treatment with MS023 resulted in a shift in identity as compared to the DMSO-treated proliferating MuSCs (M1, M2 and M4), creating transcriptionally distinct M3 and M5 clusters. M3 and M5 had elevated markers for metabolism (E.g. Eno1, Atp5k, etc) and early activation (E.g. Fos, Jun), while the untreated MuSCs in clusters M1, M2 and M4 did not. Furthermore, M3 and M5 had higher baseline levels of Pax7 expression when compared to untreated cells. Together, these findings describe a transitional subpopulation of MuSCs unique to MS023 treatment which not only harbour stem like/early activation markers Pax7, Fos and Jun, but also elevated proliferative markers related to cell cycle and energy metabolism. This particular combination of characteristics is unique to the MS023-treated MuSCs, thus identifying a novel subtype of MuSC identity. In accordance with our scRNAseq data, we validated experimentally that MS023-treated cells have higher energy metabolism and increased self-renewal potential, thereby confirming that the unique transcriptomic signature of these cells also lead to a different cell fate decision.

      2. It remains completely unclear whether the MS023-stimulated increase of metabolic pathway activity (OXPHOS, glycolysis) plays any role for preserving stem cell properties of MuSC during expansion and improves engraftment. Additional functional and mechanistic studies are required to explore the underlying molecular processes.

      REPLY: Agreed. The link between cellular metabolism and MS023 enhanced self-renewal and engraftment capabilities is correlative and we will edit the revised text to reflect this.

      3. Furthermore, it remains completely unclear whether the acclaimed increase in grip and tetanic strength of mdx mice after MS023 treatment relies on enhanced expansion of MuSC mediated by PRMT1 inhibition. 

      REPLY: Agreed. We cannot exclude if the effect is mediated by an expansion of the MuSC pool or by an effect on other cell types, such as a direct impact on the myofibers. The goal of this figure was to provide a therapeutic perspective for the use of type I PRMT inhibitor for the treatment of DMD. Muscle wasting/weakness in DMD is a complex and multifactorial process (e.g., myofiber fragility, MuSC defects, chronic inflammation, fibrofatty accumulation). If MS023 can target multiple aspects of the physiopathology of the disease it would increase its therapeutic applicability. Further studies will be needed to determine the exact mechanism by which MS023 mediate its beneficial effect. The manuscript will be modified to reflect this.

      References

      • Blanc RS, Vogel G, Li X, Yu Z, Li S, Richard S (2016) Arginine methylation by PRMT1 regulates muscle stem cell fate. Mol Cell Biol 37: e00457-00416

      • Dhar S, Vemulapalli  V, Patananan AN, Huang GL, Di Lorenzo A, Richard S, Comb MJ, Guo A, Clarke SG, Bedford MT (2013) Loss of the major Type I arginine methyltransferase PRMT1 causes substrate scavenging by other PRMTs. Scientific reports 3: 1311

      • Eram MS, Shen Y, Szewczyk M, Wu H, Senisterra G, Li F, Butler KV, Kaniskan HU, Speed BA, Dela Sena C et al (2016) A Potent, Selective, and Cell-Active Inhibitor of Human Type I Protein Arginine Methyltransferases. ACS Chem Biol 11: 772-781

      • Fedoriw A, Rajapurkar SR, Brien SO, Gerhart SV, Lorna H, Pappalardi B, Shah N, Laraio J, Liu Y, Butticello M et al (2019) Anti-tumor activity of the first-in-class type I PRMT inhibitor, GSK3368715, synergizes with PRMT5 inhibition through MTAP loss. Cancer cell XX: XX

      • Gao G, Zhang L, Villarreal OD, He W, Su D, Bedford E, Moh P, Shen J, Shi X, Bedford MT et al (2019) PRMT1 loss sensitizes cells to PRMT5 inhibition. Nucleic acids research 47: 5038-5048

      • Guccione E, Richard S (2019) The regulation, functions and clinical relevance of arginine methylation. Nat Rev Mol Cell Biol 20: 642-657

      • Kawabe Y, Wang YX, McKinnell IW, Bedford MT, Rudnicki MA (2012) Carm1 regulates Pax7 transcriptional activity through MLL1/2 recruitment during asymmetric satellite stem cell divisions. Cell Stem Cell 11: 333-345

      • Nakayama K, Szewczyk MM, Dela Sena C, Wu H, Dong A, al. e (2018) TP-064, a potent and selective small molecule inhibitor of PRMT4 for multiple myeloma. Oncotarget 9: 18480-18493

      • Neault M, Mallette FA, Vogel G, Michaud-Levesque J, Richard S (2012) Ablation of PRMT6 reveals a role as a negative transcriptional regulator of the p53 tumor suppressor. Nucleic acids research 40: 9513-9521

      • Pawlak MR, Scherer CA, Chen J, Roshon MJ, Ruley HE (2000) Arginine N-Methyltransferase 1 Is Required for Early Postimplantation Mouse Development, but Cells Deficient in the Enzyme Are Viable. Mol Cell Biol 20: 4859-4869

      • Plotnikov A, Kozer N, Cohen G, Carvalho S, Duberstein S, Almog O, Solmesky LJ, Shurrush KA, Babaev I, Benjamin S et al (2020) PRMT1 inhibition induces differentiation of colon cancer cells. Scientific reports 10: 20030

      • Taneda T, Miyata S, Kousaka A, Inoue K, Koyama Y, Mori Y, Tohyama M (2007) Specific regional distribution of protein arginine methyltransferase 8 (PRMT8) in the mouse brain. Brain Res 1155: 1-9

      • Wu Q, Nie DY, Ba-Alawi W, Ji Y, Zhang Z, Cruickshank J, Haight J, Ciamponi FE, Chen J, Duan S et al (2022) PRMT inhibition induces a viral mimicry response in triple-negative breast cancer. Nature chemical biology 18: 821-830

      • Zhu Y, He X, Lin YC, Dong H, Zhang L, Chen X, Wang Z, Shen Y, Li M, Wang H et al (2019) Targeting PRMT1-mediated FLT3 methylation disrupts maintenance of MLL-rearranged acute lymphoblastic leukemia. Blood 134: 1257-1268

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes a relatively novel approach to discovering combinations of herbal medications that may help modulate immune responses, and in turn help treat diseases such as cancer. The authors use breast plasma call mastitis as a disease in which they present results from a non-blinded clinical trial with modest results. The main shortcomings are a lack of rigor around standardizing the control group given steroids versus the treatment group given the combinations of herbal medications. There needs to be a detailed statistical analysis of the comparison in tumor size, stage, invasiveness, etc. as well as consideration of confounding disease states (autoimmune disease, prior cancers, diabetes, etc.). While the results are interesting in that the use of herbal medications is often overlooked in Western medicine, the manuscript needs great detail in the clinical comparison in order to provide convincing evidence for an effect.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is relatively novel with considerable translational impact to the field of herbal medications. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      Reviewer #2 (Public Review):

      The work is rather interesting and novel because for the first time, the authors employed knowledge graph, a cutting-edge technique in the domain of artificial intelligence, to identify a novel herbal drug combination for the treatment of PCM. The results of the clinical trial study clearly demonstrated that the drug combination is effective to ameliorate the symptoms of PCM patients and improve the general health status of the patients. Overall, the strategy of this manuscript may provide a paradigm for the design of drug combination towards many other human disorders.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

      Reviewer #3 (Public Review):

      The major merit of the manuscript is that the authors introduced the concept of knowledge graph into the domain of herbal drugs or TCM. Namely, the authors designed a knowledge graph towards systematic immunity or immunotherapy based on massive data mining techniques. The authors successfully identified an herbal drug combination for PCM with the help of a scoring system. Moreover, the authors conducted a clinical trial study and the clinical data showed that the herbal drug combination holds great promise as an effective treatment for PCM. The weakness of the manuscript is that some details for the herbal drug combination and the clinical trial study are missing.

      Many thanks for your very kind words about our work. We are excited to hear that you think our work is relatively novel and holds great promise as an effective remedy for PCM. We are truly thankful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

    1. Author Response

      Reviewer #1 (Public Review):

      After giving a very accessible introduction to cellular processes during brain development, the authors present the computational model used in this study. It combines the kinematics of cell proliferation with the mechanic of brain tissue growth and is essentially equal to their model presented in Zarzor et al (2021), but extended for the outer subventricular zone (OSVZ), see for example Figs. 2 in the present manuscript and in Zarzor et al (2021). This zone, which is specific to humans, provides a second zone of cell proliferation. The division rate in the OSVZ is smaller and at most equal to that in the ventricular zone.

      The authors present two main findings: The distance between sulci in the cortex is decreased whereas the cell density in the ventricular zone is increased in presence of the OSVZ. Furthermore, the "folding evolution", which is the ratio between the outer perimeter at time t and the initial perimeter increases in presence of the OSVZ. The strongest effect is seen, when division rates in both proliferating zones are equal. The authors compare the cases of varying and constant cortical stiffness, which they had also done in Zarzor et al (2021). Finally, they consider the feedback of cortical folding on OSVZ thickness.

      The computational model provides a sound description of how cell proliferation and migration combined with tissue mechanics yield cortical folding patterns. However, only a few parameter values are varied in a limited range. Also, it remains unclear to me, how important the specific functional dependencies of, for example, the cell division rate on the radial coordinate are. This point seems of particular importance because the effect of the presence of the OSVZ on the folding patterns seems rather minute, see Fig. 5. The authors do not propose experiments that could be used to test their description and results. Finally, the analysis is restricted to 2 dimensions.

      Thank you very much for the valuable suggestions. We agree that we are only able to show limited parameter studies in the manuscript. Therefore, we have now implemented a user interface that can be downloaded from Github (https://github.com/SaeedZarzor/BFSimulator) and will allow interested readers to directly change the parameter values and run the simulations.

      To better emphasize the effect of the presence of the OSVZ on the folding patterns, we have edited the corresponding section and figure in the revised manuscript to include a quantification of the distance between sulci:

      “In general, the distance between neighboring sulci decreases with increasing Gosvz, as marked in Figure 7. For the displayed cases, the distance decreases from d = 8.796 mm for Gosvz = 0 to d = 8.67 mm for Gosvz = 10 and finally d = 8.2 mm for Gosvz = 20. Interestingly, the cortical thickness and effective stiffness ratio at the first instability point (denoted by w in Figure 5) are the same for all these cases. Therefore, we attribute the observed differences to the faster increase in the cell density and thus cortical growth, cortical stiffness and the effective stiffness after the instability has been initiated.”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Reviewer #2 (Public Review):

      Weaknesses

      • To account for the complexity of biological phenomena, the model relies on a large number of ad hoc choices whose consequences are difficult to predict.

      We fully agree that there are quite a number of model assumptions that we have to make. Still, we have achieved great agreement with the data from fetal brain sections, which in our opinion justified the assumptions made.

      To better explain the choice of parameters, we have now included the following paragraph in the manuscript: “The mechanical and diffusion parameters are adapted from the literature Budday et al. (2020); de Rooij and Kuhl (2018), while the geometry parameters are estimated based on histologically stained human brain sections and magnetic resonance images. For instance, to determine the MST factor, we measured the relative distance between the ISVZ and OSVZ in histologically stained images. The final value adopted is the result of dividing the measured distance by the expected time. When determining the growth problem parameters, numerical stability and algorithm convergence were major criteria.”

      • The physical model description is highly technical and out of reach for a non-specialist.

      Thank you for making this point! We have now adapted the model description to better emphasize the main features of the model and the feedback mechanisms between the mechanical growth problem and the cell density problem:

      “...is the Cauchy stress tensor formulated in terms of the elastic deformation tensor, as only the elastic deformation induces stresses. The Cauchy stress describes the three dimensional stress state in the spatial (grown and deformed) configuration and is computed by deriving the strain energy function…”

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex (as the cortical stiffness changes while the subcortical stiffness remains constant) and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value G_vz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “G^s_osvz is the division rate in the OSVZ that decreases with increasing maximum stretch s in the domain”

      • The description of neurogenesis shows three zones of cell proliferation, each inhabited by a specific cell type. Despite its realism, the proposed model does not take into account the ISVZ where the intermediate progenitors operate.

      Indeed, in our model we have focused on two original sources of the cells which are radial glial cells and ORGCs. As we know so far, the intermediate progenitor cells are produced from those two cell types, so they are indirectly included in the model as a resulting cell density.

      • The experiment of comparing several regimes derived from the relative importance of proliferation in the VZ and OSVZ is not very clear. It leads to the observation of the evolution of cell density maxima over time, which seems insufficient to conclude the importance of the OSVZ for folding. One wonders whether the key parameter that leads to folding is the rate of OSVZ proliferation or simply the total quantity of neurons generated by the two or even the three zones.

      Thank you for this remark. We fully agree with the Reviewer that a key factor is the total quantity of neurons generated. However, the major question we intend to address here is where these neurons originate from and how the different proliferating zones interact. In other words, we do not question the existence of the OSVZ, but we are trying to build a computational model that can mimic all relevant cellular processes during brain development - to then study their individual effect on cortical folding. Therefore, we do not argue that the OSVZ is necessary for folding, but that it plays a crucial role in the speed of generating these folds and their complexity in the Conclusion section:

      “Our results show that the existence of the OSVZ particularly triggers the emergence of secondary mechanical instabilities leading to more complex folding patterns. Furthermore, the proliferation of outer radial glial cells (ORGCs) reduces the time required to induce the mechanical instability and thus cortical folding.”

      • The experiment on the heterogeneity of proliferation in the OSVZ is a bit frustrating. I would like to see a set-up corresponding to the mosaics found in ferrets and closely associated with folding patterns.

      This is a valuable point, thank you! We have now added new results showing a more distinct regional variation of the OSVZ and have adapted our conclusions regarding this point:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

      “Finally, our simulations reveal that inhomogeneous cell proliferation patterns in the OSVZ can control the location of first gyri and sulci but do not necessarily affect the distance between sulci and the overall complexity of the emerging folding pattern.”

      Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with the user interface shown below is now updated in the Data availability section.

      • It would be interesting to elaborate a little on the possibility of extending the model in 3D, which seems imperative to evaluate the nature of the folding pattern generated. Comparing them to reality is an essential step in gauging the credibility of the model. For instance, it would be interesting to test to which extent the model can father the type of variability observed in the general population (Mangin et al.). It will also be particularly interesting to work on the inverse model between the real folding patterns and the heterogeneous proliferation maps that can generate them.

      We fully agree with the Reviewer. Unfortunately, to the best of the Author’s knowledge, there is currently no data set providing both the 3D evolution of the folding pattern and the corresponding distribution of the cell density. Therefore, the validation of 3D results is difficult. Promisingly, our model achieved good agreement with data from histologically stained fetal brain sections regarding the local gyrification index, final cortical thickness, and cell density distribution, as presented in Zarzor, et al (2021). We have indeed initiated the collection of additional data, ideally for the 3D validation. However, this will take some time and is out of the scope of the current work. It is also a great suggestion to compare our 3D simulation results with the variability found in the general population. Indeed, we plan to do such work in the future but consider this out of the scope of the current work, which focuses more on the OSVZ.

      To still show that our model can be extended to 3D, we have now included the following results: “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone G_vz = 600, the folding complexity increases with increasing initial division rate in the OSVZ G_osvz.”

      Reviewer #3 (Public Review):

      Zarzor et al. developed a new multifield computational model, which couples cell proliferation and migration at the cellular level with biological growth at the organ level, to study the effect of OSVZ on cortical folding. Their approach complements the classical experimental approach in answering open questions in brain development. Their simulation results found the existence of OSVZ triggers the emergence of secondary mechanical instabilities that leads to more complex folding patterns. Also, they found that mechanical forces not only fold the cortex but also deepen subcortical zones as a result of cortical folding. Their physics-based computational modeling approach offered a novel way to predictively assess the links between cellular mechanisms and cortical folding during early human brain development, further shedding light on identifying the potential controlling parameters for reverse brain study.

      Strengths:

      The newly developed physics-based computational model has several advantages compared to previous existing computational brain models. First, it breaks the traditional double-layer computational brain model, gray matter layer and white matter layer, by introducing the outer subventricular zone. Second, it develops multiscale computational modeling by bringing the cellular level features, cell diffusion, and migration, into the macroscale biological growth model. Third, it could provide a cause-effect analysis of cortical folding and axonal fiber development. Finally, their approach could complement, but not substitute, sophisticated experimental approaches to answer some open questions in brain science.

      Weaknesses:

      The cellular diffusion and migration seem determined and controlled by a single variable, cell density, which is one-way coupled with the deformation gradient of the brain model. However, cell migration and diffusion should be potentially coupled with stress and vice versa. Also, the current computational model can be improved by extending it to a 3D model. Finally, they can further improve the study of regional proliferation variation by introducing fully-randomized heterogenous cell density and growth in their model.

      Thank you. We apologize for the lack of clarity in the original submission. There are indeed more coupling mechanisms, which we have now better emphasized when introducing the model:

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value Gvz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “Gosvzs is the division rate in the OSVZ that again decreases with increasing maximum stretch s in the domain”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Finally, we have added new results showing a more distinct regional variation of the OSVZ. Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with user interface is available in the paper:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors developed a new concept: Skeletal age, which is chronological age + years lost due to suffering a low-energy fracture. There seem to be conceptual problems with this concept: It is not known if the years lost are lost due to the fracture or co-morbidities.

      The Reviewer raises an important point, and we are happy to discuss it as follows. While it is not possible to show the causal relationship between a fragility fracture and excess mortality, it has been shown repeatedly that a fracture is associated with an increased risk of pre-mature mortality after accounting for comorbidities and frailty. Indeed, we and others have found that comorbidities contribute little to the increased risk10,11. Moreover, in a previous study using the ‘relative survival analysis’ technique12, we have shown that hip and proximal fractures were associated with reduced life expectancy after accounting for time-related changes in background mortality in the population, suggesting that hip and proximal fractures are an independent clinical risk factor for mortality.

      In this study, we used a multivariable Cox’s proportional hazards model to adjust for confounding effects of age and severity of comorbidities, and our result clearly indicated that a fracture is associated with years of life lost. Moreover, comorbidities were considered a factor in an individual's risk profile for estimating skeletal age. As a result, skeletal age reflects the common real-world scenario that the combination of comorbidities and proximal or lower leg fractures compounded post-fracture excess mortality, much greater than each alone13.

      Technically, there are two steps to individualise skeletal age for each individual with a specific risk profile. First, we used the statistical approach recommended for the individualisation of survival time prediction using statistical models14 to individualise specific mortality risk for each participant with a specific risk profile. Specifically, we calculated the prognostic risk index as a single-number summary of the combined effects of his/her specific risk profile of a specific fracture site and the severity of comorbidity. His/her individualised fracture-mortality association was then computed as the difference between his/her prognostic index and the mean prognostic index of “typical” people in the general population. In the second step, we used the Gompertz law of mortality and the Danish national lifetable data to transform the individualised association into life expectancy loss as a result of a fracture15.

      We have modified part of the description of the methodology as follows:

      “For the second aim, we determined skeletal age for individual based on the individual’s specific risk profile. First, we calculated the prognostic risk index as a single-number summary of the combined effects of his/her specific fracture site and the severity of comorbidity51. The prognostic index is a linear combination of the risk factors with weights derived from the regression coefficients. The individualised fracture-mortality association for an individual with a specific risk profile is then the difference between the individual's prognostic index and the mean prognostic index of 'typical' people in the general population51. In the second step, we used the Gompertz law of mortality and the Danish national lifetable data to transform the excess mortality into life expectancy loss as a result of a fracture49.”.

      In addition, with the possible exception of zoledronate after hip fracture, we have no evidence that this increased risk of mortality can be changed with interventions.

      We agree that there is a lack of strong evidence from randomised controlled trials supporting the benefit of anti-resorptive therapy on post-fracture survival. As mentioned above, the mention of zoledronic acid was simply for illustrating the use of skeletal age to convey a treatment benefit. We have decided to remove the section related to the benefit of pharmacological treatment on post-fracture mortality.

      Furthermore, it is not clear why the authors think that patients and doctors will better understand the implications of older "skeletal age", on future fracture risk and the need for prevention, for example, the 10-year risk of MOF? Knowing that my bones are older than me, could make a patient feel even more fragile and afraid of being physically active. The treatment will reduce the risk of future fractures, but this study provides no information about the effect on mortality of preventing the subsequent fracture or the risk of mortality associated with recurrent fractures.

      The risk of fracture is typically conveyed to patients and the public in terms of absolute risk metric (e.g., probability) or relative risk metrics (e.g., risk ratio). However, patients and doctors often struggle to comprehend probabilistic statements such as 'Your risk of death over the next 10 years is 5% if you have suffered from a bone fracture'. The underappreciation of post-fracture mortality's gravity has caused patients to be hesitant towards treatment and prevention, contributing to the current crisis of osteoporosis treatment.

      We consider that skeletal age will make doctor-patient risk communication more intuitive and probably more effective. For example, for the same 2-fold increased mortality risk of hip fracture, telling a 60-year man with a hip fracture that his skeletal age would be 66 years old, equivalent to a 6-year loss of life is much more intuitive. The patient might be thus more likely to accept the recommended pharmacological treatment, ultimately improving health benefits. However, we have not had RCT evidence for the effectiveness of skeletal age, and this will be one of our future research focus. We would like to point out that there is RCT evidence that effective age (such as 'Heart Age', 'Lung Age') could improve the uptake of preventive actions. For example, informing patients about their heart age, as shown by Lopez-Gonzalez et al16 was found to better improve their cardiovascular risk compared to informing the Framingham probabilistic risk score.

      Introduction:

      The statement that treatment reduces the risk of dying, needs modification as the majority of clinical trials have not demonstrated reduced mortality with treatment.

      We have modified the statement as follows: “In randomised controlled trials, treating high-risk individuals with bisphosphonates or denosumab reduces the risk of fracture4, though whether the reduction translates into reduced mortality risk remains contentious5, 6.”

      It is not clear how the skeletal age captures the risk of a future fracture. The other difference between the idea of "skeletal age" and for example "heart age" is that there are treatments available for heart disease that reduce the risk of mortality, as mentioned above this has not been shown consistently in clinical trials in osteoporosis.

      We take the Reviewer's point, but we would like to point out that there are at least two RCTs on zoledronic acid showing that treating patients with a fragility fracture reduces their risk of mortality17,18.

      Because the risk profile that is associated with a post-fracture mortality is also associated with the risk of fracture, skeletal age can be seen as a measure of the decline of the skeleton due to a fracture or exposure to risk factors that raise the risk of fracture. Thus, a 60-year-old with a skeletal age of 66 is in the same risk category as a 66-year-old with 'favourable risk factors' or at least the ones that are potentially modifiable. Hence, an older skeletal age means a greater risk of fracture.

      Neither the “Skeletal Age” nor the “Heart Age”16,19,20 has the treatment intervention incorporated into its calculator. We have added details to explain how the assessment of skeletal age would provide the conceptual risk of both fracture and post-fracture mortality as follows:

      “Unlike the current fracture risk assessment tools17 which estimate the probability of fracture over a period of time using probability-based metrics, such as relative risk and absolute risk, skeletal age quantifies the consequence of a fracture using a natural frequency metric. A natural frequency metric has been consistently shown to be easier and more friendly to doctors and patients than the probability-based metrics9 11 30. It is not straightforward to appreciate the importance of the two-fold increased risk of death (i.e., relative risk = 2.0) without knowing the background risk (i.e., 2 folds of 1% would remarkably differ from 2 folds of 10%). By contrast, for the same 2-fold mortality risk of hip fracture, telling a 60-year man with a hip fracture that his skeletal age would be 66 years old, equivalent to a 6-year loss of life, is more intuitive. The skeletal age can also be interpreted as the individual being in the same risk category as a 66-year-old with 'favorable risk factors' or at least the ones that are potentially modifiable. Hence, an older skeletal age means a greater risk of fracture.”.

      Discussion:

      The prevalent comorbidities; cardiovascular diseases, cancer, and diabetes, suggest that fracture patients die from their comorbidities and not their fractures.

      Please refer to the above response for more detail. Briefly, the multivariable Cox’s proportional hazards regression adjusted for the confounding effect of age and the severity of comorbidities, indicating the association between fracture and mortality was independent of aging and comorbidity severity. On the other hand, skeletal age is a measure of excess mortality related to either fracture or co-morbidities or both.

      The discussion should be more balanced as there is a number of clinical trials demonstrating reductions in vertebral and non-vertebral fractures without effect on mortality. There may be specific effects of zoledronate on mortality, but that has not been shown for the vast majority of treatments.

      Please refer to the above response for more detail. Specifically, as the study primarily aimed at introducing skeletal age as a new metric for risk communication, we have decided to omit the paragraph discussing the potential benefit of zoledronic acid on post-fracture mortality risk in order to maintain the clarity and focus of the study.

      It is not correct that FRAX does not take mortality into account? It does not tell you specifically how high the risk of dying and how high the risk of a fracture is but integrates the two. "Skeletal age" does not provide either information, it just tells you that your skeleton is older than your chronological age - most patients and doctors will not associate that with an increased risk of dying - only of frailty.

      Although it is commonly believed that FRAX accounts for competing risk of death, it does not provide the risk of post-fracture mortality. Indeed, none of the current fracture risk assessment tools was designed to provide post-fracture mortality risk5. Skeletal age fills the gap by providing the excess mortality following a fracture for an individual with specific risk profile.

      The statement that zoledronate reduces the "skeletal age" by 3 years, has not been demonstrated and it is not clear how this can be demonstrated by the analysis reported here. As the reduced mortality has only been shown for the Horizon RFT, this cannot be inferred for other treatments and other fracture types. The information provided by the "skeletal age" is only that the fracture you already had took x years of your remaining lifetime. With the exception of perhaps zoledronate after hip fracture, we have no indication from clinical trials that the treatment of osteoporosis will change this.

      The current study was not designed to examine the effectiveness of an intervention. The statement related to the survival benefit of zoledronate is used to illustrate how skeletal age is used to convey the treatment benefit in real-world doctor-patient risk communication. Given the hazard ratio of 0.72 for zoledronate-mortality association17, a patient might find the statement “Zoledronic acid treatment helps a patient with a hip fracture gain (back) 3 years of life” much easier to understand and probably more persuasive than the traditional statement of “Zoledronic acid treatment reduced the risk of death by 28%”.

      Reviewer #2 (Public Review):

      The paper of Tran et al. introduces the concept of 'skeletal age' as a means of conveying the combined risk of fracture and fracture-associated mortality for an individual. Skeletal age is defined as the sum of chronological age and the number of years of life lost associated with a fracture. Using the very comprehensive Danish national registry and employing Cox's proportional hazards model they estimated the hazard of mortality associated with a fracture. Skeletal age was estimated for each age and fracture site stratified by gender. The authors propose to replace the fracture probability with skeletal age for individualized fracture risk assessment.

      Strengths of the study lie in the novelty of the concept of 'skeletal age' as an informative metric to internalize the combined risks of fracture and mortality, the very large and well-described Danish National Hospital Discharge Registry, the sophisticated statistical analysis and the clear messages presented in the manuscript. The limitations of the study are acknowledged by the authors.

      We appreciate your positive remark that captures the essence of our work.

      References:

      1. Lujic S, Simpson JM, Zwar N, Hosseinzadeh H, Jorm L. Multimorbidity in Australia: Comparing estimates derived using administrative data sources and survey data. PloS one 2017; 12(8): e0183817.
      2. Andersen TF, Madsen M, Jorgensen J, Mellemkjoer L, Olsen JH. The Danish National Hospital Register. A valuable source of data for modern health sciences. Dan Med Bull 1999; 46(3): 263-8.
      3. Vestergaard P, Mosekilde L. Fracture risk in patients with celiac Disease, Crohn's disease, and ulcerative colitis: a nationwide follow-up study of 16,416 patients in Denmark. Am J Epidemiol 2002; 156(1): 1-10.
      4. Hundrup YA, Hoidrup S, Obel EB, Rasmussen NK. The validity of self-reported fractures among Danish female nurses: comparison with fractures registered in the Danish National Hospital Register. Scand J Public Health 2004; 32(2): 136-43.
      5. Beaudoin C, Moore L, Gagne M, et al. Performance of predictive tools to identify individuals at risk of non-traumatic fracture: a systematic review, meta-analysis, and meta-regression. Osteoporos Int 2019; 30(4): 721-40.
      6. Spiegelhalter D. How old are you, really? Communicating chronic risk through 'effective age' of your body and organs. BMC Med Inform Decis Mak 2016; 16: 104.
      7. Vestergaard P, Rejnmark L, Mosekilde L. Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark. Osteoporos Int 2005; 16(2): 134-41.
      8. Roerholt C, Eiken P, Abrahamsen B. Initiation of anti-osteoporotic therapy in patients with recent fractures: a nationwide analysis of prescription rates and persistence. Osteoporos Int 2009; 20(2): 299-307.
      9. Cummings SR, Lui LY, Eastell R, Allen IE. Association Between Drug Treatments for Patients With Osteoporosis and Overall Mortality Rates: A Meta-analysis. JAMA Int Med 2019; 179(11): 1491-500.
      10. Chen W, Simpson JM, March LM, et al. Comorbidities Only Account for a Small Proportion of Excess Mortality After Fracture: A Record Linkage Study of Individual Fracture Types. J Bone Miner Res 2018; 33(5):795-802
      11. Vestergaard P, Rejnmark L, Mosekilde L. Increased mortality in patients with a hip fracture-effect of pre-morbid conditions and post-fracture complications. Osteoporos Int 2007; 18(12): 1583-93.
      12. Tran T, Bliuc D, Hansen L, et al. Persistence of Excess Mortality Following Individual Nonhip Fractures: A Relative Survival Analysis. J Clin Endocrinol Metab 2018; 103(9): 3205-14.
      13. Tran T, Bliuc D, Ho-Le T, et al. Association of Multimorbidity and Excess Mortality After Fractures Among Danish Adults. JAMA Netw Open 2022; 5(10): e2235856.
      14. Henderson R, Keiding N. Individual survival time prediction using statistical models. J Med Ethics 2005; 31(12): 703-6.
      15. Kulinskaya E, Gitsels LA, Bakbergenuly I, Wright N. Calculation of changes in life expectancy based on proportional hazards model of an intervention. Insur Math Econ 2020; 93: 27-35. 16 Lopez-Gonzalez AA, Aguilo A, Frontera M, et al. Effectiveness of the Heart Age tool for improving modifiable cardiovascular risk factors in a Southern European population: a randomized trial. Eur J Prev Cardiol 2015; 22(3): 389-96.
      16. Lyles KW, Colon-Emeric CS, Magaziner JS, et al. Zoledronic acid and clinical fractures and mortality after hip fracture. N Engl J Med 2007; 357(18): 1799-809.
      17. Reid IR, Horne AM, Mihov B, et al. Fracture Prevention with Zoledronate in Older Women with Osteopenia. N Engl J Med 2018; 379(25): 2407-16.
      18. Bonner C, Batcup C, Cornell S, et al. Interventions Using Heart Age for Cardiovascular Disease Risk Communication: Systematic Review of Psychological, Behavioral, and Clinical Effects. JMIR Cardio 2021; 5(2): e31056.
      19. Svendsen K, Jacobs DR, Morch-Reiersen LT, et al. Evaluating the use of the heart age tool in community pharmacies: a 4-week cluster-randomized controlled trial. Eur J Public Health 2020; 30(6): 1139-45.
      20. Suissa S. Immortal time bias in pharmaco-epidemiology. Am J Epidemiol 2008; 167(4): 492-9.
    1. Author Response

      Reviewer #1 (Public Review):

      I noticed 2 weaknesses, the first related to the killing assays: considering that WT IgG less efficiently promotes complement-mediated phagocytosis of bacteria, one would assume that the ingested bacteria (to be killed) would be lower in neutrophils exposed to this IgG, to begin with - which is not accounted for in the analyses shown.

      We now included a better explanation of our opsonophagocytic killing assay.

      A second weakness in my mind pertains to the in vivo experiment: the model used obviously requires a very high number of bacteria (the inoculum), somehow indicating that this specific bacterial strain does not lead to progressive infection (i.e. with replicating bacteria) but mice experience a severe acute inflammatory response followed by the rapid elimination of bacteria. This explains the high mortality - and indicates that mice succumb to acute inflammation, rather than the progressive replication of bacteria. To conclusively prove the therapeutic value of those modified antibodies, a clinically more relevant S. pneumoniae model would be helpful.

      The inoculum used in our mouse model was based on a dose finding study. Although the initial starting dose was 5x106 bacteria (based on previously published mouse infection models with S. pneumoniae serotype 6A), we needed a higher dose (1x108 bacteria) to reach 80-100% mortality. While we agree that the final dose was relatively high, this does not mean that capsule type 6 is not a clinically relevant strain. It is well known that clinically relevant serotypes in humans are not always invasive in mice (doi: 10.1128/iai.60.1.111-116.1992). This is the exact reason why we chose to perform in vivo experiments with serotype 6A, which is known to be more invasive in mice (while serotype 6B is more virulent in humans). Of course, while our in vivo data provide an important proof-of-concept for the capacity of hexamer-enhancing mutations to improve protection by anti-capsular antibodies, future studies are needed to verify the potential use of mAbs against other serotypes.

      A third aspect, which should be addressed in the discussion, unless tested and not shown, is how anti-pneumococcal IgM antibodies compare to hexamerized IgGs. Is there any advantage, or do they perform similarly with regards to complement activation?

      We have now generated and tested IgM against CPS6 (Figure 2g). Although anti-CPS6 IgM can induce complement-dependent phagocytosis to some extent, but IgM was less potent than IgG variants with hexamer-enhancing mutations. This suggests that complement activation via pre-assembled IgM oligomers was less effective than via IgG hexamers that are formed after target binding.

      These new data are now included in the revised manuscript as figure 2g, supplemental figure 9 and commented in results section lines 172-179.

      Reviewer #2 (Public Review):

      The results are intriguing, and one consideration is whether enhancing complement activation is beneficial or harmful for a therapeutic antibody. Based on these results is there the possibility of a natural selection against strong levels of complement activation?

      We appreciate the positive feedback to our presented work. Indeed, it is believed there is a natural selection against these mutations to avoid uncontrolled complement activation by naturally occurring IgGs in solution. It is important to realize that formation of IgG hexamers is a surface-dependent process. When IgG molecules bind to surface-bound antigens (via Fab), they can organize into higher-ordered hexamers via Fc-Fc interactions. The specific point mutations used in this paper increase hexamer formation after antigen binding on the cell surface. However, at high concentrations of IgG (as those occurring in our blood (>10 mg/ml), IgG hexamers might be formed independent of target binding (van Kampen et al Journal of Pharmaceutical Sciences Volume 111, Issue 6, June 2022, Pages 1587-1598). If naturally occurring IgGs would have hexamer-enhancing mutations, IgG hexamers could be formed in solution resulting in massive complement activation and depletion of the complement system.

      The study clearly shows that the introduction of the hexamerisation mutations affects the ability of the antibodies to bind and activate complement. The studies in Fig 2 examining the role of Fc are particularly elegant. One issue is that it is surprising that the WT IgG1 and IgG3 monoclonals have a minimal capacity to fix and activate complement, despite IgG1/3 to other antigens being efficient isotypes at fixing complement. In the absence of data showing whether IgG1/3 from normal human sera against capsule fixes complement then it is difficult to contextualise these results or to assess if other changes, such as in glycosylation, contribute to the results presented. Related to this, there is reasonable evidence that antibodies induced to capsules can be protective yet the data in Fig 5 suggests that without the mutations then the monoclonals are not effective at all for 6B and only effective at the highest concentration for 19A.

      As mentioned in Essential revision 3 our data with S. aureus antibodies demonstrate that this is not a consequence of how these mAbs are produced or differences in their Fc glycosylation profile. We agree with the fact that there are reasonable evidence that antibodies induced to capsules can be protective. However, not all vaccine serotypes are able to induce a strong immune protection. Serotype 6B, for instance, which is covered by current vaccines, is found to be poorly immunogenic (manuscript lines 101-103). For further studies, it would be really interesting to find out what makes this difference between mAbs and, specifically in our case between anti-CPS antibodies.

      The adoptive transfer experiments demonstrate that the antibodies can moderate bacteraemia. The mechanism of this is not explored and the importance of hexamerisation and complement activation not demonstrated, especially as it is not clear if human antibodies and mouse complement are a productive combination in this context.

      We have now included additional phagocytosis assays with mouse sera (supplemental figure 15) that demonstrate that human antibodies and mouse complement are a productive combination.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Silva et al. "Evaluation of the highly conserved S2 hairpin hinge as a pan-coronavirus target" seeks to evaluate a new epitope target on the S2 domain of SARS-CoV2 Spike protein and evaluate its potential as a pan-coronavirus target. This is an impressive combination of extensive structural, HDXMS-based dynamics and antibody engineering approaches. What is missing is a detailed correlation of HDXMS with Spike dynamics. The authors have not examined the allosteric effects of 3A3 binding to the Spike trimer, specifically cooperativity in antibody binding. Does binding of one Fab positively or negatively impact the subsequent binding of antibody? In this regard, readers would benefit from HDXMS spectral envelopes in figures, at least for the epitope locus peptides. Further, what is the effect of the intrinsic ensemble behavior of the Spike protein on 3A3 interactions? In a broader sense antibody binding is assisted by intrinsic trimer ensemble behavior, as observed by the lowered binding to the omicron variant- but are there induced binding effects? It would help to better integrate HDXMS with cryo-EM and antibody engineering. It is a novel, less explored epitope target on the S2 domain. Overall, a more definitive mechanistic conclusion for how targeting the S2 hinge can advance future pan-coronavirus strategies is missing.

      1) Given that the authors have demonstrated ensemble switching behavior from 4 ℃ to 37 ℃ (Costello et al. (2021)) why is this not factored in how the HDXMS is carried out? The samples were stored, frozen at -80 ℃, thawed, and equilibrated for 20 min at 20 ℃ with or without antibody present and analyzed by HDXMS. However, the reported t1/2 for trimer tightening at 37 ℃ is t1/2 = 2.5 h (Supplementary Fig. 7). The samples should ideally be analyzed under standardized conditions with the stable conformer. Sample heterogeneity from HDXMS is likely due to any of the following contributing factors:

      i) Intrinsic ensemble heterogeneity (Costello et al. (2021)), Kinetics of RBD- up and down conformational switching

      ii) Cooperativity of Fab binding.

      iii) Partial occupancy of trimer epitopes with bivalent IgG.

      iv) Combination of cooperativity effects and partial binding effects

      I would predict for any of the above reasons, it is intriguing why are there no bimodal kinetics of deuterium exchange reported. Partial occupancy should be evident from HDXMS paratope analysis.

      2) Pan-coronavirus neutralization potential is clearly evident. It is intriguing that the antibodies were isolated after immunization with an authentic MERS S2 domain but showed better selectivity to full-length 6P-engineered Spike. How is cooperativity built into antibody binding, given that the epitope site is occluded to various extents by the S1 domain and access is contingent upon RBD up-down kinetics?

      3) I am surprised that there is no allostery described for 3A3 (Supplementary figures 5, 6).

      The HDX-MS experiments presented in this work were carried out by the D’Arcy lab and published in a preprint on bioRxiv (originally posted on February 1, 2021) prior to publication of Costello et al. (first posted to bioRxiv July 11, 2021, epub March 2, 2022). Indeed, our bioRxiv posting inspired the Marqusee lab to request 3A3 for inclusion in their work focused on the conformational heterogeneity of the spike protein. Without prior knowledge of the conformational heterogeneity, we carried out these epitope mapping experiments at 25Ç, which allowed us to successfully mapped the epitope without determining which conformation the antibody prefers.

      The data presented in Costello et al. further confirms the location of 3A3’s epitope presented here and provides additional information about its preference for different conformational states within the spike protein. We have included an additional comment in the methods section (lines 660-661) stating, “The location of the 3A3 epitope was confirmed in a separate experiment carried out over the temperature range of 4 to 37 °C (Costello et al. 2022).”

      This is a clear example of the value of pre-prints to stimulate timely scientific collaboration. While Costello et al. used 3A3 as a tool to probe spike dynamics, here we highlight the original work that identified the epitope.

      Spectral envelopes have been provided (Supplementary Fig. 4b and Supplementary Table 3).

      The HDX-MS data provides limited insight into possible cooperative or allosteric binding of the 3A3 antibody because of other sources of heterogeneity such as spike dynamics and partial occupancy of the spike epitopes. However, no difference in occupancy was detected when HDX-MS with 3A3 Fab was compared to the same experiment with bivalent 3A3 IgG. It should be noted that in this HDX system, the antibody is not bound so tightly that the spectra are bimodal, showing the exchange of bound and unbound populations separately. Though HDX-MS experiments were performed in slight Fab or IgG excess of 1:1 Fab:spike monomer stoichiometry, the absolute stoichiometry in the context of the spike trimer is unclear.

      Reviewer #2 (Public Review):

      The authors report a conserved spike S2 hinge epitopes and two conformationally selective antibodies that help elucidate spike behavior. This work defines a third class of S2 antibody and provides insights into the potency and limitations of targeting this S2 epitope for future pan-coronavirus strategies.

      Thank you for your review of this manuscript.

      Reviewer #3 (Public Review):

      The study by Silva et al details the discovery and evaluation of a third class of broadly cross-reactive anti-Spike antibody that binds a conserved hinge region in the S2 domain. After immunizing mice with a stabilized S2 protein from MERS and generating scFv phage libraries, the authors were able to identify antibody 3A3, which showed broad cross-reactivity with SARS2 (including Omicron BA.1), SARS1, MERS, and HKU1 spike proteins. Using a combination of a low-resolution cryo-EM structure and HDX mass spectrometry, the authors were able to map amino acids in the antibody paratope and spike epitope, the latter of which is the hinge region of the Spike S2 domain (residues 980-1005) that plays a critical role in pre- to -post-fusion conformational changes. Through well-executed and comprehensive mutagenesis, binding, and functional assays, the authors further validated critical residues that lead to antibody escape, which centered around the 2P residues and diminished viral entry. While 3A3 and an affinity-enhanced engineered version, RAY53, did not show potent in vitro neutralization against the authentic virus, the antibody was shown to recruit Fc effector functions for viral clearance, in vitro.

      Overall, the conclusions of this paper are well supported by the data, but the usefulness of such antibodies is likely limited. The work can be strengthened by extending the analysis of 3A3-like antibodies in the context of human immune responses and in vivo effectiveness.

      1) Isolation of 3A3 was achieved after the generation of scFv-phage libraries following immunization with a MERS S2-domain immunogen in a mouse model. The fact that 3A3 binds well to 2P-stabilized sequences and binding/neutralization is diminished upon reversion of 2P mutations back to the native spike sequence (Figures 3a, 4c, and 5b), suggest that such antibodies would likely not arise from natural infection. This contrasts the isolation of fusion peptide and stem helix-directed antibodies, which were isolated from both immunized animals and convalescent individuals. To make their results more solid regarding the use of such antibodies in future vaccine strategies, the authors should provide evidence that 3A3-like antibodies can be identified in human donors. For example, they could enrich donor-derived S2-specific antibodies that bind both MERS and SARS2 S2 domains and evaluate the fraction of antibodies that recognize the hinge-epitope using competition binding assays (either ELISA or BLI), which have commonly been used to map epitope-specific sera responses. This could also be achieved with nsEMPEM of polyclonal IgGs bound to S2 proteins.

      2) The authors speculate in the discussion that strategies to enhance access to the hinge epitope, which may include ACE2-mimicking antibodies, could promote enhanced viral clearance. In addition to ACE2-mimicking antibodies, several antibodies have been described that bind the RBD and promote S1 shedding (see for instance mAb S2A4 - Piccoli et al, 2020, Cell). Several 2nd generation vaccine platforms utilize RBD-only immunogens that are likely to induce high titers of ACE2-mimicking and cross-reactive S1-shedding antibodies. Thus, adding in vitro neutralization and ADCC experiments to assess synergy between 3A3/RAY53 and such antibodies would booster this speculative claim and be of interest to many in the field developing strategies for pan-coronavirus therapies.

      3) The authors provide in vitro evidence in Figure 5c,d for Fc-mediated viral clearance. While in vivo data to show effectiveness in animal models is ideal, additional in vitro data that utilize engineered constructs that modulate effector function (e.g., DLE (+) or LALA (-)) would boost the authors' claims regarding Fc-mediated viral clearance mechanisms by 3A3/RAY53.

      1) Though we do not plan to isolate 3A3-like antibodies from human donors, there is evidence that these antibodies are elicited in infected humans via analysis of polyclonal responses in Claireaux et al 2022. We also know of several studies on naturally occurring S2 hinge targeting antibodies from colleagues that are in preparation. Understanding the therapeutic role of this antibody class is relevant to the study of broadly-reactive S2 antibodies, even if that role is limited.

      2) We agree that synergy between S2 hinge epitope binding antibodies and ACE2 mimicking antibodies will be very interesting to investigate. We hope to pursue this in future work.

      3) We agree these are excellent controls to include, in addition to isotype controls already shown. In accordance with the eLife COVID research policy, we minimized our claims around Fc-effector functions elicited by RAY53 and stated that further experiments to confirm our preliminary findings are needed.

      The existing description of the effector function experiments states in lines 392-392 “These results indicate that RAY53 binding is compatible with ADCP and ADCC,” which is already a very limited claim.

      We also added in line 450 that S2 core-binding antibodies “require further validation” of their ability to recruit effector functions.

      We appreciate the importance of controls providing effector function modulation and will include the LALAPG mutations as a standard component of our future ADCC evaluation. However, given our focus on the relevance of the epitope and consistency of the Fc regions across the antibodies, we felt that the isotype and positive control antibodies (target binding controls) were the most relevant controls to include in this study.

    1. Author Response

      eLife assessment

      Germline inactivation of NPHP2, which encodes a protein that localizes to the transition zone at the base of the primary cilium, results in infantile kidney cysts and fibrosis. In this study, the authors provide solid evidence that increased cell proliferation and fibrosis precede cyst formation in Nphp-2 mouse models, that mutant renal epithelial cells are responsible for the phenotype, and that genetic inhibition of ciliogenesis in this model reduces disease severity. They also show that valproic acid, a drug that affects a number of cellular targets and is used to treat other human conditions, slows disease progression. One limitation of the study is that it provides limited insights into the mechanisms responsible for any of its interesting observations.

      To our knowledge, our study is the first to pinpoint defective epithelial cells as the main driver for both epithelial cysts and interstitial fibrosis in a NPHP model. The discovery that abnormal signaling from epithelial cells triggered a profibrotic response in the absence of cyst formation is also novel. Our Ift88 Nphp2 double mutant results, combined with tissue-specific function of NPHP2, suggest that NPHP2 functions as a negative regulator of a profibrotic and pro-cystic pathway that interacts with cilia-mediated signaling in epithelial cells and that abnormal signaling from epithelial cells triggers interstitial fibrosis. Moreover, we identified the HDAC inhibitor VPA as a potential candidate drug for treating NPHP. Although the precise molecular function of NPHP2 remains undefined, our results suggest that epithelial specific function and epithelial-stromal crosstalk underlie NPHP like phenotypes in Nphp2 mutant kidneys. Furthermore, although whether NPHP2 interacts with polycystin-mediated signaling remains an outstanding question, our results ruled out the involvement of NPHP2 in ciliary localization of PC2.

      Reviewer #1 (Public Review):

      Nephronophthisis (Nphp) is a multigenic, recessive disorder of the kidney presenting in childhood that is characterized by cysts predominantly at the cortico-medullary junction and progressive fibrosis. An infantile form of the disease presents earlier with more diffuse cystic change. The condition is considered a ciliopathy because most of the genes linked to the condition encode proteins involved in ciliary biogenesis or function. Germline mutations in NPHP2 are associated with a particularly severe, infantile form of the disease. Given that interstitial fibrosis is a more prominent feature of Nphp compared to many other forms of polycystic kidney disease, the authors sought to determine the mutant cell types responsible for the phenotype.

      In the current study, the authors generated and characterized mouse lines with Nphp2 selectively inactivated in either renal epithelial cell or stromal cell lineages and found that inactivation in renal epithelial cells was both necessary and sufficient to cause disease. They further showed that markers of interstitial fibrosis and proliferation increase in mutants prior to the onset of histologically evident cystic disease, suggesting that aberrant epithelial-stromal cell signaling is an early and primary feature of the condition (Figures 1-4). The study design was straightforward and appropriate to address the question, and the results support their conclusions.

      They next tested whether the cilia-dependent cyst-activating pathway (CDCA) that is "unmasked" by loss of other PKD-related genes is similarly active in Nphp2 mutants by generating Nphp2/Ift88 double mutants. Their studies found that the severity of cystic disease and markers of proliferation and fibrosis was attenuated in double-mutants (Fig 5, 6). These studies were also appropriate for testing the hypothesis and the results were similarly consistent with their interpretation.

      In the last set of studies, they tested whether valproic acid (VPA), a drug that has multiple modes of action including acting as a broad inhibitor of HDACs and previously used by the investigators in other forms of polycystic kidney disease, would have similar effects in Nphp2 mutants. The authors tested daily injection from days P10 through P28 in both control and Nphp2 mutant mice with VPA or an appropriate vehicle control and found that VPA was beneficial (Fig 7). The study design was acceptable and the results generally support their conclusions. The one perplexing result is shown in Fig 7B. The Nphp2 mutants, regardless of treatment status, have body weights (BW) that are significantly lower than the controls, with treated mutants even trending lower than their untreated mutant counterparts. This is unexplained and should be addressed. In the mutants with more widespread epithelial cell knock-out of Nphp2 (Ksp-Cre, Fig 1), total body weight decreased as mice became more severely cystic with renal impairment. In the milder form of disease produced with the Pkhd1- Cre (Fig 7), total body weight is inexplicably approx. 2g lower on average despite having much more modestly elevated KBWs and BUNs. Moreover, one might have expected that mutants treated with VPA would have had BWs intermediate between untreated mutants and controls since the severity of the disease was moderately attenuated. These differences raise the question as to whether body weight differences are due to factors independent of disease status, the most likely of which would be that the controls were not littermates. This prompted a careful review of the text for descriptions of the control mice. Throughout the study, the investigators describe selecting animals from the same "cohort", but this term is imprecise.

      There is little information provided about background strains, whether any of the lines were congenic, or whether any of the studies were done using littermate controls. This must be addressed. It would help if the investigators identified the litter status in their plots. This would clearly show relationships between animals and the number of litters that had animals with these properties. If littermates were not used for each study, the authors must explain both why they didn't do so and how they then selected which animals to use. This information is especially important for interpreting the results of their genetic interaction (fig 5) and drug treatment studies (fig 7).

      We thank the reviewer for the multiple positive comments.

      To address the issue of body weight, we examined the time course of body weight change more carefully and added Figure 7-figure supplement 1 to present the results. Although Nphp2flox/flox;Pkhd1-Cre mice displayed reduced body weight at P28 in comparison to controls, this reduction was more moderate than that of Nphp2flox/flox;Ksp-Cre mice (Figure 7-figure supplement 1A). Notably, the trend of body weight difference started at around P21 in both Nphp2flox/flox;Pkhd1-Cre and Nphp2flox/flox;Ksp-Cre mice, coinciding with weaning (Figure 7-figure supplement 1B). It is possible that mutants with compromised kidney function were less capable to thrive and gain weight at around this transition time. In terms of VPA treatment, body weight trended down in both wild type and mutant mice subjected to the treatment, although the difference did not reach statistical significance (Fig. 7B). We cannot rule out the possibility that side effect of VPA contributed to weight loss in treated mice. In addition, VPA may affect body weight increase through HDAC: the HDAC inhibitor Trichostatin A was shown to inhibit adipogenesis (PMID: 34232916) and 4-hexylresorcinol, another HDAC inhibitor, reduced body weight in treated rats (PMID: 34445640). To include the additional data and references, we added the following in the Results section:

      "We analyzed body weight change of Nphp2flox/flox;Pkhd1-Cre mice in more detail and compared it to Nphp2flox/flox;Ksp-Cre mice. At P28, the reduction of body weight in Nphp2flox/flox;Pkhd1-Cre mice in comparison to control mice was more moderate than that in Nphp2flox/flox;Ksp-Cre mice (Figure 7-figure supplement 1)."

      " However, the reduced body weight phenotype in mutant mice was not suppressed by VPA treatment (Fig. 7B). We cannot rule out the possibility that the side effects of VPA contributed to weight loss in treated mice. In addition, VPA may reduce body weight through inhibiting HDAC during the growth period: the HDACI Trichostatin A was shown to inhibit adipogenesis (51)."

      Regarding genetic background, all mice analyzed in figures 5 and 7 are in the same genetic background (C57/BL6J). We added more detailed description of genetic background in the Materials and Methods section. Littermate status is now also indicated in figure legends.

      In Figure 5, multiple genotypes (i.g. Nphp2flox/flox;Ksp-Cre, Nphp2flox/flox;Ift88flox/flox;Ksp-Cre and Ift88flox/flox;Ksp-Cre) were analyzed. Because of the limited number of animals per litter and low yield of desired genotypes, non-littermates had to be included in some cases. Littermate status is now highlighted by colors in the data tables of Figure 5 source data.

      In Figure 7, because of the limited number of animals per litter and the need to subject each genotype to VPA and vehicle treatment, non-littermates had to be included in some cases. Littermate status is now indicated by highlight colors in the data tables of Figure 7 source data.

      Several other considerations. The authors state that the effects of VPA are mediated through the drug's inhibition of HDACs and suggest that future studies could be directed at refining the specific HDAC. While this is certainly possible, the authors should acknowledge that VPAs have been reported to have numerous pharmacologic effects and targets and which of these is mediating the effects in their model is unknown (text). They would need mechanistic studies to show this, though it doesn't discount their possible efficacy as a therapy for PKD.

      We agree that it is an important point to clarify and added in Discussion: "It is also worth noting that VPA could affect targets other than HDACs and testing newly approved HDACIs will provide useful insight."

      The authors also state in their abstract that their double knock-out studies "support a significant role of cilia in Nphp2 function in vivo." It is not clear to me how their studies show this nor how they can exclude that ciliary activity is operating in an Nphp2-independent, parallel fashion that modulates some common downstream pathways.

      We agree with the reviewer that our results do not exclude the possibility that NPHP2 and ciliary activity feed into a common downstream pathway, i.e., a cilia-dependent cyst-activating pathway could operate outside of cilia. We changed the sentence in abstract to "supporting a significant interaction of cilia and Nphp2 function in vivo." In addition, we added "Although cilia-dependent, the downstream pathway could potentially operate outside of cilia and receive parallel signals from both ciliary activity and Nphp2." to Discussion to clarify and reflect the results and model more precisely.

      Reviewer #2 (Public Review):

      The manuscript by Li et al demonstrates the role of Nphp2/Invs in renal epithelia in preventing NPHP-like phenotypes, such as epithelial/stromal proliferation and stromal fibrosis, in mice. Previously, mutants of the Nphp2 allele in mice, generated by insertional mutagenesis, showed severe cystic kidney disease and fibrosis in neonates.

      The authors nicely show that the NPHP-like phenotypes in mutant kidneys arise from abnormal signaling specifically within and from renal epithelial cells. Furthermore, the fibrotic response and abnormal increase of cell proliferation precede cyst formation and could be initiated independently of cyst formation. The authors also show that the removal of cilia reduces the severity of Nphp2 phenotypes. The authors suggest that similar to polycystins, NPHP2 inhibits a cilia-dependent cyst and fibrosis-activating pathway. Finally, the histone deacetylase (HDAC) inhibitor valproic acid (VPA) reduces these phenotypes and preserves kidney function in Nphp2 mutant mice, supporting HDAC inhibitors as potential candidate drugs for treating NPHP.

      Overall, understanding the mechanisms driving NPHP phenotypes is important and drugging relevant pathways in treating this disease is an important unmet need in patients. The authors have provided insights into both these aspects in this study. The manuscript is nicely written, and the assays shown are rigorous and insightful.

      We thank the reviewer for the positive comments.

      Reviewer #3 (Public Review):

      In this manuscript, Li et. al, investigate whether epithelial or stromal Nphp2 loss, a gene causative of nephronophthisis (NPHP), drives polycystic kidney disease (PKD) and kidney fibrosis in a novel floxed model of Nphp2. The authors found that only epithelial and not stromal Nphp2 loss results in NPHP-like phenotypes in their mouse model. In addition, the authors show that concurrent cilia, via Ift88 loss, and Nphp2 loss within the kidney epithelium as well as HDAC inhibition results in less severe PKD/kidney fibrosis, as has been shown in mouse models of other non-syndromic forms of PKD, such as autosomal dominant PKD caused by mutations to Pkd1 or Pkd2.

      The authors aimed to understand (1) whether the published NPHP phenotype (kidney cysts and fibrosis), known from the global Nphp2 knockout mouse, is driven by the function of NPHP2 in the kidney epithelium or stromal cells; (2) if kidney fibrosis in NPHP is linked to kidney damage caused by cysts, or independent and preceding of the PKD phenotype; (3) whether cilia are required, causative, or prohibitive of NPHP cystogenesis; and (4) if a broad spectrum HDAC inhibitor is a potential therapeutic approach for NPHP.

      With the provided results, the authors established that epithelial Nphp2 loss is likely a predominant driver of PKD in their model; however, they cannot exclude that stromal NPHP2 does not play a role in cysts growth post-initiation because the authors failed to directly compare their cell type-specific models to a global cre knockout (e.g. Cagg-cre).

      We agree with the reviewer that we cannot rule out the possibility that stromal NPHP2 plays a role post cyst initiation and added "However, our result does not rule out functional significance of interstitial cells once a pro-cystic and fibrotic response is triggered in mutant epithelial cells." to the Discussion section.

      A direct comparison between epithelial specific and global knockout models is an attractive idea, but technically challenging. For an interpretable comparison, it is essential that the stage and knockout efficiency in epithelial cells are equivalent between the two models. However, Ksp-Cre is expressed in the distal nephron specifically, sparing epithelial cells in other segments, while epithelial cells in all segments would be affected by Cagg-Cre. In addition, global knockout of Nphp2 leads to peri-natal lethality. Inducible Cagg-Cre could potentially be used to bypass earlier functional requirements. But matching stage and knockout efficiency in renal epithelial cells between Ksp-Cre and inducible Cagg-Cre mediated knockout remains challenging. These factors make a direct comparison problematic. Finally, our results revealed the role of defective epithelial cells in triggering the phenotypes but did not rule out a role of interstitial cells once abnormal signaling is initiated in epithelial cells. To clarify this point, we added " However, our result does not rule out functional significance of interstitial cells once a pro-cystic and fibrotic response is triggered in mutant epithelial cells." to the Discussion section.

      In addition, it is possible that cyst initiation/growth upon stromal Nphp2 loss occurs substantially slower compared to epithelial Nphp2 loss. However, it seems the authors did not look at kidney phenotypes beyond 28 days of age. Publications from the ADPKD field suggest, that stromal Pkd1 loss initiates cystogenesis much slower than epithelial Pkd1 loss.

      We have expanded our analysis to 8-week-old mice. We now show that Nphp2flox/flox;Foxd1-Cre mice show normal kidney weight, kidney/body weight ratio, kidney function and histology at P56, supporting our original conclusion that deletion of Nphp2 in interstitial cells fails to trigger obvious renal phenotypes, up to young adult stage. These results were presented in Figure 4- figure supplement 1 and the Results section.

      Further, while the authors suggest that kidney fibrosis precedes cyst development, the results supporting this conclusion are limited to one time point, analyzing IF staining of a single marker that can be compared between non-cystic and cystic time points. These analyses need to be extended to make any firm conclusions.

      At the precystic kidney stage (P7), we analyzed SMA and vimentin levels via immunostaining. Their mRNA levels were additionally quantified via RT-qPCR. We have now analyzed vimentin levels at multiple timepoints (P9, 14 and 21) and results were added to Figure 2. Combined, these data support the initiation of a fibrotic response prior to cyst formation.

      The most interesting finding of the manuscript, and likely most impactful to the field, is, that loss of cilia within the setting of epithelial Nphp2 loss reduces PKD severity. This finding parallels published findings for Pkd1 and Pkd2 which are suggested to function in a cilia- dependent cyst-activation mechanism. Unfortunately, the here shown studies, do not add to the mechanistic insight beyond showing the descriptive finding. Most importantly, it remains unclear whether NPHP2 functions in the same pathway as polycystin-1 or -2 (the Pkd1, Pkd2 gene products) or in a separate independent pathway.

      Our Ift88 Nphp2 double mutant results, combined with tissue-specific function of NPHP2, which to our knowledge is completely novel in a NPHP model, suggest that NPHP2 functions as a negative regulator of a profibrotic and pro-cystic pathway that interacts with cilia-mediated signaling in epithelial cells and that abnormal signaling from epithelial cells triggers interstitial fibrosis. We agree with the reviewer that whether NPHP2 functions in the same pathway as polycystins is an interestingly question. However, we feel it is out of the scope of this manuscript and would pursue this research direction in our future studies.

      With respect to the HDAC preclinical studies, the authors show supporting data that a broad- spectrum HDAC inhibitor may be suitable for slowing cyst growth in their model of NPHP. Overall, these studies are not novel to the field, as HDAC inhibition has been shown to slow PKD progression in various models of PKD al while not in NPHP specifically. Further, the studies seem like an add-on, which does not directly link to the prior cell type-specific studies of NPHP2, and no mechanisms linking the two concepts are provided.

      Although we and others showed that HDACIs slow cyst progression in other PKD models, this study is the first to show its impact on a NPHP model. Given the current lack of treatment for NPHP, we feel it important to communicate the results to the research community even though the molecular mechanism remains to be defined.

    1. Author Response

      Reviewer #1 (Public Review):

      The article "Identification of a weight loss-associated causal eQTL in MTIF3 and the effects of MTIF3 deficiency on human adipocyte function" explored the functional roles of MTIF3 during adipocyte differentiation. In persons living with obesity, genetic variation at the MTIF3 locus associates with body mass index and responses to weight loss interventions. MTIF3 regulates mitochondrial protein expression and gene knockouts cause cardiomyopathy in mice. This paper provides insight into the impacts of MTIF3 knockout on adipocyte differentiation and the expression effects of the eQTL on MTIF3 levels. The authors implement a CRISPR/Cas9 gene editing approach coupled with an in vitro platform to detect influences of MTIF3 on adipocyte glucose metabolism and gene expression. This method may serve as a platform to explore knockouts in human cell lines, so it may allow the discovery of new gene x environment influences on in vitro outcomes related to differentiation, growth, and metabolism.

      The conclusions of this paper are mostly well supported by data, but some experimental conditions and data analysis needs to be clarified and extended.

      1) The authors use CRISPR/Cas9 to generate the rs1885988 variant in the human white adipocyte cell line and performed a comprehensive validation analysis of gene editing (Figure 1). qPCR analysis showed reduced MTIF3 expression during human adipocyte differentiation (Figure 1E, F). To expand the importance of the rs1885988 variant, the authors should have provided target gene measurements to verify the canonical differentiation profile (e.g., FABP4, ADIPOQ) and help readers understand the overall impact of gene editing at the MTIF3 locus.

      Thank you for your suggestions. As you requested, we have quantified several adipocyte differentiation markers in the allele-edited cells after 12 days of adipogenic differentiation. The data (Figure 1-figure supplement 1) shows no significant difference between cells with the different genotypes. We have added more information about this in lines 100-101, and also in another context in lines 105-116.

      Notably, the intra-group variation of the marker gene expression is large (Figure 1-figure supplement 1), which makes it difficult to clearly state how much the allele editing, as opposed to random variation resulting from single cell cloning, contributes to the differentiation outcome. However, if we also consider MTIF3 knockout cells (that do not need to be single-cell cloned), their differentiation marker expression also appears unaffected (Figure 3-figure supplement 1). Taken together then, it is unlikely the allele editing with the consequent effect on MTIF3 expression affects adipogenic differentiation in our experiments. We mention the absence of effect of MTIF3 knockout on differentiation in the paragraph starting on line 137.

      2) The direct mechanistic influences of MTIF3 on adipocyte function remain unclear. MTIF3 regulates the translation initiation of mitochondrial protein synthesis. Western blots of OXPHOS proteins do not per se underscore supercomplex formation, which is also a process mediated by MTIF3. Blue native gel electrophoresis may prove a better method to establish the effects of MTIF3 loss-of-function on supercomplex formation.

      As suggested, we have run blue native gel electrophoresis to detect the formation of OXPHOS respiration complexes. In the revised manuscript (lines: 158-168 and Figure 4 E,F), we show how MTIF3 knockout indeed interferes with the complex formation, with lower abundance of complexes V/III2+IV1, III2/IV2 and IV1. Additionally, although the blot signal for complex I+III2+IVn is diffuse, it appears higher in scrambled control cells than in MTIF3 knockout cells. Interestingly, complex II content is slightly higher in MTIF3 knockouts, which may result from a compensatory regulation mechanism, as none of the subunits of complex II is encoded by mitochondrial DNA. We also found several faster-migrating (“undefined bands” in the figure) in the MTIF3 knockout samples, although it is hard to determine whether those are single chain proteins, or degradation or mistranslation products. Overall though, the native gel blots show impaired OXPHOS complex assembly in MTIF3 knockout samples.

      In addition, we performed western blots for other mitochondrial proteins, including COX II (subunit of OXPHOS complex IV), ND2 (subunit of OXPHOS complex I), ATP8 (subunit of OXPHOS complex V), and CYTB (subunit of OXPHOS complex III). The data (Figure 4 A,B), show decreased ND2 and COX II, trending decrease of CYTB, and unaffected ATP8 content in MTIF3 knockout adipocytes.

      The methods (paragraph starting at line 479), results (paragraph starting at line 145), and discussion (lines: 261-263, 274-277) were incorporated in the revised manuscript.

      3) Based on the findings, the authors argue that MTIF3 knockout alters the function of adipocytes. However, many of the experiments show fairly small effect sizes (Figure 5A, Figure 6A). How does the MTIF3 knockout explicitly perform functions related to body weight regulation? Gene editing in vivo would have helped to substantiate the authors' conclusions.

      In the paper we are looking at the consequences of MTIF3 deficiency in one cell type, over short time, in vitro. The outcome of body weight regulation, e.g. during weight loss, would result from long-term effects of MTIF3-altered metabolism in more than one tissue. We envisage that small changes in energy metabolism in not only fat, but also in e.g. muscle, would make a substantial difference over time in vivo (this, we cannot capture in in vitro models). We have added this discussion to lines 294-311.

      As for in vivo genomic editing, the alleles of interest are specific to the human genome. Ideally, a genotype-based recall study in humans would be appropriate, but due to time and resource limitation, we are not able to conduct such a study at the moment (although we certainly hope to perform such a study in the future). As for modeling the MTIF3 deficiency in mice – the MTIF3 knockout mice are not viable [1], and certainly other options (e.g. overexpression, tissue-specific knockouts) are possible and tempting to investigate. This, however, would require considerable additional work which we could only perform in a future project.

      4) In several instances, the authors refer to 'feeding' cells with glucose (line 206, line 171). Feeding experiments often imply complex nutrient interventions in animal models and people, which cannot be easily recapitulated in cell culture. The in vitro experiments simply alter levels of glucose and more precise language would state the specific challenges accurately.

      In the revised manuscript, we have substituted “feeding” for exact glucose concentration, or “glucose concentration” where appropriate. (paragraph starting at line 215, and lines 577-578, 597, 873-879)

      Reviewer #2 (Public Review):

      Huang Mi, et al. investigated the role of MTIF3, the mitochondrial translation initiation factor 3, in the function of adipocytes. They first detected the expression of the obesity-related MTIF3 variants based on the GTEx database and found two variants lead to an increase in MTIF3 expression. Then they knockout MTIF3 in differentiated hWAs adipocytes and characterized the mitochondrial function. They found loss of MTIF3 decrease mitochondrial respiration and fatty acid oxidation. They further treated cells with low glucose medium to mimic weight loss intervention and found MTIF3 knockout adipocytes lose fewer triglycerides than control adipocytes. This paper provides new information about MTIF3 in adipocytes and the potential functional role of MTIF3 in mitochondrial function.

      1) The authors provided sufficient data to show those two genetic variants increase MTIF3 expression. Their CRISPR/Cas9 knockin cell line is also convincing. But they didn't show if the genetic variants affect adipogenesis. Adipogenesis is an important process for weight gain and fat deposition. In lines 103-107, the authors mentioned that the "allele-edited cells have some problem in differentiated state, e.g. triglyceride or mitochondrial content", so they used an inducible Cas9 system. However, the issue of differentiated allele-edited cells may be the functional effect of MTIF3 genetic variants, such as interrupting adipogenesis, decreasing triglyceride, or affecting mitochondrial number. The authors should provide that information.

      Thank you for all your suggestions. We think we were not clear regarding this issue. We did not mean that the allele-edited cells have problem in differentiated state, which then definitely could be (as you point out) due to the functional effect of MTIF3 genetic variants. The problem relates to the process of single-cell cloning itself, which inherently introduces random variation. As a consequence, the data on adipogenic differentiation in allele-edited cells has relatively high intra-group variation. We have added more clarifying text in lines 104-116.

      To provide the data on this, per your request, in the revised manuscript we include the results for the rs67785913-edited cells in Figure 1-figure supplement 1. As shown, we observed no differences in the expression of adipogenic markers (ADIPOQ, PPARG, CEBPA, SREBF1 and FABP4) or in mitochondrial content between the two rs67785913 genotypes. Since the intra-group variation is often high, it is hard to conclude how much the rs67785913 eQTL affects the quantified variables. Much of the variation could instead be ascribed to the effects of single cell cloning.

      The cloning per se introduces random variation, but is required to obtain homozygous allele-edited cells. Because of this dilemma, and to clarify how much MTIF3 expression can actually influence adipogenic differentiation, we have, during the revision, also used the hWAs-iCas9 cells to generate MTIF3 knockouts at the preadipocyte stage and then tested their differentiation capacity. As we show in Figure 3-figure supplement 1, we found no apparent differences in adipogenic marker gene expression between scrambled control and MTIF3 knockout cells (we mention that in lines 137-144). Taken together, our results may indicate that the rs67785913 genotype, through affecting MTIF3 expression, is unlikely to regulate adipogenic differentiation.

      2) In Figure 4, the author mentioned that MTIF3 knockout does not affect the expression of adipogenic differentiation markers. They need to provide more evidence to prove their point. Oil-red O staining is a clearer way to quantify adipocyte differentiation in cell culture. In addition, in Fig. 4B western blot, the author should include MTIF3 as a control to show the knockout efficiency. It is not clear the meaning of plus and minus in that panel. The author should also compare the total triglyceride levels in MTIF3 knockout cells and control cells.

      We have now included Oil-red O staining results and total triglyceride levels (Figure 3 F,G), which show no apparent differences between scrambled control and MTIF3 knockout cells (method: lines 427-431; results: lines 137-144). We also added the MTIF3 blots to figure 4A as a control, showing high and consistent MTIF3 knockout efficiency in independent experiments. In the original manuscript, the plus and minus referred to control and knockout, respectively. To clarify that, we have changed the expression to SC and KO in the revised manuscript.

      With regards to Oil-red O vs. quantification of adipogenic markers, we actually prefer the latter method, as it gives more accurate and less variable results than Oil-red O (at least in the cell line we use). We have, however, performed Oil-red O as well to address your question.

      3) MTIF3 is a translation initiation factor in mitochondria and is involved in the protein synthesis of mitochondrial DNA-encoding genes. The authors should check protein levels rather than the mRNA levels of mitochondrial DNA-encoding genes (Fig. 6E). It's interesting to see the increase of mRNA levels of ND1 and ND2, which might be feedback of lower translation. Since ND1 and ND2 are in OXPHOS complex I, the expression levels of complex I in MTIF3 KO cells would be worth checking. Additionally, the author should also check the mitochondria copy number.

      As suggested, we have detected several mitochondrial encoding proteins which are subunits of each mitochondrial OXPHOS complex. As shown in figure 4A, ND2 (subunit of OXPHOS complex I) and COX II (subunit of OXPHOS complex IV) expression were significantly reduced, CYTB (subunit of OXPHOS complex V) expression tended to decrease, and ATP8 expression was not affected in the MTIF3 knockout adipocytes. We also detected the formation of the OXPHOS respiration complex in extracted mitochondrial proteins and found MTIF3 perturbation affect mitochondrial complex assembly. The detailed methods (lines: 479-490), results (lines: 145-169) and discussion (lines: 260-262, 274-277) were incorporated in the revised manuscript.

      We have also added the mitochondrial copy number data (Figure 3A), showing that MTIF3 knockout has lower mitochondrial content (methods: lines 491-500; results: 156-157)

      4) MTIF3 knockout adipocytes retain more triglycerides under glucose restriction is interesting. It may link to the previous result of lower fatty acid oxidation in MTIF3 knockout adipocytes. However, the authors then showed there is no difference in lipolysis. The author should discuss those results in the manuscript.The authors could also check lipolysis in glucose restriction conditions. It's also necessary to include the triglyceride levels of KO cell lines at full medium

      We have now examined the glycerol release in glucose restriction condition, and found no differences between control and MTIF3 knockouts (Figure 6-figure supplement 1). Interestingly, in 1 mM glucose, both genotypes released less glycerol than at 25 mM glucose, and this has been observed before in SGBS cell line [2] According to your suggestion, we have added the total triglyceride content at 25 mM glucose condition (Figure 6C), which also was not different between control and MTIF3 knockout cells. We speculate the higher retention of triglycerides in the knockouts could be due to higher re-esterification of lipolytically released fatty acids, since, as we observed, fatty acid oxidation is impaired in the knockouts. In the revised manuscript, we added that to the discussion (lines: 289-293).

      References

      1. Rudler, D.L., et al., Fidelity of translation initiation is required for coordinated respiratory complex assembly. Sci Adv, 2019. 5(12): p. eaay2118.
      2. Renes, J., et al., Calorie restriction-induced changes in the secretome of human adipocytes, comparison with resveratrol-induced secretome effects. Biochim Biophys Acta, 2014. 1844(9): p. 1511-22.
    1. Author Response

      Reviewer #2 (Public Review):

      The idea that decidualization is related to or evolved from wound healing, including fibroblast activation, is old, going back all the way to Creighton 1878 who pointed to the similarity between granulation tissue and decidual tissue, and is supported by the fact that embryo implantation is a compensated form of the endometrial lesion. Nevertheless, the mechanistic connection between FB activation and decidualization is an important fact necessary for understanding decidualization, a fact that is reflected in previous work, for instance, Kim et al., 1999 (Hum Reprod 14 Suppl 2), their reference 20, and Oliver et al., 1999 (Humn Reprod 14), their reference 56 a.o.m. More specifically, a recent single-cell study of in vitro decidualization has shown that a myofibroblast-like cell state is a transient state in the process of decidualization, i.e. decidual cells themselves are not so much activated fibroblasts, but rather decidual cells differentiate after endometrial stromal fibroblasts undergo a FB activation like process, and the decidual re-programming happens from these activated FB like states (Stadtmauer et al., 2021, Biol. of Reprod. 1-18).

      Yes, the paper from Stadtmauer DJ and Wagner GP (2022) was cited in revised version.

      The above assessment of how the current study fits into the conceptual landscape of mammalian reproductive biology does not diminish the importance of the paper under consideration. The study contributes a large amount of observational and experimental facts to the understanding of how FB activation and decidualization are related. The authors suggest, in particular, that blastocyst-derived TNF activates the cLPA- producing Arachidonic acid (AA), activating PGI2 and PPARd signaling pathway (more about this later).

      Other major comments:

      The authors suggest that luminal epithelial cells signal through the release of arachidonic acid (AA) in response to TNF. That is interesting and supported by in vitro experiments inducing decidualization and FB activation by AA. What makes this conclusion a little problematic is that it is known that luminal epithelial cells also express COX2/PTGS2 and thus the synthesis of prostaglandins is already starting in the LE and thus LE can also signal to the stoma via PGE2, PGI2 as well as PGL2 rather than AA directly. The in vitro experiments can not exclude the possibility that the ESF is producing some prostaglandin and then having an autocrine effect.

      Yes, we agree with you. It is possible that PGI2 and PGE2 from luminal epithelial cells may also induce fibroblast activation. Based on the data from in situ hybridization, COX-2, mPGES, PGIS and PPARδ are mainly expressed in subluminal stromal cells at mouse implantation site on day 5 of pregnancy (Lim et al, 2000; Ni et al, 2002; Wang et al, 2004). Therefore, PGI2 from stromal cells should be the dominant one compared to that from luminal epithelial cells. In the future, we will examine the effects of AA on COX-2, mPGES and PGIS in luminal epoithelial cells.

      Lim H, Dey SK. PPAR delta functions as a prostacyclin receptor in blastocyst implantation. Trends Endocrinol Metab. 2000 May-Jun;11(4):137-42.

      Ni H, Sun T, Ding NZ, Ma XH, Yang ZM. Differential expression of microsomal prostaglandin e synthase at implantation sites and in decidual cells of mouse uterus. Biol Reprod. 2002 Jul;67(1):351-8.

      Wang H, Ma WG, Tejada L, Zhang H, Morrow JD, Das SK, Dey SK. Rescue of female infertility from the loss of cyclooxygenase-2 by compensatory up-regulation of cyclooxygenase-1 is a function of genetic makeup. J Biol Chem. 2004 Mar 12;279(11):10649-58.

      344: here the authors report that PGE2 has no effect on FB activation marker expression, but the problem with that is, that (at least in human ESF), progesterone is causing a change in the expression of the PGE2 receptors from EP4 to EP2, and it is only the EP2 receptor that activates cAMP/PKA pathway.

      Yes, we agree with you. PGES is highly expressed in stromal cells at implantation site. Previous studies also show that PGE2 is important during decidualization. In our study, PGES showed no significant changes after stromal cells were treated with AA. PGE2 also had no significant effects on fibroblast activation. Therefore, we focused on PGI2-PPAR pathway. It is possible that PGE2 may regulate decidualization through an alternative way rather than fibroblast activation.

      The fact that the authors show an effect of PGI2 is interesting because PGI2 receptors are among the strongest expressed PTG receptors in mammalian ESF. Prostacyclin receptor is a GPCR rather than a nuclear receptor. So the question is really why the authors have not pursued the role of prostacyclin receptor and instead have focused on PPARd?

      Yes, we agree with you. When mouse stromal cells were treated with AA, there was no significant change for the protein level of prostacyclin receptor (Figures 4E, 4F). When mouse stromal cells were treated with the agonist SELEXIPAG of prostacyclin receptor, the markers of fibroblast activation showed lower changes compared with treatments with PPARδ (Figure 3D). Therefore, we focused on PPARδ. Yes, we agree with you. Although prostacyclin receptor is less responsive than PPARδ in activating fibroblast activation, it should contribute to fibroblast activation. In the future, we will pursue the effect of prostacyclin receptor on fibroblast activation. Thank you very much for your suggestion.

      Reviewer #3 (Public Review):

      This manuscript postulates that uterine stroma cells undergo a stage of activation between the resting state and the differentiated decidual state in order to support embryo implantation. Using in vivo mouse and in vitro mouse and human stroma cells they demonstrate that during decidualization the stroma cells express the marker genes for activated stroma. They then trace an axis from the embryo-producing TNF to prostaglandin production and activin A that is required for this process. They propose data to show that activation of the stroma is altered in infertility due to fetal trisomy 16.

      The strengths of this manuscript are:

      1) This is a comprehensive study using both in vivo and in vitro studies and in both mouse and human stroma cells.

      2) The experiments use a combination of ligands, agonists, and inhibitors to map the signaling axis regulating stroma activation.

      3) The data shown support the conclusions in this manuscript.

      The weaknesses of this manuscript are:

      1) The conclusion that Acitvin A is the regulator of stroma activation as mentioned by this manuscript is correlative. What is needed is a knockdown of Activin A and then assess stroma activation to prove Activin A is the major regulator and not one of many TGFb family members.

      Yes, the data from Activin A knockdown were provided.

      2) The use of uterine epithelial cells is problematic. The in vitro co-culture approach is not a state-of-the-art co-culture. Removal of epithelial cells from the uterus results in loss of the epithelial phenotype. If the manuscript used an epithelial organoid stroma cell coculture approach it may better reflect the role of the epithelial cells in this process. Otherwise, it is not clear that the epithelial cells are actual participants in the signaling axis. The treatments could be directly on the stroma cells.

      Yes, we agree with you. According to your suggestions, we established a culture system for epithelial organoid. When the epithelial organoids were treated with TNF, a similar response was obtained compared with in vitro cultured mouse epithelial cells.

      3) Ishikawa cells are endometrial cancer cells. They do not really reflect uterine epithelium and it is not clear that any epithelial cell could be substituted for these cells.

      Thank you very much for your comments. It is true that Ishikawa cells should be different from in vivo endometrial epithelial cells. However, several studies showed that Ishikawa cell line possess apical adhesiveness to JAR trophoblast cells and expresses many of the same enzymes and structural proteins found in normal human endometrium (Castelbaum AJ et al, 1997).. Because both estrogen and progesterone receptors are expressed in Ishikawa cells, Ishikawa cells show a good response to both estrogen and progesterone (Castelbaum AJ et al, 1997). Therefore, Ishikawa cells are used as a model for receptive endometrial epithelial cells (Hannan NJ et al, 2010).

      Castelbaum AJ, Ying L, Somkuti SG, Sun J, Ilesanmi AO, Lessey BA. Characterization of integrin expression in a well differentiated endometrial adenocarcinoma cell line (Ishikawa). J Clin Endocrinol Metab 1997; 82:136-142.

      Hannan NJ, Paiva P, Dimitriadis E, Salamonsen LA. Models for study of human embryo implantation: choice of cell lines? Biol Reprod. 2010; 82:235-245.

      Lessey BA, Ilesanmi AO, Castelbaum AJ, Yuan L, Somkuti SG, Chwalisz K, Satyaswaroop PG. Characterization of the functional progesterone receptor in an endometrial adenocarcinoma cell line (Ishikawa): progesterone-induced expression of the alpha1 integrin. J Steroid Biochem Mol Biol. 1996; 59:31-39.

      4) The activation of stroma cells in the fetal trisomy 16 experiments at the end is very superficial. Data should show that these cells decidualize with decidual markers. This appears to be an experiment to show the translational value of the signaling axis. This experiment, again, is not well developed, does not add much to the manuscript, and should be omitted.

      Yes, we agree with you. The description on human trisomy 16 was deleted.

      In summary, the concept of stroma cell activation as part of decidualization is nicely developed and will add to the field. Normally investigators consider decidualization a mesenchymal to epithelial transition while some consider it stromal activation. This manuscript demonstrates that stroma cell activation is a critical part of the process of decidualization.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors screen large libraries of small proteins to identify three proteins of <50 aa that rescue the growth of an auxotrophic serB deletion Escherichia coli strain. They convincingly show that the growth rescue is due to the small proteins increasing expression of the his operon by reducing transcriptional attenuation. The authors argue that the small proteins function by directly binding the his RNA 5' UTR to alter RNA secondary structure.

      The conclusion that the three small proteins reduce his operon attenuation is well supported by the data. A previous study suggested this mechanism for a somewhat larger, randomly selected protein, but the current study extends this prior work by firmly establishing that the proteins modulate attenuation. The suggestion that the small proteins function by directly binding the his RNA is less well supported by the data. The RNase T1 mapping data are not straightforward to interpret, and there is no assessment of protein-RNA interactions in vivo.

      Major comments:

      1) The RNase T1 probing data are not straightforward to interpret, and hence are insufficient to conclude that Hdp1 binding to the his 5' UTR is the mechanism by which it reduces attenuation. Specifically, G96 has reduced cleavage in the presence of Hdp1, inconsistent with the antiterminator conformation. The authors argue that G96 could be within the site of Hdp1 binding. This is certainly possible but would require additional experimental evidence to draw a confident conclusion. Also, the increased cleavage of bases around the start codon and Shine-Dalgarno sequence is inconsistent with a shift from the terminator to the antiterminator conformation. One confounding issue here is the lack of replicates and the lack of quantification. Additional probes could be tested, which would provide complementary structural information.

      We agree that the RNase T1 probing data alone does not provide sufficient resolution to fully assess changes in terminator/anti-terminator conformations. Therefore, we have clarified our interpretation of the data, addressed its limitations, and have softened the conclusions that can be drawn from it in the text (lines 419-431). We have also included two additional T1 probing experimental replicates in Supplementary Fig. S11 which are in agreement with the cleavage patterns presented in the main text Figure 3D. Based on the revised conclusions and the consistency of the cleavage patterns between the experimental replicates, we do not think that quantification of the probing data would provide any additional information.

      2) There are no experiments to test whether Hdp1 binds the his RNA in vivo. The in vitro data show that Hdp1 can bind the his RNA, but they do not show that this occurs in vivo, or that this is the mechanism by which Hdp1 regulates the expression of the his operon.

      As addressed in the Essential Revisions section, we have now performed and included data from co- immunoprecipitation assays, in which we were able to successfully detect and demonstrate enrichment of his operator-regulated RNA transcripts in HA-tagged Hdp1 pull-down samples. We were also able to demonstrate less enrichment (i.e. reduced interaction/specificity) for thr operator-regulated RNA transcripts in the Hdp1 pull-downs as well as lower enrichment for all his operator-regulated target RNA transcripts in pull-downs performed with the HA-tagged Hdp1 L27Q mutant. These data are presented in Fig. 3A and discussed in lines 313-337.

      Reviewer #2 (Public Review):

      In this work, Babina et al. address a central question in molecular evolution that is only partially answered: how does cellular novelty emerge in evolution? The authors focus here on small proteins, whose importance to various cellular functions has become more appreciated recently. Babina et al. ask if functional small proteins can emerge from random sequences, a question that is mostly unresolved with only a small number of examples in the published literature for such functions. In this study, the authors demonstrate that proteins selected from random, synthetic libraries can rescue auxotrophy in E. coli. Namely, the authors find three small, random proteins (<50 amino acids) that allow E. coli cells with a ΔserB genetic background to grow in a medium without the amino-acid serine. They then show that this rescue is based on the up-regulation of HisB, an enzyme that can compensate for the serB deletion. Finally, using different molecular biology techniques, the authors propose a model in which up-regulation of HisB is achieved by physical interactions between the random proteins and the his operator that regulates the transcription of the his operon in E. coli.

      Notably, as the authors themselves point out, a previous study has already shown that semi-random proteins can result in up-regulation of HisB levels to rescue ΔserB cells. Thus, most of the novelty comes from the attempt to figure out the molecular mechanism of the three random proteins. The idea that a random protein binds the 5' of an mRNA which results in up-regulated expression levels is interesting and can benefit the field. However, some clarification on existing data and additional control experiments are needed to support the authors' claims:

      1) Growth data are not presented in the current form of the manuscript, which makes it impossible to evaluate many of its claims. Especially, the extent of rescue and fitness gain achieved by these random proteins compared to cells harboring the serB gene.

      We thank the reviewer for pointing out this discrepancy. We have now added all relevant growth data under non-permissive conditions (Figure 1G, Supplementary Figures S2, S3, S5) and have also included data on the fitness effects exerted by Hdp expression in cells harboring serB under permissive conditions (LB medium), to allow for comparison with the empty plasmid control strain (Supplementary Figure S1).

      2) The authors have screened their library on other auxotrophic strains, however, they could only find random proteins that rescue growth in the ΔserB background. Currently, they do not address this point, but it might be relevant to the molecular mechanism of those random proteins.

      The reviewer raises an interesting point. We have added a paragraph to our Discussion addressing why we believe that the serB-model with a complementary enzyme is an ideal target for the selection of de novo genes (lines 536-543).

      3) Central to the authors' claims is the up-regulation of HisB, however, they mostly work with an alternative LacZ system to assess the effects of the random proteins on expression. The paper will benefit from some more work measuring actual HisB levels as expressed by the various constructs used along the paper. The authors did provide an important proteomic analysis to show that HisB (along with other proteins in the his operon) is up- regulated as a result of the expression of one of the random proteins. However, it is unclear if the reported ~3- fold increase in HisB levels is enough to allow the growth of ΔserB cells in a medium without serine.

      We thank the reviewer for raising this concern and allowing the opportunity to clarify. It is well established that upregulation of HisB can rescue growth of a SerB-deficient strain on minimal medium (for examples, see Patrick, et al. PMID: 17884825, Digianantonio and Hecht PMID: 26884172). We have now performed additional proteomics analyses that show a specific upregulation of the his operon upon expression of Hdp1 and Hdp3. We have also added a control experiment overexpressing HisB from our expression vector, showing that it restores growth of the auxotrophic ΔserB mutant. It is also clear that histidine starvation itself does not de-repress HisB sufficiently to allow growth of a ΔserB mutant, as this strain does not grow on minimal medium lacking histidine (such as M9 minimal medium that was used for the functional selection in our study). In addition to upregulation of HisB, we show that the rescue is dependent on presence of HisB and provide additional experiments showing a specific interactions in vitro and in vivo of Hdp1 with the his operator RNA. Our results clearly show that rescue depends on HisB and that Hdp expression upregulates HisB, and we do believe our central claim is substantiated beyond reasonable doubt. The reviewer’s main concern, that it is unclear if expression levels of HisB are high enough to allow growth is, in our opinion, resolved by the observation that Hdp-dependent upregulation of HisB does restore growth.

      We respectfully disagree with the reviewer’s suggestion that an exact determination of the level of upregulation is relevant and needed, as outlined above. In addition, we would like to point out that it is not possible to measure HisB upregulation compared to an empty plasmid control strain under non- permissive conditions. Comparing HisB levels in a ΔserB strain expressing Hdp vs. the empty plasmid control in minimal medium is not possible, since the empty plasmid control strain is not able to grow, and the corresponding baseline of HisB expression cannot be determined in a non-growing strain. To circumvent this, we determined HisB levels in rich medium, which does not necessarily reflect the exact amount of upregulation occurring under non-permissive conditions, but still allows us to detect a physiological activity. Alternative experimental setups, such as comparing HisB levels in a strain carrying serB in minimal medium also suffer severe shortcomings as it no longer reflects the cellular physiology of the auxotoph under non-permissive conditions, where growth is dependent on HisB upregulation.

      4) It is unclear how noisy and statistically significant some of the critical experiments in the manuscript are, especially the EMSA and T1-digestion experiments. The authors should try to find a different operator with a similar RNA structure and attenuation function, but a different nucleotide sequence, to the his operator, and show that this control operator is unaffected by the random proteins. Demonstrating the lack of phenotypes using the LacZ system, EMSA experiments, and T1-digestion patterns will much support the authors' claims.

      We thank the reviewer for suggesting this important control and agree that its inclusion significantly strengthens our claims. We used the threonine operon (thr) operator, which is regulated by terminator/anti-terminator formation similar to that of to the his operon with the his operator. We show that Hdp1 does not cause de-repression of this operator using a lacZ reporter construct. Strongly supporting this is the fact that our whole proteome analysis showed specific upregulation of the his operon. Any other off target de-repression would be detected in this assay. Furthermore, we now include the thr operator RNA as a control in the EMSAs, which demonstrates reduced binding with Hdp1 in comparison to the his operator RNA. We also added an in vivo pull-down experiment using tagged Hdp1, showing marked enrichment of his operator-regulated RNA transcripts, whereas the observed enrichment of the control thr RNA transcripts is substantially less.

    1. Author Response

      Reviewer #1 (Public Review):

      Thakkar et al describe the immune effects of 3rd and 4th doses of COVID-19 monovalent vaccines in a diverse cohort of immunocompromised cancer patients. They describe augmentation of anti-Spike antibodies after dose 3, especially seroconversion in 57% of patients, followed by a durable response over six months. The fourth dose was associated with increased anti-Spike antibodies in 67% of patients. T-cell responses were seen in 74% and 94% of patients after the third and fourth doses respectively. Strikingly, neutralization of Omicron was absent in all patients after the third dose but increased to 33% after the fourth dose.

      Strengths:

      Diverse cohort (34% Caucasian, 31% AA, 25% Hispanic 8% Asian) including 106 cancer patients after dose 3, of which 47 patients were longitudinally assessed for six months, as well as eighteen patients assessed after the fourth dose. Seronegative as well as seropositive patients benefit from a third dose of vaccination. Assessment of cellular (T cell) immune responses and viral neutralization against wild-type as well as Omicron variant is commendable.

      Weaknesses:

      The efficacy of the bivalent vaccine (Omicron specific) is not studied here, since the fourth dose of vaccine was a monovalent vaccine. This should be clarified in the discussion.

      We have added text in the discussion section regarding this comment, lines 470-472

      “The bivalent COVID-19 vaccine was introduced after the enrollment for our study was closed however it is reassuring to see that the bivalent vaccine has better neutralization activity against Omicron sub-variants”

      The authors describe an increase in anti-S titers after monoclonal antibodies. Were any of the patients receiving IVIG, and what was the effect, if any on Anti-S antibodies? Characteristics of breakthrough infections, particularly if they had prolonged duration, would be important to include.

      We have added text in the results section for IVIG (lines 382-383) and characteristics of breakthrough infections (lines 341-344)

      “No patients were on intravenous immunoglobulin (IVIG) at the time of study participation” “Of the 4 breakthrough infections, 1 patient had no symptoms, and 3 had mild symptoms”

      Reviewer #2 (Public Review):

      In this manuscript, Thakkar and colleagues evaluate the immunogenicity of 3rd and 4th doses of SARS-CoV2 vaccinations in patients with cancer. The authors find that additional vaccine doses are able to seroconvert a subset of patients and that antibody levels correlate with T-cell responses and viral neutralization.

      The main strengths of this manuscript are:

      1) The authors systemically performed a broad array of immunological assessments, including assessments of antibody levels, T cell activity, and neutralization assays, in a large cohort of patients with cancer receiving 3rd and 4th doses of COVID vaccines.

      2) The authors recruited an ethnically diverse cohort of patients with diverse cancer types, though enrolled participants were enriched for hematological malignancies.

      3) Prior to FDA/CDC guidance supporting a 4th vaccine dose, the authors recruited participants with no or inadequate responses into a prospective clinical trial of a 4th dose, the results of which are outlined here.

      4) The authors' findings that patients with hematologic malignancies and those receiving anti-CD20/BTK inhibitors have lower immunological responses to SARS-CoV-2 vaccines are consistent with multiple prior studies, including prior studies from these authors.

      5) The authors also find that 3rd and 4th COVID vaccine doses are able to seroconvert a subset of patients with no or "inadequate" responses, though it's unclear whether seroconversion is enough for true protection from SARS-CoV-2 infection.

      The main weaknesses of the manuscript include:

      1) The study cohorts disproportionately enrolled patients with hematological malignancies who have been previously shown to mount lower immunological responses to COVID-19 vaccines; thus, the findings may not be representative of a typical oncology patient population.

      We have clarified this in the discussion (lines 465-466)

      “However, caution should be exercised in generalizing these results to the broader immunosuppressed population given the small sample size of our cohort and the disproportionately high representation of hematologic malignancy patients”

      2) The subgroup analyses were relatively small.

      The discussion text in line 464-465 is in concordance with this observation

      “However, caution should be exercised in generalizing these results to the broader immunosuppressed population given the small sample size of our cohort and the disproportionately high representation of hematologic malignancy patients”

      3) The nomenclature used in the manuscript was confusing when it came to "baseline" assessments and boosters versus additional doses of vaccines.

      We have clarified the nomenclature throughout the manuscript

      4) Ultimately, the major limitation of this manuscript is that antibody levels/T-cell responses/neutralization are surrogates for immune protection against SARS-CoV-2, but it's unclear what defines the ideal cutoffs for protection. Simply seroconverting may still be insufficient. The authors don't provide data showing antibody levels as relates to breakthrough infection, likely because they are underpowered for this analysis.

      We have added text to expand on this further lines 475-482

      “Further efforts are also needed to better determine cut-off values at which anti-S antibody levels provide protection from symptomatic COVID-19. At the present time, this data exists only for neutralizing antibody titers[36, 44] and the commercially available anti-S antibody assays are quite heterogenous with efforts being made to improve equivalency in titer reporting[45]. Our study while providing a correlation between anti-S antibody titer and neutralizing antibody titer supports that the higher the titer, the better neutralization is expected and by extrapolation, less likelihood of symptomatic infection however this needs to be confirmed in larger, systematic studies”.

    1. Author Response

      Reviewer #3 (Public Review):

      Zhang, Q. et al. developed a two-photon fluorescence microscope (2PFM) by incorporating direct wavefront sensing adaptive optics (AO), which is optimized for mouse in vivo retinal imaging. By using the same 2PFM with the option of using or not using the incorporated AO system, this team compared the in vivo retinal images and convincingly demonstrated that AO correction acquired brighter and higher resolution images of retinal ganglion cells (RGCs) and their axons in both densely and sparse labeled transgenic mouse lines, normal and defected capillary vasculatures, and RGC spontaneous activities detected by genetic Ca2+ sensor. Interestingly and importantly, this team found that a global correction by removing the common aberration from the entire FOV enhances imaging signals throughout the entire large FOV, indicating a preferable AO imaging strategy for large FOVs. The potential applications of the in vivo retinal imaging techniques and strategies developed by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal vasculatures and neurons during disease progression and before and after treatments. It would be beneficial to the manuscript and the readers if the authors can elaborate on optic design a little bit more. For example, whether the incorporation of AO adversely affects the 2PFM optic design? If the 2PFM can be further optimized by uncompromised optic design without incorporating AO, the quality of in vivo images will comparable to the AO-2PFM or not?

      We thank the reviewer for these thoughtful questions.

      Whether the incorporation of AO adversely affects 2PFM optical design may be a matter of perspective. As we demonstrated in the retina and elsewhere, AO substantially improves the achievable spatial resolution. Its incorporation does not reduce the temporal resolution of the system, as the ocular aberrations are temporally stable in the anesthetized mouse due to the lack of eye movement and do not require repeated aberration measurements throughout the imaging session. Signal enhancement by AO can increase the frame rate by reducing pixel dwell time required to achieve desired signal-to-noise ratio (SNR). The deformable mirror used for wavefront correction has high reflectivity, thus does not reduce the power throughput of the 2PFM. Using similar lenses for conjugation of the AO path to those employed by the 2PFM itself, we also maintain the scanning field of view size.

      However, the incorporation of AO, including the direct wavefront sensing module (the “L10-L11-SH-sensor” path in Fig. 1A) and the deformable mirror (together with a pair of lenses for optical conjugation), does increase the complexity of the imaging system. Maintaining the optimal performance of AO also requires advanced optical knowledge that may not be possessed by most biological users.

      For this reason, we carefully designed the 2PFM path for optimal imaging performance without AO, characterized its performance (“AO two-photon fluorescence microscope (AO-2PFM)” and “System correction” sections of Materials and Methods, Fig. S1), and optimized sample preparation including designing our own contact lens (“In vivo imaging” section of Materials and Methods, Fig. S2). Our efforts, which we believe to have led to the best possible performance of a 2PFM sans AO, allowed us to resolve retinal capillaries and cell bodies (in 2D) in vivo. Therefore, our 2PFM (sans AO) design and sample preparation procedure should benefit users who do not plan to implement AO.

      Hypothetically, if the ocular aberrations of all mouse eyes were similar, it would be possible to add a static corrective element to a conventional 2PFM to improve image resolution (in the same spirit as the non-prescription reading glasses for far-sighted human eyes). However, as shown in Fig. S6 (“Zernike decompositions and corrective wavefronts for all experiments”), ocular aberrations are variable. These variabilities may arise from alignment differences (e.g., different angles between the optical axis of the ocular optics and the optical axis of the 2PFM), which can be minimized by establish a procedure to reproducibly position the eyes of different mice in similar ways. In this case, a static corrective element may be designed for substantial aberration reduction. However, the variations also arise from optical differences in the ages [1] or strains [2] of the mice. To have a 2PFM that always performs at the diffraction limit, an adaptive element as employed by AO is necessary to maintain optimal performance regardless of the specifics of the sample.

      References

      1. C. Cheng, J. Parreno, R. B. Nowak, S. K. Biswas, K. Wang, M. Hoshino, K. Uesugi, N. Yagi, J. A. Moncaster, W.-K. Lo, B. Pierscionek, and V. M. Fowler, "Age-related changes in eye lens biomechanics, morphology, refractive index and transparency," Aging (Albany. NY). 11(24), 12497–12531 (2019).
      2. C. Tan, H. na Park, J. Light, K. Lacy, and M. Pardue, "Strain differences in mouse lens refractive indices when measured with OCT," Invest. Ophthalmol. Vis. Sci. 54(15), 1917 (2013).
    1. Authoor Response

      Reviewer #1 (Public Review):

      This manuscript investigates the question of how polylysogeny impacts competition with a sensitive non-lysogen, and how this is shaped by phage resistance. This is an important and timely question, as lysogeny can be a strategy to invade new niches, and prophages are important vehicles for the acquisition of a range of virulence factors by pathogens including Klebsiella. The authors use a polylysogenic Klebsiella clone in competition with a non-lysogen that is sensitive to at least some of the prophages produced by the polylysogen. They compete these strains over a 30-day period and measure host population dynamics and evolution of phage resistance and lysogenic conversion in the (initially) sensitive competitor. Overall, the experiment shows that lysogen formation is relatively rare and short-lived. Instead, phage resistance through complete loss of the capsule is the primary mechanism evolving, but other resistant capsule mutants, with more subtle mutations affecting capsule expression, emerge as well. The authors have collected a very impressive amount of data and made some very interesting observations.

      My main problem with this paper is that the manuscript lacks a clear narrative, making it very hard to extract the key message this paper wants to convey. Related to this, (some of) the conclusions that the authors make do not appear to be well supported by the data. For example, the authors conclude that selection favours more subtle capsule mutations because they are less costly than capsule-loss mutants (lines 497-500). However, there are no data to support this conclusion, as fitness costs of the various resistance phenotypes analysed were not measured. Apart from the genotypes, the data that are presented in this show that these subtle mutants have more subtle decreases in capsule production compared to the mutants that show a complete loss of capsule. But this does not tell us their relative cost. It also doesn’t tell us how the emergence of these different mutants relates to phage pressure, because whilst bacterial population dynamics data are monitored meticulously, phage dynamics data are missing (I have not found them in the supplemental information either). This makes it impossible to directly relate the emergence of the various resistance mechanisms to phage infection pressure during the coevolution experiment, even though this appears to be a hypothesis the authors wish to test.

      Overall I think the overarching question of the manuscript is important and the model system is a very relevant one to study this question, but in my view, the current data don’t support the conclusions of the paper. Apart from these criticisms, the manuscript is very well written and the figures are overall easy to interpret.

      We thank the reviewer for the critical assessment of our work and the time invested in the process. We have modified our manuscript following the recommendations, provided new data and we are convinced that our main results are now fully supported by the data.

      Reviewer #2 (Public Review):

      This manuscript presents data on multiple experiments regarding the co-evolution of poly-lysogenic and phage-susceptible Klebsiella pneumoniae strains. In particular, the manuscript aimed to determine the mechanisms of resistance that would shape bacterial competition over co-evolutionary timescales. The major finding is that the potential for lysogenization as a phage resistance mechanism is narrow and only likely to occur given certain circumstances. Moreover, the manuscript again reinforces the importance of receptor changes -initially loss, but modification in structure or expression over longer time scales- as a major mechanism of phage resistance that influences bacterial competition.

      Strengths

      A major strength of this manuscript is the care in designing experiments and conducting follow-up experiments to isolate the essential elements to support each of the conclusions. This includes using orthogonal methods such as sequencing and modeling to support or expand the findings from culturing and experimental evolution. The study features results that were beautifully replicated (e.g. Figure 3) lending confidence to the findings.

      Weaknesses

      Two weaknesses of the manuscript in its current form are: 1) a need to discuss other studies that also have found context-dependent results and 2) more focus on delivering the key overall "message" of the paper to the reader. Finally, not a weakness, but a (necessary) limitation is the study system, but this manuscript sets a bar for other groups to test in their systems to probe the generality of the findings.

      The support for the conclusions is compelling. The findings were counter to the initial expectation (lysogenization as a major feature) and the manuscript does an admirable job of supporting the unexpected conclusion with thorough experimental work, supplemented with modeling.

      This manuscript will be of great significance in microbial evolution, both for its implications in limiting the scope of lysogenization as a viable phage resistance mechanism in the long term and for its significant experimental rigor, particularly with regard to the co-evolutionary timescale studied. The study has very important implications for the evolution of antimicrobial resistance and phage therapy.

      We thank the reviewer for the time spent and enthusiasm towards our experimental set-up.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors conducted a thorough analysis of the correlation between height and measures of cognitive abilities (what are essentially IQ test components) across four cohorts of children and adolescents in the UK measured between 1957 and 2018. The authors find the strength of the association between height and cognitive measures declined over this time frame--for example, among 10- and 11-year-olds born in 1958, height explained roughly 3% of the variation in verbal reasoning scores; this dropped to approximately 0.6% among those born in 2001. These associations were further attenuated after accounting for proxy measures of social class.

      The authors' analyses were performed carefully and their observations regarding declining height / cognitive measure associations are likely to be robust if we interpret their results with an important caveat: these results reflect measurements aimed at assessing cognition rather than cognition itself. The importance of this distinction is evidenced by the changing correlation structure of the cognitive measures over time. For example, age 11 verbal / math scores were correlated at >= 0.75 at the first two time points but dropped to 0.33 at the most recent time point. Similar patterns are present for the other cognitive measures and time points. The authors' conclude that such changes are unlikely to impact their primary findings, but I'm less certain. For example, one interpretation of this finding is that older cognitive measures were simply worse at indexing distinct cognitive domains and instead reflected a combination of cognitive ability together with non-specific factors relating to opportunity, health, class, etc. Further, height was historically a stronger proxy for class and economic status than it is today (e.g., by capturing adequate nutritional intake, risk for childhood disease, etc.). Together, then, previously high height / cognitive measure correlations might reflect the fact that both phenotypes previously indexed socio-economic factors to a greater extent than they might today (which is still non-negligible).

      We agree, it is possible that our results could in principle be explained by changes to the measures. We have provided further analysis to attempt to inform the likelihood of this suggestion and have expanded our discussion of this issue (Discussion, explanation of findings section; copied below).

      First, we conducted additional sensitivity analysis repeating our main analysis using cognition measures in which the number of response options was set to be the same for each test (the lowest common denominator across all cohorts). This was tested in two separate approaches: 1) by reducing the number of categories to the same number in each cohort; and 2) or by picking a random sample of question items for each category. Our main findings were unchanged: described in “Additional and sensitivity analyses” section, Figs S20-S21.

      Regarding the suggestion that “high height / cognitive measure correlations might reflect the fact that both phenotypes previously indexed socio-economic factors to a greater extent than they might today” – we sought to account for this by adjustment for measured indicators of socioeconomic position, and found the trend remained after adjustment (Fig 1 panel 2). As in other observational studies we cannot fully rule out the possibility of residual confounding however (Discussion, Explanation of findings paragraph 2).

      “The multi-purpose and multidisciplinary cohorts used cognition tests which differed slightly in each cohort. It is therefore possible that differences in testing could have either: 1) entirely generated the pattern of results we observed, such that if identical tests were used the association between cognition and height would otherwise have been identical in each cohort; in contrast to previous findings which reported using identical tests20; or 2) biased our results, such that if identical tests were used the decline in association between cognition and height would have been less marked than we reported. While we cannot directly falsify this alternative hypothesis given our reliance on historical data sources, a number of lines of reasoning suggest that the first scenario is unlikely. First, our results were similar when using 4 different cognitive tests (spanning mathematical and verbal reasoning); any bias which generated the results we observed should be similarly present across all 4 tests. Other things being equal, one would expect that more discriminatory tests (i.e., those with a greater number of responses) would have higher accuracy and thus better index cognition. Our results were similar when the youngest cohort had similar numbers of unique scores in cognitive tests compared with the oldest cohort (Verbal @ 11 years: n=41 in 1946c, n=40 in 2001c) and fewer unique scores (Maths @ 7/11: n=51 in 1946c, n=21 in 2001c). Our results were also similar in sensitivity analyses in which the number of response options were set to be the same in each cohort. Higher random measurement error in the independent variable (cognition) would lead to weakened observed associations with the outcome (height),52 yet we do not a-priori anticipate that this such error was higher in younger across all tests in such a manner that would have led to the correlation we observed. Ensuring comparability of exposure is a major challenge across such large timespans. Reassuringly, our results are consistent with those from a previous study which reported consistent tests being used (from 1939-1967).20 However, even seemingly identical require modification across time (e.g., for verbal reasoning/vocabulary there is typically a need to adapt question items due to societal and cultural changes over time in vocabulary and numerical use); further, changes to education such as increases in testing may have led to increasing preparedness and familiarity with testing than in the past even where identical tests are used.

      Interestingly, we observed a marked reduction in the correlation between cognitive tests across time (e.g., between verbal and maths scores). This trend has been reported in previous studies53 54 and warrants future investigation; it is consistent with evidence that IQ gains across time seemingly differ by cognitive domain,45 potentially capturing differences across time in cognitive skill use and development in the population. Previous studies using three (1958-2001c) of the included cohorts have also reported changing associations between cognition (verbal test scores at 10/11 years) and other traits: a declining negative association with birth weight19 and a change in direction of association with maternal age (from negative to positive);55 each finding has plausible explanations based on changes across time in relevant societal phenomena (improved medical conditions19 and changes in parental characteristics,55 respectfully), yet also cannot conclusively falsify the notion that differences in tests used influences the results obtained. In this paper, we used multiple tests and sensitivity analyses to attempt to address this.”

      Additionally, their findings add an interesting data point to a collection of recent results suggesting that the relationship between cognitive and anthropometric measures is complex and difficult to interpret. For example, studies using genetic markers to examine shared genetic bases have virtually all relied on methods assuming mating is random, which is not the case empirically. Howe et al. (doi.org/10.1038/s41588-022-01062-7) recently reported that the ostensible genetic correlation of -.32 between years of education and BMI attenuates to -.05 when using direct-effect estimates, which should theoretically be immune to the effects of non-random mating and other confounding variables. Likewise, Keller et al. (doi.org/10.1371/journal.pgen.1003451) and Border et al. (doi.org/10.1101/2022.03.21.485215) used very different approaches to arrive at the same conclusion that ~50% of the nominal genetic correlation between IQ and height could be attributed to bivariate assortative mating rather than shared causal biological factors. Given that assortative mating on both IQ measures and height involves many other traits (not just two as assumed in such bivariate models), the true extent to which height / IQ correlations reflect causal factors is plausibly even lower than these estimates suggest. For these reasons, I do not entirely agree with the authors' review of previous findings in the introduction, where they write "recent studies have suggested that links between higher cognition and taller height can be largely explained by genetic factors", though it is certainly true that this claim has been made.

      We have revised our introduction to better reflect the complexity of previous findings and to note that this claim.

      Reviewer #2 (Public Review):

      The authors use birth cohorts with extensive cognitive assessments and height measurements along with data on parental height and socioeconomic status. The authors estimate that the correlation between height and cognitive ability has approximately halved in the last 60 years.

      Quantile regression results suggest that this is due to a stronger association between low cognitive ability and short stature in older cohorts, potentially due to environmental factors that cause both and that have been removed by improvements in the environment in the last 60 years.

      While this is a plausible hypothesis, the evidence presented in the manuscript is unable to rule out alternative hypotheses, such as changes in assortative mating.

      The results in the manuscript will be of interest to researchers investigating how genetics and environment lead to correlations between cognitive and physical/health traits, and to researchers interested in the relationship between social and health inequalities.

      While my sense of the evidence presented is that there is fairly solid statistical evidence for a trend where the correlation between cognitive ability and height declines over time, there is no formal quantification of this trend nor measurement of the uncertainty in the trend.

      We now include additional statistical tests to compare estimates in each cohort (Fig S6). We have opted to include this in supplemental material given the large number of tests included already.

      Similarly, the quantile regression plots in Figure 2 appear to show a trend across the height deciles for the two oldest cohorts, but no quantification of how strong this is nor what uncertainty exists is calculated. Furthermore, if the apparent trend in the quantile regression plots is true, wouldn't this imply a non-linear association between height and cognitive ability for the older cohorts? Can this be seen in the scatterplots or in a non-linear regression?

      We included 95% confidence intervals in our quantile regression analyses which provide an indication of uncertainty. We believe that given the substantial amount of analyses (across 4 historical cohorts and 4 cognition tests; 23 supplemental results) further work would be best placed to undertake additional statistical exploration of both quantile regression and non-linear associations. We would be happy to reconsider this if requested.

      I think the authors could have done more with their data to investigate the contribution of assortative mating to the observed trend. Looking at Figure S4, it looks like the correlation between mother's education and father's height in the 2001 cohort is substantially lower than for previous cohorts. While cognitive ability may not be available for parents, one could look at, for example, father's education and mother's height across the cohorts and see if there is a downward trend in correlation.

      We now include in Figure S5 cross-cohort investigation of the correlation between parental height and maternal education. We find that the correlation is similar across 1946c, 1958c, and 1970c, yet is weaker in 2001c (Fig S5). We comment on this in the paper (see revised discussion, explanation of findings section). Interpretation of these results is complicated by measurement error in parental education (typically reported for both parents by mothers). Further, interpretation may be further complicated by reductions in the socioeconomic patterning of height across time (see https://www.thelancet.com/journals/lanpub/article/PIIS2468-2667(18)30045-8/fulltext). Future would which focuses on assortative mating could investigate these issues.

      Reviewer #3 (Public Review):

      A difficulty with the paper is the different cognitive tests used in the different cohorts; the authors address this at some length in the discussion. However, I am afraid that this matter makes the results hard or impossible to interpret along the lines of their research question. One would need to know that, if these cognitive tests were administered in a single cohort at one time, they would have the same correlation with height.

      Please see our responses to Reviewer 1 and our revised Discussion. We are reliant upon imperfect historical data to make inferences on long-run trends, in the absence of ideal data for this paper (eg, the same tests used in all cohorts born in 1946, 1958, 1970 and millennium; though even in this instance some changes would be required (eg, to the words chosen in verbal reasoning tasks; see Discussion, explanation of findings section)).

      I judge that the main limitation of the method is the fact that different cognitive tests are used in the different cohorts. The tests in themselves are valid tests of cognitive functions. However, given that the focus of the study is on the change in correlations across time, then it is a worry that the tests are different; that is, the authors have the burden of proving to us that, if the environmental/social changes had NOT been operative across time, then the height-cognitive test correlations would be the same. What can the authors do to prove to us that if, say, all of these different-cohort verbal tests had been given to a single cohort on a single occasion, then they would have the same correlations with height? The same goes for the mathematics based tests. I note the tests' somewhat different distributions in Figure 1, but that is not the only thing that could lead to different correlations with, say, height. I am aware that all cognitive tests tend to correlate positively and that they all have loadings on general intelligence; however, different tests will not necessarily have the same correlations with outside variables (e.g. height). This will depend on things such as their content, their reliability/internal consistency etc.

      In the Results the authors state: "Cognitive test scores were strongly-moderately positively correlated with each other, with the size of the correlation weakening across time." That's true, but perhaps, also a major concern for this study. One possible reason for the decline in verbal-maths test correlations across cohorts (old to recent) is that the nature of these tests has changed across time, either/both in terms of content (what capabilities are assessed) or something such as reliability/internal consistency/ceiling-or-floor effects (how well the capabilities are assessed). That is, given that the height-cognitive test correlations show a similarly declining pattern of correlations over cohorts, it could be that the tests' contents (of the different tests) is partly or wholly responsible. I raise that as a possibility only, and I appreciate that it might be correct, as the authors prefer, that there is an inherent lowering of intelligence-height correlations over time, but I do not think that one can rule out-with the present study's design-that it might have been due to the change in tests. For example, a reading-math correlation of 0.74 in 1946 lowered to a correlation of .32 in 2001, in the face of different tests. To show that this is not due to the different tests being used would require more information. If this is a true result, it is big news.

      Please see our responses to Reviewer 1. This includes additional analysis and an expanded discussion of this possible cause of bias. We hope our manuscript now provides further evidence and discussion to inform the likelihood of this possibility.

      I have a suggestion: if the authors wish to rule out the possibility that the lowering intelligence-height correlations across cohorts are due to different cognitive tests being used, they should take all the cognitive tests used here and apply them cross-sectionally to single-year-born samples (of 11- and 16-year olds) that have also been measured for height. If the cognitive tests all correlate at the same level with height within each of these two samples (they needn't do so across the 11- and 16-year olds), then one could proceed more safely with between-cohorts (1946, 1958, 1970, 2001) comparisons of the correlations.

      We thank the reviewer for this suggestion. However we are unsure that we understood the suggested analysis or whether it was tractable given our data—the cohorts we used were born in either 1946, 1958, 1970, or around 2000. We do not have cross-sectional samples of 11 and 16 year olds at the same time.

    1. Author Response:

      Dear eLife Editorial Board, dear reviewers, dear readers,

      We very much thank the eLife editors and reviewers for their overall very positive review and encouraging assessment of our manuscript, and for highlighting our study’s innovation and relevance for using genomic approaches for the conservation of biodiversity.

      We very much thank the reviewers for pointing out parts of the manuscript that could be described more clearly or in more detail to make the study fully reproducible, and have therefore rewritten parts of the manuscript. We importantly follow reviewer 1’s specific recommendation to focus the main text on clearly understandable results, and therefore now only showcase the application of selective nanopore sequencing (aka adaptive sampling) to one soil sample, which we hope will make the flow of the manuscript easier to understand.

      We further agree that parts of the study could have been conducted more extensively (e.g. include more samples and thereby showcase the broad applicability of the approach), which was unfortunately not feasible since I as the lead author left New Zealand to take up another position abroad. We are, however, following up on this work with another controlled large-scale study.  

      We further agree that both qPCR and metabarcoding have their advantages and disadvantages. Metabarcoding approaches, however, importantly deliver more information about the biodiversity of a location than just the presence of a single species; this, in our case, includes other endangered species and evidence of kākāpō predators. We further show that the chosen marker gene region (12S rRNA) is species-specific enough to distinguish kākāpō from its two closest relatives. While qPCR has been shown to be more sensitive for some species, the difference is often minimal (see e.g., Harper et al., Ecol Evol. 2018 Jun; 8(12): 6330–6341), and for some species has been shown to be equally sensitive (Schneider et al., PLoS ONE 2016, 11, e0162493). qPCR approaches further require the careful design of species-specific primers, and herewith the access to samples and DNA of the target species and of closely related species – all of which are not necessarily at hand, especially not for conservationists who want to use these approaches regularly in the future, and in countries like New Zealand where genomic work with material from any “treasured” species has to be approved in a long and detailed process according to national regulations and the Nagoya Protocol. Given all these reasons, and the general good performance of our metabarcoding approach (also in detecting our species of interest), we do not see the necessity of applying a qPCR approach in this study.

      To avoid any confusion, we now also describe the samplings sites in more detail and use their labels consistently throughout the manuscript. Briefly, the sites were always sampled directly at the site, and at 4m and 20m distance, and all in replicates, as described in detail in the manuscript. Specifically, the “abandoned nests” had only been abandoned ~30 days before sampling, as described in the Methods, and this is why kākāpō DNA is still present.

      We further thank reviewer 2 for suggesting to discuss the impact of selective nanopore sequencing on pore efficiency in more depth, and added a respective sentence to the Discussion. We in general added more references and the broader scientific context to the Discussion.

      Thank you again for this very helpful review of our work.

      With best regards,<br /> Lara Urban

    1. Author Response:

      We are grateful for the detailed feedback provided by the two anonymous reviewers. We provide a point-by-point response to their reviews below:

      Reviewer #1 (Public Review):

      Medwig-Kinney et al perform the latest in a series of studies unraveling the genetic and physical mechanisms involved in the formation of C. elegans gonad. They have paid particular attention to how two different cell fates are specified, the ventral uterine (VU) or anchor cell (AC), and the behaviors of these two cell types. This cell fate choice is interesting because the anchor cell performs an invasive migration through a basement membrane. A process that is required for correct C. elegans gonad formation and that can act as a model for other invasive processes, such as malignant cancer progression. The authors have identified a range of genes that are involved in the AC/VC fate choice, and that imparts the AC cell with its ability to arrest the cell cycle and perform an invasive migration. Taking advantage of a range of genetic tools, the authors show that the transcription factor NHR-63 is strongly expressed in the AC cell. The authors also present evidence that NHR-63 is could function as a transcriptional repressor through interactions with a Groucho and also a TCF homolog, and they also suggest that these proteins are forming repressive condensates through phase separation.

      The authors have produced an extensive dataset to support their two primary claims: that NHR-67 expression levels determine whether a cell is invasive or proliferative, and also that NHR-67 forms a repressive complex through interactions with other proteins. The authors should be commended for clearly and honestly conveying what is already known in this area of study with exhaustive references. But absent data unambiguously linking the formation and dissolution of NHR-67 condensates with the activation of downstream genes that NHR-67 is actively repressing, the novelty of these findings is limited.

      Response 1.1: We thank the reviewer for recognizing the extensive dataset we provide in this manuscript in support of our claims that, (1) NHR-67 expression levels are important for distinguishing between AC and VU cell fates, and (2) NHR-67 interacts with transcriptional repressors in VU cells. We acknowledge that a complete mechanistic understanding of the functional significance of NHR-67 puncta is not possible without knowing direct targets of NHR-67 in the AC. Unfortunately, tools to identify transcriptional targets in individual cells or lineages in C. elegans do not exist, and generation of such tools would be beyond the scope of this work. This is evidenced by the fact that the first successful attempt to transcriptionally profile the AC was only posted as a preprint one month ago (Costa et al., doi: 10.1101/2022.12.28.522136). It is our hope that the findings we present here can be integrated with future AC- and VU-specific profiling efforts to provide a more complete picture of the functional significance of NHR-67 subnuclear organization.

      Reviewer #2 (Public Review):

      Medwig-Kinney et al. explore the role of the transcription factor NHR-67 in distinguishing between AC and VU cell identity in the C. elegans gonad. NHR-67 is expressed at high levels in AC cells where it induces G1 arrest, a requirement for the AC fate invasion program (Matus et al., 2015). NHR-67 is also present at low levels in the non-invasive VU cells and, in this new study, the authors suggest a role for this residual NHR-67 in maintaining VU cell fate. What this new role entails, however, is not clear. The model in Figure 7E shows NHR-67 switching from a transcriptional activator in ACs to a transcriptional repressor in VUs by virtue of recruiting translational repressors. In this model, NHR-67 actively suppresses AC differentiation in VU cells by binding to its normal targets and acting as a repressor rather than an activator. Elsewhere in the text, however, the authors suggest that NHR-67 is "post-translationally sequestered" (line 450) in nuclear condensates in VU cells. In that model, the low levels of NHR-67 in VU cells are not functional because inactivated by sequestration in condensates away from DNA. Neither model is fully supported by the data, which may explain why the authors seem to imply both possibilities. This uncertainty is confusing and prevents the paper from arriving at a compelling conclusion. What is the function, if any, of NHR-67 and so-called "repressive condensates" in VU cells?

      Response 2.1: As the reviewer correctly notes, we present two possible models in this manuscript. The interaction between NHR-67 and the Groucho/TCF complex in the VU cells could (1) switch the role of NHR-67 from a transcriptional activator to a transcriptional repressor, or (2) sequester NHR-67 away from its transcriptional targets. Indeed, we cannot definitively exclude the possibility of either model. In our resubmission, we will attempt to make this more clear in the text and by presenting both possible models in the summary figure (Fig. 7E).

      Below we list problems with data interpretation and key missing experiments:

      1) The authors report that NHR-67 forms "repressive condensates" (aka. puncta) in the nuclei of VU cells and imply that these condensates prevent VU cells from becoming ACs. Fig. 3A, however, shows an example of an AC that also assemble NHR-67 puncta (these are less obvious simply due to the higher levels of NHR-67 in ACs). The presence of NHR-67 puncta in the AC seems to directly contradict the author's assumption that the puncta repress the AC fate program. Similarly, Figure 5-figure supplement 1A shows that UNC-37 and LSY-22 also form puncta in ACs. The authors need to analyze both AC and VU cells to demonstrate that NHR-67 puncta only form in VUs, as implied by their model.

      Response 2.2: The puncta formed by NHR-67 in the AC are different in appearance than those observed in the VU cells and furthermore do not exhibit strong colocalization with that of UNC-37 or LSY-22. The Manders’ overlap coefficient between NHR-67 and UNC-37 is 0.181 in the AC, whereas it is 0.686 in the VU cells. Likewise, the Manders’ overlap coefficient between NHR-67 and LSY-22 is 0.189 in the AC compared to 0.741 in the VU cells. We speculate that the areas of NHR-67 subnuclear enrichment in the AC may represent concentration around transcriptional targets, but testing this would require knowledge of direct targets of NHR-67.

      2) While a pool of NHR-67 localizes to "repressive condensates", it appears that a substantial portion of NHR-67 also exists diffusively in the nucleoplasm. This would appear to contradict a "sequestration model" since, for such a model to work, a majority of NHR-67 should be in puncta. What proportion of NHR-67 is in puncta? Is the concentration of NHR-67 in the nucleoplasm lower in VUs compared to ACs and does this depend on the puncta?

      Response 2.3: The proportion of NHR-67 localizing to puncta versus the nucleoplasm is dynamic, as these puncta form and dissolve over the course of the cell cycle. However, we estimate that approximately 25-40% of NHR-67 protein resides in puncta based on segmentation and quantification of fluorescent intensity of sum Z-projections. We also measured NHR-67 concentration in the nucleoplasm of VU cells and found that it is only 28% of what is observed in ACs (n = 10). We disagree with the notion that the majority of NHR-67 protein should be located in puncta to support the sequestration model. As one example, previously published work examining phase separation of endogenous YAP shows that it is present in the nucleoplasm in addition to puncta (Cai et al., 2019, doi: ​​10.1038/s41556-019-0433-z). In our system, it is possible that the combination of transcriptional downregulation and partial sequestration away from DNA is sufficient to disrupt the normal activity of NHR-67.

      3) The authors do not report whether NHR-67, UNC-37, LSY-22, or POP-1 localization to puncta is interdependent, as implied in the model shown in Fig. 7.

      Response 2.4: It is difficult to test whether localization of these proteins to puncta is interdependent, as perturbation of UNC-37, LSY-22, and POP-1 result in ectopic ACs. Trying to determine if loss of puncta results in VU-to-AC transdifferentiation or vice versa becomes a chicken-egg argument. It is also possible that UNC-37 and LSY-22 are at least partially redundant in this context. We based our model, shown in Fig. 7E, on known or predicted protein-protein interactions, which we confirmed through yeast two-hybrid analyses (Fig. 7D; Fig. 7-figure supplement 1).

      4) The evidence that the "repressor condensates" suppress AC fate in VUs is presented in Fig. 4D where the authors deplete the presumed repressor LSY-22. First, the authors do not examine whether NHR-67 forms puncta under these conditions. Second, the authors rely on a single marker (cdh-3p::mCherry::moeABD) to score AC fate: this marker shows weak expression in cells flanking one bright cell (presumably the AC) which the authors interpret as a VU AC transformation. The authors, however, do not identify the cells that express the marker by lineage analyses and dismiss the possibility that the marker-positive cells could arise from the division of an AC-committed cell. Finally, the authors did not test whether marker expression was dependent on NHR-67, as predicted by the model shown in Fig. 7.

      Response 2.5: For the auxin-inducible degron experiments, strains contained labeled AID-tagged proteins, a labeled TIR1 transgene, and a labeled AC marker. Thus, we were limited by the number of fluorescent channels we could co-visualize and therefore could not also visualize NHR-67 (to assess for puncta formation) or another AC marker (such as LAG-2). We could have generated an AID-tagged LSY-22 strain without a fluorescent protein, but then we would not be able to quantify its depletion, which this reviewer points out is important to measure. We did visualize NHR-67::GFP expression following RNAi-induced  knockdown of POP-1 and observed consistent loss of puncta in ectopic ACs. However, this again becomes a chicken-egg argument as far as whether cell fate change or loss of puncta causes the other.

      5) Interaction between NHR-67 and UNC-37 is shown using Y2H, but not verified in vivo. Furthermore, the functional significance of the NHR-67/UNC-37 interaction is not tested.

      Response 2.6: We attempted to remove the intrinsically disordered region found at the C-terminus of the endogenous nhr-67 locus, using CRISPR/Cas9, as this would both confirm the NHR-67/UNC-37 interaction in vivo and allow us to determine the functional significance of this interaction. However, we were unable to recover a viable line after several attempts, suggesting that this region of the protein is vital.

      6) Throughout the manuscript, the authors do not use lineage analysis to confirm fate transformation as is the standard in the field.

      Response 2.7: The timing between AC/VU cell fate specification and AC invasion (the point at which we look for differentiated ACs) is approximately 10-12 hours at 25 °C. With our imaging setup, we are limited to approximately 3-4 hours of live-cell imaging. Therefore, lineage tracing was not feasible for our experiments. Instead, we relied on visualization of established markers of AC and VU cell fate to determine how ectopic ACs arose. In Fig. 6B,C we show that the expression of two AC markers (cdh-3 and lag-2) turn on while a VU marker (lag-1) get downregulated within the same cell. In our opinion, live-imaging experiments that show in real time changes in cell fate via reporters was the most definitive way to observe the phenotype.

      There are 4 multipotential gonadal cells with the potential to differentiate into VUs or ACs. Which ones contribute to the extra ACs in the different genetic backgrounds examined was not determined, which complicates interpretation. The authors should consider and test the following possibilities: disruption of NHR-67 regulation causes 1) extra pluripotent cells to directly become ACs early in development, 2) causes VU cells to gradually trans-fate to an AC-like fate after VU fate specification (as implied by the authors), or 3) causes an AC to undergo extra cell division(s)?? In Fig. 1F, 5 cells are designated as ACs, which is one more that the 4 precursors depicted in Fig. 1A, implying that some of the "ACs" were derived from progenitors that divided.

      Response 2.8: When trying to determine the source of the ectopic ACs, we considered the three possibilities noted by the reviewer: (1) misspecification of AC/VU precursors, (2) VU-to-AC transdifferentiation, or (3) proliferation of the AC. We eliminated option 3 as a possibility, as the ectopic ACs we observed here were invasive and all of our previous work has shown that proliferating ACs cannot invade and that cell cycle exit is necessary for invasion (Matus et al., 2015; Medwig-Kinney & Smith et al., 2020; Smith et al., 2022). Specifically, NHR-67 is upstream of the cyclin dependent kinase CKI-1 and we found that induced expression of NHR-67 resulted in slow growth and developmental arrest, likely because of inducing cell cycle exit. For our experiment using hsp::NHR-67, we induced heat shock after AC/VU specification. For POP-1 perturbation, we explicitly acknowledged that misspecification of the AC/VU precursors could also contribute to ectopic ACs (Fig. 6A; lines 364-402). We could not achieve robust protein depletion through delayed RNAi treatment, so instead we utilized timelapse microscopy and quantification of AC and VU cell markers (Fig. 6B,C; see response 2.7 above).

      In conclusion, while the authors report on interesting observations, in particular the co-localization of NHR-67 with UNC-37/Groucho and POP-1 in nuclear puncta, the functional significance of these observations remains unclear. The authors have not demonstrated that the "repressive condensates" are functional and play a role in the suppression of AC fate in VU cells as claimed. The colocalization data suggest that NHR-67 interacts with repressors, but additional experiments are needed to demonstrate that these interactions are specific to VUs, impact VU fate, and sequester NHR-67 from its targets or transform NHR-67 into a transcriptional repressor.

      Response 2.9: We agree that, at this time, we cannot pinpoint the precise mechanism through which NHR-67 puncta function (i.e., by sequestering NHR-67 from DNA or switching the role of NHR-67 from activating to repressing). However, identification of NHR-67 puncta and their colocalization with UNC-37, LSY-22, and POP-1 in VU cells allowed us to discover an undescribed role for the Groucho/TCF complex in maintaining VU cell fate. This, combined with our evidence demonstrating that NHR-67 transcriptional regulation is important for distinguishing between AC and VU cell fate, are the main contributions of our study.

    1. Author Response:

      Reviewer #1 (Public Review):

      Vaparanta et al propose a new bioinformatic algorithm for pathway discovery from multi-omics data sources at one time point, and validate some of their algorithm's predictions using functional experiments. The authors should be commended for their detailed experimental work and comprehensive data collection around TYRO3 signaling in melanoma, which will likely be of value to that field. They also provide a mature software package that is well documented for implementing their bioinformatic methods. The reviewer's experience with the software was that it is computationally efficient/fast with well written code. The biological data (both multiomics and functional validation studies) will be of interest to melanoma research as well as scientists interested in TYRO3 signaling.

      The authors wish to thank the Reviewer for the positive comments.

      At this time, however, the bioinformatics algorithm proposed is of unclear utility to the broader multiomics community for the following reasons:

      First, the algorithm itself has numerous hyperparameters, which can make it challenging to use and potentially highly sensitive to these user inputs. Just the regulatory complex inference step has 10 hyperparameters/settings required to be selected.

      We have now reduced the number of parameters in the code by automating the choice for 2 of the parameters. The manuscript is now accompanied by a sensitivity analysis on all the key parameters in the code (new Supplementary Figures 5-11) and we have created a script to inform the choice of the key parameter S (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10). We have additionally thoroughly revised the accompanying documentation in helping the user choose the right settings for their datasets (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3).

      Second, the algorithm is presented in an ad hoc manner without mathematical/statistical justifications of the many design decisions and steps in the analysis. For example, the authors write "The inference of regulatory complexes from the combined score follows the nearest neighbor principle, assuming that while a single high combined score can be random chance, the combination of combined scores between 3 cell signaling molecules would be predictive". It is mathematically unclear that this is true…

      We have now tested the effect of the design decisions of the algorithm on the ability to discover known associations in omics datasets (new Supplementary Figure 4). Adhering to the design decision of the algorithm greatly improves the amount of known associations found in real omics data.

      …and thus this reviewer attempted to test the algorithm using simulated uncorrelated Gaussian noise (see code/outputs at end of the review) in 10K genes and 10 samples using a best attempt at hyperparameter selection per the code comments and documentation. It appears that nearly 1/3 of all genes (i.e., 3205 of 10K) were erroneously grouped into complexes (assuming no mistakes in reviewer's usage of the code). In general, "unbiased" pathway analysis in multiomics that is not relying on prior knowledge will require solving the extraordinarily challenging task of estimating a very large covariance matrix from statistically small sample sizes. This puts the method at high risk of producing spurious results.

      The Reviewer raises an important topic that should be considered in de novo analyses. However, the test dataset the reviewer used is not truly representative of the omics datasets that should be used to evaluate the performance of the algorithm. First, the algorithm should be only used with positive expression values due to the way the stoichiometry score is calculated. This is now more clearly indicated in the accompanying documentation (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3). The Gaussian noise used by the reviewer does not represent any positive expression values of any omics datasets.

      Second, the way the algorithm is constructed it will try to find an association to all features in the dataset if so instructed by the parameters. To this end, we have now added a new parameter (parameter S) into the algorithm to better control this setting. If correctly used in the test dataset used by the reviewer the algorithm now returns 0 complexes. The authors also wish to point out that they strongly believe that the amount of features in the dataset that have no real association with other features in real omics data is very low since most intracellular molecules have common upstream regulators. This poses a problem only if the dataset has a very limited amount of features.

      Third, it seems to the authors that instead of testing the limits of the algorithm with totally randomized data, it would be more valuable to assess whether the algorithm can find true positives among randomized data. To this end we estimated the true positive and false positive rate with normally, negative binomial and beta distributed simulated data (new Supplementary Figures 7-9). Indeed, the algorithm can discover only the true positives among the false positives as long as the S parameter is not set too low. We now provide a separate script (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10) that will help the user to choose the parameter S for their data so that the amount of false positives in the inference is minimized.

      Fourth, the data produced by the standard normal distribution has a relatively low variance, already 68% values fall between -1 and 1 and 95% values between -2 and 2. If you simulate 10000 random rows with a sample size of 10 of such low variance parameter you are at high chance of creating highly correlating rows that actually would be representative of true positives in the dataset due to the generally high variation within omics data. Therefore, it is exceedingly hard to interpret whether the features were erroneously assigned into complexes or not because the chosen simulation method could have by chance created associations that represent true positives in the dataset.

      Fifth, we also analyzed the standard normal distributed simulated data with WGCNA, which is still the most widely used module discovery method. WGCNA assigned almost all the features into modules. However, I think it is clear due to the wide us that the analysis still can offer valuable insight into biological processes. Therefore, the authors are not sure how concerned they should be about the results of this test.

      Third, pathway analysis has long been a bioinformatic goal in the literature, with the authors citing a landmark paper for the WGCNA method from 2008. As such, there are numerous and long-standing discussions in the literature regarding challenges of pathway analysis (i.e., omics data often has dimensionality D far larger than sample size N, and correlation matrix estimation requires D^2 >> N parameters to be estimated) and its potential for spurious correlations. Some authors use sophisticated statistical tools (e.g., "Biological network inference using low order partial correlation" 2014, "Learning Large‐Scale Graphical Gaussian Models from Genomic Data" 2005, "Incorporating prior knowledge into Gene Network Study" 2013) to attempt to deal with this issue.

      The authors agree that if by spurious the Reviewer means non causal indirect associations like in the paper by Zuo et al. (Zuo et al., 2014. Biological network inference using low order partial correlation. Methods 69:266-73. doi: 10.1016/j.ymeth.2014.06.010.), then, indeed, the algorithm has not been designed to find directed networks. Instead, the algorithm has been designed to find common upstream regulators.

      Furthermore, the authors indicate that their approach is the first to attempt pathway analysis in multi-omics setting, stating "Integrative approaches combining more than one robust molecular association measure, however, have not been explored", but one can find attempts such as "An Integrative Transcriptomic and Metabolomic Study of Lung Function in Children With Asthma" to build on WGCNA for work in multiomics datasets.

      Indeed, the Reviewer is correct that correlation networks and WGCNA have been previously used with multi-omics datasets. What the authors meant to convey is that these previous approaches rely only on one measure of molecular association, which in the case of correlation networks is correlation and WGCNA covariation, while our method is the first that combines two measures of molecular association, the correlation and stoichiometry score. We have now amended the sentence in the manuscript (lines 51-52).

      The 2020 review paper "Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources" seems to identify multiple published methods dealing with pathway estimation in multiomics datasets. As the paper stands, this reviewer cannot adequately assess the impact of the proposed bioinformatic algorithm and its results against the existing body of literature for pathway inference.

      We have now benchmarked our method against existing module discovery, network and multi-omics integration methods and provide evidence that our method outperforms these methods (new Figure 4).

      Reviewer #2 (Public Review):

      The authors describe a bioinformatic platform that allows for unbiased pathway analysis from multiomics data. The concept is based on correlation, stoichiometry scores and their combination to evidence interaction between two proteins, transcripts or phosphosites in an omic dataset. This platform was developed and validated on both previously published and in house omics data. I really appreciate that the paper is well written and clear, and I would like to acknowledge the amount of work generated to produce the in-house dataset.

      The authors wish to thank the Reviewer for the encouraging words.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors' conclusions presented herein are supported by a well-established mouse genetic conditional approach and an extensive array of phenotypic analyses.

      Strengths:

      1. The authors utilized well-described genetic tools, AdipoQCre, to target preadipocyte-like progenitor cell populations in bone marrow, as well as Csf1 floxed alleles. They further sifted through the cell population by showing that mature lipid-laden adipocytes express Csf1 at a much lower level, and determined that AdipoQCre-marked progenitor cell population presents a major cellular source of M-CSF,

      2. The reanalysis of published scRNAseq datasets in Figure 1, as well as the following phenotypic analyses of the mutant mice are well-conducted. The analyses include a broad range of experiments both in vivo (3DmicroCT, histology, flow cytometry) and ex vivo (osteoclastogenesis assay in bone marrow cell culture). The confidence of the reported findings is high.

      3. The data presented in this manuscript are of very high quality.

      Weaknesses:

      1. The role of AdipoQ-lineage progenitors as a source of M-CSF is overstated. The authors claim in many instances that "mature bone adipocytes do not express M-CSF", "These cells however do not produce Csf1", "...these peripheral AdipoQ+ cells nearly do not produce M-CSF". However, the authors' qPCR experiments only show four times differences in Csf1 expression. Therefore, the claim that AdipoQ-lineage progenitors are an exclusive source of M-CSF is not well substantiated. In line with this, some of the recent literature reporting conditional deletion of M-CSF in other bone cells (JBMR Plus. 4:e10080., Nature. 590:457-462) are not included.

      We thank the reviewer for this important question. We have performed the below experiments to further clarify and support our conclusion:

      1) We increased the replicates of each group cells in Fig. 3A (the old Fig. 1E) to five/group and based on reviewer 3’ recommendation on housekeeping gene usage, we found that the mRNA expression of Csf1 in bone marrow AdipoQ-lineage progenitor cells is 20-30 fold higher than those in mature adipocytes. This result has been updated in Fig. 3A.

      2) We further performed immunofluorescence staining of M-CSF on bone slices, and found that the majority of bone marrow AdipoQ-expressing progenitor cells express M-CSF (Fig. 3B, 1865 cells out of 2001 cells counted, n=3 mice, 93.2%). In contrast, M-CSF expression was not detected in mature bone marrow adipocytes (Perilipin1+) (Fig. 3C, 0 cells out of 115 cells counted, n=3 mice, 0%), indicating that mature bone marrow adipocytes are unlikely a significant source of M-CSF.

      3) We performed western blot to analyze M-CSF protein expression in peripheral adipose. As shown in Fig. 3D, the stromal vascular fraction (SVF) cells in adipose, which contain multiple cell populations including adipogenic progenitors, express M-CSF. On the contrary, M-CSF was nearly undetectable in the peripheral mature adipocytes isolated from adipose (Fig. 3D).

      These data collectively support that mature adipocytes are not a significant source of M-CSF as evidenced by nearly undetectable M-CSF expression compared to the Adipoq-lineage progenitors. The results were described on pg. 5. However, the reviewer’s comment on ‘exclusive source’ is well taken as osteocytes and osteo lineage also express certain levels of M-CSF. We deleted ‘exclusive source’ in the manuscript, have added relevant literature and discussion in the Results and Discussion section on pp. 5 and 9.

      2. Some of the phenotypic analyses are still incomplete. The authors did not report whether CHet (AdipoQCre Csf1(flox/+)) showed any bone phenotype. Further, the authors did not show that Csf1 mRNA or M-CSF protein is expressed in AdipoQ-lineage progenitors using histological methods. Current evidence is only based on scRNAseq and qPCR of isolated cells. Whether there was any change in circulating bone resorption markers in CKO mice was not shown. Cortical bone parameters were not included in the 3D-microCT analyses. These missing pieces of information would be important to correctly interpret the phenotypes.

      The het mice (Csf1f/+;AdipoQ Cre) do not show abnormal bone phenotype, which is now shown in Fig. 4-figure supplement 4. We performed immunofluorescence staining of M-CSF on bone slices, and found that the majority of bone marrow AdipoQ-expressing progenitor cells express M-CSF (Fig. 3B, 1865 cells out of 2001 cells counted, n=3 mice, 93.2%). We tested serum TRAP level in mice, and found that the Csf1 deficiency in Csf1∆AdipoQ mice significantly decreased the TRAP level in serum, compared to that in the WT control mice (Fig. 5B). Csf1∆AdipoQ mice do not exhibit abnormal cortical bone phenotype. The cortical bone parameters are now included in Fig. 4G.

      3. Which bone marrow cell population(s) are marked by AdipoQCre remain largely unclear. It is possible that AdipoQCre also marks at least part of MSPC-osteo cluster in addition to MSPC-adipo. Adipo-lineage progenitors may not stay entirely as adipoprogenitors and drift toward osteoblasts or their precursor cells.

      We thank the reviewer for the insightful comment on this interesting mystery and complicated question, which is drawing more attention in the field.

      In addition to Adipoq-lineage progenitors, Adipoq Cre also labels other clusters. However, the expression levels of Adipoq and frequency of Adipoq+ cells in other cell populations are relatively low. For example, the integrated scRNAseq dataset we analyzed shows that Adipoq is expressed at a low level (with scaled mean expression at 0.68, (27)) in a small proportion of MSPC-osteo cells (Fig. 1), and small amounts (31, 37) (about 4%) of osteoblasts in 8 or 12-week-old mice are Adipoq-lineage. A recent report found that in 24-week-old mice, about 15-40% of osteoblasts are marked with Adipoq Cre (37). This raises a few important possibilities that will need to be distinguished in future work. One possibility is that the Adipoq-lineage cells (adipo-CAR cells/MALPs) have minor or latent osteogenic potential that may become more evident under specific conditions, such as in older animals. However, balanced against this is the alternative that Adipoq-cre could primarily target a population of solely adipogenic adipo-CAR cells but that its specificity is imperfect, leading to progressive low levels of deletion in a separate population expressing very low levels of Adipoq, such as osteo-CAR cells. An additional possibility is that the Adipoq-lineage cells may themselves actually be further subdivided into multiple component cell types, including a major adipogenic and a separate minor osteogenic subpopulation. Ultimately, at the root of these issues is that Adipoq cre primarily defines one or possibly more lineages of cells rather than a cell type within those lineages. Therefore, application of further markers to fractionate the adipoq-lineage into its component cell types will be needed to resolve these possibilities, focusing on whether any potential osteogenic activity present can be fractionated away from the primary adipogenic activity present.

      Of note, the Adipoq expression level and positive cell proportion are much higher in bone marrow Adipoq lineage progenitors than the levels seen in osteoblast lineage (Fig.1, Fig.2, (22, 27, 31)) or endothelial cells in bone marrow (38, 39). For example, the MSPC-Adipo cluster (Adipoq-lineage progenitors) has 6441 cells with the highest level (scaled mean expression level at 3.01 per (27) at Single Cell Portal) of Adipoq seen among bone marrow cells analyzed. In contrast, the MSPC-osteo cluster consists of 2247 cells with a very low Adipoq expression level (scaled mean expression level at 0.68 per (27) at Single Cell Portal). Taken together with both average expression level and cell numbers in each cluster, the relative overall contribution to Adipoq expression by MSPC-osteo vs the Adipoq-lineage progenitors is 7.8% ((2247 x 0.68)/(6441 x 3.01)). Therefore, the expression of Adipoq in MSPC-osteo cluster is marginal compared to that in the Adipoq-lineage progenitors. These data make Adipoq as an important marker to identify bone marrow Adipoq lineage progenitors. Overall, our work not only validates prior research identifying adipoq-lineage cells, identified as MALPs (22, 31), as a key osteoclast regulatory population, but also further extends the scope of their functions to encompass M-CSF production and regulation of macrophages.

      These points have been added to the Discussion sections on pp. 9-10.

      4. The OVX data in Figure 5 are not very well explained. The data do not seem to support the authors' conclusion that M-CSF deficiency in AdipoQ-lineage progenitors alleviates estrogen-deficiency induced osteoporosis. The CKO mice lose bone mass almost to the same extent as WT mice upon OVX.

      To address the reviewer’s question, we calculated the changes of the uCT parameter values between Sham and OVX groups in the WT control and Csf1∆AdipoQ mice. Significant changes were identified between the control and Csf1∆Adipoq mice in several μCT parameters. For example, a decrease in trabecular BV/TV after OVX: 35.1% in the control vs 20.9% in Csf1∆Adipoq mice; a decrease in Tb. N after OVX:11.34% in the control vs 7.97% in Csf1∆Adipoq mice; a decrease in Conn-Dens after OVX: 39.7% in the control vs 14.56% in Csf1∆Adipoq mice; an increase in Tb. Sp after OVX: 12.51% in the control vs 1.97% in Csf1∆Adipoq mice. These results support our conclusion that M-CSF deficiency in AdipoQlineage progenitors alleviates estrogen-deficiency induced osteoporosis. These value changes have been included in Fig. 7C and discussed on pg. 7.

      Reviewer #3 (Public Review):

      Macrophage colony-stimulating factor (M-CSF) plays key roles in the differentiation of myeloid-lineage cells, including monocytes, macrophages and osteoclasts. The latter mediate bone resorption, which is important for physiological bone remodelling but, unrestrained, contributes to bone loss in conditions such as in post-menopausal osteoporosis. M-CSF production within the bone marrow is implicated in the maintenance of myeloid and skeletal homeostasis, but the cellular source of bone marrow M-CSF has remained elusive. In this study, Inoue et al address this issue through advanced transcriptomic and gene targeting approaches. They conclude that a population of Adipoq-expressing progenitors within the bone marrow, designated "AdipoQ-lineage progenitors", is the key cellular source of M-CSF. Consistent with this, they find that transgenic deletion of M-CSF from these cells disrupts macrophage and osteoclast development, leading to osteopetrosis and possibly preventing bone loss following ovariectomy. However, they have not adequately addressed the possibility that M-CSF production from other cell types, particularly adipocytes in peripheral adipose tissues, may also be influencing these phenotypes. Specific strengths and weaknesses are as follows:

      Strengths:

      1. The manuscript is written in a clear, succinct manner and the data are generally nicely presented. It is therefore a pleasure to read.

      2. The analysis of single-cell transcriptomic data is clear and convincing, and the skeletal phenotyping has been done to a high standard.

      Weaknesses:

      1. The authors underplay the potential contribution of M-CSF production from other cell types, particularly from adipocytes in peripheral adipose tissues. They show that M-CSF expression from these cells is lower than from the bone marrow progenitors that they focus on; however, based on this they allude to "no expression" of M-CSF from these other adipocytes. This overlooks the findings of other studies showing that peripheral adipocytes produce M-CSF and that this has biological functions. Whether their knockout model alters M-CSF expression in peripheral adipose tissue, whether for whole tissue or for isolated adipocytes, has not been tested.

      We performed western blot to analyze M-CSF protein expression in peripheral adipose. As shown in Fig. 3D, the stromal vascular fraction (SVF) cells in adipose, which contain multiple cell populations including adipogenic progenitors, express M-CSF. On the contrary, M-CSF was nearly undetectable in the peripheral mature adipocytes isolated from adipose (Fig. 3D). These data collectively support that mature adipocytes are not a significant source of M-CSF as evidenced by nearly undetectable M-CSF expression compared to the Adipoq-lineage progenitors. However, we understand that current techniques may have limitation in identification of trace amount of M-CSF. We thus deleted descriptions such as ‘exclusive’ or ‘do not produce/express…’ in the revised manuscript.

      2. The decreases in M-CSF have been assessed at the transcript level, but not for M-CSF protein. Whether their knockout model

      We performed immunofluorescence staining of M-CSF on bone slices, and found a drastic decrease in M-CSF protein in bone marrow AdipoQ+ cells in Csf1∆AdipoQ mice compared to the WT control mice. The results are shown in Fig. 4B, and Fig. 3B-D.

      3. It is also unclear if the Adipoq-lineage progenitors consist exclusively of adipogenic cells, or if osteogenic progenitors are also part of this population.

      We thank the reviewer for the insightful comment on this interesting mystery and complicated question, which is drawing more attention in the field.

      In addition to Adipoq-lineage progenitors, Adipoq Cre also labels other clusters. However, the expression levels of Adipoq and frequency of Adipoq+ cells in other cell populations are relatively low. For example, the integrated scRNAseq dataset we analyzed shows that Adipoq is expressed at a low level (with scaled mean expression at 0.68, (27)) in a small proportion of MSPC-osteo cells (Fig. 1), and small amounts (31, 37) (about 4%) of osteoblasts in 8 or 12-week-old mice are Adipoq-lineage. A recent report found that in 24-week-old mice, about 15-40% of osteoblasts are marked with Adipoq Cre (37). This raises a few important possibilities that will need to be distinguished in future work. One possibility is that the Adipoq-lineage cells (adipo-CAR cells/MALPs) have minor or latent osteogenic potential that may become more evident under specific conditions, such as in older animals. However, balanced against this is the alternative that Adipoq-cre could primarily target a population of solely adipogenic adipo-CAR cells but that its specificity is imperfect, leading to progressive low levels of deletion in a separate population expressing very low levels of Adipoq, such as osteo-CAR cells. An additional possibility is that the Adipoq-lineage cells may themselves actually be further subdivided into multiple component cell types, including a major adipogenic and a separate minor osteogenic subpopulation. Ultimately, at the root of these issues is that Adipoq cre primarily defines one or possibly more lineages of cells rather than a cell type within those lineages. Therefore, application of further markers to fractionate the adipoq-lineage into its component cell types will be needed to resolve these possibilities, focusing on whether any potential osteogenic activity present can be fractionated away from the primary adipogenic activity present.

      Of note, the Adipoq expression level and positive cell proportion are much higher in bone marrow Adipoq lineage progenitors than the levels seen in osteoblast lineage (Fig.1, Fig.2, (22, 27, 31)) or endothelial cells in bone marrow (38, 39). For example, the MSPC-Adipo cluster (Adipoq-lineage progenitors) has 6441 cells with the highest level (scaled mean expression level at 3.01 per (27) at Single Cell Portal) of Adipoq seen among bone marrow cells analyzed. In contrast, the MSPC-osteo cluster consists of 2247 cells with a very low Adipoq expression level (scaled mean expression level at 0.68 per (27) at Single Cell Portal). Taken together with both average expression level and cell numbers in each cluster, the relative overall contribution to Adipoq expression by MSPC-osteo vs the Adipoq-lineage progenitors is 7.8% ((2247 x 0.68)/(6441 x 3.01)). Therefore, the expression of Adipoq in MSPC-osteo cluster is marginal compared to that in the Adipoq-lineage progenitors. These data make Adipoq as an important marker to identify bone marrow Adipoq lineage progenitors. Overall, our work not only validates prior research identifying adipoq-lineage cells, identified as MALPs (22, 31), as a key osteoclast regulatory population, but also further extends the scope of their functions to encompass M-CSF production and regulation of macrophages.

      These points have been added to the Discussion section on pp. 9-10.

      If these weaknesses are addressed then this work has potential to yield firm conclusions and new insights into the regulation of myeloid and skeletal homeostasis, both in normal physiology and in clinically relevant conditions.

      Yes, we have addressed the above 3 major questions.

    1. Author Response

      Reviewer #1 (Public Review):

      The current study proposed a drug discovery pipeline to accelerate the process of identifying drug candidates for LCA10 patients using cells from mouse retinal organoid for initial screening, human patient iPSC-derived retinal organoid for further testing, and then mouse mutants for in vivo validation. Reserpine was identified as the top candidate, possibly through modulating proteostasis and autophagy to promote cilium assembly. The study was with high translational value. However, the rationale using dissociated cells from mouse retinal organoid for initial drug screening needs to be justified. In addition, the consistency of phenotypic characteristics in human patient iPSC-derived retinal organoid needs to be reported. It was unclear if the rescued phenotypic changes were from the drug effects or a result of phenotypic variations in organoids.

      We thank the reviewer for the comments and suggestions. Please see the response provided in the “Essential Revisions” earlier. Briefly, the use of single-cell cultures for screening is to compensate for the variations of the Nrl-GFP signal in rd16 organoids so that each compound was present to homogenous cells. In addition, we performed a large-scale screening with 11 concentrations and 2 duplicates of over 6000 compounds. It was thus not feasible to manually perform the screening. We used a semi-automatic electronic dispenser to set up the screens in 1536-well plates and a liquid handling system to add the compounds. Intact mouse retinal organoids are too big to be dispensed and would be damaged during the process. They are also too big to fit into one well of a 1536-well plate or even in a 384-well plate. Therefore, single-cell cultures outweigh intact organoids in this application. We understand the potential pitfalls and thus the positive hits were verified in intact organoids in the secondary assays.

      We have now tested reserpine on retinal organoids derived from 2 clones of each (a total of 4) of LCA1 and LCA2 patients. As suggested by the reviewers, we quantified the fluorescence intensity of rod marker rhodopsin staining in multiple sections of at least two batches of differentiation (Figure 3C and Figure 3—figure supplement 2). Although showing variability as predicted, reserpine treatment significantly increased the fluorescence intensity of rhodopsin in retinal organoids differentiated from multiple lines (Figure 3C), further validating the rescue effect of reserpine.

      Reviewer #2 (Public Review):

      In this manuscript, a drug discovery pipeline was developed using a human iPSC derived organoid-based high-throughput screening platform to be used to identify drug candidates for maintaining photoreceptor survival in LCA10 retinopathies. Reserpine proved effective in patient organoids and in mutant mouse retina in vivo to improve photoreceptor survival and outer segment structure. Protein homeostasis was restored after reserpine treatment by increasing p62 levels, decreasing the 20S proteasome, and increasing proteasome activity. The manuscript is clearly written, contains a large amount of valuable and high-quality data and demonstrates that rebalancing proteostasis can stabilize photoreceptor overall homeostasis in the presence of a mutation that causes retinal degeneration.

      The manuscript may lack functional in vivo data on the treatment by reserpine in RD16 mice such as ERG measurements or other functional tests (the authors also refer to it as future direction). Nevertheless, in my view, the study provides a solid and convincing set of data and substantially advances our understanding on the neuroprotective effects of reserpine beyond the scope of the retina and therefore can be expected to have widespread influence on a readership interested in the principles of neuroprotection rebalancing proteostasis.

      We sincerely thank the reviewer for the positive comments and suggestions. This study has taken many years to materialize. We agree and have now performed full-field electroretinogram (ERG) of untreated and reserpine-treated rd16 retina (as stated in response to an earlier comment). Scotopic a-wave was only marginally increased, yet scotopic b-wave displayed a significant higher amplitude, suggesting improved rod photoreceptor function (Figure 6D).

      Reviewer #3 (Public Review):

      Chen et al. perform an innovative screen using retinal organoids derived from rd16 mice to identify small molecules to treat CEP290 hypomorphic mutations linked to ciliopathies such as LCA. The authors identify reserpine which promotes photoreceptor development and viability in retinal organoids derived from LCA patient iPSCs and rd16 mouse retinas. The authors finally propose a mechanistic model where reserpine restores proteostasis thereby improving ciliogenesis.

      The authors present a highly effective drug screen that utilizes the benefits of retinal organoids while also accounting for the inherent variability of retinal organoids by performing a screen on 2D cultures derived from the organoids. This is an innovated approach to using retinal organoids in drug screens and is of interest to the greater community. The success of the screen is reflected in the effectiveness of reserpine in the in vivo rd16 mouse retinal model where it promotes photoreceptor survival. However there are multiple issues with the LCA patient organoid screen that must be resolved.

      We are grateful to the reviewer for generous comments. We have incorporated the suggestions and performed additional work to resolve the issues, as mentioned earlier in this response as well as below.

      The patient derived iPSC lines are not controlled sufficiently enough to make conclusions stated in the manuscript. The authors rely on single iPSC clones from disease patients to perform experiments, and it is not clear whether karyotyping to validate normal chromosomal integrity was performed. In the case of the RNAseq experiment one patient clone does not show any differences calling into question the findings from the other clone. Patient derived iPSC studies would benefit from the use of multiple independently derived iPSC clones per patient, or rescuing the LCA10 mutation using CRISPR editing to validate the correlation of the mutation with the differences observed.

      This study could be strengthened by parallel RNAseq studies is the rd16 mouse retina and patient iPSC retinal organoids.

      Thanks for the suggestions. As mentioned earlier in “Essential Revisions” and response to other reviewers, we have performed additional experiments using multiple iPSC clones and from three patients (2 each from LCA1 and LCA2). These iPSC lines have been characterized previously (Shimada et al. 2017). We have now provided more details on iPSC derivation, iPSC maintenance, and differentiation. Karyotypes of all human and mouse iPSC lines were provided in Figure 1—figure supplement 1. Retinal organoids were generated using iPSC lines within 10 passages of test cells.

      The purpose of the RNA-seq data is to provide primers on the signaling pathways modulated by reserpine treatment. The rescue effect of reserpine suggests that these pathways might be implicated in disease pathogenesis. Based on our RNA-seq data, we have validated the dysregulation of proteostasis pathway in patient-derived retinal organoids and in vivo rd16 retina. Further investigations are needed to validate other pathways but are beyond the scope of this manuscript. Although RNA-seq studies have advantages, more detailed molecular and functional assays are needed to validate the findings of RNA-seq studies and therefore we argue that performing additional RNA-seq on different clones or cell lines or mouse retina would provide more solid information.

      According to our quantification of rhodopsin staining intensity (Figure 3C and Figure 3—figure supplement 2), LCA1 organoids are more responsive to reserpine compared to LCA2, which is not surprising based on the variations of patient responsiveness to drug treatments in previous clinical studies. We note that reserpine is not a transcription factor, thus the differentially expressed genes in reserpine treatments are secondary effects and the change of gene profiles upon reserpine treatment could vary in time and intensity, which could explain the few differentially expressed genes observed in LCA-2. Nevertheless, the action mechanisms of reserpine we found based on LCA1 could be validated on LCA2 (Figure 5—figure supplement 3), further strengthening our findings.

      The reason why we performed RNA-seq on treated organoids but not treated mice was to identify the signaling pathways modulated by reserpine in a well-controlled manner in order to catch the small changes. Compared to reserpine treatment on organoid cultures, in which the organoids have stable and constant contact with reserpine, intravitreal injection of reserpine into P7 mice is technically challenging and leads to substantial variations. In this case, some small changes might be missed and masked by the variations.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors sought to be able to examine what cellular mechanisms underlie increases in mature blood cell production upon immune challenge. To this end they devised a new in vitro organ culturing system for the lymph gland, the main hematopoietic organ of the fruit fly Drosophila melanogaster; the fly serves as an excellent model for studying fundamental questions in immunology, as it allows live imaging combined with genetic manipulation, and the molecular pathways and cellular functions of its innate immune system are highly conserved with vertebrates.

      The authors provide compelling evidence that the cultured lymph gland shows a similar time scale, dynamics, and capacity for cell division as was observed in vivo, and does not undergo undue oxidative stress in their optimized culture conditions. This technique will prove extremely useful to the large community studying the fly lymph gland, and potentially vertebrate immunologists seeking to expand the models they utilize.

      In these cultured glands, the authors identify progenitors undergoing symmetric cell divisions and provide some evidence that is consistent with, but does not prove, that these two cells maintain their proliferative capacity. They detect equivalent levels in the two equally sized daughter cells of dome-Meso-GFP, a marker for JAK-STAT activity; however, this could be due to an equal inheritance of the protein from the mother, not an equivalent maintenance of a proliferative capacity.

      This is an interesting question. A close look at the our movie (Video 4) of the dome-Meso-GFP marker shows the following sequence of events: the marker is nuclear, the mother cell divides and the nuclear envelope breaks down, cell division is completed, the dome-Meso-GFP re-accumulates at the nucleus of the daughter cells. This sequence of events implies that JAK-STAT is still active in the daughter cells. But as the reviewer points out there is a possibility of inheritance of the signal from the mother. If one of the cells were to differentiate, we would expect two things to occur, a differentiation marker to turn on in one of the daughter cells, and likely a slow decrease in the signal level of dome-Meso-GFP in one of the cells over time. We failed to mention that we accounted for both of those possibilities in our experiments such as the one shown in Video 5. We did this by first, including the eater-dsRed in the genetic background (see Figure 2 figure legend) in which these experiments were undertaken, if differentiation took place dsRed level would go up, an occurrence which we did not observe. Second, long-term tracking of dome-Meso-GFP levels for extended periods of time after completion of cell division did not show divergence or significant decrease of protein levels in the two daughter cells (Figure 2 - figure supplement 2). In any case, to directly make readers aware of this important caveat raised by the reviewer concern we added to the Results section in line 225-230 an explanation mentioning the possibility of inheritance of the marker and why we did not think this was the case.

      The authors develop a technique to conduct tracking of progenitor cell size over time in the cultured lymph glands and identify a switch increase in growth after division, as well as two orientations of the divisions, with the main one occurring 90% of the time.

      They show that bacterial infection results in a significant decrease in the division of Blood progenitors and the elimination of the minor orientation of division, but no obvious change in the rate of division.

      By imaging two markers, Dome-GFP for the progenitor state and Eater dsRed for the differentiated one, they examine the trajectories by which differentiation occurs in the wild-type lymph gland. They describe two main categories of fate transitions. In one that they call linear, the blood cells express high levels of the differentiation marker along with the progenitor marker before turning off the progenitor marker. The dynamics of how these progenitor cells get to the state of expressing both the differentiation and progenitor marker at high levels is not described. In the other, which they call sigmoidal, cells express only high levels of the progenitor marker, and the differentiation marker increases after or as the progenitor marker decreases. The authors show that upon infection there is a large increase in the amount of the linear type of differentiation. But how this change in the type of differentiation upon infection explains the increased amount of differentiation is not clear.

      A potential explanation comes from an aspect of their data that the authors don't comment upon. In their live analysis of lymph glands at a distinct time point in the uninfected state (Fig 7M-N), 95% of the cells they analyze traversing the sigmoidal path are in the intermediate step. This would predict that the cells on this path spend a much longer time stuck in this intermediate state before traversing to the final differentiated one, or that only a small fraction of the cells that become sigmoidal intermediate cells progress onwards to full differentiation. But this does not match the trajectories observed in the real-time analysis for uninfected cultured lymph glands (Fig 7A'-D') marker. Perhaps their algorithm discarded traces from the live imaging in which the differentiation marker did not come up quickly and was thus not analyzed in the trajectories.

      If my interpretation of the single time point analysis is true, this would argue that the linear path is actually much faster/more fruitful than the sigmoidal one and this would explain why a higher level of total progenitor differentiation infection is the result of infection-inducing more differentiation by the linear path. Otherwise, I don't understand how their data explains that observation.

      We understand the reviewer concern here and would like to state categorically that we did not use an algorithm to “discard” traces. As the reviewer outlines there is a large concentration of cells in the Dome-Meso-GFP (low expressing), eater-dsRed (low expressing) state. This is an intermediate state for the sigmoid differentiation trajectory. The reviewer suggests two scenarios to explain this. The first scenario is that this is the slowest (and thus rate limiting) step in the sigmoid differentiation trajectory. But, also as the reviewer notes, our tracking of individual cell trajectories doesn't show that cells spend a lot of time in this state. This leaves the second scenario the reviewer outlines, that only a small fraction of the cells that are in the Dome-Meso-GFP (low expressing), eater-dsRed (low expressing) state go on to differentiate (at least in the larval stage). We favor this model, because it is consistent with our observations, mainly that manipulating the sigmoid pathway is not a good way to induce the production of mature blood cells following infection, compared to manipulating the linear pathway. As the reviewer correctly points out the linear pathway provides a powerful way to change the rate of production of mature blood cells, with a few hours of infection the number of cells that are found in the intermediate state for this trajectory (Dome-Meso-GFP (high expressing), eater-DSred (high expressing)) increases 5-6 times. We now mention this specifically in the Discussion in line 532-539.

    1. Author Response

      Reviewer #1 (Public Review):

      Single-cell sequencing technologies such as 10x, in conjunction with DNA barcoded multimeric peptide MHCs (pMHCs) has enabled high throughput paring of T cell receptor transcript with antigen specificity. However, the data generated through this method often suffers from the relatively high background due to ambient DNA barcodes and TCR transcripts leaking into "productive" GEMs that contain a 10X bead and a T cell decorated with antigen-specific barcoded proteins. Such contaminations can affect data analysis and interpretation and have the potential to lead to spurious results such as an incorrect assessment of antigen-TCR pairs or TCR cross-reactivity. To address this problem, Povelsen and colleagues have described a data-driven algorithm called "Accurate T cell Receptor Antigen Pairing through data-driven filtering of sequencing information from single-cells" (ATRAP) that supplies a set of filtering approaches that significantly reduces background and allows for accurate pairing of T cell clonotypes with cognate pMHC antigens.

      This paper is rigorously conducted and will be useful for the field - there are some areas where further clarifications and comparisons will benefit the reader.

      Strengths:

      1) Povelsen and colleagues have systematically evaluated the extent to which parameters in the experimental metadata can be used to assess the likelihood of a GEM to correctly identify the antigen specificity of the associated T cell clonotype.

      2) Povelsen and colleagues have provided elegant data-driven scoring metrics in the form of concordance score, specificity score, and an optimal ratio of pMHC UMI counts between different pMHCs on a GEM, which allows for easy identification of poor quality data points.

      3) Based on the experimental goals, ATRAP allows for customizable filters that could achieve appropriate data quality while maximizing data retention.

      Weakness:

      1) The authors mention that 100% of the 6,073 "productive" GEMs contained more than one sample hashing barcode, and 65% contained pMHC multiplets. While the rest of the paper elaborates on the steps taken to deal with pMHC multiplets issue, not much is said about the extent of multiplet hashing issue and how was it dealt with when assigning cells to individual donors. How is this accounted for? Even a brief explanation would be beneficial.

      We agree that the issue of multiplet hashing was only very briefly discussed in the manuscript. The reason for this is that although cell hashing multiplets exist for every GEM, it is generally a much simpler issue to solve than pMHC multiplets, because one hashing entry most often has much higher counts compared to the others (see supplementary fig. 3). Moreover, in the experimental design, only one hashing antibody is added to each sample. It is therefore given that only a single hashing signal should be associated with each GEM, i.e. this does not mirror the complex nature of the pMHC data, where cross-reactivity could result in more than one pMHC being a true binder to a given TCR. Given the simplicity associated with the hashing signal, we have here opted for utilizing an existing tool to annotate cell hashing. We have elaborated the description of this in the revised manuscript (line 384).

      2) It would be helpful for the authors to describe how experimental factors such as the quality of the input MHC protein may affect the outputted data (where different proteins may have different degrees of non-specific binding), and to what degree the ATRAP approach is robust to these changes. As an example, the authors mention that RVR/ A03 was present at high UMI counts across all GEMs and RPH/ B07 was consistently detected at low levels. Are these observations the property of the pMHCs or the barcoded dextran reagent? Furthermore, are there differences in the frequency of each of these multimers in the starting staining library which manifests in consistent high vs low read counts for the pMHC barcodes?

      We understand the reviewers' concern. We have extensive experience from staining with large libraries of different pMHCs in a bulk setting (Bentzen et al 2016), where it is part of the routine analyses to include an aliquot of the barcoded pMHC library taken prior to incubation with cells (input sample). From this data, we know that even if pMHCs are present in uneven amounts prior to cell incubation, this unevenness is not translated to the final output. I.e. if a given barcode (associated with a specific pMHC) is present at levels up to 2x higher than the remaining barcodes, this does not result in that barcode also being enriched after cell incubation if T cells do not recognize the corresponding pMHC. And vice versa, a barcode present at lower levels in the input can still be enriched after incubation with cells.. From the same type of data, we also have experience with differences in the background associated with different MHC/HLA molecules, i.e. a general higher level of background related to a certain MHC irrespectively of the peptide bound in this. We agree that this potentially could be a confounding factor influencing our results (as it will influence any other results related to the potential different background signal associated with different MHC/HLA molecules). We are currently in other studies investigating in a broader sense whether these differences reflect a biological inherent MHC association or are experimental artifacts. In the current work, we have opted for not defining pHLA specific UMI count threshold to ensure that any biological relevance remains unmasked, but still ensure that we can at the same time filter the data to identify the most likely true pMHC specific interaction.

      3) It would be helpful for the authors to further explain how ATRAP handles TCRs that may be present in only one (or a small number) of GEMs, as seen in Figure 7b, and potentially for the large number of relatively small clonotypes observed for the RVR/A03 peptide in Figure 6 (it is difficult to know if the long tail of clonotypes for RVR is in the range of 1 or 10 GEMs based on the scale bar). Beyond that, is there any effect on expected (or observed) clonal expansion on these data analyses, for example, if samples are previously expanded with a peptide antigen ex vivo or not?

      ITRAP removes any GEM that does not meet the criteria of the selected filters. Small clones are only removed if all GEMs in a clone fail to meet the selected filter criteria. As ITRAP is based on combinations of filters which are user-defined, one can choose to filter away singlet specificities, i.e. a TCR-pMHC pair only observed in a single GEM. However, this might not be relevant in all cases. We believe that it is a strength of the method that it is flexible and adaptable to the needs of individual users. This also allows for additional filters to be imposed by the user, if one for instance wishes to remove clones of fewer than a certain number of GEMs. With respect to figure 6, we agree that it was difficult to estimate the number of clonotypes within a given peptide plateau, and have updated the figure to include a clonotype count in the x-axis. In relation to the effect on clonotype expansion, we would first like to refer to figure 7. Here, we in figure a) and b) display the observed T cell frequencies towards the individual pMHCs as obtained by the two different experiment approaches a) conventional fluorescent multimer staining, and b) GEMs counts as obtained using the single-cell pipeline described here. This analysis demonstrates a very high concordance between the two approaches of the T cell populations, reflected by the vast majority of the responses detected by fluorescent multimer staining also being captured in the single-cell screening, (recall of 0.95). This result suggests that sensitivity of the SC approach, in the context of the current pMHC epitope set, is comparable to that of conventional fluorescent multimer staining. With regard to clonotype expansion, we would next like to refer back to figure 3. Even though we have not expanded the clones in vitro, this figure shows how the specificity of a TCR clone can be more confidently assigned when there are more GEMs mapped to a given TCR clone. Hence, to identify a single TCR-pMHC match, it could in many cases be valuable to expand a given clone prior to the experiments. However, since the 10x pipeline can only include a limited number of cells, we argue that it is valuable to identify pMHC TCR pairs on unexpanded/unmanipulated material to include as many different pairs as possible.

      4) The authors mention a second method, ICON, for conducting these types of analyses, and that the approach leads to significantly more data loss. However, given there could be differences in dataset quality themselves, and given the dataset, ICON is publicly available, it would be helpful for a more explicit cross-comparison to be conducted and presented as a figure in the paper.

      We have conducted such a comparative analysis in a separate manuscript (available at BioRxiv doi.org/10.1101/2023.02.01.526310). The overall conclusion is that both methods allow for effective denoising of the provided data, with an overall advantage in favor of iTRAP. We have extended the discussion in the current manuscript with a brief summary of the main findings from this study.

      Reviewer #2 (Public Review):

      The study by Povlsen, Bentzen et al. describes certain computational pipelines authors used to analyze the results from a single-cell sequencing experiment of pMHC-multimer stained T cells. DNA-barcoded pMHC multimers and single-cell sequencing technologies provide an opportunity for the high-throughput discovery of novel antigen-specific TCRs and profiling antigen-specific T-cell responses to multiple epitopes in parallel from a single sample. The authors' goal was to develop a computational pipeline that eliminates potential noise in TCR-pMHC assignments from single-cell sequencing data. With several reasonable biological assumptions about underlying data (absence of cross-reactivity between these epitopes, same specificity for different T-cells within a clonotype, more similarity for TCRs recognizing the same epitope, HLA-restriction of T cell response) authors identify the optimal strategy and thresholds to filter out artifacts from their data.

      It is not clear If the identified thresholds are optimal for other experiments of this kind, and how the violation of authors' assumptions (for example, inclusion of several highly similar pMHC-multimers recognized by the same clone of cross-reactive T cells) will impact the algorithm performance and threshold selection by the algorithm. The authors do not discuss several recent papers featuring highly similar experimental techniques and the same data filtering challenges:

      https://www.science.org/doi/10.1126/sciimmunol.abk3070

      https://www.nature.com/articles/s41590-022-01184-4

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184244/

      As described above, we have investigated the use of ITRAP on the large data set provided by 10X Genomics, and here further compared the result to that obtained by ICON in an independent publication [BioRxiv doi.org/10.1101/2023.02.01.526310]. We have included a brief summary of the findings in study in the current manuscript. The overall results and conclusions between the two studies align very well. UMI count filtering and donor-HLA matching are in both cases driving the strongly denoising signal. However, when it comes to the identified UMI thresholds, they were found to differ between the two data sets. As stated above, this we however believe to be a strength of the ITRAP framework, since it demonstrates that the tools can be robustly applied to data originating from very different technical and/or biological settings.

      We acknowledge that ITRAP is highly dependent on the data containing a set of “large” clonotypes for which a single pMHC target can be assigned using the statistical approach outlined in the manuscript. This since the UMI filtering thresholds are defined based on these clonotypes and associated peptide annotations. However, other than this, the method does not exclude identification of cross-reactive TCR (in contrast to for instance ICON). We have expanded the discussion to make this point more clear.

      When it comes to the papers mentioned by the reviewer, these are clearly of high interest to us, and we are currently in the process of analyzing these data using the ITRAP framework. We however believe these analyses are beyond the score of the current publication, in particular since we have conducted the parallel benchmark study on the 10X Genomics data mentioned above.

      Unfortunately, I was unable to validate the method on other datasets or apply other approaches to the authors' data because neither code nor raw or processed data were available at the moment of the review.

      All data sets and code has been made publicly available at https://services.healthtech.dtu.dk/suppl/immunology/ITRAP

      One of the weaknesses of this study is that the motivation for the experiment and underlying hypothesis is unclear from the manuscript. Why these particular epitopes were selected, why these donors were selected, are any of the donors seropositive for EBV/CMV/influenza is unclear. Without particular research questions, it is hard to evaluate pipeline performance and justify a particular filtering strategy: for some applications, maximum specificity (i.e. no incorrect TCR specificity assignments) is crucial, while for others the main goal is to retain as many cells as possible.

      We understand this concern and have elaborate our motivation for the experimental design in the text. The overall motivation for this study was to generate TCR-pMHC data complementing what was available in the public domain at the start of the project. This with the purpose of generating novel data for training of TCR specificity prediction models. This is also the reason why we explicitly “deselected” T cells specific for the 3 negative control peptides, since these already are covered with large amounts of TCR sequences in the public databases.

      We do not know the serostatus of the donors included, but have determined the antigen-specificities present in the donors prior to initiating the study (evaluated for T cell recognition against 945 common viral specificities, using barcoded pMHC multimers in a bulk setting). The 945 peptides were selected from prevalent epitopes within IEDB. This means that the T cell specificities for the donors selected to be included in the current study was known a priori. We have updated the motivation for performing the study (lines 122-126).

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript "Optimal Cancer Evasion in a Dynamic Immune Microenvironment Generates Diverse Post-Escape Tumor Antigenicity Profiles" by George and Levine describes TEAL - a mathematical model for the dynamics of cancer evolution in response to immune recognition. The authors consider a process in which tumor cells from one clone are characterized by a set of neoantigens that may be recognized by the immune system with a certain probability. In response to the recognition, the tumor may adapt to evade immune recognition, by effective removal of recognizable neoantigens. The authors characterize the statistics of this adaptive process, considering, in particular, the evasion probability parameter, and a possibility of an adaptive strategy when this parameter is optimized in each step of the evolution. The dynamics of the latter process are solved with a dynamic programming approach. In the optimal case, the model captures the tradeoff between a cancer population's need for adaptability in hostile immune microenvironments and the cost of such adaptability to that population. Additionally, immune recognition of neoantigens is incorporated. These two factors, antitumor vs pro-tumor IME as quantified by the Beta penalty term, and the level of immune recognition as quantified by the rate q, form the basis of a characterization of tumors as 'hot' or 'cold'.

      I think this framework is a valuable attempt to formally characterize the processes and conditions that result in immunologically hot vs cold tumors. The model and the analytical work are sound and potentially interesting to a major audience. However, certain points require clarification for evaluation of the relevance of the model:

      1) Tumor clonality

      My main concern is about the lack of representation of the evolutionary process in the model and that the heterogeneity of the tumor is just glossed over.

      The single mention of the problem occurs in Section 2, p2: "Our focus is on a clonal population, recognizing that subclonal TAA distributions in this model may be studied by considering independent processes in parallel for each clone."

      I don't think this assumption resolves the impact of tumor heterogeneity on the immune evasion process. Furthermore, I would claim that the process depicted in Fig 1A is very rare and that cancers rarely lose recognizable neoantigens - typically it would be realized via subclonal evolution, with an already present cancer clone without the neoantigens picking up. Similarly, the adaptation of a tumor clone is an evolutionary process - supposedly the subclones that manage to escape recognition via genetic or epigenetic changes are the ones that persist. It is not clear what the authors assume about the heterogeneity of the adapting/adapted population between different generations, n->(n+1). Is the implicit assumption that the n+1 generation is again clonal, i.e. that the fitness advantage of the resulting subclone was such that the remaining clones were eliminated? Or does the model just focuses on the fittest subclone? A discussion on whether these considerations are relevant to the result would clarify the relevance of the result.

      We thank the reviewer for these helpful clarifying points. Empirical evidence in lung cancer exists for genomic changes manifesting as lost neoantigens in treatment-resistant clones (and Anagnostou et al. Cancer Discovery 2017) showed that those lost antigens were also shown to generate functional immune responses). Similar results for melanoma have also been shown (Verdegaal et al. Nature 2016), with loss of neoantigens associated with reactivity in TILs. Recent observations (Jaeger et al. Clinical Cancer Research 2020) even show that mutated peptides may be hid by protein stabilization, in addition to reduced expression patterns. We however do wish to clarify that our model implicitly equates antigen loss and the progression of a subpopulation currently adapted to evade immune targeting – either by direct pruning of the fittest subclone or by stochastic emergence and subsequent growth of a new one lacking the targeted antigens – as equivalent.

      Because we for foundational understanding studied the case where a single clonal signature was tracked in time, we under-explained the implementation of such a model in more complicated cases. As mentioned previously, the next most complicated scenario involves a heterogeneous population of cancer cells with disjoint neoantigen profiles. In this case, a parallel process can be studied wherein the effects of recognition in one environment are decoupled from the other (relevant to, for example, spatially distinct sub-populations). This description however misses the case where such disparate populations evolve to express shared antigens, or in the case where there are both clonal and subclonal antigen targets. Here, our model can still be applied in parallel to study distinct clones but requires additional structure. Namely, in this case we would need to incorporate non-trivial coupling between the possible recognition/selection against certain antigens shared across clones. For example, control of a population with clonal antigens {a,b} but having unique subclones having either antigens {w,x} or {y,z} could be considered by studying the process in parallel, and control in the next periods would require recognition/selection against either 1) at least one of {w,x} and at least one of {y,z}, or 2) at least one of {a,b}. In this more general framework, the arrival of new subclones with distinct features from the parent clone in question could also be incorporated and studied across time periods. This strategy of subdividing more complicated evolutionary structures has now been further elaborated on in the Methods section, and we have expounded these points in the discussion (see additions given under Editor Comment 2).

      2) Time scales

      Section 2, p2: "We assume henceforth that the recognition-evasion pair consists of the T cell repertoire of the adaptive immune system and a cancer cell population, recognizable by a minimal collection of s_n TAAs present on the surface of cancer cells in sufficient abundance for recognition to occur over some time interval n.".

      How do the results depend on the duration of interval n? The duration should be long enough to allow for recognition and, up to some limiting duration, proportional to the TAA recognition probability q. However, it should not be so long that the state of the system can change significantly. A clarification on this point is needed.

      We agree with the reviewer that these points should be elaborated upon when discussing the time interval. Very briefly, we opted for a discrete-time model tracking a cancer population under selective immune pressure. In order for 𝒒 to represent the total recognition probability of an immune system against a particular TAA, the time interval 𝚫𝒏 in question is a coarse-grained feature representing the time between the earliest chance that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape. This time period may vary substantially across cancer subtypes and depends on the cancer per-cell division rate, for example (George, Levine. Can Res 2020). As the reviewer pointed out, in implementing such a model there is an asymmetric risk to considering 𝚫𝒏 too large, as the future state of the system may not be well-reflected by the simple loss and addition of new TAAs. On the other hand, considering small time intervals 𝚫𝒏, while possible, would require the incorporation of additional intermediate states ending in neither cancer elimination nor cancer escape.

      We have clarified the points that the reviewer has brought up by adding them to the discussion section: In this discrete-time evolutionary model, the intertemporal period considered represents the time period between the earliest moment that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape (George, Levine. Can Res 2020). This effectively gives 𝒒 a probabilistic representation for the total rate of opportunity to recognize a given TAA during cancer progression. Implementing this model in cancer subtype-specific contexts thus requires a consideration of the per-cell division rates, for example.

      Reviewer #3 (Public Review):

      Cancer cell populations co-evolve under the pressure exerted by the recognition of tumor-associated antigens by the adaptive immune system. Here, George and Levine analyze how cancers could dynamically adapt the rate of tumor-associated antigen loss to optimize their probability of escape. This is an interesting hypothesis that if confirmed experimentally could potentially inform treatments. The authors analyze mathematically how such optimally adapting tumors gain and lose tumorassociated antigens over time. By simplifying the complex interplay of immune recognition and tumor evolution in a toy model, the authors are able to study questions of practical interest analytically or through stochastic simulations. They show how different model parameters relating to the tumor microenvironment and immune surveillance lead to different dynamics of tumor immunogenicity, and more immunologically hot or cold tumors.

      Simple models are important because they allow an exhaustive study of dynamical regimes for different parameters, such as has been done elegantly in this study. However, in this quest for simplification, the authors have not considered biological features that are likely to be of importance for understanding the process of cancer immune co-evolution in generality: tumor heterogeneity and immune recognition that only stochastically results in cancer elimination. In this sense, this paper might be seen as the opening act in a series of more sophisticated models, and the authors discuss avenues towards such further developments.

      We share the reviewer’s credence in foundational modeling for comprehensive predictions on available dynamical behavior for the important problem at hand. The reviewer also correctly points out that that future model refinement will be needed to further develop the foundational model developed in this work. In an attempt to illustrate one of the more reasonable generalizations, which is to include nontrivial sub-clonal heterogeneity in tumor antigens, we now describe how one would go about enhancing the existing model to address this, which has been added to the Methods and Discussion sections (see additions given under Editor Comment 2).

    1. Author Response

      Reviewer #1 (Public Review):

      N1-methyladenosine (m1A) is a rather intriguing RNA modification that can affect gene expression and RNA stability etc. The manuscript presented the exploration of RNAs m1A modification in normal and OGD/R-treated neurons and the effects of m1A on diverse RNAs. The authors showed that m1 modification can mediate circRNA/LncRNA-miRNA-mRNA mechanism and 3'UTR methylation of mRNAs can disturb miRNA-mRNA binding.

      The manuscript provides evidence for the following,

      1) The OGD/R can have impacts on various functions of m1A mRNAs and neuron fates.

      2) The m1A methylation of mRNA 3'UTRs disturbs the miRNA-mRNA binding.

      3) The authors identified three possible patterns of m1A modification regulation in neurons.

      The main merit of the manuscript is that the authors identified some critical features and patterns of m1A modification and in neurons and OGD/R-treated neurons. Moreover, the authors identified m1A modifications on different RNAs and explored the possible effects of m1A modification on the functions of different RNAs and the overall posttranscriptional regulation mechanism via an integrated approach of omics and bioinformatics. The major weakness of the manuscript is that technique details for many results are missing. Moreover, language inconsistences can be found throughout the manuscript. My general feeling about the manuscript is that some conclusions are rather superficial and therefore require validation and discussion.

      We appreciate your endorsement and constructive opinion concerning our work. Our study provides a comprehensive exploration of the characteristics of m1A modifications in neurons. According to your suggestions, we have specified the technique details in the revised manuscript have included our perspectives on some of the conclusions in the Discussion section. In addition, we have made changes to language inconsistences throughout the manuscript. We hope that the revisions made are acceptable and meet your requirements.

      Reviewer #2 (Public Review):

      In this manuscript, investigators explore the m1A modification, an important post-transcriptional regulatory mechanism, in primary normal neuron and OGD/R treated neuron. As far as I know, the regulatory m1A modification remains poorly characterized in neuron. This is an interesting topic in the context of epitranscriptomics. This paper not only provided us with a landscape of m1A modifications in neuron, but also explored the impact of m1A modifications on the biological functions of different RNA (mRNA, lncRNA, circRNA). In addition, the argument that m1A modification affects miRNA binding to other RNAs is of interest to reader, and the authors have performed a dual luciferase validation here to add feasibility to this conclusion.

      Thank you for your careful review of our study, and thank you for your appreciation on our work. The aim of this work was to explore the characteristics of m1A modification in neurons. We believe that incorporating your advice into the revised manuscript has enhanced the quality of our article.

      Reviewer #3 (Public Review):

      Overall, this is an interesting and well performed study that described a comprehensive landscape of m1A modification in primary neuron and investigated the role of m1A in the circRNA/lncRNA‒miRNA-mRNA regulatory network following OGD/R. The focus on the two different complex regulatory networks for differential expression and differential methylation is important and it will be a valuable resource for the research community that focuses on epitranscriptomics and central nerve system diseases. Collectively, the authors present an exciting piece of work that certainly adds to the literature regarding epitranscriptomic features in neuron. While interesting results obtained and the paper is nicely written, I have the following suggestions for minor revisions to improve the paper.

      We are grateful for your many positive comments and recognition of the potential of our work. Due to your suggestion, we found some shortcomings in our current manuscript. These suggestions were introduced and added value to our article. Our future research will continue to explore some conclusions obtained from this work. And we will continue to contribute our research outcomes in this field. Thank you again for your excellent suggestions!

      1) The authors have explored the role of m1A modification in neuron, but it would have been helpful if the authors described the significance of these findings in depth in some sections (Figure 5 and Figure 6) to enhance the value of the article.

      Thank you for your insightful suggestion. We agree to the comment that the significance of these findings should be described in detail. As such, we have added corresponding content to the Results (line 407-424) and Discussion (line 532-550) sections respectively.

      2) The authors should describe in detail the current research state of m1A modification and the significance of this study to the field of epitranscriptomics in the introduction and Discussion section.

      Thank you for your insightful suggestion. There is relatively little knowledge in the m1A modification area. It is really important to summarize the existing knowledge and research progress in a comprehensive and detailed manner. We conducted a comprehensive latest literature search and added corresponding content to the Introduction (line 78-83) and Discussion section (line 505-511, line 532-562) as you suggested.

    1. Author Response

      Reviewer 1 (Public Review):

      Protein oligomerization is essential to their in vivo function, and it is generally challenging to determine the distribution of oligomeric states and the corresponding conformational ensembles. By combining coarse-grained molecular dynamics simulations and experimental small-angle X-ray scattering profiles at different protein concentrations, the authors have established a robust approach to self-consistently determine the oligomeric state(s) and the conformational ensemble. The approach has been applied specifically to the speckle-type POZ protein (SPOP) and generated new insights into the conformational ensemble and structural features that determine the ensemble. The model was further tested by the analysis of several relevant mutants as well as models with different types of structural restraints. The results also support the isodesmic selfassociation model, with KD values comparable to those measured from independent experiments in the literature. The approach is potentially applicable to a broad set of systems.

      We thank the reviewer for taking the time to assess our work.

      Reviewer 2 (Public Review):

      This manuscript applied the SAXS data analysis of protein selfassembly by implementing the simultaneous fitting of intra- and intermolecular motions/conformations against SAXS data at a series of oligomerization states/concentrations. Despite several major assumptions hinted, a diverse pool of conformational and oligomeric candidates was generated from CG simulations, and more importantly, these candidates were fitted into these SAXS data to reach a reasonable agreement, suggesting a somewhat convergence (even if the ensemble-fitting could well be at a local minimal). This is considered a technical advance, given the fairly large numbers of both the oligomer fraction phi_i (i=1, ..., N) and the conformational weight w_k (k=1, ..., n), where N is the number of oligomers and n is the number of internal conformational states.

      We thank Prof. Yang for taking the time to assess our work.

      Central is optimizing phi_i and w_k, simultaneously. The former has been illustrated in Fig. 4 and SI-Fig. 7 for the total number of 60mers. The latter relies on an overfitting-preventing strategy, as shown in SI_Fig. 1, where an effective fraction cutoff was used from 0.1 to 1.0, as opposed to the number of conformational states. What are the numbers of conformational states for these oligomers? This should be quantifiable, e.g., defining the conformational differences by chi_2.

      The reviewer is correct that the entropy-based term for preventing overfitting is a key aspect of the method. In contrast to some of the other methods to combine experiments with simulations, our approach does, however, not require us to define individual conformational states. Instead, the weights in the entropy term refer to individual configurations rather than states, and we can thus integrate the SAXS experiments and simulations without, for example, clustering the conformations. Indeed, for most of the collective variables that we have calculated from the ensembles, such as the radii of gyration, end-to-end distances, and MATH-MATH distances, we observe continuous monomodal probability distributions, which suggests that it might be difficult to define a few distinct conformational states. For the MATH-BTB/BACK distance, we observe a trimodal distribution, and these distinct conformational states are shown as overlaid structures in Fig. 4i. Thus, while these “states” change populations during reweighting, this is the result from changing weights of the individual configurations.

      Reviewer 3 (Public Review):

      Molecular-level interpretations of SAXS data are challenging, especially for oligomeric systems of variable length with intrinsic flexibility and the possibility of multiple association interfaces. In order to make this challenge tractable, a number of assumptions are made here: 1) There is a single pathway by which individual domains associate first into homodimers and then into longer oligomers; 2) the association kinetics is isodesmic, which allows the direct calculation of oligomer distributions based on the given value of a single dissociation constant; 3) the internal dynamics within dimers is restricted essentially to relative domain-domain motions, that are sampled comprehensively via MD simulations. As a result, excellent fits to the SAXS data are obtained and the underlying conformational ensembles are highly plausible. The resulting models are useful to further understand SPOP function, especially in the context of liquidliquid phase separation.

      We thank the reviewer for taking time to read our work and for their various suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      This work provides a new general framework for estimating missing data on cervical cancer epidemiology, including sexual behavior, HPV prevalence, and cervical cancer incidence. These data are useful to determine impact projections of cervical cancer prevention. The authors suggest a three-step approach: 1) a clustering method applied on registries with an intermediate level of data availability to cluster cervical cancer incidence based on a Poisson-regression-based CEM algorithm, 2) a classification method applied on registries with a low level of data availability to classify cervical cancer incidence based on a Random Forest, 3) a projection method applied on missing data based on the mean of available data. The authors use India as a case study to implement this new methodology. Results indicate that two patterns of cervical cancer incidence are identified in India (high and low incidence), classifying all Indian states with missing data to a low incidence. From this classification, missing data is approximated using the mean of the available data within each cluster.

      A strength of this approach is that this methodology can be applied to regions with missing data, although a minimum set of information is needed. This makes it possible to have individual data for each unit in the region.

      One of the weaknesses of this methodology is the need for a minimum set of epidemiological data to enable impact projections. It is true that when epidemiological cervical cancer data is not available, authors mentioned that general indicators (e.g., human development index, geography) can be used but projections will be probably less realistic. As observed with other techniques, countries with fewer resources have less data available and cannot benefit from these types of techniques to have more adequate guidelines.

      Imputation of missing data is always a challenging issue. The technique proposed in this manuscript is an interesting new approach to missing data imputation that could be applied with a minimum set of available data. However, we must focus on obtaining reliable data from each region of the world to help local health authorities implement better preventive measures for the local population.

      We thank the reviewer for the considerate comments and suggestions and have tried to incorporate them as much as possible in the revised manuscript.

      As the reviewer has pointed out, the applicability of the proposed methodology depends on the available data. In our opinion, it is a general challenge for approximating missing data, rather than a weakness particular to our methodology. In fact, we believe that our framework is flexible to address missing data in many situations. To clarify this point, we have included the following sentences in the Discussion (lines 363-376, page 18): “It is important to note that, in general, the applicability the proposed framework depend on the actual amount of data available. However, in our opinion, it is a general challenge for approximating missing data, rather than a weakness particular to our methodology. By allowing possible adaptations, we believe that our framework is sufficient flexible to address missing data in many situations.”

      Finally, we fully agree with the reviewer that we should continue our effort to collect more data for countries where these are not available. The proposed framework should be considered as a solution to the situation in which collection of additional data is not or not yet possible.

      Reviewer #2 (Public Review):

      The burden of cervical cancer worldwide is well recognized. While prevention strategies, including vaccination against human papillomavirus (HPV), cervical cancer screening, and pre-cancer treatment, can reduce the burden of cervical cancer, access to these measures is still limited, especially in low- and middle-income countries. Since the impact of prevention strategies is heavily dependent on the disease's burden on a particular population, we need to know the latter to assess the impact of these context-specific prevention strategies.

      However, epidemiological data on cervical cancer are not always available for all geographical areas. This paper uses India as a case study to propose a framework called "Footprinting" to comprehensively evaluate the burden of cervical cancer. The authors applied a three-step analytical strategy to impute cervical cancer epidemiological data in states where this information was unavailable using data from cervical cancer incidence, HPV prevalence, and sexual behaviour from other regions. The findings suggest a high and low incidence of cervical cancer incidence in different parts of India; all Indian states with missing data were classified as low incidence.

      The proposed analytical strategy presents an important solution for imputing data from geographic areas of a country where data are missing.

      We thank the reviewer for the considerate comments and suggestions and have tried to incorporate them as much as possible in the revised manuscript.

      One conceptual limitation of this work is the lack of explanation or evidence that sexual behaviour can be used to approximate cervical cancer and/or HPV rates.

      A similar comment was raised by Reviewer #1. It is well established that sexual contact is the only transmission route of carcinogenic HPV infection, and hence necessary for the occurrence of cervical cancer [ref #26 Vaccerella 2006, Muñoz 1992 Int J Cancer 52, 743-749].

      We have included sexual behaviour variables that have previously been shown to be risk factors of HPV infection and cervical cancer risk, e.g., age of sexual debut and number of sexual partners [ref #26 Vaccerella 2006, ref #27 Schulte-Frohlinde 2021]. Furthermore, we used variables that are commonly available so that the analyses can be easily applied to other settings.

      As far as we know, there is no established set of sexual behaviour variables for predicting the patterns of HPV prevalence and cervical cancer incidence. The good prediction performance in the India case study shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.

      To clarify these points we have included the following paragraph in the Discussion (lines 319-325, page 16): “In our analysis of classifying clusters of cervical cancer incidence, we only included some of the sexual behaviour variables available in the NACO report [15]. We selected variables that were previously shown to be risk factors of HPV infection and cervical cancer risk and that are commonly available so that the analyses can be easily applied to other settings, e.g., age of sexual debut and number of sexual partners [26, 27]. As far as we know, there is no established set of sexual behaviour variables for predicting the patterns of HPV prevalence and cervical cancer incidence. The good prediction performance shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.”

      Also, full information on the three main indicators is only available in two states. This is used to impute the values for the other states.

      Indeed, HPV prevalence data were only available for two states. While we acknowledge that this affects the certainty in the imputed HPV prevalence, we considered the imputed results to be satisfactory based on the good accordance with the cervical cancer incidence data we found in the validation step (lines 286-23, page 14). We verified that the ratio of HPV prevalence between the high-and low-incidence cluster (1.7-fold) was very similar to the ratio of age-standardized cervical cancer incidence (1.9-fold).

      Furthermore, we note that previous modelling works on India relied on even less data, namely one source of HPV prevalence and cervical cancer incidence data [ref #29 Brisson 2020, Diaz 2008 Br J Cancer].

      Moreover, the available data used in this study also present some limitations; for example, cervical cancer incidence data were from 2012 to 2016, while sex behaviour data were from 2006. This large gap is likely to have a significant cohort effect, especially given changes in sexual norms in Western countries over the last few decades, which may have gradually influenced other countries, especially in this age of the internet and social media.

      In our opinion, for the purpose of modelling the natural history of cervical cancer, it is not necessarily more adequate to use the most recent data of sexual behaviour data. Arguably, as sexual behaviour is the “exposure” for the “outcome” cervical cancer, calibration of HPV transmission and cervical cancer model is best done with data of sexual behaviour and cervical from the same cohorts, hence, sexual behaviour data from an earlier period than the cervical cancer data.

      In addition, if changes of sexual behaviour occur across the country, it should not affect the clustering much.

      Finally, due to delay in reporting, cervical cancer incidence from the period 2012-2016 is the most recent edition at the moment of writing. Regarding sexual behaviour data, there is at the moment no later edition of the NACO report published after that of year 2006.

      Finally, it would be interesting to validate this methodology to confirm its utility.

      We agree that it would be very interesting to validate this proposed methodology in other regions. Unfortunately, it was beyond the scope of this work. Currently, we are working on a project in which we try to apply footprinting to a collection of low- and middle-income countries.

      The proposed framework's strength is difficult to evaluate because the steps and justification for the model variables were not clearly presented, nor were the models validated.

      We acknowledge that the framework could be more clearly presented and have added additional explanation in the following places to do so:

      • Concerning the framework steps, in Method (144-163, pages 7-8): “For convenience of explanation, we assumed earlier that data availability occurs hierarchically. However, the framework can also be applied with less stringent data requirements. First, the source of Footprint data needs not necessarily cover all geographical units. It is still possible to train a classifier in the classification step with Footprint data available for only a part of clustered geographical units. Second, if none of the key cervical cancer epidemiological data (sexual behavior, HPV prevalence, and cervical cancer incidence data) have large enough coverage to serve as Footprint data, alternatives indicators of similarity, such as human development index and geographical distance, could also be used as substitute. However, the resulting classification performance might be suboptimal, as we expect these indicators to correlate less well with cervical cancer risk. Third, for the projection step, data of cervical cancer incidence, sexual behavior, and HPV prevalence needed for calibration of projection models need not necessarily belong to the same geographical unit. Calibration can be performed as long as the three types of data are available within each cluster.

      With these less stringent data requirements, the proposed framework should sufficient flexible to be applied to many situations. However, one should still be cautious in applying the framework when there are little data. This means that, in some cases, we might need to exclude from the analysis some geographical units with too little data or redefine bigger geographical units if the data are not granular enough. Furthermore, we should assess the goodness-of-fit of the obtained clustering, performance of classification, correlation of data within different clusters, and calibration fits to ensure the validity of the final impact projections.”

      • Concerning selection of model variables (lines 319-325, page 16): “In our analysis of classifying clusters of cervical cancer incidence, we only included some of the sexual behaviour variables available in the NACO report [15]. We selected variables that were previously shown to be risk factors of HPV infection and cervical cancer risk and that are commonly available (e.g., age of sexual debut and number of sexual partners) so that the analyses can be easily applied to other settings [26, 27]. In the India case study, the good classification performance shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.”

      Based on the authors' interpretation of the framework findings, this framework may help extrapolate data from one country to another. I'm curious as to whether this framework could be applied across states and countries.

      We thank the reviewer for this comment. Currently, we are working on a multi-year projects in which we try to apply the framework to all low- and middle-income countries.

    1. Author Response:

      eLife assessment

      This work is an attempt to establish conditions that accurately and efficiently mimic a drought response in Arabidopsis grown on defined agar-solidified media - an admirable goal as a reliable experimental system is key to conducting successful low water potential experiments and would enable high-throughput genetic screening (and GWAS) to assess the impacts of environmental perturbations on various genetic backgrounds. The authors compare transcriptome patterns of plant subjected to water limitation imposed using different experimental systems. The work is valuable in that it lays out the challenges of such an endeavor and points out shortcomings of previous attempts. However, a lack of water relations measurements, incomplete experimental design, and lack of critical evaluation of these methods in light of previous results render the proposed new methodology inadequate.

      We thank eLife for the initial assessment and comments to our work. In our revised manuscript we plan to address the main concerns raised by reviewers. Specifically, we plan to perform water relations measurements for all our treatment assays, as well as explore the separate effects agar hardening and nutrient concentration have in our low-water agar assay. We will also provide a more in depth critical review of our results compared to previously published results.

      Reviewer #1 (Public Review):

      High-throughput genetic screening is a powerful approach to elucidate genes and gene networks involved in a variety of biological events. Such screens are well established in single-celled organisms (i.e. CRISPR-based K/O in tissue culture or unicellular organisms; screens of natural variants in response to drugs). It is desirable to extend such methodology, for example to Arabidopsis where more than 1000 ecotypes from around the Northern hemisphere are available for study. These ecotypes may be locally adapted and are fully sequenced, so the system is set up for powerful exploration of GxE. But to do so, establishing consistent "in vitro" conditions that mimic ecologically relevant conditions like drought is essential. 

      The authors note that previous attempts to mimic drought response have shortcomings, many of which are revealed by 'omics type analysis. For example, three treatments thought to induce osmotic stress; the addition of PEG, mannitol, or NaCl, fail to elicit a transcriptional response that is comparable to that of bonafide drought. As an alternative, the authors suggest using a low water-agar assay, which in the things they measure, does a better job of mimicking osmotic stress responses. The major issues with this assay are, however, that it introduces another set of issues, for example, changing agar concentration can lead to mechanical effects, as illustrated nicely in the work of Olivier Hamant's group.

      We thank the reviewer for their comments. We hypothesize that our low-water agar assay is able to replicate drought gene expression patterns through a combination of hardened agar and higher nutrient concentration. However, we did not explore the separate effects each of these factors may play in eliciting such responses. Thus, in our revised manuscript, we will explore what role the mechanical effects of changing agar concentration has on root gene expression. However, we suspect that the mechanical effects introduced by hard agar does not introduce another issue per se, but in fact may help with replicating the transcriptional effects seen under drought.

      Reviewer #2 (Public Review):

      […] The authors have not always considered literature that would be relevant to their topic. For example, there is a number of studies that have reported (and deposited in the public database) transcriptome analysis of plants on PEG-plates or plants exposed to well-controlled, moderate severity soil drying assays (for the latter, check the paper of Des Marais et al. and others, for the former, Verslues and colleagues have published a series of studies using PEG-agar plates). They also overlook studies that have recorded growth responses of wild type and a range of mutants on properly prepared PEG plates and found that those results agree well with results when plants are exposed to a controlled, partial soil drying to impose a similar low water potential stress. In short, the authors need to make such comparisons to other data and think more about what may be wrong with their own experimental designs before making any sweeping conclusions about what is suitable or not suitable for imposing low water potential stress. 

      To solve the problem of using these other systems to impose low water potential stress, the authors propose the seemingly logical (but overly simplistic) idea of adding less water to the same mix of nutrients and agar. Because the increased agar concentration does not substantially influence water potential (the agar polymerizes and thus is not osmotically active), what they are essentially doing is using a concentrated solution of macronutrients in the growth media to impose stress. This is a rediscovery of an old proposal that concentrated macronutrient solutions could be used to study the osmotic component of salt stress (see older papers of Rana Munns). There are also effects of using very hard agar that is of unclear relationship to actual drought stress and low water potential. Thus, I see no reason to think that this would be a better method to impose low water potential. 

      We thank the reviewer for their comments. In our revised manuscript, we will address points regarding plant and soil water potential; similar concerns were also raised by Reviewer 1 and 3. We note that we report vermiculite water content in Supplementary Table 4.

      We would like to clarify that both the PEG media and overlay solution were buffered - we did not include this within the written description in the methods, but will do in our revised manuscript.

      We agree with the reviewer’s concern that it may be problematic to compare the transcriptomic profiles of seedling and mature plants. In light of this, we plan to explore what effects our treatment media has on mature rosettes.

      We note that we do not claim that PEG is unable to produce low-water potential responses similar to partial soil drying. Indeed, we indicate that it is a good technique for eliciting phenotypes comparable to drought at the physiological level (line 48). Rather, we claim that PEG is unable to produce gene expression responses that are sufficiently similar to partial vermiculite drying.

      Reviewer #3 (Public Review):

      […] The authors observed that gene expression responses of roots in their 'low-water agar' assay resembled more closely the water deficit in pots compared to the PEG, mannitol, and salt treatments (all at the highest dose). In particular, 28 % of PEG led to the down-regulation of many genes that were up-regulated under drought in pots. Through GO term analysis, it was pointed out that this may be due to the negative effect of PEG on oxygen solubility since downregulated genes were over-represented in oxygen-related categories. The data also shows that the treatment with abscisic acid on plates was very good at simulating drought in roots. Gene expression changes in shoots showed generally a high concordance between all treatments at the highest dose and water deficit in pots, with mannitol being the closest match. This is surprising, since plants grow in plates under non-transpiring conditions, while a mismatch between water loss by transpiration on water supply via the roots leads to drought symptoms such as wilting in pot and field-grown plants. The authors concluded that their 'low-water agar' assay provides a better alternative to simulate drought on plates. 

      Strengths: 

      The development of a more robust assay to simulate drought on plates to allow for high-throughput screening is certainly an important goal since many phenotypes that are discovered on plates cannot be recapitulated on the soil. Adding less water to the media mix and thereby increasing agar strength and nutrient concentration appears to be a good approach since nutrients are also concentrated in soils during water deficit, as pointed out by the authors. To my knowledge, this approach has not specifically been used to simulate drought on plates previously. Comparing their new 'low-water agar' assay to popular treatments with PEG, mannitol, salt, and abscisic acid, as well as plants grown in pots on vermiculite led to a comprehensive overview of how these treatments affect gene expression changes that surpass previous studies. It is promising that the impact of 'low-water agar' on the shoot size of 20 diverse Arabidopsis accessions shows some association with plant fitness under drought in the field. Their methodology could be powerful in identifying a better substitute for plate-based high-throughput drought assays that have an emphasis on gene expression changes. 

      Weaknesses: 

      While the authors use a good methodological framework to compare the different drought treatments, gene expression changes were only compared between the highest dose of each stress assay (Fig. 2B, 3B). From Fig. 1F it appears that gene expression changes depend significantly on the level of stress that is imposed. Therefore, their conclusion that the 'low-water agar' assay is better at simulating drought is only valid when comparing the highest dose of each treatment and only for gene expression changes in roots. Considering how comparable different levels of stress were in this study leads to another weakness. The authors correctly point out that PEG, mannitol, and salt are used due to their ability to lower the water potential through an increase in osmotic strength (L. 45/46). In soils, water deficit leads to lower water potential, due to the concentration of nutrients (as pointed out in L. 171), as well as higher adhesion forces of water molecules to soil particles and a decline in soil hydraulic conductivity for water, which causes an imbalance between supply and demand (see Juenger and Verslues, The Plant Cell 2022 for a recent review). While the authors selected three different doses for each treatment that are commonly used in the literature, these are not necessarily comparable on a physiological level. For example, 200 mM mannitol has an approximate osmotic potential of around -5 bar (Michel et al. Plant Physiol. 1983) whereas 28 % PEG has an osmotic potential closer to -10 bar (Michel et al. Plant Physiol. 1973). It also remains unclear how the increase in agar concentration versus the increase in nutrient concentration in the 'low-water agar' affect water potentials. For these reasons it cannot be known whether a better match of the 'low-water agar' at the 28% dose to water deficit in pots for roots in comparison to the other treatments is due to a good match in stress levels with the 'low-water agar' or adverse side-effect of PEG, mannitol, or and salt on gene regulation. Lastly, since only two biological replicates for RNA sequencing were collected per treatment, it is not possible to know how much variance exists and if this variance is greater than the treatments themselves. 

      We thank the reviewer for their comments. In our statistical analyses, we found that dose-responsive genes (as fit by a linear model) were very similar to those genes found differentially expressed at the highest dose. Thus, for clarity, we decided to simply present the genes differentially expressed at the highest dose. We see now that this might have been an oversimplification. In our revised manuscript, we will present genes that are dose responsive across the range of treatment doses, thus providing more evidence that lower doses of low-water agar are also capable of simulating drought (as is suggested by overlap analysis of Figure 2A).

      Additionally, we will also explore the osmotic potential of each of our different assays to provide a better benchmark of how comparable each of our treatments are (as similarly requested by Reviewer 1 and 2). Lastly, to address concerns regarding the size of variance in gene expression, we will sequence a 3rd replicate of RNA.

    1. Author Resposnse

      Reviewer #2 (Public Review):

      This manuscript reassesses the strength of evidence for rapid human germline mutation spectrum evolution, using high coverage whole genome sequencing data and paying particular attention to the potential impact of confounders like biased gene conversion. The authors also refute some recently published arguments that historical changes in the age of reproduction might explain the existence of such mutation spectrum changes. My overall impression is that the paper presents a useful new angle for studying mutation spectrum evolution, and the analysis is nicely suited to addressing whether a particular model such as the parental age model can explain a set of observed polymorphism data. My main criticism is that the paper overstates certain weaknesses of previously published papers on mutation spectrum evolution as well as the generation time hypothesis; correcting these oversimplifications would more accurately capture what the paper's new analyses add to the state of knowledge in these areas.

      As part of the motivation for the current study, the introduction states in lines 97-99 that "it thus remains unclear if the numerous observed [mutation spectrum] differences across human populations stem from rapid evolution of the mutation process itself, other evolutionary processes, or technical factors." This seems to overstate the uncertainty that existed prior to this study, given that Speidel, et al. 2021 found elevated TCC>TTC fractions in ancient genomes from a specific ancient European population, which seems like pretty airtight evidence that this historical mutation rate increase really happened. In addition, earlier papers (Harris 2015, Mathieson & Reich 2016, Harris & Pritchard 2017) already presented analyses rejecting the hypothesis that biased gene conversion or genetic drift could explain the reported patterns-in fact, the Mathieson & Reich paper reports one mutation spectrum difference between populations that they conclude is an artifact caused by the Native American population bottleneck, but they conclude that other mutation spectrum differences appear more robust.

      We completely agree with the reviewer that there has been compelling evidence from multiple independent groups supporting transient elevation of TCC>TTC mutation rate in Europeans. Beyond the TCC signal, however, the mechanisms underlying the observed differences in mutation spectrum across populations remain unclear. In particular, several biological and technical factors impact the mutation spectrum and none of the previous studies have investigated their effects, independently or altogether. Thus, it remains unclear if the mutation rate is evolving rapidly across populations, or if one or more factors (like biased gene conversion) differ across groups or over evolutionary time. Our analysis framework attempts to control these effects together to more reliably investigate the effects of various factors and examine when and how often there has been evolution of mutation rate over the course of human evolution.

      As the authors acknowledge in the discussion of their own results, biased gene conversion and non-equilibrium demography are difficult confounders to deal with, and neither previous papers nor the current paper are able to do this in a way that is 100% foolproof. The current manuscript makes a valuable contribution by presenting new ways of dealing with these issues, particularly since previous papers' work on this topic was often confined to supplementary material, but it seems appropriate to acknowledge that earlier papers discussed the potential impacts of biased gene conversion and demographic complexity and presented their own analyses arguing that these phenomena were poor explanations for the existence of mutation spectrum differences between populations.

      For the most part, I found the paper's introduction to be a useful summary of previous work, but there are a few additional places where the limitations of previous work could be described more clearly. I'd suggest noting that the data artifacts discovered by Anderson-Trocmé, et al. were restricted to a few old samples and that the large differences the current manuscript focuses on were never implicated as potential cell line artifacts. In addition, when the authors mention that their new approach includes "minimiz[ing] confounding effects of selection by removing constrained regions and known targets of selection" (lines 106-107), they should note that earlier papers like Harris & Pritchard 2017 also excluded conserved regions and exons.

      We agree with the reviewer that some of the previous work also attempted to account for the contributions of selection or other factors in post hoc ways; we now acknowledge this in the Results section more explicitly. However, we note that our contribution is in introducing a framework to account for these effects a priori and then assess if there are differences in mutation spectrum across populations and over the course of human evolution. In particular, an innovation of our framework is to better control for the effect of gBGC, which has not been done in previous studies.

      One innovative aspect of the current paper's approach is the use of allele ages inferred by Relate, which certainly has advantages over using allele frequencies as a proxy for allele age. Though the authors of Relate previously used this approach to study mutation spectrum evolution, they did not perform such a thorough investigation of ancient alleles and collapsed mutation type ratios. I like the authors' approach of building uncertainty into the use of Relate's age estimates, but I wonder about the validity of assuming that the allele age posterior probability is distributed uniformly between the upper and lower confidence bounds. Can the authors address why this is more appropriate than some kind of peaked distribution like a beta distribution?

      The lower and upper bounds of the allele age reported by Relate reflect the start and end points of the branch that the mutation falls on in the reconstructed genealogical tree. If Relate does a perfect job in reconstructing the tree and estimating the branch lengths, the mutation age should be uniformly distributed in the inferred interval. It is unrealistic that Relate can perform perfectly in tree building, and there is likely considerable uncertainty and even bias in the time to endpoints of the branch. Unfortunately, Relate does not report the uncertainty in the lower and upper bounds of the mutation age, so we were not able to model the posterior distribution of the allele age properly. However, assuming a uniform distribution of the mutation age between the upper and lower confidence bounds should be valid to first approximation.

      I would also argue that the statement on line 104 about Relate's reliability is not yet supported by data-there is certainly value in using Relate ages to investigate mutation spectrum change over time and compare this to what has been seen using allele frequencies, but I don't think we know enough yet to say that the Relate ages are definitely more reliable. Relate's estimates might be biased by the same processes like selection and demography that make allele frequencies challenging to interpret. The paper's statements about the limitations of allele frequencies are fair, but there is always a tradeoff between the clear drawbacks of simple summary statistics and the more cryptic possible blind spots of complicated "black box" algorithms (in the case of Relate, an MCMC that needs to converge properly). DeWitt, et al. 2021 noted that the demographic history inferred by Relate doesn't accurately predict the underlying data's site frequency spectrum, indicating that the associated allele ages might have some problems that need to be better characterized. While testing Relate for biases is beyond the scope of this work, the introduction should acknowledge that the accuracy and precision of its time estimates are still somewhat uncertain.

      We agree with the reviewer and have now added a paragraph in the Discussion highlighting some issues of Relate regarding mutation age estimation and ancestral allele polarization.

      The paper's results on C>T mutations in Europeans versus Africans are a nice confirmation of previous results, including the observation from Mathieson & Reich that neither SBS7 nor SBS11 is a good match for the mutational signature at play. More novel is the ancient mutational signature enriched in Africa and the interrogation of the ability of parental age to explain the observed patterns. I just have a few minor suggestions regarding these analyses:

      1) I like the idea of using maternal age C>G hotspots to test the plausibility of the maternal age as an explanatory factor, but I think this would be more convincing with the addition of a power analysis. Given two populations that have average maternal ages of 20 and 40, and the same population sample sizes available from 1000 Genomes, can the authors calculate whether the results they'd predict are any different from what is observed (i.e. no significant differences within the maternal hotspots and significant differences outside of these regions)?

      We thank the review for this suggestion. We performed simulations to estimate the power of observing significant inter-population differences within and outside the maternal C>G mutation hotspots, under the assumption that all differences in the mutation spectrum between the two populations are related to the parental age (i.e., generation time). We found that, because of the extraordinarily strong maternal age effects in the maternal mutation hotspots, the power for detecting variation in C>G/T>A ratio due to change in generation age is much greater within maternal hotspots than outside, despite the smaller total size of the maternal hotspot regions (and hence fewer SNPs; Figure 3 – figure supplement 4). For example, even with an age difference of five years, there is nearly 100% power to detect significant differences in the maternal hotspots, compared to <12% for regions outside the maternal hotspots. In other words, if inter-population differences in the mutation spectrum are driven by differences in maternal age across populations, we should have enough power to observe a signal in the maternal hotspot regions alone, the lack of which (Figure 2C) strongly suggests that maternal age is not driving these signals.

      2) Is it possible that the T>C/T>G ratio is elevated in all variants above a certain age but shows up as an African-specific signal because the African population retains more segregating variation in this age range, whereas non-African populations have fixed or lost more of this variation? Since Durvasula & Sankararaman identified putative tracts of super-archaic introgression within Africans, is it possible to test whether the mutation spectrum signal is enriched within those tracts?

      The observation that the T>C / T>G signal is driven by TpG>CpG mutations (which might be mis-polarized CpG transitions) casts a doubt on the signal. Given the unresolved technical issue, we have now removed any discussion of the biological explanations behind the signal and instead focus on describing the challenges with ancestral allele polarization under context-dependent mutation rate variation.

      3) Although Coll Macià, et al. argued that generation time is capable of explaining all mutation spectrum differences between populations, including the excess of TCC>TTC in Europeans, Wang et al. argue something slightly different. They exclude TCC>TTC and the other major components of the European signature from their analysis and then argue that parental age can explain the rest of the differences between populations. I think the analysis in this paper convincingly refutes the Coll Macià, et al. argument, but refuting the Wang, et al. version would require excluding the same mutation types that are excluded in that paper.

      Although we did not present an analysis that explicitly excludes TCC>TTC mutations, our analysis still shows that generation time alone cannot explain the remaining variations in the mutation spectrum observed (Figure 4). Specifically, the temporal trend of T>C/T>G ratio would suggest a decreasing generation time of Europeans with time, whereas the C>G/T>A ratio suggests the opposite. In addition, the power analysis for C>G maternal hotspots (suggested by the reviewer) further supports that the inter-population differences observed cannot be entirely driven by differences in parental ages. These observations, which do not involve TCC>TTC mutations, strongly suggest that generation time is not the sole or primary driver of differences in mutation spectrum across populations. Further, our analysis shows that several technical issues and biological processes, in addition to changes in life history traits can lead to changes in the mutation spectrum of polymorphisms. Therefore, inferring generation time using changes in mutation spectrum is not straightforward as Wang et al. proposed, because generation time is not the only or dominant factor impacting mutation spectrum.

    1. Author Response

      Reviewer #2 (Public Review):

      This study identifies the neural circuits inhibited by activation of opioid receptors using complex experimental approaches such as electrophysiology, pharmacology, and optogenetics and combined them with retrograde and anterograde tracings. The authors characterize two key regions of the brainstem, the preBötzinger Complex, and the Kolliker-Fuse, and how these neuronal populations interact. Understanding the interactions of these circuits substantially increases our understanding of the neural circuits sensitive to opioid drugs which are critical to understand how opioids act on breathing and potentially design new therapies.

      Major strengths.

      This study maps the excitatory projections from the Kolliker-Fuse to the preBötzinger Complex and rostral ventral respiratory group and shows that these projections are inhibited by opioid drugs. These Kolliker-Fuse neurons express FoxP2, but not the calcitonin gene-related peptide, which distinguishes them from parabrachial neurons. In addition, the preBötzinger Complex is also hyperpolarized by opioid drugs. The experiments performed by the authors are challenging, complex, and the most appropriate types of approaches to understanding pre- and post-synaptic mechanisms, which cannot be studied in vivo. These experiments also used complex tracing methods using adenoassociated virus and cre-lox recombinase approaches.

      Limitations.

      (1) The roles of the mechanisms identified in this study have not been established in models recording opioid-induced respiratory depression or respiratory activity. This study does not record, modulate, or assess respiratory activity in-vitro or in-vivo, without or with opioid drugs such as fentanyl or morphine.

      (2) Experiments are performed in-vitro which do not mimic the effects of opioids observed in-vivo or in freely-moving animals. However, identification of pre- and post- synaptic mechanisms, as well as projections, cannot be performed in-vivo, so the authors use the right approaches for their experiments.

      We agree with both of these points. We hope this study lays the groundwork for future studies assessing the impact of these projections on respiratory activity in vitro and in vivo.

      (3) The type of neurons projecting from KP to preBötzinger Complex or ventral respiratory group have not been identified. Although some of these cells are glutamatergic, optogenetic experiments could have been performed in other cre-expressing cell populations, such as neurokinin-1 receptors.

      There are indeed many different cell populations that could be interrogated. In addition to the optogenetic identification of glutamatergic projections, we identified immunohistochemically that at least some opioid receptor-expressing, medullary-projecting KF neurons express FoxP2, and not CGRP. Further dissection of other cell populations, such as Lmx1b and Phox2b, are excellent future directions.

      Reviewer #3 (Public Review):

      This manuscript reveals opioid suppression of breathing could occur via multiple mechanisms and at multiple sites in the pontomedullary respiratory network. The authors show that opioids inhibit an excitatory pontomedullary respiratory circuit via three mechanisms: 1) postsynaptic MOR-mediated hyperpolarization of KF neurons that project to the ventrolateral medulla, 2) presynaptic MOR mediated inhibition of glutamate release from dorsolateral pontine terminals onto excitatory preBötC and rVRG neurons, and 3) postsynaptic MOR-mediated hyperpolarization of the preBötC and rVRG neurons that receive pontine glutamatergic input.

      This manuscript describes in detail a useful method for dissecting the relationship between the dorsolateral pons and the rostral medulla, which will be useful for various researchers. It's also great to see how many different methods have been applied to improve the accuracy of the results.

      1. Relationship between the dorsolateral pons and rostral ventrolateral medulla.

      The method of this paper is a good paper to show a very precise relationship between the presence of opioid receptors and the dorsolateral pons and rostral ventrolateral medulla, and for opioid receptors, based on the expression of Oprm1, the use of genetically modified mice with anterograde or retrograde viruses with additional fluorescent colors showed both anterograde and retrograde projections, revealing a relationship between the dorsolateral pons and rostral ventrolateral medulla.

      For example, to visualize dorsal pontine neurons expressing Oprm1, Oprm1Cre/Cre mice were crossed with Ai9tdTomato Cre reporter mice to generate Ai9tdT/+ oprm1Cre/+ mice (Oprm1Cre/tdT mice) expressing tdTomato on neurons that also express MOR at any point during development, and the retrograde virus encoding Cre-dependent expression of GFP (retrograde AAV-hSIN-DIO-eGFP was injected into the respiratory center of Oprm1Cre/+ mice and into the ventral respiratory neuron group, showing that KF neurons expressing Oprm1 project to the respiration-related nucleus of the ventrolateral medulla.

      However, although the authors have also corrected it, the virus may spread to other places as well as where they thought it would be injected, and it is important to note that it is injected accordingly to mark the injection site with an anterograde virus encoding a different fluorescent color mCherry, and the extent of the injection is quantified, which is excellent as a control experiment.

      In addition, the respiratory center seems to be related not only to preBötC but also to pFRG recently, so if the relation with it is described, it is important from the viewpoint of the effect on the respiratory center and the effect on the rhythm.

      Our injections centered in preBotC, rVRG or BötC did not spread extensively to slices containing 7N/pFRG (Figure 2C and Figure 2-supplement 1D, Bregma -6.0 to -6.4, shaded region labeled 7N).

    1. Author Response:

      eLife assessment

      This manuscript analyzes large-scale Neuropixels recordings from visual areas and hippocampus of mice passively viewing repeated clips of a movie and reports that neurons respond with elevated firing activities to specific, continuous sequences of movie frames. The important results support a role of rodent hippocampal neurons in general episode encoding and advance understanding of visual information processing across different brain regions. The strength of evidence for the primary conclusion is solid, but some technical limitations of the study were identified that merit further analyses.

      We thank the editors and reviews for the assessment and reviews. We have provided clarifications and updated the manuscripts to address the seeming technical limitations that are perhaps due to some misunderstanding, please see below. We provide additional results that isolate the contribution of pupil diameter, sharpwave ripple and theta power to show that movie tuning cannot be explained by these nonspecific effects. Nor are these mere time cells or some other internally generated patterns due to many differences highlighted below.

      Reviewer #1 (Public Review):

      Taking advantage of a publicly available dataset, neuronal responses in both the visual and hippocampal areas to passive presentation of a movie are analyzed in this manuscript. Since the visual responses have been described in a number of previous studies (e.g., see Refs. 11-13), the value of this manuscript lies mostly on the hippocampal responses, especially in the context of how hippocampal neurons encode episodic memories. Previous human studies show that hippocampal neurons display selective responses to short (5 s) video clips (e.g. see Gelbard-Sagiv et al, Science 322: 96-101, 2008). The hippocampal responses in head-fixed mice to a longer (30 s) movie as studied in this manuscript could potentially offer important evidence that the rodent hippocampus encodes visual episodes.

      We have now included citations to Gelbard-Sagiv et al. Science 2008 paper and many other references too, thank you for pointing that out. There are major differences between that study and ours.

      • The movies used in previous study contained very familiar, famous people and famous events, and the experiment was about the patient’s ability to recall those famous movie episodes. In our case the mice had seen this movie clip only twice before.

      • They did not look at the fine structure of neural responses below half a second whereas we looked at the mega-scale representations from 30ms to 30s.

      • The movie clips in that study were in full color with audio, we used an isoluminant, black-and-white, silent movie clip.

      • Their movie clips contained humans and was observed by humans, whereas our study mice observed a movie clip with humans and no mice or other animals.

      The analysis strategy is mostly well designed and executed. A number of factors and controls, including baseline firing, locomotion, frame-to-frame visual content variation, are carefully considered. The inclusion of neuronal responses to scrambled movie frames in the analysis is a powerful method to reveal the modulation of a key element in episodic events, temporal continuity, on the hippocampal activity. The properties of movie fields are comprehensively characterized in the manuscript.

      Thank you.

      Although the hippocampal movie fields appear to be weaker than the visual ones (Fig. 2g, Ext. Fig. 6b), the existence of consistent hippocampal responses to movie frames is supported by the data shown. Interestingly, in my opinion, a strong piece of evidence for this is a "negative" result presented in Ext. Fig. 13c, which shows higher than chance-level correlations in hippocampal responses to same scrambled frames between even and odd trials (and higher than correlations with neighboring scrambled frames). The conclusion that hippocampal movie fields depend on continuous movie frames, rather than a pure visual response to visual contents in individual frames, is supported to some degree by their changed properties after the frame scrambling (Fig. 4).

      Yes, hippocampal selectivity is not entirely abolished with scrambled movie, as we show in several figures (Fig 4d,g and Extended Data Fig. 16), but it is greatly reduced, far more than in the afferent visual cortices. The fraction of tuned cells for scrambled movies dropped to 4.5% in hippocampus, which is close to the chance level of 3%. In contrast, in visual areas selectivity was still above 80%.

      Significant overlap between even and odd trials is to be expected for the tuned cells. Without a significant overlap, i.e. a stable representation, they will not be tuned. Despite this, the correlation between even and odd trials for the (only 4.5% of) tuned cells in the hippocampus was more than 2-fold smaller than (more than 80% of) cells in visual cortices. This strongly supports our hypothesis that unlike visual cortices, hippocampal subfields depended very strongly on the continuity of visual information. We will clarify this in the main text.

      However, there are two potential issues that could complicate this main conclusion.

      One issue is related to the effect of behavioral variation or brain state. First, although the authors show that the movie fields are still present during low-speed stationary periods, there is a large drop in the movie tuning score (Z), especially in the hippocampal areas, as shown in Ext. Fig. 3b (compared to Ext. Fig. 2d). This result suggests a potentially significant enhancement by active behavior.

      There seems to be some misunderstanding here. There was no major reduction in movie tuning during immobility or active running. As we wrote in the manuscript, the drop in selectivity during purely immobile epochs is because of reduction in the amount of data, not reduction in selectivity per se. Specifically, as the amount data reduces, the statistical strength of tuning (z-scored sparsity) reduces. For example, if we split the total of 60 trials worth of data into two parts, the amount of data reduces to about half in each part, leading to a seeming reduction in selectivity in both halves. Extended figure 2B shows nearly identical tuning in all brain regions during immobility and equivalent subsamples chosen randomly from the entire data, including mobility and immobility. We will include additional data in the revised manuscript to demonstrate this more clearly. Please see below for more details.

      Second, a general, hard-to-tackle concern is that neuronal responses could be greatly affected by changes in arousal or brain state (including drowsy or occasional brief slow-wave sleep state) in head-fixed animals without a task. Without the analysis of pupil size or local field potentials (LFPs), the arousal states during the experiment are difficult to know.

      In the revised manuscript we will that the behavioral state effects cannot explain movie tuning. Specifically:

      • We compare sessions in which the mouse was mostly immobile versus sessions in which the mouse was mostly running. Movie tuned cells were found in both these cases (Extended Data Fig. 7).

      • b. We detect and remove all data around sharp-wave ripples (SWR). Movie tuning was unchanged in the remaining data.

      • c. As a further control, we quantified arousal by two standard metrics. First within a session, we split the data into two groups, segments with high theta power and segments with low theta power. Significant movie tuning persisted in both.

      • d. Finally, pupil dilation is another common method to estimate arousal, so data within a session were split into two parts: those with pupil dilation versus constriction. Movie tuning remained significant in both parts. See the new Extended Data Fig. 7.

      Many example movie fields in the presented raw data (e.g., Fig. 1c, Ext. Fig. 4) are broad with low-quality tuning, which could be due to broad changes in brain states. This concern is especially important for hippocampal responses, since the hippocampus can enter an offline mode indicated by the occurrence of LFP sharp-wave ripples (SWRs) while animals simply stay immobile. It is believed that the ripple-associated hippocampal activity is driven mainly by internal processing, not a direct response to external input (e.g., Foster and Wilson, Nature 440: 680, 2006). The "actual" hippocampal movie fields during a true active hippocampal network state, after the removal of SWR time periods, could have different quantifications that impact the main conclusion in the manuscript.

      We included the broadly tuned hippocampal neurons to demonstrate the movie-field broadening compared to those in visual areas. We will include more examples with sharp movie fields in the hippocampal regions (Main figure 1a-d right column, 2d and h, Extended Data Fig 5 and 8). Further, as stated above, we detected sharp-wave ripples and removed one second of data around SWR. Move tuning was unchanged in the remaining data. Thus, movie tuning is not generated internally via SWR (Extended Data Fig. 6). See also Extended Data 7 and 8 and the response above.

      Another issue is related to the relative contribution of direct visual response versus the response to temporal continuity in movie fields. First, the data in Ext. Fig. 8 show that rapid frame-to-frame changes in visual contents contribute largely to hippocampal movie fields (similarly to visual movie fields).

      There seems to be some misunderstanding here. That figure showed that the frame-toframe changes in the visual content had the highest effect on visual areas MSUA and much weaker in hippocampus (Extended Data Fig. 8, as per previous version). For example, the depth of modulation (max – min) / (max + min) for MSUA was 21% and 24% for V1 but below 6% for hippocampal regions. Similarly, the MSUA was more strongly (negatively) correlated with F2F correlation for visual areas (r=0.48 to 0.56) than hippocampal (0.07 to 0.3). Similarly, comparing the number of peaks or their median widths, visual regions showed stronger correlation with F2F, and largest depth of modulation than hippocampal regions, barring handful exceptions (like CA3 correlation between F2F and median peak duration). This strongly supports our claim that visual regions generated far greater response of the frame-to-frame changes in the movie than hippocampal regions.

      Interestingly, the data show that movie-field responses are correlated across all brain areas including the hippocampal ones.

      The changes in multiunit activity are strongly correlated only between visual areas and some of the hippocampal region pairs. The correlation is much weaker for hippocampal areas, or hippocampal-visual area pairs. This will be quantified explicitly in the revised text Extended Data Fig. 11 with an additional correlation matrix. Further, in Fig 3c we compared the MSUA responses with normalization between brain regions. Amongst the 21 possible brain region pairs, 5 were uncorrelated, 7 were significantly negatively correlated and 9 were significantly correlated.

      This could be due to heightened behavioral arousal caused by the changing frames as mentioned above, or due to enhanced neuronal responses to visual transients, which supports a component of direct visual response in hippocampal movie fields.

      As shown in Extended data 7 and 8 and described above, the effect of arousal as quantified by theta power of pupil diameter cannot explain the results in hippocampal areas and the correlations in multiunit responses are unrelated across many brain areas.

      Second, the data in Ext. Fig. 13c show a significant correlation in hippocampal responses to same scrambled frames between even and odd trials, which also suggests a significant component of direct visual response.

      This is plausible. The fraction of hippocampal cells which were significantly tuned for the scrambled presentation (4.5%) was close to chance level (3%), and this small subset of cells was used to compute the population overlap between even and odd trials in Ext Fig. 13 (old numbering). As described above, this significant but small amount of tuning could generate significant population overlap, which is to be expected by construction.

      Is there a significant component purely due to the temporal continuity of movie frames in hippocampal movie fields? To support that this is indeed the case, the authors have presented data that hippocampal movie fields largely disappear after movie frames are scrambled. However, this could be caused by the movie-field detection method (it is unclear whether single-frame field could be detected).

      As described in the methods section, the movie-field detection algorithm had a resolution of 3.3ms resolution, which ensured that we could detect single frame fields. As reported, we did find such short movie fields in several cells in the visual areas. The sparsity metric used is agnostic to the ordering of the responses, and hence single frame field, and the resultant significant movie-tuning, if present, can be detected by our methods.

      Another concern in the analysis is that movie-fields are not analyzed on re-arranged neural responses to scrambled movie frames. The raw data in Fig. 4e seem quite convincing. Unfortunately, the quantifications of movie fields in this case are not compared to those with the original movie.

      We saw very few (3.6-4.9%) cells with significant movie tuning for scrambled presentation in the hippocampus. Hence, we did not quantify this earlier. This is now provided in new Extended Data Fig. 16. The amount of movie tuning for the scrambled presentation taken as-is, or after rearranging the frames is below 5% for all hippocampal brain regions.

      Reviewer #2 (Public Review):

      […] The authors have concluded that the neurons in the thalamo-cortical visual areas and the hippocampus commonly encode continuous visual stimuli with their firing fields spanning the mega-scale, but they respond to different aspects of the visual stimuli (i.e., visual contents of the image versus a sequence of the images). The conclusion of the study is fairly supported by the data, but some remaining concerns should be addressed.

      1) Care should be taken in interpreting the results since the animal's behavior was not controlled during the physiological recording.

      This was done intentionally since plenty of research shows that task demand (e.g., Aronov and Tank, Nature 2017) can not only modulate hippocampal responses but also dramatically alter them. We have now provided additional figures (Extended Data Fig. 6 and 7) where we quantified the effects of the behavioral states (sharp wave ripples, theta power and pupil diameter), as well as the effect of locomotion (Extended Data Fig. 4). Movie tuning remained unaffected with these manipulations. Thus, movie tuning cannot be attributed to behavioral effects.

      It has been reported that some hippocampal neuronal activities are modulated by locomotion, which may still contribute to some of the results in the current study. Although the authors claimed that the animal's locomotion did not influence the movie-tuning by showing the unaltered proportion of movie-tuned cells with stationary epochs only, the effects of locomotion should be tested in a more specific way (e.g., comparing changes in the strength of movie-tuning under certain locomotion conditions at the single-cell level).

      Single cell analysis of the effect of locomotion and visual stimulation is underway, and beyond the scope of the current work. As detailed in the (Extended Data Fig. 4), we have ensured that in spite of the removal of running or stationary epochs, as well as removal of sharp wave ripple events (Extended Data Fig. 6) movie tuning persists. Further, we will provide examples of strongly tuned cells from sessions with predominantly running or predominantly stationary behavior (Extended Data Fig. 7).

      2) The mega-scale spanning of movie-fields needs to be further examined with a more controlled stimulus for reasonable comparison with the traditional place fields. This is because the movie used in the current study consists of a fast-changing first half and a slow-changing second half, and such varying and ununified composition of the movie might have largely affected the formation of movie-fields. According to Fig. 3, the mega-scale spanning appears to be driven by the changes in frame-to-frame correlation within the movie. That is, visual stimuli changing quickly induced several short fields while persisting stimuli with fewer changes elongated the fields.

      Please note that a strong correlation between the speed at which the movie scene changed across frames was correlated with movie-field width in the visual areas, but that correlation was much weaker in the hippocampal areas (see above). Please see Extended Data Fig. 11 and the quantification of correlation between frame-to-frame changes in the movie and the properties of movie fields.

      The presentation of persisting visual input for a long time is thought to be similar to staying in one place for a long time, and the hippocampal activities have been reported to manifest in different ways between running and standing still (i.e., theta-modulated vs. sharp wave ripple-based). Therefore, it should be further examined whether the broad movie-fields are broadly tuned to the continuous visual inputs or caused by other brain states.

      As shown in Extended Data Fig. 6, movie field properties are largely unchanged when SWR are removed from the data, or when the effect of pupil diameter or theta power were factored for (Extended Data Fig.7).

      3) The population activities of the hippocampal movie-tuned cells in Fig. 3a-b look like those of time cells, tiling the movie playback period. It needs to be clarified whether the hippocampal cells are actively coding the visual inputs or just filling the duration.

      Tiling patterns would be observed when the maximal are sorted in any data, even for random numbers. This alone does not make them time cells. The following observations suggest that movie fields cannot be explained as being time cells.

      • a. Time cells mostly cluster at the beginning of a running epoch (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) and they taper off towards the end. Such large clustering is not visible in these tiling plots for movie tuned cells.

      • b. Time fields become wider as the temporal duration progresses (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) as the encoded temporal duration increases. This is not evident in any movie fields.

      • c. Widths of movie fields in visual areas, and to a smaller extent in the hippocampal areas, were clearly modulated by the visual content, like the change from one frame to the next (F2F correlation, Extended Data Fig. 11).

      • d. Tiling pattern of movie fields was found in visual areas too, with qualitatively similar pattern as hippocampus. Clearly, visual area responses are not time cells, as shown by the scrambled stimulus experiment. Here, neural selectivity could be recovered by rearranging them based on the visual content of the continuous movie, and not the passage of time.

      The scrambled condition in which the sequence of the images was randomly permutated made the hippocampal neurons totally lose their selective responses, failing to reconstruct the neural responses to the original sequence by rearrangement of the scrambled sequence. This result indirectly addressed that the substantial portion of the hippocampal cells did not just fill the duration but represented the contents and temporal order of the images. However, it should be directly confirmed whether the tiling pattern disappeared with the population activities in the scrambled condition (as shown in Extended Data Fig. 11, but data were not shown for the hippocampus).

      As stated above for the continuous movie, tiling pattern alone does not mean those are time cells. Further, tuning, and tiling pattern remained intact with scrambled movie in the visual cortices but not in hippocampus.

      Reviewer #3 (Public Review):

      […] The paper is conceptually novel since it specifically aims to remove any behavioral or task engagement whatsoever in the head-fixed mice, a setup typically used as an open-loop control condition in virtual reality-based navigational or decision making tasks (e.g. Harvey et al., 2012). Because the study specifically addresses this aspect of encoding (i.e. exploring effects of pure visual content rather than something task-related), and because of the widespread use of video-based virtual reality paradigms in different sub-fields, the paper should be of interest to those studying visual processing as well as those studying visual and spatial coding in the hippocampal system. However, the task-free approach of the experiments (including closely controlling for movement-related effects) presents a Catch-22, since there is no way that the animal subjects can report actually recognizing or remembering any of the visual content we are to believe they do.

      Our claim is that these are movie scene evoked responses. We make no claims about the animal’s ability to recognize or remember the movie content. That would require entirely different set of experiments. Meanwhile, we have shown that these results are not an artifact of brain states such as sharp wave ripples, theta power or pupil diameter (Extended Data Fig. 6 and 7) or running behavior (Extended Data Fig. 4). Please see above for a detailed response.

      We must rely on above-chance-level decoding of movie segments, and the requirement that the movie is played in order rather than scrambled, to indicate that the hippocampal system encodes episodic content of the movie. So the study represents an interesting conceptual advance, and the analyses appear solid and support the conclusion, but there are methodological limitations.

      It is important to emphasize that these responses could constitute episodic responses but does not prove episodic memory, just as place cell responses constitute spatial responses but that does not prove spatial memory. The link between place cells and place memory is not entirely clear. For example, mice lacking NMDA receptors have intact place cells, but are impaired in spatial memory task (McHugh et al. Cell 1996), whereas spatial tuning was virtually destroyed in mice lacking GluR1 receptors, but they could still do various spatial memory tasks (Resnik et al. J. Neuro 2012). The experiments about episodic memory would require an entirely different set of experiments that involve task demand and behavioral response, which in turn would modify hippocampal responses substantially, as shown by many studies. Our hypothesis here, is that just like place cells, these episodic responses without task demand would play a role, to be determined, in episodic memory. We will emphasize this point in the main text (Ln 432-436 in the revised manuscript).

      Major concerns:

      1) A lot hinges on hinges on the cells having a z-scored sparsity >2, the cutoff for a cell to be counted as significantly modulated by the movie. What is the justification of this criterion?

      The z-scored sparsity (z>2) corresponds to p<0.03. This would mean that 3% of the results could appear by chance. Hence, z>2 is a standard method used in many publications. Another advantage of z-scored sparsity is that it is relatively insensitive to the number of spikes generated by a neuron (i.e. the mean firing rate of the neuron and the duration of the experiment). In contrast, sparsity is strongly dependent on the number of spikes which makes it difficult to compare across neurons, brain regions and conditions (See Supplement S5 Acharya et al. Cell 2016). To further address this point, we compared our z-scored sparsity measure with 2 other commonly used metrics to quantify neural selectivity, depth of modulation and mutual information (Extended Data Fig. 3). Comparable movie tuning was obtained from all 3 metrics, upon z-scoring in an identical fashion.

      It should be stated in the Results. Relatedly, it appears the formula used for calculating sparseness in the present study is not the same as that used to calculate lifetime sparseness in de Vries et al. 2020 quoted in the results (see the formula in the Methods of the de Vries 2020 paper immediately under the sentence: "Lifetime sparseness was computed using the definition in Vinje and Gallant").

      The definition of sparsity we used is used commonly by most hippocampal scientists (Treves and Rolls 1991, Skaggs et al. 1996, Ravassard et al. 2013). Lifetime sparseness equation used by de Vries et al. 2020, differs from us by just one constant factor (1-1/N) where N=900 is the number of frames in the movie. This constant factor equals (1- 1/900)=0.999. Hence, there is no difference between the sparsity obtained by these two methods. Further, z-scored sparsity is entirely unaffected by such constant factors. We will clarify this in the methods of the revised manuscript.

      To rule out systematic differences between studies beyond differences in neural sampling (single units vs. calcium imaging), it would be nice to see whether calculating lifetime sparseness per de Vries et al. changed the fraction "movie" cells in the visual and hippocampal systems.

      As stated above, the two definitions of sparsity are virtually identical and we obtained similar results using two other commonly used metrics, which are detailed in Extended Data Fig. 3.

      2) In Figures 1, 2 and the supplementary figures-the sparseness scores should be reported along with the raw data for each cell, so the readers can be apprised of what types of firing selectivity are associated with which sparseness scores-as would be shown for metrics like gridness or Raleigh vector lengths for head direction cells. It would be helpful to include this wherever there are plots showing spike rasters arranged by frame number & the trial-averaged mean rate.

      As shown in several papers (Aghajan et al Nature Neuroscience 2015, Acharya et al., Cell 2016) raw sparsity (or information content) are strongly dependent on the number of spikes of a neuron. This makes the raw values of these numbers impossible to compare across cells, brain regions and conditions. (Please see Supplement S5 from Acharya et al., Cell 2016 for details). Including the data of sparsity would thus cause undue confusion. Hence, we provide z-scored sparsity. This metric is comparable across cells and brain regions, and now provided above each example cell in Figure 1 and Extended Data Fig. 2.

      3) The examples shown on the right in Figures 1b and c are not especially compelling examples of movie-specific tuning; it would be helpful in making the case for "movie" cells if cleaner / more robust cells are shown (like the examples on the left in 1b and c).

      We did not put the most strongly tuned hippocampal neurons in the main figures so that these cells are representative of the ensemble and not the best possible ones, so as to include examples with broad tuning responses. We have clarified in the legend that these cells are some of the best tuned cells. Although not the cleanest looking, the z-scored sparsity mentioned above the panels now indicates how strongly they are modulated compared to chance levels. Additional examples, including those with sharply tuned responses are shown in Extended Data Fig. 5 and 8.

      4) The scrambled movie condition is an essential control which, along with the stability checks in Supplementary Figure 7, provide the most persuasive evidence that the movie fields reflect more than a passive readout of visual images on a screen. However, in reference to Figure 4c, can the authors offer an explanation as to why V1 is substantially less affected by the movie scrambling than it's main input (LGN) and the cortical areas immediately downstream of it? This seems to defy the interpretation that "movie coding" follows the visual processing hierarchy.

      This is an important point, one that we find very surprising as well. Perhaps this is related to other surprising observations in our manuscript, such as more neurons appeared to be tuned to the movie than the classic stimuli. A direct comparison between movie responses versus fixed images is not possible at this point due to several additional differences such as the duration of image presentations and their temporal history. The latency required to rearrange the scrambled responses (60ms for LGN, 74ms for V1, 91ms for AM/PM) supports the anatomical hierarchy. The pattern of movie tuning properties was also broadly consistent between V1 and AM/PM (Fig 2). However, all metrics of movie selectivity (Fig 2) to the continuous movie showed a consistent pattern that was the exact opposite pattern of the simple anatomical hierarchy: V1 had stronger movie tuning, higher number of movie fields per cell, narrower movie-field widths, larger mega-scale structure, and better decoding than LGN. V1 was also more robust to the scrambled sequence than LGN. One possible explanation is that there are other sources of inputs to V1, beyond LGN, that contribute significantly to movie tuning. This is an important insight and we will modify the discussion to highlight this.

      Relatedly, the hippocampal data do not quite fit with visual hierarchical ordering either, with CA3 being less sensitive to scrambling than DG. Since the data (especially in V1) seem to defy hierarchical visual processing, why not drop that interpretation? It is not particularly convincing as is.

      The anatomical organization is well established and an important factor. Even when observations do not fit the anatomical hierarchy, it provides important insights about the mechanisms. All properties of movie tuning (Fig 2) –the strength of tuning, number of movie peaks, their width and decoding accuracy firmly put visual areas upstream of hippocampal regions. But, just like visual cortex there are consistent patterns that do not support a simple feed-forward anatomical hierarchy. We have pointed out these patterns so that future work can build upon it.

      5) In the Discussion, the authors argue that the mice encode episodic content from the movie clip as a human or monkey would. This is supported by the (crucial) data from the scrambled movie condition, but is nevertheless difficult to prove empirically since the animals cannot give a behavioral report of recognition and, without some kind of reinforcement, why should a segment from a movie mean anything to a head-fixed, passively viewing mouse?

      We emphasize once again that our claim is about the nature of encoding of the movie across these neurons. We make no claims about whether this forms a memory or whether the mouse is able to recognize the content or remember it. Despite decades of research, similar claims are difficult to prove for place cells, with plenty of counter examples (See the points above). The important point here is that despite any cognitive component, we see remarkably tuned responses in these brain areas. Their role in cognition would take a lot more effort and is beyond the scope of the current work.

      Would the authors also argue that hippocampal cells would exhibit "song" fields if segments of a radio song-equally arbitrary for a mouse-were presented repeatedly? (reminiscent of the study by Aronov et al. 2017, but if sound were presented outside the context of a task). How can one distinguish between mere sequence coding vs. encoding of episodically meaningful content? One or a few sentences on this should be added in the Discussion.

      Aronov et al 2017, found the encoding of an audio sweep in hippocampus when the animals were doing a task (release the lever at a specific frequency to obtain a reward). However, without a task demand they found that hippocampal neurons did not encode the audio sequence beyond chance levels. This is at odds with our findings with the movie where we see strong tuning despite any task demand or reward. These results are consistent with but go far beyond our recent findings that hippocampal (CA1) neurons can encode the position and direction of motion of a revolving bar of light (Purandare et al. Nature 2022). Please see Ln 414-420 for related discussion.

      These responses are unlikely to be mere sequence responses since the scrambled sequence was also fixed sequence that was presented many times and it elicited reliable responses in visual areas, but not in hippocampus. Hence, we hypothesize that hippocampal areas encode temporally related information, i.e. episodic content. We will modify the discussion to address these points.

    1. Author Response:

      We thank the eLife editorial board and the reviewers for the assessment of our article. We look forward to thoroughly addressing their comments and concerns. We would like to correct one factual error in the consensus public review:

      “Importantly, the authors do not present evidence that value itself is stably encoded across days, despite the paper's title. The more conservative in its claims in the Discussion seems more appropriate: "these results demonstrate a lack of regional specialization in value coding and the stability of cue and lick [(not value)] codes in PFC."

      The imaging sessions in which we identify value coding cells were in fact performed on separate days: Experimental Days 6 and 7 (see Figure 1b), which is evidence of the stability of value coding across consecutive days. Days 6 and 7 correspond to the third day of Odor Set 1 and the third day of Odor Set 2, respectively, which is why we referred to them both as “Day 3” in the manuscript, and this may have led to the confusion about the temporal relationship between these sessions. We will clarify this terminology in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this well-written manuscript, Afshar et al demonstrated the significant transcriptional and proteomic differences between cultured human umbilical vein endothelial cells (HUVECs) and those freshly isolated from the cords. They showed that TGFbeta and BMP signaling target genes were enriched in cord cells compared to those in culture. Extracellular matrix (ECM) and cell cycle-related genes were also different between the two conditions. Because master regulators of EC shear stress response genes, KLF2 and KLF4, were downregulated in culture, the authors sought to restore the in vivo transcriptional profile with the application of shear stress in an orbital shaker and dextran-containing media for various time periods. They showed that after 48 hours of shear stress the transcriptional profile of sheared cells correlated with in vivo transcriptional profile more significantly than static cultures. They also showed, using single cell RNAseq, that EC-smooth muscle cell cocultures resulted in changes in TGFbeta and NOTCH signaling pathways and rescued 9% of the in vivo transcriptional signatures.

      This is an important study that was elegantly executed. The authors should also be commended for making their data public; thereby, creating a valuable resource for vascular biologists.

      We much appreciate the comments and thank the reviewer for the time and effort evaluating the study.

      Reviewer #2 (Public Review):

      The authors profiled the transcriptome and proteome of human umbilical vein endothelial cells freshly isolated from in vivo and compared that with the same cells exposed to in vitro culture under different conditions, including static culture, flow, and co-culture with smooth muscle cells. The experiments were properly designed and performed. The authors also provided a reasonable and sound interpretation of their findings. This study provides valuable insights into how the culturing conditions impact on gene expression, encouraging the field to select their in vitro work setting appropriately. Overall, the manuscript is well-written and easy to follow.

      Several notable strengths include:

      1. Parallel transcriptome- and proteome-wide profiling of endothelial cells enabling the unbiased interrogation of gene expression and a genome-wide view of the impact of in vitro culture on endothelial transcriptome.

      2. The innovative experimental design and comparisons were done with genetically identical ECs (from the same donors) in vivo and in vitro.

      3. The analyses were robust and provided novel information on flow-dependent and cell context-dependent gene regulation, with the native freshly isolated cells as a baseline.

      4. The donor samples used in this study were diverse including Asian, White, Black, Latino, and American Indian samples which reduce racial background bias.

      Some points that can strengthen the study:

      A clear description of experimental and analytical details (e.g. how the comparisons were made) and more in-depth interpretation and discussion of the results, e.g. the complete genes that are rescued by flow and co-culture and potential synergy of these factors.

      We thank the reviewer for highlighting the strengths and appreciate the comments on experimental and analytical details which have been now addressed in this revised manuscript. Specifically, we have expanded the discussion and included synergy and additional comments on the rescued genes. A clear description of experimental and analytical details (e.g. how the comparisons were made) and more in-depth interpretation and discussion of the results, e.g. the complete genes that are rescued by flow and co-culture and potential synergy of these factors are now included.

      Reviewer #3 (Public Review):

      Afshar et al. performed RNA-seq and LC-MS of in vivo and in vitro HUVECs to identify the role of culture conditions on gene expression. Given the widespread use of HUVECs to study EC biology, these findings are interesting and can help design better in vitro experiments. There have been previous papers that compared in vivo and in vitro HUVECs, however, the depth of sequencing and analysis in this manuscript identifies some novel effects which should be accounted for in future in vitro experiments using ECs.

      Strengths:

      1. Major findings of distinct pathways affected by cell culture are novel and interesting. The authors identify major effects on TGFb and ECM gene expression. They also corroborate previous findings of flow response pathways, namely KLF2/4 and Notch pathway regulation.

      2. Use of multiple genomic methods to profile effects of culture conditions. The LC-MS data showed a significant correlation with RNA-seq, however, the data were not as strong so not used for subsequent analyses.

      3. Use of scRNA-seq to show the dynamic effects of co-culture and shear stress on ECs is very novel. However, the heterogeneity in the EC populations is not discussed in this manuscript.

      We would like to thank the reviewer for the in-depth analysis of our study and for highlighting the novelty and strength of the data. Note that we included comments in relation to EC heterogeneity as part of the limitations of this study (in the Discussion).

      Weaknesses:

      1. The physiological relevance of these changes in gene expression is not demonstrated in the manuscript. The authors claim the significance of their data is to improve in vitro culture to better represent in vivo biology. Is this the case with orbital shear stress? Do they rescue some functional effects in ECs with long-term shear stress? An angiogenesis, barrier function, or migration assay for HUVECs exposed to different conditions would help answer this question. A similar assay for cells after EC-VSMC co-culture would validate the importance of these stimuli.

      The reviewer is correct, our manuscript did not expand into physiological read outs, we have now clearly acknowledged this as part of the limitations of the study. Notably, there is already extensive literature on the effects of different types of flow on several physiological parameters. For example, others have shown that laminar shear stress (by orbital or other means) reduces proliferation and migration (PMID: 31831023; PMID: 22012789, PMID: 12857765, PMID: 21312062, PMID: 15886673; PMID: 17323381), reduces inflammation (PMID: 34747636; PMID: 32951280), and improves barrier function (PMID: 20543206; PMID: 32457386 ; PMID: 12577139, PMID: 27246807; PMID: 31500313 ).

      From the onset, our objective was to bring granularity to transcriptional changes associated with the transition from in vivo to in vitro. Further, it was our goal to identify the cohorts of transcripts that could and those that could not be rescued by altering culture conditions. Because we had transcriptional information from the identical samples at a time that they were in the vessel, we have been able to fulfill our goal. We feel this is important, and currently missing data, that will be of value to many investigators.

      1. One explanation for the increased expression of ECM genes in vivo is that these cells are contaminated with VSMCs/fibroblasts. This could be very likely given that cells were not sorted or purified upon isolation. Expression of other VSMC or fibroblast-specific markers (i.e. CNN1, MYH11, SMTN, DCN, FBLN1) would help determine if there is some level of non-EC contamination.

      We thank the reviewer for this comment and prompted by this, we have included a new figure (Supplemental Figure 1 and new panels in Supplemental Figure 5) that directly address this concern.

      Amongst the several pieces of data, we included scRNAseq from cells that were immediately obtained from umbilical vein – three independent experiments sequenced together and showed in one UMAP (Supplemental Figure 1C). As can be appreciated, the very large majority of cells are endothelial and the only other cell types present were blood cells (erythrocytes and CD45+ cells). No smooth muscle cells or fibroblasts were detected. These three examples are indeed representative of a large number of scRNAseq datasets (35 from cords and cultures for this and other projects). Furthermore, our cultures are also routinely evaluated by FACS (one example has been provided in Supplemental Figure 1E). We do not find, as illustrated in that example, cells that are not positive for CD31 and VE-Cadherin.

      We hope this information reveals the rigor of our studies and convinces the reviewer that the transcriptional changes observed are from endothelial cells.

      1. The use of scRNA-seq in Figure 4 is interesting. There appear to be 2 distinct EC populations in the co-cultured ECs. What are the marker genes for the 2 populations?

      Indeed, we and others (Kalluri et al., 2019) have noticed two distinct populations in the in vivo and also in cultured ECs, as pointed by the reviewer. Evaluation as to these two subpopulations reflect two transcriptionally distinct groups or different states of cyclic expression patterns, requires more thorough analysis and lineage tracing studies and distinct from the focus of this manuscript. Nonetheless, we have made a point in the revised manuscript to highlight these possibilities.

      Reference: Kalluri, AS, Vellarikkal, SK, Edelman, ER, Nguyen, L, Subramanian, A, Ellinor PT, Regev, A, Kathiresan, S, Gupta, RM. Single Cell Analysis of the Normal Mouse Aorta Reveals Functionally Distinct Endothelial Cell Populations. Circulation, 2019. 140:147-163.

      1. The modest shifts in gene expression with shear stress and co-culture could be attributed to the batch effect. The authors describe 1 batch correction method (ComBat) in the bulk RNA-seq, but no mention of batch correction was noted in the scRNA-seq methods. The authors should ensure that batch effect correction in all data is adequate, and these results should be added to the manuscript.

      We thank the reviewer for this comment. Indeed, batch effects are a particularly important consideration when samples are prepared separately and/or sequenced at distinct times, note this was not the case in this study.

      For the scRNA-seq analysis, we removed the low-quality cells, but did not use batch-effect correction methods because the samples were prepared and run at the same time. Meaning, isolation was performed in parallel, generation of cDNA libraries was done concurrently, and sequencing was run in the same gel. The quality of the data (and lack of batch effect) was subsequently verified when the two mono-culture biological replicates were evaluated by Seurat and were found to overlap on the UMAP (Figure 4), the same applies to the two co-culture biological replicates. These results clearly indicate that there’s no batch effect (as the samples were not process in distinct batches) among these samples.

      1. Table 1 shows ATAC-seq was done, however, no data from these experiments are provided in the manuscript.

      As mentioned (reviewer 2), we had performed ATACseq but decided to remove from the manuscript for several reasons and apologize for missing reference to Table 1. We have now corrected this error.

      1. Shear stress was achieved with an orbital shaker, which the accompanying citation states introduces significant heterogeneity in the ECs. This is based on the location of the culture dish. Was this heterogeneity seen in the scRNA-seq data?

      Correct. We only use the 2/3 peripheral area of the plates and discard the central aspect of the plate. We have added clarifying language to the Methods > Shear stress application to reflect this: “Orbital shear stress (130 rpm) was applied to confluent cell cultures by using an orbital shaker positioned inside the incubator as previously discussed (32). The shear stress within the cell culture well corresponds to arterial magnitudes (11.5 dynes/cm2) of shear stress. To reduce issues associated with uniformity of shear stress, the endothelial cell monolayers in 6-well plates were lysed after removing center region using cell scraper (BD Falcon #35-3085) and washing with 1X HBSS (Corning #21-022-CV). The 1.8cm blade was circumferentially used in the center of the 6-well plate to remove the center of the monolayer that did not see the higher shear stress.”

      1. It would be important to know whether the authors reproduce the findings from other papers that CD34 expression is reduced in cultured HUVECs:

      Muller AM, Cronen C, Muller KM, Kirkpatrick CJ: Comparative analysis of the reactivity of human umbilical vein endothelial cells in organ and monolayer culture. Pathobiology 1999;67:99-107. Delia D, Lampugnani MG, Resnati M, Dejana E, Aiello A, Fontanella E, Soligo D, Pierotti MA, Greaves MF: Cd34 expression is regulated reciprocally with adhesion molecules in vascular endothelial cells in vitro. Blood 1993;81:1001-1008.

      Thank you for this suggestion. Supplemental Excel 4 allows the reader to review single genes that are modulated by condition and in fact, consistent with all previous literature, CD34 expression is one of the most significantly decreased genes in cultured HUVECs (0.9, p=1E-5).

    1. Author Response

      Reviewer #1 (Public Review):

      1) I was confused about the nature of the short-term plasticity mechanism being modeled. In the Introduction, the contrast drawn is between synaptic rewiring and various plasticity mechanisms at existing synapses, including long-term potentiation/depression, and shorter-term facilitation and depression. And the synaptic modulation mechanism introduced is modeled on STDP (which is a natural fit for an associative/Hebbian rule, especially given that short-term plasticity mechanisms are more often non-Hebbian).

      Indeed, because of its associative nature, the modulation mechanism was envisioned to be STDP-like, i.e. on faster time scales than the complete rewiring of the network (via backpropagation) but slower time scales than things like STSP which, as the reviewer points out, are usually not considered associative. One thing we do want to highlight is that backpropagation and the modulation mechanism are certainly not independent of one another. During training, the network’s weights that are being adjusted by backpropagation are experiencing modulations, and said modulations certainly factor into the gradient calculation.

      We have edited the abstract and introduction to try to make the distinction of what we are trying to model clearer.

      1) cont: On the other hand, in the network models the weights being altered by backpropagation are changes in strength (since the network layers are all-to-all), corresponding more closely to LTP/LTD. And in general, standard supervised artificial neural network training more closely resembles LTP/LTD than changing which neurons are connected to which (and even if there is rewiring, these networks primarily rely on persistent weight changes at existing synapses).

      Although we did not highlight this particular biological mechanism because we wanted to keep the updates as general as possible, one could view the early versus late LTP. We have added an additional discussion of how the associative modulation mechanisms and backpropagation might biologically map into this mechanism in the discussion section.

      1) cont: Moreover, given the timescales of typical systems neuroscience tasks with input coming in on the 100s of ms timescale, the need for multiple repetitions to induce long-term plasticity, and the transient nature/short decay times of the synaptic modulations in the SM matrix, the SM matrix seems to be changing on a timescale faster than LTP/LTD and closer to STP mechanisms like facilitation/depression. So it was not clear to me what mechanism this was supposed to correspond to.

      We note that although the structure of the tasks certainly resembles known neuroscience experiments that happen on shorter time scales (and with the introduction of the 19 new NeuroGym tasks, even more so), we did not have a particular time scale for task effects in mind. So each piece of “evidence” in the integration tasks may indeed occur over significantly slower time scales and could abstractly represent multiple repetitions in order to induce (say) early phase LTP.

      Given that the separation between the two plasticity mechanisms may be clearer for STSP, and indeed many of the tasks we investigate may more naturally be mapped to tasks that occur on time scales more relevant to STSP, we have introduced a second modulation rule that is only dependent upon the presynaptic firing rates. See our response to the Essential Revisions above for additional details on these new results.

      2) A number of studies have explored using short-term plasticity mechanisms to store information over time and have found that these mechanisms are useful for general information integration over time. While many of these are briefly cited, I think they need to be further discussed and the current work situated in the context of these prior studies. In particular, it was not clear to me when and how the authors' assumptions differed from those in previous studies, which specific conclusions were novel to this study, and which conclusions are true for this specific mechanism as opposed to being generally true when using STP mechanisms for integration tasks.

      We have added additional works to the related works sections and expanded the introduction to try to better convey the differences with our work and previous studies. Briefly, mostly our assumptions differed from previous studies in that we considered a network that relied only on synaptic modulations to do computations, rather than a network with both recurrence and synaptic modulations. This allowed us to isolate the computational power and behavior of computing using synaptic modulations alone.

      It is hard to say which of the conclusions are generally true when using STP mechanisms for integration tasks without a comprehensive comparison of the various models of STP on the same tasks we investigated here. That being said, we believe we have presented in this work conclusions that are not present in other works (as far as we are aware) including: (1) a demonstration of the strength of computing with synaptic connection on a large variety of sequential tasks, (2) an investigation into the dynamics of such computations how they might manifest in neuronal recordings, and (3) a brief look at how these different dynamics might be computational beneficial in neuroscience-relevant areas. We also note that one reason for the simplicity of our mechanism is that we believe it captures many effects of synaptic modulations (e.g. gradual increase/decrease of synaptic strength that eventually saturates) with a relatively simple expression, and so we believe other STP mechanisms would yield qualitatively similar results. We have edited the text to try to clarify when conclusions are novel to this study and when we are referencing results from other works.

      Reviewer #2 (Public Review):

      On the other hand, the general principle appears (perhaps naively) very general: any stimulus-dependent, sufficiently long-lived change in neuronal/synaptic properties is a potential memory buffer. For instance, one might wonder whether some non-associative form of synaptic plasticity (unlike the Hebbian-like form studied in the paper), such as short-term synaptic plasticity which depends only on the pre-synaptic activity (and is better motivated experimentally), would be equally effective. Or, for that matter, one might wonder whether just neuronal adaptation, in the hidden layer, for instance, would be sufficient. In this sense, a weakness of this work is that there is little attempt at understanding when and how the proposed mechanism fails.

      We have tried to address if the simplicity of the tasks considered in this work may be a reason for the MPN’s success by training it on 19 additional neuroscience tasks (see response to Essential Revisions above). Across all these additional tasks, we found the MPN performs comparable to its RNN counterparts.

      To address whether associativity is necessary in our setup we have introduced a version of the MPN that has modulation updates that are only presynaptic dependent. We call this the “MPNpre” and have added several results across the paper addressing its computational abilities (again, additional details are provided above in Essential Revisions). We find the MPNpre has dynamics that are qualitatively the same as its MPN counterpart and has very comparable computational capabilities.

      Certainly, some of the tasks we consider may also be solvable by introducing other forms of computation such as neuronal adaptation. Indeed, we believe the ability of the brain to solve tasks in so many different ways is one of the things that makes it so difficult to study. Our work here has attempted to highlight one particular way of doing computations (via synapse dynamics) and compared it to one particular other form (recurrent connections). Extending this work to even more forms of computation, including neuronal dynamics, would be very interesting and further help distinguish these different computational methods from one another.

      Reviewer #3 (Public Review):

      Because the MPN is essentially a low-pass filter of the activity, and the activity is the input - it seems that integration is almost automatically satisfied by the dynamics. Are these networks able to perform non-integration tasks? Decision-making (which involves saddle points), for instance, is often studied with RNNs.

      We have tested the MPN on 19 additional supervised learning tasks found in the NeuroGym package (Molano-Mazon et. al., 2022), which consists of several decision-making-based tasks and added these results to the main text (see response to Essential Revisions above, and also Figs. 7i & 7j). Across all tasks we investigated, we found the MPN performs at comparable levels to its RNN counterparts.

      Manuel Molano-Mazon, Joao Barbosa, Jordi Pastor-Ciurana, Marta Fradera, Ru-Yuan Zhang, Jeremy Forest, Jorge del Pozo Lerida, Li Ji-An, Christopher J Cueva, Jaime de la Rocha, et al. “NeuroGym: An open resource for developing and sharing neuroscience tasks”. (2022).

      The current work has some resemblance to reservoir computing models. Because the M matrix decays to zero eventually, this is reminiscent of the fading memory property of reservoir models. Specifically, the dynamic variables encode a decaying memory of the input, and - given large enough networks - almost any function of the input can be simply read out. Within this context, there were works that studied how introducing different time scales changes performance (e.g., Schrauwen et al 2007).

      Thank you for pointing out this resemblance and work. In our setup, the fact that lamba is the same for the entire network means all elements of M decrease uniformly (though the learned modulation updates may allow for the growth of M to be non-uniform). One modification that we think would be very interesting to explore is the effects on the dynamics of non-uniform learning rates or decays across synapses. In this setting, the M matrix could have significantly different time scales and may even further resemble reservoir computing setups. We have added a sentence to the discussion section discussing this possibility.

      Another point is the interaction of the proposed plasticity rule with hidden-unit dynamics. What will happen for RNNs with these plasticity rules? I see why introducing short-term plasticity in a "clean" setting can help understand it, but it would be nice to see that nothing breaks when moving to a complete setting. Here, too, there are existing works that tackle this issue (e.g., Orhan & Ma, Ballintyn et al, Rodriguez et al).

      Thank you for pointing out these additional works, they are indeed very relevant and we have added them all to the text where relevant.

      Here we believe we have shown that either recurrent connections or synaptic dynamics alone can be used to solve a wide variety of neuroscience tasks. We don’t believe a hybrid setting with both synaptic dynamics and recurrence (e.g. a Vanilla RNN with synaptic dynamics) would “break” any part of this setup. Since each of the computational mechanisms could be learned to be suppressed the network could simply solve the task by relying on only one of the two mechanisms. For example, it could use a strictly non-synaptic solution by driving eta (the learning rate of the modulations) to zero or it could use a non-recurrent solution by driving the influence of recurrent connections to be very small. Orhan & Ma mention they have a hard time training a Vanilla RNN with Hebbian modulations on the recurrent weights for any modulation effect that goes back more than one time step, but unlike our work they rely on a fixed modulation strength.

      Indeed, we think how networks with multiple computational mechanisms will solve tasks is a very interesting question to be further investigated, and a hybrid solution may be likely. We believe our work is valuable in that it illuminates one end of the spectrum that is relatively unexplored: how such tasks could be solved using just synaptic dynamics. However, what type of solution a complete setup ultimately lands on is likely largely dependent upon both the initialization and the training procedure, so we felt exploring the dynamics of such networks was outside the scope of this work.

      One point regarding biological plausibility - although the model is abstract, the fact that the MPN increases without bounds are hard to reconcile with physical processes.

      Note although the MPN expression does not have explicit bounds, in practice the exponential decay eventually does balance with the SM matrix updates, and so we observe a saturation in its size (Fig. 4c, except for the case of lamba=1.0, which is not considered elsewhere in the text). However, we explicitly added modulation bounds to the M matrix update expression and did not find it significantly changed the results (see comments on Essential Revisions above for details).

    1. Author Response

      Reviewer #2 (Public Review):

      Here I will mainly comment on the biology of adipocytes, which is my specialty.

      In this manuscript, it has been very convincingly shown that O-GlcNAc acts as an important regulator of MSC differentiation in mice, and given previous studies in which O-GlcNAc is regulated by aging and nutritional status, it makes sense that this PTM determines differentiation and BM niche.

      The point that O-GlcNAc regulates adipocyte differentiation is convincing, but there are already previous studies using 3T3-L1 (e.g., Biochemical and Biophysical Research Communications 417 (2012) 1158-1163), and a more step-by-step demonstration of the molecular mechanism would make this an excellent paper that can be extended to adipocyte research in general, not just BM.

      While O-GlcNAc has been demonstrated in regulating many aspects of metabolic physiology, our understanding of its role in adipogenesis has been limited so far. As the reviewer pointed out, there was an in vitro report on its inhibition of adipogenesis in 3T3-L1 cells (Ji et al., 2012). Two recent publications from Dr. Xiaoyong Yang’s group revealed the profound role of mature white adipocytes OGT in regulating lipolysis and obesity (Li et al., 2018; Yang et al., 2020). To my knowledge, our manuscript is the first attempt to address the regulation of adipogenesis by O-GlcNAc in vivo. While using the BMSCs as a non-conventional model, we speculate our molecular mechanisms (i.e., O-GlcNAc inhibition of C/EBPβ) could be conserved in peripheral adipose organs, including white and brown adipose tissues. Future experiments are warranted in the lab to extend the current knowledge to these adipocyte progenitors. Nonetheless, I would also like to point out that, due to the broad actions of OGT and the current lack of adipocyte progenitor specific Cre animal tools, such efforts might be futile as results can be confounded by defects in other organs/cells.

      It is somewhat unclear whether or not the authors' in vitro experiments using 10T1/2 cells accurately reflect what is happening in vivo in knockout mice. The PDGFRa+VCAM1+ population of adipocyte progenitors shown by the authors is upregulated by about 30% by knockout of Ogt (Figure 4C). How significant is this difference? Rather, might the expression of Pparg, which indicates lineage commitment, be the underlying mechanism? In any case, this manuscript is highly impactful in the sense that the differentiation of adipocytes forming the BM niche can be controlled using tissue-specific knockouts of the Ogt gene.

      We agree with the reviewer that the role of OGT in BMSC fate determination and adipogenesis might be multifaceted. The 30% increase in PDGFRa+VCAM1+ BM adipose progenitors cannot fully explain the massive adipogenesis observed in OgtΔOsx animals (Fig. 4A). Indeed, we provided in vitro evidence that genetic deletion or chemical inhibition of OGT activates adipogenesis (Fig. 4D-I). Mechanistically, we found the O-GlcNAcylation of C/EBPβ protein (but not PPARγ) is responsible in the inhibition, which leads to reduced expression of adipogenic genes, including Pparg (Fig. 4H).

    1. Author Response

      Reviewer #1 (Public Review):

      The paper states that they observed a combined total of 77,017 single-nucleotide variants (SNVs) and 12,031 insertion/deletions (In/Dels) across all tissue, age, and intervention groups. Collectively, these data represent the largest collection of somatic mtDNA mutations obtained in a single study to date. However, A study with more somatic mtDNA mutations by the LostArc method (PMID 32943091) revealed 35 million deletions (~ 470,000 unique spans) in skeletal muscle from 22 individuals with and 19 individuals without pathogenic variants in POLG. Thus, the authors should reword this part to say that this study represents the largest collections of mouse mtDNA point mutations detected, but not the largest amount of mutations (deletions exceed this number).

      Thank you for pointing this out. When we wrote that sentence, we were more referring to small polymerase-based errors, as opposed to larger structural variants that likely arise from a different mechanism. However, the distinction between these two event classes is poorly defined. We have amended our statement and have added a citation to Lujan et al. Our statement now reads “We observed a combined total of 77,017 single-nucleotide variants (SNVs) and 12,031 small insertion/deletions (In/Dels) (≲15bp in size) across all tissue, age, and intervention groups. Collectively, these data represent the largest collection of somatic mtDNA point mutations obtained in a single study to date and is second only to Lujan et al. in terms overall In/Del counts (Lujan et al., 2012).” (Lines 252-256)

      What is the theoretical limit of pt mutations in the mitochondrial genome, assuming only one pt mutation per genome? Doesn't 77000 detected independent pt mutations approach that limit? Can the authors estimate how many molecules contained two or more pt mutations? Did the analysis reveal any un-mutated regions implying an essential function? For example, on p.9 can the authors provide an explanation of why OriL and other G/C-rich regions were not uniformly covered as compared to the rest of the genome?

      This is an interesting question and one we’ve given some thought to. In fact, this basic question was the inspiration for our recent Nucleic Acids Research paper (PMC8565317) where we asked how mutations were distributed in the genome. The short answer is that we likely exceed the limit for only dG site mutations (and only for G>A mutations, at that), but not the other reference sites. The reason is that there are only 2013 dG sites and the mutation spectrum is heavily skewed toward G>X (there are 47,680 dG site mutations, 42,924 of which are G>A). In comparison, we observe only 4,421 A>X, 9,277 T>X, and 15,632 C>X mutations, but with 5,629, 4,681, and 3,976 dA, dT, and dC genomic sites, respectively. Assuming the mutations are uniformly distributed along the genome (which they are not; see our NAR paper), then random binomial sampling would require a fair amount more mutations in order to reach saturation for the other genomic sites. The uneven distribution increases this number further.

      With regard to the second question, we can’t actually do this estimation with this data set. The reason is because the ~77,000 mutations aren’t found in a single sample, but are distributed across may independent or semi-independent (i.e. different organs within a mouse), which means that most, if not all, of the mutations are necessarily on different mtDNA molecules.

      With regard to the OriL and G/C rich regions, these presumably have some sort of secondary structure that prevents the sequencer from obtaining any useful information. However, this is all speculative and we don’t know why. Interestingly, human mtDNA doesn’t show this dip at the OriL, despite a similar function and location in the mtDNA.

      Given that mitochondrial disease usually doesn't present until >60% of the genomes are affected, the very low level of detected pt mutations observed in the mouse (and presumably similar to human) would mean that they are well below a physiological level. Thus, these low-level pt mutations are well tolerated. Can the authors estimate a theoretical age of the mouse (well beyond their life span) where over 50% of the genomes carry at least one pt mutation?

      The reviewer brings up a frequent noted point in mitochondrial biology that is very much worth addressing in this manuscript. The often-cited statistic that mitochondrial disease doesn’t present until ~60% of genomes are affected is, while true, only pertinent to overt mitochondrial diseases, such as LHON, MERRF, etc, where all or nearly all cells in an individual are affected by the mutation. However, the impact of mtDNA mutations is not only contingent on how many cells have the mutation, but also the fraction of mtDNA molecules within a cell that harbor the variant. Because the deleterious effects of a mtDNA mutation act at the level of individual cells, it is important to know both how many cells harbor a mutation as well as what the heteroplasmic level is within the cell before making claims on their pathological impact.

      To date, nearly all studies on mtDNA mutations rely on bulk DNA analysis from thousands to millions of cells, which necessarily decouples variant phasing information between any two reads, resulting in a loss of important biological information such as the heteroplasmic level within any given cell. As such, with bulk sequencing it is impossible to tell the difference between a homoplasmic mutation in a small subset of cells and heteroplasmic mutation in all cells. In the first case, the cells harboring this mutation would be negatively impacted, whereas in the second example, it is unlikely. One can imagine a scenario where every cell contains a different homoplasmic pathogenic mutation which would negatively affect cellular function for every cell. In this case, mutations would be highly prevalent (100% of cells), yet individually rare. However, bulk sequencing would give the appearance that no mutation comes close to exceeding the phenotypic threshold. We highlight this issue in a recent review (Sanchez-Contreras and Kennedy, 2022; PMC8896747).

      The point that the review brings up is extremely important, so we have added a section in the discussion related to heteroplasmy versus clones.

      Also, the problem with this low level of pt mutations is that they are not physiological, the effect of the drug treatment causing a reduction in ROS-mediated transversions would not be expected to have a detectable effect on mitochondria. The improvement on mitochondrial seen by others is most likely independent of the mutations in the genome. There needs to be a cause and effect here and I don't see one.

      It is important to note that we do not make the claim (no do we want to imply) that the reduction of mutations is the reason behind the improvements in mitochondrial function by these interventions. Instead, we believe that loss of ROS-linked mutations is a consequence of the mechanism by which these interventions work. We do hypothesize that the reduction in ROS-linked mutations suggests that “there is tissue specificity in how cells repair and/or destroy oxidatively damaged mitochondria and/or mtDNA resulting in a steady-state of ROS-linked mutations.” (Lines 551-553) and that “We propose that rather than the incidence and impact of ROS damage on mtDNA being minimal, recognition and removal of ROS-linked mutations are maintained at a steady state during aging.” (Lines 572-574).

      In addition, as noted above, how “low level” these mutations are and their impact on cellular function is not easily determined in bulk sequencing studies, so a strong link between cause and effect is not an answerable relationship with this data set.

      There's no mention in this paper and methodology about how point mutations in nuclear-encoded mtDNA (NUMTs) are excluded from the reads and I'm worried that these errors are being read as rare errors in the mtDNA genome. While NUMTs have been documented for decades, a recent report in Science (PMID: 36198798) documents how frequently and fluidly NUMTs occur. Can the authors provide a clear explanation of how mutations in NUMTs are excluded?

      The reviewer is absolutely correct to call attention to this important aspect of mitochondrial biology. We don’t believe NUMTs are an important confounder in our data set for several reasons.

      1) We used isogenic inbred C57Blk6/J which, frequently, were litter mates (siblings). Therefore, any mutations from NUMTS that are there would be expected to be uniform across samples, especially between tissues from a single sample animal. Unknown and variations of NUMTS would certainly be a potentially strong confounder in an outbred population, but the use of one isogenic inbred line for this study likely eliminates this confounder.

      2) We used the mm10 reference genome which is based on the C57Blk6/J strain so any NUMTS derived variants present in our mtDNA data should preferentially align against the NUMT. Therefore, we perform a BLAST step of all reads containing at least one variant against the mm10. BLAST is much more sensitive to sequence variation compared to bwa but is far slower, so it is impractical to run as the initial aligner. We then reassign the read based to whatever genomic location has the lower e-score. The result is typically around a dozen reads are removed, demonstrating that NUMTS are not likely a major source of false mutations.

      3) Because NUMTS are inherited, then any variants would be found across all the tissues and animals we used in this study. As part of our processing, we mark and remove variants shared between multiple individual samples.

      We have made edits to the Methods section (Lines 198-206) to more explicitly highlight the filtering steps and the logic behind them. In addition, we have added a paragraph in the discussion that addresses NUMTs (Starting on line 642).

      Reviewer #2 (Public Review):

      A common problem in mutation analysis is that DNA damage (present in one strand) is difficult to separate from real mutations (present in both strands). One of the approaches to solve this problem based on independent tagging of the two strands by different unique molecular identifiers was developed by the authors about 10 years ago. This study summarizes the application of this method to a wide range of mouse tissues, ages, and drug treatment regimes. Much of the results confirm previous conclusions from this laboratory. This involves overall mutational levels of somatic mtDNA mutations (~10-6-10-5), their accumulation with age, the prevalence of GA/CT transitions, and their clonality. Although these results were not new, it is important that these were confirmed in a single study with high confidence in a huge number of independent mutations.

      We thank the reviewer for the comment and really hope this data set will be of significant use to other researchers given its breadth of sample types and large number of mutations.

      What really sets this study apart from other studies is the detection of a large proportion of transversion mutations, primarily of the C>A/G>T and C>G/G>C types. Transversions are traditionally considered 'persona non grata' in mtDNA mutational spectra and are typically associated with errors of mutational analysis (which they in fact are). The presence of these mutations in both strands of the duplex makes a good case that these mutations are real, rather than converted damage. However, because this is such a novel discovery and because regular controls do not work (I mean, for example, that these mutations never clonally expand. If there is a clonal expansion, then the mutation is real, only real mutation can expand. But in the case of non-expandable C>A/G>T and C>G/G>C this control does not help to validate these mutations), it would be nice to provide extra assurances that this is not some kind of artifact that somehow slipped through the ds sequencing procedure. I would recommend including in the supplement the data on the abundance of single-stranded base changes as detected by ds sequencing (i.e., changes confirmed in one and not in the other strand of a given molecule). An unusually high presence of such single-stranded changes of the C>A/G>T and C>G/G>C type would be a red flag for me. If ratios of single and double-stranded mutations were similar for transitions and transversions - that would reassure me and hopefully the reader.

      Furthermore, a similar excess of C>A/G>T and C>G/G>C has been observed in a recent paper by Abascal 2021 (cited in the manuscript). In that paper, a UMI- free, but otherwise very similar ds sequencing approach in nuclear DNA (BotSeqS) was demonstrated to suffer from an artifact causing (among other effects) an excess of C>A/G>T and C>G/G>C transversions. This artifact is related to end repair and nick-translation of DNA fragments during library preparation. Because BotSeqS is very similar to ds sequencing, we expect that same artifact may be taking place in the study under review. We recommend running checks similar to those undertaken by Abascal et al (which include, at the very minimum, checking the distribution of the C>A/G>T and C>G/G>C transversions within the reads (artifacts tend to be concentrated towards the ends of the reads).

      The reviewer is absolutely correct to bring up this extremely important point. We have addressed these concerns in two ways that are addressed on Lines 332-361. 1) by performing an analysis of the single-stranded consensus data, which is a measure of PCR artifacts that frequently arise as a function of DNA damage, across all the tissues of the aged cohort. We noted no differences between tissues, which indicates that the amount of ROS-induced PCR artifacts is no different between the tissues. Thus, it would require a different rate at which ROS artifacts lead to false “Duplex consensus” variants that is tissue specific. The analysis is presented in Figure 3-figure supplement 2. 2) we have included an experiment in which we show that treatment of post-fragmented DNA with FPG, a glycosylase that targets Fapy-dG and 8-oxo-dG, does not differ from untreated control DNA. Because Duplex-Seq requires that both strands of a parent DNA molecule be present to form a final Duplex Consensus Sequence, the scission of one strand by the lyase activity of FPG would prevent the formation of this final consensus and prevent this sort of error from “bleeding through”. This analyses can now be found in a Figure 3-figure supplement 3.

      Of note, even if transversions detected in this study prove to be artifacts of the Abascal type (likely) they still may reflect real ss damage in mtDNA (not instrumental artifacts, like sequencing errors or in vitro DNA damage). This is supported by the strong variation in the levels of transversions across tissues and as a result of the ameliorating drug intervention. Artifacts, in contrast, would be expected to be at a constant level. This logic, however, does not differentiate between real ds mutations and ss damage. So UMI-based ds sequencing evidence remains the only (though very strong) independent proof. So, in my view, whereas the jury may be still out on whether the observed transversions are true ds mutations or some kind of single-stranded damage, this is a critically important observation. The evidence of ss damage greatly varied between tissues and detected with such precision on a single molecule level is a very important finding as well.

      Out of caution, I would recommend mentioning the above-stated uncertainty and noting that more research is needed to fully confirm that C>A/G>T and C>G/G>C changes detected in this study are indeed double-stranded mutations.

      We agree. Together with comments from Reviewer #1 regarding NUMTs (Comment #5), we have added a paragraph in the Discussion about potential alternative explanations for our observations.

    1. Author Response

      Reviewer #1 (Public Review):

      Reviewer 1 confirmed the view that your paper provides new insight into YTHDC1 function in regulating SC activation/proliferation but added that some of the data could be improved to fully support the conclusions. Specifically:

      The title "Nuclear m6A Reader YTHDC1 Promotes Muscle Stem Cell Activation/Proliferation by Regulating mRNA Splicing and Nuclear Export" seems a bit overstated. Their data are not sufficient to show YTHDC1 regulating nuclear export. From figure 6 we could see some mRNAs export was inhibited upon YTHDC1 loss but intron retention also occurs on these mRNAs, for example, Dnajc14. Since intron retention could lead to mRNA nuclear retention, the mRNA export inhibition may be caused by splicing deficiency. From the data they provided we could not draw the conclusion that YTHDC1 directly affects mRNA export. I think they could not emphasize this point in the title.

      Thanks for the suggestion. It is true that in our initial submission, we had more data to support YTHDC1 regulation of mRNA splicing but not enough on nuclear export. It will take substantial amount of time and efforts to have thorough dissection on both mechanisms. Nevertheless, we argue that our data does provide evidence on YTHDC1 regulation of nuclear export. For example, in Figures 6 C, H, and M, only ~20% of the target mRNAs (such as Dnaj14) showed alteration in both splicing and export upon YTHDC1 loss while the majority of the export targets showed no splicing deficiency. For example, Btbd7 and Tiparp in Figure 6 N showed no intron retention. In addition, we have now performed Co-IP experiments to validate the interaction between YTHDC1 and THOC7 (new result added in Figure 7L), which provides extra evidence to support YTHDC1 function in regulating mRNA nuclear export. We thus would like to keep the original title in order to reflect the multifaceted function of YTHDC1 in muscle stem cells.

      The mechanism of YTHDC1 promoting muscle stem cell activation/proliferation is not solidified. The authors could strengthen their evidence through bioinformatics analysis or give more discussion. Besides, the previous work done by Zhao and colleagues (Zhao et al,. Nature 542, 475-478 (2017).) reported another m6A reader Ythdf2 promotes m6A-dependent maternal mRNA clearance to facilitate zebrafish maternal-to-zygotic transition. Does YTHDC1 regulate mRNA clearance during SC activation/proliferation? The authors should explore this possibility by deep-seq data analysis and give some discussion.

      Thanks for the critical comment. For the first concern, we think YTHDC1 promotes muscle stem cell activation/proliferation through the multi-level gene regulatory capabilities of YTHDC1 on both transcriptional and post-transcriptional processes and the myriads of targets regulated by YTHDC1. In addition, with the newly added data, we believe that YTHDC1’s function is largely dependent on its synergism with hnRNPG (Figure 7 K). We have added the discussion in lines 421-427 of the revised text. For the second question, our data showed that YTHDC1 predominantly localizes in the nucleus of SCs and myoblasts (Figure 1 F&G), thus it may not have a role in regulating mRNA clearance in the cytoplasm like YTHDF2. Nevertheless, there are a few existing reports1, 2 suggesting its possible role in mRNA degradation and stability which may arise from its transient shuttling to cytoplasm of cells. We have now added this point in lines 469-472 of the revised text.

      Reviewer #2 (Public Review):

      Reviewer 2 was similarly positive stating that several tour-de-force techniques were used to examine m6A and the biological consequence in satellite cells and that there was a large amount of data supporting the conclusions with only a few minor weaknesses.

      General points: The main body is lengthy, and some content can be reduced or condensed. For example, RNA-seq was used to determine gene expression in WT and cKO cells, but the purpose of this is not well justified given that YTHDC1 mainly functions to regulate splicing and nuclear expert of mRNA rather than controlling their expression levels. Does the RNA-seq data suggest that YTHDC1 may also regulate gene expression independent of m6A reader function?

      Thanks for the comment. We have now revised the entire text to condense the content. Nevertheless, we must point out that the purpose of the RNA-seq is to provide extra evidence for the proliferation defect of the YTHDC1 KO cells but not to search for the underlying mechanism. We have now revised in lines 159-160 to clarify this.

      Reference:

      1. Shima, H., Matsumoto, M., Ishigami, Y., Ebina, M., Muto, A., Sato, Y., Kumagai, S., Ochiai, K., Suzuki, T. & Igarashi, K. S-Adenosylmethionine Synthesis Is Regulated by Selective N(6)-Adenosine Methylation and mRNA Degradation Involving METTL16 and YTHDC1. Cell Rep 21, 3354-3363 (2017).
      2. Zhang, Z., Wang, Q., Zhao, X., Shao, L., Liu, G., Zheng, X., Xie, L., Zhang, Y., Sun, C. & Xu, R. YTHDC1 mitigates ischemic stroke by promoting Akt phosphorylation through destabilizing PTEN mRNA. Cell Death Dis 11, 977 (2020).
      3. He, P.C. & He, C. m(6) A RNA methylation: from mechanisms to therapeutic potential. EMBO J 40, e105977 (2021).
      4. Widagdo, J., Anggono, V. & Wong, J.J. The multifaceted effects of YTHDC1-mediated nuclear m(6)A recognition. Trends Genet 38, 325-332 (2022).
      5. Sheng, Y., Wei, J., Yu, F., Xu, H., Yu, C., Wu, Q., Liu, Y., Li, L., Cui, X.L., Gu, X., Shen, B., Li, W., Huang, Y., Bhaduri-Mcintosh, S., He, C. & Qian, Z. A Critical Role of Nuclear m6A Reader YTHDC1 in Leukemogenesis by Regulating MCM Complex-Mediated DNA Replication. Blood (2021).
      6. Cheng, Y., Xie, W., Pickering, B.F., Chu, K.L., Savino, A.M., Yang, X., Luo, H., Nguyen, D.T., Mo, S., Barin, E., Velleca, A., Rohwetter, T.M., Patel, D.J., Jaffrey, S.R. & Kharas, M.G. N(6)-Methyladenosine on mRNA facilitates a phase-separated nuclear body that suppresses myeloid leukemic differentiation. Cancer Cell 39, 958-972 e958 (2021).
      7. Chen, C., Liu, W., Guo, J., Liu, Y., Liu, X., Liu, J., Dou, X., Le, R., Huang, Y., Li, C., Yang, L., Kou, X., Zhao, Y., Wu, Y., Chen, J., Wang, H., Shen, B., Gao, Y. & Gao, S. Nuclear m(6)A reader YTHDC1 regulates the scaffold function of LINE1 RNA in mouse ESCs and early embryos. Protein Cell 12, 455-474 (2021).
      8. Xiao, W., Adhikari, S., Dahal, U., Chen, Y.S., Hao, Y.J., Sun, B.F., Sun, H.Y., Li, A., Ping, X.L., Lai, W.Y., Wang, X., Ma, H.L., Huang, C.M., Yang, Y., Huang, N., Jiang, G.B., Wang, H.L., Zhou, Q., Wang, X.J., Zhao, Y.L. & Yang, Y.G. Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing. Mol Cell 61, 507-519 (2016).
      9. Webster, M.T., Manor, U., Lippincott-Schwartz, J. & Fan, C.M. Intravital Imaging Reveals Ghost Fibers as Architectural Units Guiding Myogenic Progenitors during Regeneration. Cell Stem Cell 18, 243-252 (2016).
      10. Yankova, E., Blackaby, W., Albertella, M., Rak, J., De Braekeleer, E., Tsagkogeorga, G., Pilka, E.S., Aspris, D., Leggate, D., Hendrick, A.G., Webster, N.A., Andrews, B., Fosbeary, R., Guest, P., Irigoyen, N., Eleftheriou, M., Gozdecka, M., Dias, J.M.L., Bannister, A.J., Vick, B., Jeremias, I., Vassiliou, G.S., Rausch, O., Tzelepis, K. & Kouzarides, T. Small-molecule inhibition of METTL3 as a strategy against myeloid leukaemia. Nature 593, 597-601 (2021).
      11. Otto, A., Schmidt, C., Luke, G., Allen, S., Valasek, P., Muntoni, F., Lawrence-Watt, D. & Patel, K. Canonical Wnt signalling induces satellite-cell proliferation during adult skeletal muscle regeneration. J Cell Sci 121, 2939-2950 (2008).
      12. Liu, J., Gao, M., He, J., Wu, K., Lin, S., Jin, L., Chen, Y., Liu, H., Shi, J., Wang, X., Chang, L., Lin, Y., Zhao, Y.L., Zhang, X., Zhang, M., Luo, G.Z., Wu, G., Pei, D., Wang, J., Bao, X. & Chen, J. The RNA m(6)A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322-326 (2021).
      13. Xu, W., Li, J., He, C., Wen, J., Ma, H., Rong, B., Diao, J., Wang, L., Wang, J., Wu, F., Tan, L., Shi, Y.G., Shi, Y. & Shen, H. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317-321 (2021).
      14. Roberson, P.A., Romero, M.A., Osburn, S.C., Mumford, P.W., Vann, C.G., Fox, C.D., McCullough, D.J., Brown, M.D. & Roberts, M.D. Skeletal muscle LINE-1 ORF1 mRNA is higher in older humans but decreases with endurance exercise and is negatively associated with higher physical activity. J Appl Physiol (1985) 127, 895-904 (2019).
      15. Mumford, P.W., Romero, M.A., Osburn, S.C., Roberson, P.A., Vann, C.G., Mobley, C.B., Brown, M.D., Kavazis, A.N., Young, K.C. & Roberts, M.D. Skeletal muscle LINE-1 retrotransposon activity is upregulated in older versus younger rats. Am J Physiol Regul Integr Comp Physiol 317, R397-R406 (2019).
    1. Author Response

      Reviewer #1 (Public Review):

      Laurent et al. generate genotyping data from 259 individuals from Cabo Verde to investigate the histories and patterns of admixture in the set of islands that make up Cabo Verde. The authors had previously studied admixture in an earlier study but in a smaller set of individuals from two cities on one island (from Santiago) in Cabo Verde. Here, the authors sample from all the islands of Cabo Verde to study admixture in these islands and reveal that there is a varied picture of admixture in that the demographic histories are distinct amongst this set of islands.

      I found the article interesting and clearly written, and I like that it highlights that admixture is a dynamic process that has manifested differently in distinct geographical regions, which will be of broad interest. It also highlights how genetic ancestry patterns are correlated with the populations that were in power/enslaved during colonial times and proposes that certain social practices (e.g. legally enforced segregation) might have affected the distribution/length of runs of homozygosity.

      We thank the reviewer for this positive and encouraging appreciation of our work.

      My main suggestion is that the authors provide a set of hypotheses regarding admixture that may explain their observations, and it would be nice to see if at least one of these has some support using simulations. Could the authors run simulations under their proposed demographic model for populations in Cabo Verde vs what we would expect in a pseudo-panmictic population with two sources of admixture? The authors probably already have simulations they could use. And then see how pre/post admixture founding events change patterns of ancestry.

      As suggested by the reviewer, in the revised version of the manuscript, we conducted the same MetHis-ABC scenario-choice and posterior parameter inference considering the 225 Cabo Verde-born individuals as a single random-mating population, in addition to our main results considering each island of birth separately. Most interestingly, we find that our ABC inferences fail to accurately reconstruct the detailed admixture history of Cabo Verde when considered as a whole instead of per each island of birth separately. This is due to admixture histories substantially differing across islands of birth of individuals, also consistent with the significantly differentiated genetic patterns within Cabo Verde obtained from ADMIXTURE, local-ancestry inferences, ROH, and isolation-by-distance analyses. These results are now implemented throughout the revised version of the manuscript and in supplementary figures and tables. See in particular Results L758-769, and Appendix1-figures and tables, Figure7-figure supplement 1-3, and Appendix 5-table 10.

      Reviewer #2 (Public Review):

      In this article, the authors leveraged patterns on the empirical genomic data and the power of simulations and statistical inferences and aimed to address a few biologically and culturally relevant questions about Cabo Verde population's admixture history during the TAST era. Specifically, the authors provided evidence on which specific African and European populations contributed to the population per island if the genetic admixture history parallels language evolution, and the best-fitting admixture scenario that answers questions on when and which continental populations admixed on which island, and how that influenced the island population dynamics since then.

      Strengths

      1) This study sets a great example of studying population history through the lens of genetics and linguistics, jointly. Historically most of the genetic studies of population history either ignored the sociocultural aspects of the evidence or poorly (or wrongly) correlated that with genetic inference. This study identified components in language that are informative about cultural mixture (strictly African-origin words versus shared European-African words), and carefully examined the statistical correlation between genetic and linguistic variation that occurred through admixture, providing a complete picture of genetic and sociocultural transformation in the Cabo Verde islands during TAST.

      We thank the reviewer for this very enthusiastic and encouraging comment on our work.

      2) The statistical analyses are carefully designed and rigorously done. I especially appreciate the careful goodness-of-fit checking and parameter error rates estimation in the ABC part, making the inference results more convincing.

      Again, we thank the reviewer for this positive comment.

      Weaknesses

      1) Most of the methods in the main analyses here were previously developed (eg. MDS, MetHis, RF/NN-ABC). However, when being introduced and applied here, the authors didn't reinstate the necessary background (strength and weakness, limitations and usage) of these methods to make them justifiable over other methods. For example, why ADS-MDS is used here to examine the genetic relationship between Cabo Verde populations and other worldwide populations, rather than classic PCA and F-statistics?

      As mentioned in the answer to the general comments, we extensively modified our manuscript in both Results and Material and Methods, to clarify and justify our reasoning for each one of the analyses conducted, and to discuss pros and cons of the methods used. We warmly thank the reviewers for this request, as we believe it allowed us to strongly improve the accessibility of our work in particular for the less specialized audience, as well as equally crucially improve replicability of our work for specialists. See in particular Results L185-193, L245-250, L368-371, L380-386, L495-511, L567-571, L606-621, and the corresponding Material and Methods sections.

      For the particular example of PCA raised by the reviewer: see Results L185-193.

      For that of F-statistics, see Results L368-386. Note that we added the F-stat analysis suggested by the reviewer to the revised version of our manuscript (see detailed answers below), Figure 3-figure supplement 2.

      We believe that these changes strongly strengthen our manuscript and enlarged its potential readership, and we thank, again, the reviewer for this request.

      2) The senior author of this paper has an earlier published article (Verdu et al. 2017 Current Biology) on the same population, using a similar set of methods and drew similar conclusions on the source of genetic and linguistic variation in Cabo Verde. Although additional samples on island levels are added here and additional analyses on admixture history were performed, half of the main messages from this paper don't seem to provide new knowledge than what we already learned from the 2017 paper.

      We substantially modified the text of the revised version of the manuscript to address the concern raised by the reviewer in numerous locations of the Abstract, Introduction and Results and Discussion sections, thus hoping to highlight better what we think is the profound novelty brought by this study. In particular, see Introduction L128-153.

      3) Furthermore, there are a few essential factors that could confound different aspects of the major analyses in this article that I believe should be taken into account and discussed. Such factors include the demographic history of source populations prior to admixture, different scenarios of the recipient population size changes, differences in recombination rates across the genome and between African and European populations, etc.

      We thank the reviewer for these comments which allowed us to improve the clarity of our manuscript and rise very interesting discussion points that we had overlooked. As indicated in part in the general answer to reviewers above:

      1) We clarified our methods’ design and discussed extensively its limitations with respect to ancestral populations’ sizes mis-specifications. Indeed, ancestral source population sizes are not modelized in our MetHis-ABC approach. Instead, we consider that the observed proxy source populations from Africa and Europe are at the drift-mutation equilibrium and are large since the initial and recent founding of Cabo Verde in the 1460’s, and thus use observed genetic variation patterns in these populations to build virtual gamete reservoirs for the admixture history of Cabo Verde with the MetHis-ABC framework. Therefore, while we cannot evaluate explicitly the influence of ancestral source population sizes differences on our inferences in Cabo Verde, as we now state in the revised version of our manuscript: “we nevertheless implicitly take the real demographic histories of these source populations into account in our simulations, as we use observed genetic patterns themselves the product of this demographic history to create the virtual source populations at the root of the admixture history of each Cabo Verdean island.”. We then discuss the outcome of such an approach which mimics satisfactorily the real data for ABC inference. See in particular the revised versions of the Material and Methods L1454-1491 novel section “Simulating the admixed population from source-populations for 60,000 independent SNPs with MetHis”, and Results L637-649.

      2) Concerning the possibilities for population-size changes in the admixed population in our simulations and ABC inferences, we clarified our Material and Methods and explanations of our Results to better show that we readily consider various possible scenarios (for each island separately). Indeed, with our MetHis simulation design, given values of model-parameters correspond either to a constant, a linearly increasing, or a hyperbolic increase in reproductive size in the admixed population over time. We further clarified our Results and Discussion pointing out that we find, a posteriori, indeed, different demographic regimes among islands.

      Nevertheless, reviewers are right that we did not test the possibility for bottlenecks. We thus substantially expanded the Results and Discussion sections in multiple locations to highlight this limitation and the challenges involved in overcoming it in future work. See in particular Material and Methods L1386-1404 section “Hyperbolic increase, linear increase, or constant reproductive population size in the admixed population”, Results L739-742, and Discussion L934-941, and Perspectives.

      3) Finally, concerning recombination rate, we considered only independent SNPs in our simulation and inference process, as is now clarified in multiple locations throughout the text. Otherwise, we further discuss matters of recombination concern regarding specifically our ROH analyses, as suggested in the detailed reviewer’s comments. In brief, we note that in Figure 8 Pemberton 2012 (AJHG 91:275-292) shows that occurrence of long ROH at the same genomic location across individuals is correlated with low recombination rates, although the effect is relatively weak unless in extreme recombination cold spots. Unless there were many extreme recombination cold spots that were different among the islands or ancestral populations, we anticipate fine-scale recombination rate differences not to matter very much for total ROH levels in these data. Similarly, we do not expect large genome-wide differences in mutation rate, and therefore we don’t anticipate minor local variation in mutation rates to make a systematic difference in total ROH levels. We now refer to these important points in the revised version of our Results L414-415.

      Overall, the paper is of interest to the field of human evolutionary genetics - that not only does it tell the story of a historically important population, but also the methodology behind this paper sets a great example for future research to study genetic and sociocultural transformations under the same framework.

      We would like to thank the reviewer for this very encouraging conclusion and for the detailed revision of our work which, we believe, helped us to substantially improve our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The heat shock effect in the drosophila lines was not understood in the study. Why did some lines show phenotypes only at 29C but not 22C? The study showed data that ubiquilin 2 expression was not impacted by 29C, then what caused the phenotypic differences? In addition, the method section did not describe clearly whether a temperature sensitive promoter was used in the flies.

      The heat inducibility of the UBQLN2 transgenes is likely attributed to heat shock elements in the UAS promoter as noted in on page 6, line 4-14. The heat inducibility of dUbqln is interesting and may reflect transcriptional and/or posttranscriptional mechanisms. While it is possible that increased UBQLN2 contributes to the severe phenotypes in UBQLN24XALS flies reared at 29C; this is not seen for UBQLN2WT and UBQLN2P497H flies. Instead, we postulate that heat stress synergizes with the misfolded UBQLN24XALS protein to disrupt proteostasis and/or endolysosomal function. This clarification has been added to paragraph 2 of the Discussion (page 16, line 15-25) section of the revised MS: “The reason for enhanced toxicity of UBQLN24XALS is unclear; however, its enhanced aggregation potential may overwhelm cellular proteostasis machinery and/or accelerate disease mechanisms that are slow to manifest in neurons harboring ALS point mutations. This is consistent with the fact that UBQLN24XALS toxicity in flies was unmasked by HS, which is a well-known inducer of proteotoxicity.” We have also explicitly state the HS inducibility of the UAS-Gal4 in the revised Materials and methods (page 20, line 24-25).

      2) The study showed data on male and female flies separately in some but not all experiments. In addition, the manuscript largely avoided discussing whether there was a sex difference in those experiments.

      We showed separate male and female eye phenotypes in Figure 1 to clearly demonstrate that UBQLN24XALS toxicity is not sex dependent. Subtle sex differences were seen in the longevity and climbing assays and were reported in figures 4A and 4D. In Figure 4D, Unc-5 silencing extended the lifespan of Elav>Gal4 female control flies but not Elav>Gal4 male control flies. In Figure 4A, an Unc-5 KK RNAi line rescued climbing of D42>UBQLN24XALS male flies, but not female flies (a second Unc-5 RNAi line rescued both males and females). The reasons for sex differences in these specific experiments is unclear.

      3) Some data appear to be peripheral with no significant contribution to the main findings. Moreover, some data were introduced but were not explained. For instance, the RNA-Seq analysis (Fig 2) did not contribute much to the study. The rescue effect of UBA* (F594A mutant) in Fig 1-Supplemental 1B was interesting but was not elaborated or followed up. FUS flies in Fig 6-Supplement 2 were abrupted introduced with little discussion.

      We understand the reviewer’s point or the reviewer’s point is well taken. Appreciating the reviewer’s comment, we moved both figures to the supplementary data.

      RNA-Seq (Fig. 2)

      Although not essential, the RNA-Seq adds experimental rigor to the study by providing strong molecular correlates to eye degeneration phenotypes across different UBQLN2 genotypes. It shows the unique toxicity of UBQLN24XALS and reinforces phenotypic similarity between UBQLN2WT and UBQLN2P497H flies, which likely reflects non-specific toxicity of overexpressed UBQLN2 proteins. We have carried out additional data analyses requested by the reviewer and moved the RNA-Seq data to Figure 1-figure supplement 2.

      UBA mutant (Figure1-figure supplement 1)

      Both aggregation and toxicity of UBQLN24XALS were abolished by an inactivating F594A mutation in the UBA domain. While this implicates Ub binding in the biochemical mechanism of UBQLN2 toxicity, we have not followed up on the finding in either fly or iMN models and have chosen to remove the data (Figure1-figure supplement 1) from the revised MS.

      Lack of genetic interaction between FUS and Unc-5 (Figure 3-figure supplement 1).

      This data was included to show that shUnc-5 is not a general suppressor of eye toxicity in Drosophila. This contrasts with lilliputian, whose mutation rescues toxicity phenotypes elicited by FUS, TDP-43, and UBQLN2. We believe that the FUS control data enhances experimental rigor and have retained the data in the revised MS, with some additional clarification on page 10, line 5-8.

      4) The main quadrupole (4XALS) mutation used in the study was not found in patients. The relevance of the findings needs to be thoroughly justified.

      The use of combinatorial mutants—either in the same gene or same pathway—can sometimes be used to enhance neurodegenerative phenotypes in cellular and rodent models for neurodegenerative diseases, most notably, Alzheimer’s Disease. In the case of the 4XALS mutant, we reasoned that its enhanced aggregation might drive stronger phenotypes than those elicited by UBQLN2 clinical alleles, whose toxicity is barely discernible in flies (relative to overexpressed UBQLN2WT) or in iMNs. We have clarified the rationale for testing the 4XALS mutant and articulated its potential strengths and weaknesses in Results (page 5, line 14-page 6, line 2) and Discussion (page 16, line 15-25) sections.

      5) ALS and FTD are age-related neurodegenerative diseases, whereas the involvement of axon guidance genes in indicative of disruptions during the developmental stage. The manuscript did not discuss this potential caveat.

      We have inserted the following sentence in the discussion to note this caveat: “Consistent with this notion, UNC5B has been linked to neurodegeneration in the 6-OHDA model of Parkinson’s Disease (PD) and UNC5C has been nominated as a risk allele in late-onset Alzheimer’s Disease. Defining the contributions of pathologic UNC5 signaling to the development or progression of ALS-dementia awaits further study.” on Page 20, line 2-6. We have added a similar sentence to the Limitations paragraph at the end of the Discussion: “Third, it is possible that axon guidance genes are only relevant to UBQLN2 toxicity in the context of the developing nervous system”.

    1. Author Response

      Reviewer #1 (Public Review):

      This work describes a new method, Proteinfer, which uses dilated neural networks to predict protein function, using EC terms and GO terms. The software is fast and the server-side performance is fast and reliable. The method is very clearly described. However, it is hard to judge the accuracy of this method based on the current manuscript, and some more work is needed to do so.

      I would like to address the following statement by the authors: (p3, left column): "We focus on Swiss Prot to ensure that our models learn from human-curated labels, rather than labels generated by electronic annotation".

      There is a subtle but important point to be made here: while SwissProt (SP) entries are human-curated, they might still have their function annotated ("labeled") electronically only. The SP entry comprises the sequence, source organism, paper(s) (if any), annotations, cross-references, etc. A validated entry does not mean that the annotation was necessarily validated manually: but rather that there is a paper backing the veracity of the sequence itself, and that it is not an automatic generation from a genome project.

      Example: 009L_FRG3G is a reviewed entry, and has four function annotations, all generated by BLAST, with an IEA (inferred by electronic annotation) evidence code. Most GO annotations in SwissProt are generated that way: a reviewed Swissprot entry, unlike what the authors imply, does not guarantee that the function annotation was made by non-electronic means. If the authors would like to use non-electronic annotations for functional labels, they should use those that are annotated with the GO experimental evidence codes (or, at the very least, not exclusively annotated with IEA). Therefore, most of the annotations in the authors' gold standard protein annotations are simply generated by BLAST and not reviewed by a person. Essentially the authors are comparing predictions with predictions, or at least not taking care not to do so. This is an important point that the authors need to address since there is no apparent gold standard they are using.

      The above statement is relevant to GO. But since EC is mapped 1:1 to GO molecular function ontology (as a subset, there are many terms in GO MFO that are not enzymes of course), the authors can easily apply this to EC-based entries as well.

      This may explain why, in Figure S8(b), BLAST retains such a high and even plateau of the precision-recall curve: BLAST hits are used throughout as gold-standard, and therefore BLAST performs so well. This is in contrast, say to CAFA assessments which use as a gold standard only those proteins which have experimental GO evidence codes, and therefore BLAST performs much poorer upon assessment.

      We thank the reviewer for this point. We regret if we gave the impression that our training data derives exclusively, or even primarily, from direct experiments on the amino acid sequences in question. We had attempted to address this point in the discussion with this section:

      "On the other hand, many entries come from experts applying existing computational methods, including BLAST and HMM-based approaches, to identify protein function. Therefore, the data may be enriched for sequences with functions that are easily ascribable using these techniques which could limit the ability to estimate the added value of using an alternative alignment-free tool. An idealised dataset would involved training only on those sequences that have themselves been experimentally characterized, but at present too little data exists than would be needed for a fully supervised deep-learning approach."

      We have now added a sentence in the early sentence of of the manuscript reinforcing this point:

      "Despite its curated nature, SwissProt contains many proteins annotated only on the basis of electronic tools."

      We have also removed the phrase "rather than labels generated by a computational annotation pipeline" because we acknowledge that this could be read to imply that computational approaches are not used at all for SwissProt which would not be correct.

      While we agree that SwissProt contains many entries inferred via electronic means, we nevertheless think its curated nature makes an important difference. Curators as far as possible reconcile all known data for a protein, often looking for the presence of key residues in the active sites. There are proteins where electronic annotation would suggest functions in direct contradiction to experimental data, which are avoided due to this curation process. As one example, UniProt entry Q76NQ1 contains a rhomboid-like domain typically found in rhomboid proteases (IPR022764) and therefore inputting it into InterProScan results in a prediction of peptidase activity (GO:0004252). However this is in fact an inactive protein, as discovered by experiment, and so is not annotated with this activity in SwissProt. ProteInfer successfully avoids predicting peptidase activity as a result of this curated training data. (For transparency, ProteInfer is by no means perfect on this point: there are also cases in which UniProt curators have annotated single proteins as inactive but ProteInfer has not learnt this relationship, due to similar sequences which remain active).

      We had also attempted to address this point by comparing with phenotypes seen in a specific high-throughput experimental assay ("Comparison to experimental data" section).

      We have now added a new analysis in which we assess the recall of GO terms while excluding IEA annotation codes. We find that at the threshold that maximises F1 score in the full analysis, our approach is able to recall 60-75% (depending on ontology) of annotations. Inferring precision is challenging due to the fact that only a very small proportion of the possible function*gene combinations have in fact been tested, making it difficult to distinguish a true negative from a false negative.

      "We also tested how well our trained model was able to recall the subset of GO term annotations which are not associated with the "inferred from electronic annotation" (IEA) evidence code, indicating either experimental work or more intensely-curated evidence. We found that at the threshold that maximised F1 score for overall prediction, 75% of molecular function annotations could be successfully recalled, 61% of cellular component annotations, and 60% of biological process annotations."

      Pooling GO DAGs together: It is unclear how the authors generate performance data over GO as a whole. GO is really 3 disjoint DAGs (molecular function ontology or MFO, Biological Process or BPO, Cellular component or CCO). Any assessment of performance should be over each DAG separately, to make biological sense. Pooling together the three GO DAGs which describe completely different aspects of the function is not informative. Interestingly enough, in the browser applications, the GO DAG results are distinctly separated into the respective DAGs.

      Thank you for this suggestion. To answer the question of how we were previously generating performance data: this was simply by treating all terms equivalently, regardless of their ontology.

      We agree that it would be helpful to the reader to split out results by ontology type, especially given clear differences in performance.

      We now provide PR-curve graphs split by ontology type.

      We have also added the following text:

      "The same trends for the relative performance of different approaches were seen for each of the direct-acyclic graphs that make up the GO ontology (biological process, cellular component and molecular function), but there were substantial differences in absolute performance (Fig S10). Performance was highest for molecular function (max F1: 0.94), followed by biological process (max F1:0.86) and then cellular component (max F1:0.84)."

      Figure 3 and lack of baseline methods: the text refers to Figures 3A and 3B, but I could only see one figure with no panels. Is there an error here? It is not possible at this point to talk about the results in this figure as described. It looks like Figure 3A is missing, with Fmax scores. In any case, Figure 3(b?) has precision-recall curves showing the performance of predictions is the highest on Isomerases and lowest in hydrolases. It is hard to tell the Fmax values, but they seem reasonably high. However, there is no comparison with a baseline method such as BLAST or Naive, and those should be inserted. It is important to compare Proteinfer with these baseline methods to answer the following questions: (1) Does Proteinfer perform better than the go-to method of choice for most biologists? (2) does it perform better than what is expected given the frequency of these terms in the dataset? For an explanation of the Naive method which answers the latter question, see: ( https://www.nature.com/articles/nmeth.2340 )

      We apologise for the errors in figure referencing in the text here. This emerged in part from the two versions of text required to support an interactive and legacy PDF version. We had provided baseline comparisons with BLAST in Fig. 5 of the interactive version (correctly referenced in the interactive version) and in Fig. S7 of the PDF version (incorrectly referenced as Fig 3B).

      We have now moved the key panel of Fig S7 to the main-text of the PDF version (new Fig 3B), as suggested also by the editor, and updated the figure referencing appropriately. We have also added a Naive frequency-count based baseline. This baseline would not appear in Fig 3B due to axis truncation, but is shown in a supplemental figure, new Fig S9. We thank the reviewer and the editor for raising these points.

      Reviewer #2 (Public Review):

      In this paper, Sanderson et al. describe a convolutional neural network that predicts protein domains directly from amino acid sequences. They train this model with manually curated sequences from the Swiss-Prot database to predict Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. This paper builds on previous work by this group, where they trained a separate neural network to recognize each known protein domain. Here, they train one convolutional neural network to identify enzymatic functions or GO terms. They discuss how this change can deal with protein domains that frequently co-occur and more efficiently handle proteins of different lengths. The tool, ProteInfer, adds a useful new tool for computational analysis of proteins that complements existing methods like BLAST and Pfam.

      The authors make three claims:

      1) "ProteInfer models reproduce curator decisions for a variety of functional properties across sequences distant from the training data"

      This claim is well supported by the data presented in the paper. The authors compare the precision-recall curves of four model variations. The authors focus their training on the maximum F1 statistic of the precision-recall curve. Using precision-recall curves is appropriate for this kind of problem.

      2) "Attribution analysis shows that the predictions are driven by relevant regions of each protein sequence".

      This claim is very well supported by the data and particularly well illustrated by Figure 4. The examples on the interactive website are also very nice. This section is a substantial innovation of this method. It shows the value of scanning for multiple functions at the same time and the value of being able to scan proteins of any length.

      3) "ProteInfer models create a generalised mapping between sequence space and the space of protein functions, which is useful for tasks other than those for which the models were trained."

      This claim is also well supported. The print version of the figure is really clear, and the interactive version is even better. It is a clever use of UMAP representations to look at the abstract last layer of the network. It was very nice how each sub-functional class clustered.

      The interactive website was very easy to use with a good user interface. I expect will be accessible to experimental and computational biologists.

      The manuscript has many strengths. The main text is clearly written, with high-level descriptions of the modeling. I initially printed and read the static PDF version of the paper. The interactive form is much more fun to read because of the ability to analyze my favorite proteins and zoom in on their figures (e.g. Figure 8). The new Figure 1 motivates the work nicely. The website has an excellent interactive graphic showing how the number of layers in the network and the kernel size change how data is pooled across residues. I will use this tool in my teaching.

      We are grateful for these comments. We are excited that the reviewer hopes to use this figure for teaching, which is exactly the sort of impact we hoped for this interactive manuscript. We agree that the interactive manuscript is by far the most compelling version of this work.

      The manuscript has only minor weaknesses. It was not clear if the interactive model on the website was the Single CNN model or the Ensemble CNN model.

      We thank the reviewer for pointing out the ambiguity here. The model shown on the website is a Single CNN model, and is chosen with hyperparameters that achieve good performance whilst being readily downloadable to the user's machine for this demonstration without use of excessive bandwidth. We have added additional sentences to address this better in the manuscript.

      " When the user loads the tool, lightweight EC (5MB) and GO model (7MB) prediction models are downloaded and all predictions are then performed locally, with query sequences never leaving the user's computer. We selected the hyperparameters for these lightweight models by performing a tuning study in which we filtered results by the size of the model's parameters and then selected the best performing models. This approach uses a single neural network, rather than an ensemble. Inference in the browser for a 1500 amino-acid sequence takes < 1.5 seconds for both models "

      Overall, ProteInfer will be a very useful resource for a broad user base. The analysis of the 171 new proteins in Figure 7 was particularly compelling and serves as a great example of the utility and power of ProteInfer. It completes leading tools in a very valuable way. I anticipate adding it to my standard analysis workflows. The data and code are publicly available.

      Reviewer #3 (Public Review):

      In this work, the authors employ a deep convolutional neural network approach to map protein sequence to function. The rationales are that (i) once trained, the neural network would offer fast predictions for new sequences, facilitating exploration and discovery without the need for extensive computational resources, (ii) that the embedding of protein sequences in a fixed-dimensional space would allow potential analyses and interpretation of sequence-function relationships across proteins, and (iii) predicting protein function in a way that is different from alignment-based approaches could lead to new insights or superior performance, at least in certain regimes, thereby complementing existing approaches. I believe the authors demonstrate i and iii convincingly, whereas ii was left open-ended.

      A strength of the work is showing that the trained CNNs perform generally on par with existing alignment based-methods such as BLASTp, with a precision-recall tradeoff that differs from BLASTp. Because the method is more precise at lower recall values, whereas BLASTp has higher recall at lower precision values, it is indeed a good complement to BLASTp, as demonstrated by the top performance of the ensemble approach containing both methods.

      Another strength of the work is its emphasis on usability and interpretability, as demonstrated in the graphical interface, use of class activation mapping for sub-sequence attribution, and the analysis of hierarchical functional clustering when projecting the high-dimensional embedding into UMAP projections.

      We thank the reviewer for highlighting these points.

      However, a main weakness is the premise that this approach is new. For example, the authors claim that existing deep learning "models cannot infer functional annotation for full-length protein sequences." However, as the proposed method is a straightforward deep neural network implementation, there have been other very similar approaches published for protein function prediction. For example, Cai, Wang, and Deng, Frontiers in Bioengineering and Biotechnology (2020), the latter also being a CNN approach. As such, it is difficult to assess how this approach differs from or builds on previous work.

      We agree that there has been a great deal of exciting work looking at the application of deep learning to protein sequences. Our core code has been publicly available on GitHub since April 2019 , and our preprint has now been available for more than a year. We regret the time taken to release a manuscript and for it to reach review: this was in part due to the SARS-CoV-2 pandemic, which the first author was heavily involved in the scientific response to. Nevertheless, we believe that our work has a number of important features that distinguish it from much other work in this space.

      ● We train across the entire GO ontology. In the paper referenced by the reviewer, training is with 491 BP terms, 321 MF terms, and 240 CC terms. In contrast, we train with a vocabulary of 32,102 GO labels, and the majority of these are predicted at least once in our test set. ● We use a dilated convolutional approach. In the referenced paper the network used is instead of fixed dimensions. Such an approach means there is an upper limit on how large a protein can be input into the model, and also means that this maximum length defines the computational resources used for every protein, including much smaller ones. In contrast, our dilated network scales to any size of protein, but when used with smaller input sequences it performs only the calculations needed for this size of sequence.

      ● We use class-activation mapping to determine regions of a protein responsible for predictions, and therefore potentially involved in specific functions.

      ● We provide a TensorFlow.JS implementation of our approach that allows lightweight models to be tested without any downloads

      ● We provide a command-line tool that provides easy access to full models.

      We have made some changes to bring out these points more clearly in the text:

      "Since natural protein sequences can vary in length by at least three orders of magnitude, this pooling is advantageous because it allows our model to accommodate sequences of arbitrary length without imposing restrictive modeling assumptions or computational burdens that scale with sequence length. In contrast, many previous approaches operate on fixed sequence lengths: these techniques are unable to make predictions for proteins larger than this sequence length, and use unnecessary resources when employed on smaller proteins."

      We have added a table that sets out the vocabulary sizes used in our work (5,134 for EC and 32,109 for GO):

      "Gene Ontology (GO) terms describe important protein functional properties, with 32,109 such terms in Swiss-Pr ot (Table S6) that cov er the molecular functions of proteins (e.g. DNA-binding, amylase activity), the biological processes they are involved in (e.g. DNA replication, meiosis), and the cellular components to which they localise (e.g. mitochondrion, cytosol)."

      A second weakness is that it was not clear what new insights the UMAP projections of the sequence embedding could offer. For example, the authors mention that "a generalized mapping between sequence space and the space of protein functions...is useful for tasks other than those for which the models were trained." However, such tasks were not explicitly explained. The hierarchical clustering of enzymatic proteins shown in Fig. 5 and the clustering of non-enzymatic proteins in Fig. 6 are consistent with the expectation of separability in the high-dimensional embedding space that would be necessary for good CNN performance (although the sub-groups are sometimes not well-separated. For example, only the second level and leaf level are well-separated in the enzyme classification UMAP hierarchy). Therefore, the value-added of the UMAP representation should be something like using these plots to gain insight into a family or sub-family of enzymes.

      We thank the reviewer for highlighting this point. There are two types of embedding which we discuss in the paper. The first is the high-dimensional representation of the protein that the neural network constructs as part of the prediction process. This is the embedding we feel is most useful for downstream applications, and we discuss a specific example of training the EC-number network to recognise membrane proteins (a property on which it was not trained): "To quantitatively measure whether these embeddings capture the function of non-enzyme proteins, we trained a simple random forest classification model that used these embeddings to predict whether a protein was annotated with the intrinsic component of membrane GO term. We trained on a small set of non-enzymes containing 518 membrane proteins, and evaluated on the rest of the examples. This simple model achieved a precision of 97% and recall of 60% for an F1 score of 0.74. Model training and data-labelling took around 15 seconds. This demonstrates the power of embeddings to simplify other studies with limited labeled data, as has been observed in recent work (43, 72)."

      As the reviewer points out, there is a second embedding created by compressing this high-dimensional down to two dimensions using UMAP. This embedding can also be useful for understanding the properties seen by the network, for example the GO term s highlighted in Fig. 7 , but in general it will contain less information than the higher-dimensional embedding.

      The clear presentation, ease of use, and computationally accessible downstream analytics of this work make it of broad utility to the field.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Kschonsak et al. describes the rational structure-based design of novel hybrid inhibitors targeting human Nav1.7 channel. CryoEM structure of arylsulfonamide (GNE-3565) - VSD4 NaV1.7-NaVPas channel complex confirmed binding pose observed in x-ray structure GX-936 - VSD4 Nav1.7-NavAb channel. Remarkably, cryoEM structure of acylsulfonamide (GDC-0310) - VSD4 NaV1.7-NaVPas channel complex revealed a novel binding pocket between the S3 and S4 helices, with the S3 segment adopting a distinct conformation compared to the arylsulfonamide (GNE-3565) - VSD4 NaV1.7-NaVPas channel complex. Creatively, the authors designed a novel class of hybrid inhibitors that simultaneously occupy both the aryl- and acylsulfonamide binding pockets. This study underscores the power of structure-guided drug design to target transmembrane proteins and will be useful to develop safer and more effective therapeutics.

      We thank this Reviewer for the very positive feedback and for highlighting the importance of our work in utilizing structure-based drug design to target key membrane targets.

      Reviewer #2 (Public Review):

      In this manuscript, the authors identify a critical unmet need for the (structure-based) drug design of human Nav channels, which are of clinical interest. They cleverly rationalized a hybrid strategy for developing target-specific small molecule inhibitors, which integrate binding mechanisms of two drug candidates that act orthogonally on the VSD4 of Nav 1.7. Thus, the authors illustrate a promising outlook on pharmaceutical intervention on Nav channels.

      Overall, the cryo-EM structures of the ligand-bound Nav channels are convincing, with a clear indication of the site-specific, distinct density of the small molecules. At the moment, it is difficult to tell how innovative the pipeline is compared to conventional cryo-EM structure determination.

      We thank this Reviewer for this positive comments and for the very helpful suggestions. We are addressing the concerns regarding our cryoEM pipeline.

      Reviewer #3 (Public Review):

      This is an excellent manuscript, describing a few lines of discoveries:

      1. Establishment of a structural biological pipeline for iterative structural determination of an engineered Nav1.7;

      2. Illumination of the novel compound binding mode;

      3. Structure-based development of the hybrid compounds, which led to the novel Nav1.7 inhibitor;

      The cryo-EM study on the engineered Nav1.7 consistently reveals the map at the mid to low 2 Å range, which is unprecedented and impressive, thus, demonstrating the high value of this workflow. The further strength of this study is that the authors were able to develop a new compound by combining structural information gained from the two Nav1.7 structures complexed to two different compounds with different binding modes. Overall, the depth and quality of this study are excellent.

      We thank this Reviewer for highlighting the importance of this manuscript and specifically recognizing our accomplishments in enabling iterative high-resolution structure for this target which allowed us to perform SBDD and design a new series of hybrid compounds. We are also grateful for indicating the excellence of our studies.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, McQuate et al. use serial block face SEM to provide a high resolution, 3D analysis of mitochondrial structure in hair cells and surrounding supporting cells of the zebrafish lateral line. They first demonstrate that hair cells have a higher mitochondrial volume as compared to supporting cells, which likely reflects the high metabolic load of these sensory cells. Their deeper analysis of mitochondrial morphology in hair cells reveals that the base of the hair cell - near the presynapse is dominated by a large, networked mitochondrion, while the apex of the cell is dominated by many small mitochondria. By examining hair cells at different stages of development, the authors show that specialized features of hair cell mitochondria are gradually established over the course of development. Finally, by examining hair cells in mutants that lack mechanosensation or presynaptic calcium responses, McQuate et al. reveal that cellular activity contributes to the development of appropriate mitochondrial morphology and localization within hair cells. This dataset, which will be made publicly available, is an immense resource to the community and will facilitate the generation of novel hypotheses about hair cell mitochondrial function in health and disease.

      Strengths:

      1. The painstaking acquisition and analysis of hair cell EM data in a genetically tractable system that is easily accessible for in vivo functional experiments to address hypotheses that emerge from this work.

      2. The use of multiple datasets and analysis methods to cross-validate results.

      3. The thoughtful, careful analysis of the data highlights the richness of the dataset.

      4. The use of both wild-type and mutant animals substantially adds to the manuscript, providing significantly more insight than wild-type data alone.

      Weaknesses:

      1. The manuscript could more strongly highlight the utility of this dataset and facilitate its future use by providing a summary table that lists each sample together with salient details.

      2. The authors examine an opa-1 mutant with altered mitochondrial fission (which consequently has changes in mitochondrial morphology and organization) to suggest that aberrant mitochondrial architecture negatively impacts mitochondrial function. However, mitochondrial fusion is thought to be critical for mitochondrial health beyond just altered architecture. Because fusion has other roles, it is difficult to use this manipulation to conclude that it is simply disruptions in mitochondrial architecture that alters function.

      3. Although the work of acquiring and reconstructing EM data is labor-intensive, ideally, multiple fish would be examined for each genotype. Readers should take into consideration that one of the mutant datasets is derived from just one animal.

      We thank Reviewer 1 for pointing out the “painstaking acquisition” that went into this study, the “thoughtful, careful analysis,” and the “richness of the dataset.” We believe we have addressed the aforementioned weaknesses.

      Reviewer #2 (Public Review):

      Sensory hair cells have high metabolic demands and rely on mitochondria to provide energy as well as regulate homeostatic levels of intracellular calcium. Using high-resolution serial block face SEM, the authors examined the influences of both developmental age and hair cell activity on hair cell mitochondrial morphology. They show that hair cell mitochondria develop a regionally specific architecture, with the highest volume mitochondria localized to the basolateral presynaptic region of hair cells. Data obtained from mutants lacking either mechanotransduction or presynaptic calcium influx provide evidence that hair cell activity shapes regional mitochondrial morphology. These observed specializations in mitochondrial morphology may play an important role in mitochondrial function, as mutants showing disrupted hair cell mitochondrial architecture showed depolarized mitochondrial potentials and impaired evoked mitochondrial calcium influx.

      This work provides novel and intriguing evidence that mechanotransduction and presynaptic calcium influx play important roles in shaping subcellular mitochondrial morphology in sensory hair cells. Yet there was a lack of consistency in the analysis and presentation of the data which made it difficult to contextualize and interpret the results. This study would be greatly strengthened by i) consistent definitions for hair cell maturation, ii) comparable data analysis of cav1.3a mutant and cdh23 mutant mitochondrial morphologies, and iii) more detailed descriptions and interpretations of the UMAP analysis.

      We thank Reviewer #2 for thinking the work is “novel and intriguing”. We have addressed the weaknesses raised.

      Reviewer #3 (Public Review):

      McQuate et al have succeeded in reconstructing 3D images of mitochondria and discovered unique structural features of mitochondria in zebrafish hair cells. Compared to the other cell types, such as central and peripheral support cells, Hair cells have many elongated and connected mitochondria and they seem to be involved in hair cell and ribbon synapses development. These findings will contribute to understanding the mechanisms for mitochondrial network regulation.

      Using the SBFSEM technique, the authors provide clear 3D images of hair cells and the technique improves the resolution of the image to understand the structural parameters of not only mitochondria but also ribbon synapses compared to typical fluorescent imaging. These results are very attractive and have the high potential to broadly apply to 3D imaging of any type of organelles, cells, and tissues. On the other hand, however, the authors provide the data from a small sample size, and the functional experiments to make a conclusion are lacking. Some missing representative images and the nonunified methods of grouping for the analysis make the reviewer concerned.

      We thank the Reviewer for thinking the results are “very attractive and have the high potential to broadly apply to 3D imaging of any type or organelles, cell, and tissues.” We agree. We have addressed the weaknesses raised

    1. Author Response

      Reviewer #1 (Public Review):

      The article from Dumoux et al. shows the use of plasma-based focused ion beams for volume imaging on cryo-preserved samples. This exciting application can potentially increase the throughput and quality of the data acquired through serial FIB-SEM tomography on cryo-preserved and unstained biological samples. The article is well-written, and it is easy to follow. I like the structure and the experimental description, but I miss some points in the analyses, without which the conclusions are not adequately supported.

      The authors state the following: "the application of serial FIB/SEM imaging of non-stained cryogenic biological samples is limited due to low contrast, curtaining, and charging artefacts. We address these challenges using a cryogenic plasma FIB/SEM (cryo-pFIB/SEM)".

      Reading the article, I do not find that the challenges are addressed; it appears that some of these are evaluated when the samples are prepared using plasma-based beams. To support the fact that charging, contrast, and curtaining are addressed, a comparison should be made with the current state of the art, or it is otherwise impossible to determine whether these systems bring any advantage.

      Charging is an issue that is not described in detail, nor has it been adequately analysed. The effect of using plasma beams is independent of the presented algorithm for charging suppression, which is purely image processing based, although very interesting. Given that the focus of the work is on introducing the benefit of using plasma ion beams (from the title) and given that a great deal of data is presented on the effect of the multiple ion sources, one would expect to have comparable images acquired after the surfaces have been prepared with the different beams. This should also be compared against the current state-of-the-art (gallium) to provide a baseline for different beams' benefits. I realise that this requires access to another microscope and that this also imposes controls on the detector responses on each instrument to have a normalised analysis. Still, it also provides the opportunity to quantify the benefits of each instrumentation.

      We have provided a response to the charging comments outlined here in the main rebuttal above. The SEM we used in this study was selected based on its optimal performance at low electron voltages due to its immersion field. The low kV capability is particularly of interest in the case of charging (cross over energy). There is the possibility the interaction of the sample surface with chemically inert or reactive ion species could change the surface potential (either positively or negatively). The Vero cells imaged during a serial pFIB/SEM using nitrogen plasma still exhibit charging as well as the argon plasma we canonically used, suggesting that charging is ion beam independent.

      Regarding Gallium, this would require prolonged access to another very bespoke microscope for a like-for-like comparison, and indeed there are studies (e.g. Schertel et al. 2013 and Scher et al, 2021) that show SEM data of cryogenic sample surfaces milled with gallium. Therefore, we consider such a study outside of the scope of this manuscript.

      The curtaining scores. This is a good way to explain the problem, though a few aspects need to be validated. For example, curtains appear over time when milling, and it would be useful to understand how different sources behave over time in FIB/SEM tomography sessions. The score is currently done from individual windows milled, which gives a good indication of the performance. However, it would make sense to check that the behaviour remains identical in an imaging setting and with the moving milling windows (or lines). This will show the counteracting effect to the redeposition and etching effect reported when imaging with the E-beam the milled face.

      Please see our response in the main rebuttal points.

      No detail about the milling resolution has been reported. Since different currents and beams have different cross-sections, it is expected to affect the z-resolution achievable during an imaging session. It would be useful to have a description of the beam cross-sections at the various conditions used and how or whether these interfere with the preparation.

      Please see our response in the main rebuttal points.

      Contrast. No analysis of plasma FIBs' benefits on image contrast compared to the current state of the art has been provided. Measuring contrast is complex, especially when this value can change in response to the detector settings. Still, attempts can be made to quantify it through the FRC and through the analysis of the image MTF (amplitude and fall off), given that membranes are the only most prominent and visible features in cryoFIB/SEM images of biological samples.

      We agree that measuring contrast is complex, and therefore the following parameters as stated on page 6, line 6 to 7 were kept consistent throughout data collection: voltage, current, line integration, exposure, detectors voltage offset and gain. We also decided to keep constant or vary the working distance (focus) in Figure 4 and compared the FRC as well as the contrast. As discussed above, a like-for-like comparison with the state of the art (gallium) is not currently possible, making this experiment/analysis outside the scope of this manuscript.

      Figure S4 points out that electrons that hit the sample at normal incidence give better signal/contrast or imaging quality than when the sample is imaged at a tilt. This fact is expected to significantly affect large areas as the collection efficiency will vary across the sample, particularly as regions get further away from the optimal location. The dynamic focusing option available on all SEM will compensate for the focal change but not the collection efficiency. Even though this is a fact, the authors show a loss of resolution, which is not explained by the tilt itself. In particular, the generation of secondary electrons is known to increase with the increased tilt, and to consider that the curtains (that are the prominent feature on the surface) are running along the tilt direction, it would be expected to see no contrast difference between the background and the edge of each curtain as the generation of secondary electrons will increase with tilt for both the edges and the background. Therefore, the contrast should be invariant, at least on the curtains.

      Looking at the images presented in the figure, they appear astigmatic and not properly focused when imaged at a tilt. As evidence of this claim, the cellular features do not measure the same, and the sharpness of the edge of the curtains is gone when tilted. This experience comes from improper astigmatism correction, which in turn, in scanning systems, leads to the impossibility of focusing. The tilt correction provides not only dynamic focusing but also corrects for the anisotropy in the sampling due to the tilt. If all imaging is set up correctly, the two images should show the imaged features with the exact sizes regardless of the resolution (which, in the presented case, is sufficient), and the sharpness of the curtain edges should be invariant regardless of the tilt, at least while or where in focus. Only at that point, the comparison will be fair.

      Please see our response in the main rebuttal points.

      Finally, the resolution measurements presented in the last supplementary figures have no impact or relation to the use of plasma FIB/SEM. It is an effect related to the imaging conditions used in the SEM regardless of the ion beam nature. The distribution of the resolution within images appears predominantly linked to local charging and the local sample composition (from fig8). Given the focus is aimed at introducing or presenting the use of the plasma-based beams the results should be presented in that optic in mind with a comparison between beams.

      This figure is to present the absence of degradation in image quality over the dataset. As the stage is moving during the imaging at 90 it would be possible for the focus to be lost throughout a longer data acquisition session. However, this figure demonstrates that the focus is well adjusted throughout the data acquisition. We also considered potential beam damage accumulation which does not seem to be detectable with our method.

      Reviewer #2 (Public Review):

      The authors present a manuscript highlighting recent advancements in cryo-focused ion beam/scanning electron microscopy (cryo-FIB) using plasma ion sources as an alternative to positively-charged gallium sources for cryo-FIB milling and volumetric SEM (cryo-FIB/SEM) imaging. The authors benchmark several sources of plasma and determine argon gas is the most suitable source for reducing undesirable curtaining effects during milling. The authors demonstrate that milling with an argon source enables volumetric imaging of vitrified cells and tissue with sufficient contrast to gleam biological insight into the spatial localization of organelles and large macromolecular complexes in both vitrified human cells and in high-pressure frozen mouse brain tissue slices. The authors also show that altering the sample angle from 52 to 90 degrees relative to the SEM beam enhances the contrast and resolution of biological features imaged within the vitrified samples. Importantly, the authors also demonstrate that the resolution of SEM images after serial milling with argon and nitrogen plasma sources does not appear to significantly affect resolution, suggesting that resolution does not vary over an acquisition series. Finally, the authors test and apply a neural network-based approach for mitigating image artifacts caused by charging due to SEM imaging of biological features with high lipid content, such as lipid droplets in yeast, thereby increasing the clarity and interpretability of images of samples susceptible to charging.

      Strengths and Weaknesses:

      The authors do a fantastic job demonstrating the utility of plasma sources for increased contrast of biological features for cryo-FIB/SEM images. However, they do not specifically address the lingering question of whether or not it is possible to use this plasma source cryo-FIB/SEM volumetric imaging for the specific application of localizing features for downstream cryo-ET imaging and structural analyses. As a reader, I was left wondering whether this technique is ideally suited solely for volumetric imaging of cryogenic samples, or if it can be incorporated as a step in the cellular cryo-ET workflow for localization and perhaps structure determination. Another biorxiv paper (doi.org/10.1101/2022.08.01.502333) from the same group establishes a plasma cryo-FIB milling workflow to generate lamella of sufficient quality to elucidate sub-nanometer reconstructions of cellular ribosomes. However, I anticipate the real impact on the field will be from the synergistic benefits of combining both approaches of volumetric cryo-FIB/SEM imaging to localize regions of interest and cryo-ET imaging for high-resolution structural analyses.

      Additional experiments were undertaken to demonstrate that serial cryo pFIB/SEM can be used in a variety of correlative imaging workflows, including follow-on cryoET. However, we have yet to carefully determine the consequences for downstream high spatial frequencies of such imaging modalities e.g., for sub volume averaging. The role of the SEM imaging, ion beam damage, etc has yet to be analysed or optimised in detail. This work is outside of the scope of this manuscript.

      Another weakness is the lack of demonstration that the contrast gained from plasma cryo-FIB/SEM is sufficient to apply neural network-based approaches for automated segmentation of biological features. The ability to image vitrified samples with enhanced contrast is huge, but our interpretation of these reconstructions is still fundamentally limited in our ability to efficiently analyze subcellular architecture.

      We have demonstrated that the segmentation of subcellular features such as mitochondria within a serial pFIB-SEM data set of heart tissue can be automated using SuRVos2 – a neural network based automated segmentation software. These comparisons are included in an additional figure (Figure 11).

    1. Author Response

      Reviewer #2 (Public Review):

      1) My main reservation is the presentation of the work. The writing style is conversational and expansive, which makes it challenging for the reader. Furthermore, long paragraphs shift from one topic to the next rather than using separate paragraphs with strong topic sentences to cover each topic. I suggested a few places to start new paragraphs, but many more paragraphs could be divided.

      We have also made significant efforts to reduce the text of the manuscript in each section, with more compact phrasing (including the headlines for the different results sections), and more short paragraphs to make the paper more readable. This has resulted in an overall reduction in the total number of words in the manuscript from ~11.000 to 9.000 (including Abstract, Introduction, Results, Discussion, Materials and Methods, and Figure legends sections), equivalent to approximately four pages of typed text.

      2) Most of the figures are also overly complicated. I did not attempt to edit one of them, but I am sure that findings will be much clearer with about half of the panels moved to supplemental materials, so the reader can concentrate on the most important data.

      As recommended by the reviewer, we have significantly reduced the number of panels within the figures in the revised manuscript. Accordingly, the total number of panels in the modified figures compared to the original version is as follows: Figure 1 (7 vs 8); Figure 2 (8 vs 10); Figure 3 (7 vs 10); Figure 4 (7 vs 12); Figure 5 (6 vs 11); Figure 6 (4 vs 8).

      The remaining panels, including quantitative data such as cable-to-patch ratios, or percentages of septated/multiseptated cells, among others, have been moved to existing and new supplementary figures. The total number of supplementary figures is now 9 versus 6 in the original version.

    1. Author Response

      Reviewer #1 (Public Review):

      This study combines the biologging method with captive experiments and DNA metabarcoding to detail the hunting behavior of a bat species in the wild. Specifically, it shows that bats use two foraging strategies (echolocating small prey in the air and capturing large ground prey with passive listening) with different success rates and energetic gains. This result highlights that a species believed to be a specialist forager can, in fact, have mixed strategies depending on the condition and environment.

      The detailed foraging behavior they show for such a small animal is impressive. A combination of several different methods, including captive experiments, is a major strength of the paper. I especially like the mastication sound analysis, although I don't know how new it is. However, I have a major concern about the presentation of this study. The manuscript is apparently written for a bat community, and it's hard to understand the significance of the results in the field of animal ecology.

      Thank you for your helpful feedback. We agree that the framing of the ms was too narrow for the audience of eLife, and we have framed the introduction for a broader audience of animal ecology.

      Reviewer #2 (Public Review):

      This paper has huge potential for influencing the way we think about bats as foragers. But, I think that it can be improved.

      Specifically, there is no clearly articulated hypothesis underlying the work. Second, there should be specific testable predictions arising from the hypothesis. This change, while relatively minor, will vastly improve the focus of the work, and hence its impact on the reader.

      Thank you highlighting the need for clear hypotheses. We have added three specific hypotheses to guide the reader (line: 54-56) in the introduction. We have also reformatted the discussion section to address each hypothesis in succession using subheadings with clear take home messages (line: 223-224, 271-272, 293, 318)

      Reviewer #3 (Public Review):

      The study addresses a tough question in the study of wild bats: what and where they eat, using both acoustic bio-logging and DNA metabarcoding. As a result, it was found that greater mouse-eared bats made more frequent attack attempts against passively gleaning prey with lower predation success but higher prey profitability than aerial hawking with higher predation success. This is a precious study that reveals essential new insights into the foraging strategies of wild bats, whose foraging behavior has been challenging to measure. On the other hand, the detection of capture attempts, success or failure of predation, and whether it was by passively gleaning prey or aerial hawking were determined from the audio and triaxial accelerometer analysis, and all results of this study depend entirely on the veracity of this analysis. Also, although two different weights and a tag nearly 15% of its weight were used, it is essential for the results of this data that there be no effect on foraging behavior due to tag attachment. Since this is an excellent study design using state-of-the-art methods and very valuable results, readers should carefully consider the supplemental data as well.

      Thank you for the kind words. We agree that it is critically important that the two foraging strategies are un-affected by tagging effects. In the revised ms, we have added tag weights, tag types and change in body weight during instrumentation as explanatory factors in out statistical models and found no effect of the tag weight on our results. We have also addressed this important issue in the method section (model 1: line 520-539, model 3: 568-590).

    1. Author Response

      Reviewer #1 (Public Review):

      Zeng and colleagues investigated the neural underpinnings of visual-vestibular recalibration. Specifically, they measured changes in three monkeys' perception of unisensory heading cues as well as associated changes in neuronal responses to these cues in three different cortical areas following prolonged exposure to systematic visual-vestibular discrepancies. Behavioral responses in a motion direction discrimination task indicate unisensory perceptual shifts in opposite directions that account for the cross-modal discrepancy the monkeys were exposed to. Neuronal firing patterns, related to motion discrimination judgments by means of neurometric functions indicated analogous shifts in neuronal tuning in areas MSTd and PIVC. In contrast, in area VIP tuning for visual heading stimuli shifted in the same direction as tuning for vestibular stimuli and thus in contradiction to the observed perceptual shifts.

      The shifts observed in MSTd and PIVC fit nicely with existing theories and results regarding cross-modal recalibration and substitute claims that activity in these areas might underlie perceptual decisions. The shift of visual tuning in VIP is surprising and will certainly spark further investigation.

      Overall the results are really interesting, yet, the manuscript in its current form needs revisions along two dimensions, 1) data analysis and 2) writing.

      We thank the reviewer for the positive comments and thoughtful suggestions, which have greatly helped us improve the data analysis and writing. Also, thank you for the thorough list of specific suggestions for improved writing and phrasing. This considerably helped us clarify these aspects in our manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Zeng and colleagues aims to investigate how neural representations of sensory cues in two modalities (visual and vestibular) change when conflicts are introduced between the cues. The manuscript convincingly demonstrates that this recalibration process differs between areas MSTd (a multisensory region), where sensory responses recalibrated differently for visual and vestibular cues, following each modality's conflict, and area VIP ( a higher-level region), where responses follow the vestibular cue. More limited insights are present for area PIVC, where visual responses are limited.

      The analyses generally support the conclusions of the authors, but I have two major suggestions to strengthen the statistical robustness of the manuscript:

      1) The analysis about the lack of visual recalibration in area PIVC would have been more convincing if the authors had used Bayesian statistics instead of regular t tests. In this way it would have been possible to estimate if the lack of visual recalibration in this area, for those few neurons that show visual tuning, can be taken as evidence for the absence of an effect or not. In the absence of this additional analysis, it is in fact difficult to properly interpret the results about area PIVC. Is PIVC more in line with MSTd, in view of the lack of visual responses? Or is there actually no visual recalibration, in contrast to both MSTd and VIP?

      In response to this comment, we calculated the Bayesian Pearson correlation for visual recalibration in area PIVC, with the alternative hypothesis (H1) of a correlation between neuronal shifts and perceptual shifts and the null hypothesis (H0) of no correlation: Pearson's r = 0.26, and BF10 = 0.49. Thus, the evidence neither supports H1 nor H0. The lack of support for or against visual recalibration in PIVC primarily reflects the lack of robust tuning to visual heading stimuli in PIVC. Accordingly, in the manuscript, we do not argue for or against the recalibration of visual heading tuning in PIVC. Rather, we highlight that neurons in PIVC respond strongly to vestibular signals, but not so to visual heading stimuli and that the vestibular responses undergo recalibration. We agree that the lack of evidence for (or against) visual recalibration in PIVC primarily reflects the lack of robust tuning to visual heading stimuli. We interpret the observed shifts in vestibular tuning in PIVC as lower-level, sensory, recalibration (similar to MSTd) based on the broader understanding that PIVC encodes lower-level vestibular signals, with transient time-courses, and impoverished visual tuning (Chen et al., 2016; Chen et al., 2021). Our results are in line with this interpretation, and there is no reason to suspect that PIVC reflects more complex multisensory recalibration (like VIP). Nonetheless, the data could also be in line with alternative interpretations. Therefore, in the revised manuscript we now more explicitly explain this argument and have added limitations thereof, and alternative interpretations to the Discussion (in subsection “Limitations and future directions”, paragraph 2).

      2) For all statistical analyses, multi-level statistics would have been more appropriate than simple t-tests. In fact, since recordings come from few subjects, which in turn have relatively few recording sessions, there is a risk that the results are influenced by one subject and do not represent the full population. Admittedly, this is unlikely in view of the apparently large effect size and low p values. Nonetheless, a more appropriate statistical analysis would make the results more robust and convincing.

      Thank you. We agree with this suggestion and have now: 1) added summary statistics for the individual monkeys, and 2) performed linear mixed model (LMM) analyses (please see our response to Essential Revisions Comment #1, for further details).

      Once these issues are addressed, I believe that the manuscript would provide relevant evidence supporting the hypothesis that multisensory processing in the cortex is an area-specific phenomenon, and that effects observed in one area cannot be simply expected to operate elsewhere. This will therefore elucidate the mechanisms of multimodal plasticity.

      Reviewer #3 (Public Review):

      This study documents an empirical investigation of a fundamental brain process: adaptation to systematic cross-sensory discrepancies. The question is important, the experiment is carefully designed, and the results are striking. Following an unsupervised recalibration block, perceptual judgments of self-motion on the basis of visual and vestibular cues are systematically altered. These behavioral effects are mirrored by changes in the response properties of single neurons in areas MSTd and PIVC (provided that neurons in these areas exhibited selectivity for the sensory cue). Remarkably, neurons in downstream area VIP adjust their response properties in a very different manner, seemingly exclusively reflecting vestibular recalibration (which is opposite in direction to visual perceptual shifts). In the former two areas, the neural-behavior association follows the stimulus dynamics. In VIP, this association remains high beyond the life span of the stimulus. VIP typically exhibits strong choice signals. These decreased in strength after recalibration (an effect unique to area VIP). Together, these findings further dissociate VIP's functional role from that of MSTd and PIVC, without however, fully revealing what that role may be. These results offer a novel perspective on the neural basis of cross-sensory recalibration and will inspire future modeling studies of the neural basis of perception of self-motion.

      We thank the reviewer for the supportive comments.

    1. Author Responses

      Reviewer #1 (Public Review):

      The authors present a very detailed short report on a previously undocumented behaviour where flying squirrels are believed to have created grooves in various species of nuts to aid their secure storage in the crotch or forks of twigs. The behaviour is suggested to have evolved as an adaptive strategy in this population of flying squirrels because of the challenges for nut caching in a rainforest environment.

      Thanks

      Using detailed photographs, GPS locations, measurements and camera trap videos, the authors describe the behaviour in great depth providing a useful base for comparative and future studies. However, the weakest point of this study is that the authors did not detect any squirrels making the grooves and only monitored nuts once they were cached. Therefore more research needs to be done to ascertain who, how and where the grooves are produced in the first place.

      Three new videos are attached to show that two squirrel species are rotate and carving the nuts to create the grooves. By the new videos, we can also observe that squirrels re-fixed the nuts between the twigs by carving the nuts. These direct observations can support the claim better. See Supplementary Media files 6-8.

      This work will be of great interest to scholars of animal behaviour and cognition and draws attention to a novel behaviour that warrants further study in similar species.

      Yes, it is. Thanks

      Reviewer #2 (Public Review):

      The authors describe observations of an innovative food caching behavior attributed to two species of flying squirrels and likened the behavior to architectural joints used by humans. The discovery of nuts stored in the crook of shrub branches, facilitated by indented rings seemingly carved by squirrels, possibly represents an interesting food handling innovation that may function to prevent spoilage in a damp tropical ecosystem.

      Thanks!

      I applaud the efforts to survey the area multiple times after the initial discovery, and the use of trail cameras to try capture evidence of animal associations. For what is in essence a natural history note, the authors did a great job of trying to gather a variety of supporting evidence. The videos capturing squirrels visiting and retrieving the cached nuts were compelling, and the shaking of the shrubs demonstrating the difficulty in dislodging the nuts helps build the case that the nuts are cached effectively.

      Thanks!

      The most glaring gap in the evidence is that there is no direct observation of the squirrels actually performing this nut carving behavior, only associating with the nuts after they have been cached.There must be more documentation provided to explicitly link the causality between squirrels and this caching innovation.

      We have included three additional videos to demonstrate that squirrels of both species rotate and carve the nuts to create the grooves. These new videos also show that squirrels can fit the nuts between twigs by carving the nuts. We think that these direct observations clearly support our claim, but agree that it was oversight not to included them in the first draft. See Supplementary Media files 6-8.

      The second major weakness is more to do with writing style and could be addressed with significant revisions to phrasing and development of ideas. This is namely to do with the claim that this is somehow an evolved behavior, without providing evidence that 1) it is indeed the squirrels performing this behavior, 2) that is confers some kind of fitness benefit, and 3) hard evidence that this caching method does indeed prevent decomposition/germination in comparison to the more traditional caching methods of these species. Given the limited geographic range of the observations, I wonder how much of this is actually attributable to learning and/or innovation by these individuals. These ideas are not developed fully, and sometimes the writing wanders among learning and evolution without exploring the deep links among the two concepts.

      1) As above, three new videos establish that the squirrels do, in fact, carve the nuts. See Supplementary Media files 6-8.

      2) We added more description to suggest how this behavior likely confers fitness benefit in the discussion. At this point, however, it is correct to say that we have no hard evidence to demonstrate this, and thus, we’ve attempted to ‘tighten up’ the discussion accordingly so that our arguments (and its limitations) are more understandable.

      3) We revised the statistics about the proportion of nuts that were fresh during each of the surveys, and added some references about how long is required for the nuts to germinate in natural conditions. L163-172.

      Third, the connection to architecture is attention-grabbing, but I'd like to see this fleshed out a bit more with more text description (and a visual here would help immensely).

      We added more description about how the grooving, caching and checking processes were performed by squirrels and how the principles of this suspension are similar to the mortise-tenon joint as employed by humans. L186-202. As above, three new videos are attached.

      Ultimately this work stands to potentially contribute a fascinating piece of evidence into the growing literature on animal cognition, spatial awareness, caching behavior, innovation, and adaptation, but currently, the claims are unsupported by the evidence presented.

      Thank you for your comments about the potential importance of our work on this interesting system. In this version we try to focus more tightly on the aspects for which we have new information to interpret.

      Reviewer #3 (Public Review):

      The authors were trying to describe and document the grooving behaviour of nuts in two species of flying squirrels (Hylopetes Phayrei electilis and H. alboniger) as well as related such behaviour to tool use or that the squirrels are smart. To achieve these objectives, the authors conducted three field surveys. They also set out a camera later to capture animal species that interacted with these nuts. They found that these nuts with grooves are fixed between twigs and can be found in different small plant species. Both species of squirrels made grooves a nut. More shallow grooves are found in nuts that are fixed on alive than dead trees. Ellipsoid nuts have deeper grooves than oblate nuts. They concluded that these nut grooving behaviours are evolved or learned in those flying squirrel populations, and related these behaviours to tool use as well as that the squirrels are smart.

      Thanks!

      One strength of this work is that the data were collected in the field, which may provide hard evidence with video footage showing the two flying squirrel populations made grooves on nuts as well as fixing them between twigs. This evidence will induce new interests to understand the causes and consequences of such nut grooving behaviour. It may be bold to claim that such behaviour involves advance cognition or cognitive process without proper, systematic, experiments. Accordingly, whether the squirrels are 'smart' remains unclear. The authors did well in describing and documenting the nut grooving behaviours of the two species of flying squirrels, which has achieved their first aim. However, as mentioned above, whether such behaviour is 'smart' will need more systematic investigations.

      We have removed the description about cognition or cognitive process in the paper, and the paper is focused on the grooving behavious. “Smart” is also removed, with other words used instead.

    1. Author Response

      Reviewer #3 (Public Review):

      1) (Schichl et al. 2011 JBC 286:38466). This publication is not cited in the current version of the manuscript. The results of Schichl et al. seem particularly relevant for the interpretation of some of the results presented here and should be considered in the final discussion and conclusions of the present work.

      This reference and related text was added in the discussion section in the revised manuscript (lines 508-517).

      2) The ubiquitination of endogenous TTP has not been demonstrated.

      New data assessing the ubiquitination of endogenous TTP was added as Figure 1 – figure supplement 1D.

      3) The type of ubiquitination detected on the overexpressed version of TTP is not characterized. This seems important in view of the results of Schichl et al. who showed non-degradative ubiquitination (K63) of TTP.

      New data with the detection of K48- or K63-linked poly-ubiquitin chain by specific antibodies was added as Figure 1 – figure supplement 1G. These data show that recombinant poly-ubiquitin chains can be readily detected with both antibodies, but that only K48-linked chains were detected on TTP IPed from cells.

      4) The half-life of the non-ubiquitinated mutant of TTP (K→R) was not precisely compared to the half-life of the wild-type TTP protein (similar to the experiment presented in 1B).

      New data from TTP-KtoR chase experiments was added as Figure 1 – figure supplement 1E. The half-life was increased substantially from 1.4 h for wtTTP to 5.7 h for the mutant.

      5) The effect of the E1 ubiquitin ligase TAk-243 on endogenous TTP levels was not tested.

      New data assessing the effect of TAK-243 on endogenous TTP was added as Figure 1 – figure supplement 1B. Consistent with our data with exogenously expressed TTP, treatment with the inhibitor increased the abundance of endogenous TTP.

      6) While they demonstrate that TTP-HA is efficiently degraded after 3 to 7h of LPS stimulation (Fig 1B) and that the stronger decrease in mCherry-TTP fusion level occurs between 4 and 6h of LPS stimulation the screen for identification of TTP modulators is performed 16h of LPS stimulation (Fig 2A). The rationale behind this experimental setting is not explicitly described.

      We found that endogenous TTP and mCherry-TTP levels were substantially lower at 16 h post-LPS stimulation compared to 6 h. (see Fig. 1D), and reasoned that this would yield the best genetic screen window in which to identify mutant cells with non-functional degradation mechanisms.

      7) The authors did not directly test the effect of HUWE1 inactivation on endogenous TTP accumulation after blocking protein synthesis. This control seems important as data presented in figure 2E could result both from an effect of Huwe1 level on LPS-induced TTP synthesis and TTP degradation.

      New data from chase experiments with endogenous TTP have been added as Fig. 2G. Consistent with the data presented in Fig. 2E, TTP levels declined during the chase period in sgROSA control cells, with an estimated half-life of 3.7 h. In contrast, TTP levels did not significantly decline during the CHX chase period in Huwe1 KO cells, resulting in an estimated TTP protein half-life of ~20 h in this genotype.

      8) In the data presented in figure 2, it is not entirely clear what exactly the authors are referring to as "endogenous TTP". In Figure 2C endogenous TTP is detected by western blot on cells transfected with an mCherry-TTP fusion. In this case, the size difference allows unambiguous identification of the endogenous form of TTP (although one could not exclude that overexpressing a TTP fusion protein might affect the level of the endogenous protein). However, TTP and mCherry-TTP cannot be distinguished by FACS (Fig2 D and E). If cells used in the experiments shown in 2C and 2D-E are distinct, this should be mentioned more explicitly in the legend of Fig. 2. Otherwise, the detection of endogenous TTP should be performed on cells that do not express mCherry-TTP.

      Results from Fig. 2D/E are indeed from cells that do not express mCherry-TTP. Endogenous TTP is detected in these cells by intracellular antibody staining. The figure legend text has been updated to reflect that panel 2C is with the RAW264.7-Dox-Cas9-mCherry-TTP cell line, and D-E is with the RAW264.7-Dox-Cas9 cell line.

      9) The third part of the manuscript aims to demonstrate that loss of Huwe1 decreases the half-life of pro-inflammatory mRNAs controlled by TTP. In my opinion, this conclusion is reliably supported by the data presented in Figure 3 and Supplementary Figure 3. As the conclusion of this paragraph refers to the effect of TTP on the stability of these mRNAs, the measurement of TNF mRNA stability (Fig. sup. 3C) should be presented in the main part of Fig. 3.

      The TNF mRNA stability figure panel was moved to the main figures as Fig. 3C.

      10) Fig 4E aims to identify kinases and phosphatases potentially involved in TTP stability (line 277, line 298). However, the approach used here (a measure of intracellular TTP level) cannot distinguish between increased production of TTP or a decrease in TTP degradation.

      One of the main points of this experiment was to assess whether the steady-state increase in TTP in HUWE1 KO cells, which stems for an important part from increased stability (Fig. 2G), was influenced by TTP phospho-status. Thus, while we do not explicitly measure TTP protein half-life in this particular assay, it is very likely to reflect changes in TTP protein stability. This idea is consistent with the fact that treatment with p38i, MK2i, and CaclycA affected TTP steady-state levels consistent with their previously reported effects on TTP protein stability.

      11) Also, the result presented in fig. 4E, are not totally consistent with the results presented in 4A. Fig4D shows a similar level of endogenous TTP accumulating after 2h of LPS stimulation in Huwe1 KO and control cells while a clear difference in TTP level is observable in the same condition in fig. 4A. Could the difference in the TTP detection method (Western vs intracellular FACS) be responsible for this discrepancy?

      We do not exactly know, but agree that this could indeed be influenced by the measurement method per se, as well as small variations in cell density, or total sample numbers in a particular experiment (as this may increase the time outside of the incubator for handling/stimulations). The much larger sample size of the experiment from panel 6E, and having multiple different stimulations, may have contributed to a slightly delayed timing of the Huwe1-dependent phenotype. It is important to note, that we have consistently demonstrated with different measurement methods, that TTP is initially stabilized post-LPS treatment (2-3 h, insensitive to Huwe1 KO), followed by TTP degradation (6-16h, sensitive to Huwe1 KO).

      12) These experiments and data presented in Fig.5D show that the level of the TTP paralog ZFP36L1 accumulates in huwe1 KO cells but do not demonstrate that HUWE1 affects ZFP36L1 protein stability.

      We agree, and changed all instances in the text that claimed ZFP36L1 ‘stabilization’ to ‘increase in abundance’.

      13) Based on data presented in fig. 6 B and sup. 6B the authors conclude that residues S52 and 178, previously identified as regulators of TTP stability, are unlikely to be involved in HUWE1-dependent TTP accumulation. The data are only based on 2 independent experiments, one of which (fig 6B) shows a difference in TTP S52/S178 mutant in Huwe1 deficient cells as compared to wt TTP. These results seem therefore too preliminary to reliably exclude the implication of S52 and 178 on the HUWE1 accumulation of TTP.

      Additional new data with the S52/178 TTP mutant of six biological replicates has been added to the manuscript as Figure 6 – figure supplement 1C. Data from these experiments are consistent with our other results, and show that protein levels similarly increase for both wtTTP and the S52/178A mutant in Huwe1 KO cells.

      14) From these data, the authors conclude (line 416) that N-terminal deletion does not affect the TTP protein level. However, TTP accumulation in Huwe1 KO cells seems mostly lost in mutant N4. As mentioned above the limited number of replicates (n=2) and the absence of a statistical test makes the interpretation of this result difficult.

      Additional new data with the Δ4 mutant of two biological replicates has been added to the manuscript as Figure 6 – figure supplement 1E. Data from these experiments are consistent with our other results, and show that protein levels similarly increase for the Δ4 mutant in Huwe1 KO cells.

      15) Several TTP C-terminal mutants show a HUWE1-independent accumulation when compared to the wt protein (Fig6. D). Is this region identical to the unstructured region identified by Ngoc (line 1255) as a potent regulator of TTP degradation? If relevant this point should be discussed.

      Ngoc showed that fusion to GFP of either the N-terminal TTP part, or the TTP Cterminal part (aa 214-436), destabilized GFP in cells. Thus, the GFP destabilization was seemingly indiscriminate, and possibly caused by the disordered nature of the fusion construct per se. Since the C-terminal TTP part fused to GFP by Ngoc included aa 214-436, we cannot rule out that part of this effect was HUWE1-dependent. However, the discrepancy with our finding that the TTP N-terminus does not contribute to HUWE1-dependent TTP regulation, may suggest that the GFP fusions by Ngoc were destabilized by more general protein principles, rather than HUWE1-specific effects. Additional text conveying this notion was added to the Discussion section (line 490-497).

    1. Author Response

      Reviewer #1 (Public Review):

      Understanding the evolution of nitrogenases is a very important problem in the field of evolutionary biogeochemistry. Ancestral sequence reconstruction at least in theory could offer insights into how this planet alerting activity evolved from ancestors that did not reduce nitrogen. But the very many components of the nitrogenase enzyme system make this a very challenging question to answer.

      This paper now demonstrates the first empirical resurrection of functional ancestral nitrogenases both in vivo and in vitro. The nodes that are resurrected are very shallow in the nitrogenase tree and do not help answer how these proteins evolved. The authors' reasoning for choosing these nodes is that they are likely compatible with the metal cluster assembly machinery of their chosen host organism, A. vinelandii. The reader is left to wonder if deeper, more interesting nodes were tried but didn't yield any activity. As the paper stands, it proves that relatively shallow nitrogenase ancestors can be resurrected, but these nodes do not yet teach us anything very fundamental about how these enzymes evolved.

      Technically, this work was no doubt challenging. Genome engineering in A vinelandii is very difficult and time-consuming. This organism was chosen because it is an obligate aerobe, which makes it easier to handle than the many anaerobic bacteria and archaea that harbor nitrogenases. It does make one wonder if this choice of organism is wise: the authors themselves note that it probably has a set of specialized proteins that allow the nitrogenase to be assembled and function in the presence of oxygen. This may limit A. vinelandii's potential future ancestral reconstructions deeper in the tree, which according to the authors' reasoning probably requires different assembly machinery.

      The ancestral sequence reconstruction is done in two different ways: Two out of three reconstructions are carried out with what appears to be an incorrect algorithm implemented in older versions of RaxML. This algorithm is not a full marginal reconstruction, because it only considers the descendants of the node of interest for the reconstruction. The full algorithm (implemented e.g. in PAML and the newest versions of RaxML) considers all tips for a marginal reconstruction. The fact that this was called a marginal ancestral sequence reconstruction in RaxML's manual is unfortunate - as far as I understand it is in fact just the internal labelling of nodes produced by the pruning algorithm, which is not equivalent to a marginal reconstruction. In this specific case, it is unlikely that this has led to any fundamental issues with the reconstructions (as all are functional nitrogenases, which is to be expected in this part of the tree). For the shallower of the two nodes, the authors in fact verify that they get the same experimental results if they use PAML's full implementation of a marginal reconstruction (which yields a somewhat different sequence for this node). It would have been helpful to point this RaxML-related issue out in the methods, so as to prevent others from using this incorrect implementation of the ASR algorithm.

      One other slightly confusing aspect of the paper is that it contains two different maximum likelihood trees, which were apparently inferred using the same dataset, model, and version of RaxML. It is unclear why they have different topologies. This probably indicates a lack of convergence. Again, this does not cast any doubt on the uncontroversial findings of this paper that shallow nodes within the nitrogenases are also nitrogenases.

      We thank the reviewer for their careful appraisal of our article, and their helpful recommendations for improving its quality. We appreciate the reviewer’s comment regarding the experimental challenges associated with nitrogenase engineering and genetic studies of our bacterial model, Azotobacter vinelandii. The complexity of nitrogen fixation machinery does indeed present several experimental obstacles, though, as we note in our revised article, this feature also makes the systems-level approach we have implemented here ideal for evolutionary studies of nitrogenases and their associated network.

      The reviewer focuses on three central points: 1) the relevance of the targeted ancestral nodes for addressing fundamental questions concerning nitrogenase origins, 2) the applicability of our bacterial model for older reconstructions, and 3) issues associated with the different trees/methods for ancestral sequence reconstruction.

      Addressing the first point, we concede that targeting relatively shallow nodes cannot specifically test hypotheses concerning the earliest stages of nitrogenase evolution (e.g., “how this planet altering activity evolved from ancestors that did not reduce nitrogen”). Our central result is that a specific, enzymatic mechanism for dinitrogen binding reduction (established for three modern nitrogenases to date) extends back through nitrogenase ancestry over the studied timeline. More broadly, a conserved nitrogenase mechanism in the only surviving family of nitrogenase families suggests that life may have been constrained in its available strategies for achieving this challenging biochemical reaction. By comparison, multiple abiotic pathways for nitrogen fixation are feasible, and another, ecologically vital metabolism, carbon fixation, can proceed by at least seven pathways. Deeper investigations into these possible evolutionary constraints and across deeper portions of the nitrogenase tree will require continued study, which we anticipate will be facilitated by the experimental approach presented in this article.

      Concerning the applicability of our bacterial model, we agree that it is possible that older reconstructions may require different host organisms so as to provide a compatible genetic background. Similar considerations we have outlined in our article, including a systematic evaluation of the genetic components that likely accompanied nitrogenase ancestors in their ancient hosts, will likely be necessary. Nevertheless, we foresee that the general, systems-level approach that we have built for Azotobacter can be adapted for additional microbial models, and that these efforts will be worthwhile given the significance of biological nitrogen fixation to evolutionary biogeochemistry and microbial engineering applications.

      Finally, we thank the reviewer for noting the differences in the ancestral sequence reconstruction algorithms of RAxML v.8 and PAML and welcome an explanation of these issues in our revised article. We confirm that RAxML v.8 does not perform full marginal reconstruction (in contradiction to its description in the RAxML manual). Due to this concern, we repeated our ancestral sequence reconstruction with PAML, which, like newer versions of RAxML, does implement the full algorithm. Here, ancestors reconstructed by RAxML v.8 and PAML from equivalent phylogenetic nodes yield comparable experimental results, indicating that the algorithm differences have not significantly impacted the major outcomes of our study. In the second analysis, we repeated the entire phylogenetic ancestral sequence reconstruction workflow, though did not trim the alignment as we did in the first case (this has now been clarified). This likely explains the differences in our trees, as the reviewer notes. We have included these details in the Materials and Methods section of our revised article.

      In addition to expanding upon the points outlined above throughout the revised article, we have included additional text in the Discussion that elaborates on the limitations of our study, and in particular, the need to explore deeper portions of the nitrogenase tree in future work.

      Reviewer #2 (Public Review):

      The authors convincingly show that their reconstructed ancestral nitrogenases are active both in vivo and in vitro, and show similar inhibitory effects as extant/wild-type enzymes.

      The conclusion that, evolutionarily, there is a "single available mechanism for dinitrogen reduction" is not well explored in the paper. This suggests a limitation of using ancestral sequence reconstruction in this instance.

      We thank the reviewer for their comments and appreciate their assessment that the core experimental results are conclusively demonstrated, including in vivo/in vitro activity of ancestral nitrogenase enzymes and that they all exhibit the specific mechanism for dinitrogen binding and reduction, evidenced by hydrogen inhibition.

      We note the reviewer’s concern regarding the evolution of the dinitrogen reduction mechanism described above. Our primary conclusion is that this mechanism is conserved in the studied nitrogenase ancestors, which, together with previous demonstrations of this mechanism in the different nitrogenase isozymes (Mo, V, Fe) of Azotobacter vinelandii, suggests that this is an early evolved feature of the nitrogenase family. These enzymes have thus not only been performing an ecologically vital, metabolic function, but have likely been achieving this challenging biochemical reaction in the same manner for billions of years. We discuss the resulting implications as they relate to evolutionary constraints on biological nitrogen fixation strategies. We clarify that our presented paleomolecular approach cannot directly evaluate alternate evolutionary scenarios that did not persist and were not preserved in extant genomic sequences, as ancestral sequence reconstruction is fundamentally informed by extant sequence diversity. Our approach is a powerful tool for defining the contours of ancestral nitrogenase sequence-function space, which can serve as a basis for engineering and evaluating alternate scenarios. We have clarified these points in our Discussion.

      Reviewer #3 (Public Review):

      In this work, the authors attempt to probe the constraints on the early evolution of nitrogen fixation, the development of which presented a key metabolic transition. Given that life on Earth evolved only once (to our knowledge) which aspects were necessary and which may have taken a different course are open questions. Are there alternative forms of life, metabolic networks, or even enzymatic mechanisms that could have replaced the ones we see today, or is the space of possible biologies limited? This manuscript tests the ability of ancestrally-reconstructed molybdenum-dependent nitrogenase complexes to support diazotrophic growth in Azotobacter vinelandii, as well as in vivo and in vitro activity, which all point towards a conserved mechanism for nitrogen reduction at least since proteobacteria divergence.

      This is an ambitious project, requiring multiple techniques, systems, and approaches, and the successful combination of these is one of the major strengths of this work. Using parallel techniques is an important way to be certain that the overall results are robust, and an appropriate mix of in vivo and in vitro experiments is chosen here. The manuscript should serve as a useful model for how to combine phylogenetics and biochemistry.

      The nature of ASR means that a solid phylogeny and/or understanding of how robust the results are to uncertainty in reconstructed states is essential since all results flow from there. The overall phylogenetic methods used are appropriate and the system is an apt one for the technique, but there is not quite enough detail in the methods to be certain of the results. Given that only the single maximum a posteriori sequence is assayed at every 3 nodes, this may have compounding results in that the sensitivity to uncertainty in the reconstruction is increased. The authors appropriately make qualitative rather than quantitative inferences, but some hesitation towards the overall results still exists.

      The assumption that the Anc1A/B and Anc2 nodes correspond to ancestral states might be undermined by horizontal gene transmission, which has been reported for nif clusters. In particular, there may be different patterns of transmission for each element of the cluster. By performing reconstruction with a concatenated alignment, the phylogenetic signal is potentially maximized, but with the assumption that each gene has an identical history. Discordant transmission may cause an incorrect topology to be recovered.

      Finally, I am unsure if ASR is the most appropriate approach to answer questions of contingency and alternative pathways for protein evolution. ASR may tell what nitrogenase millions or billions of years ago looked like, but it can only say what has already existed. If there are different mechanisms or metabolic pathways enabling nitrogen fixation that simply never came to pass, via contingency and entrenchment or simple chance, ASR would say nothing about them. It is true that a conserved mechanism would point towards a constrained space for evolving nitrogen fixation, but that does not directly address it.

      Overall, despite these issues, the manuscript is compellingly written and the figures are attractive and clear, and help get the major narrative across. This work will be of interest to protein biochemists of evolutionary bent and microbial physiologists with an interest in the origins of life.

      We thank the reviewer for their evaluation of our study and appreciate their comments regarding the experimental effort involved and scientific significance. We have carefully considered their recommendations to improve our article.

      The reviewer’s critical comments concern 1) the level of detail regarding the phylogenetic methodology, 2) the impact of horizontal gene transfer on phylogenetic reconstructions, and 3) the appropriateness of ancestral sequence reconstruction for accessing alternate evolutionary scenarios in the emergence of biological nitrogen fixation.

      We have addressed the first point by including additional methodological details regarding our phylogenetic analyses in our Materials and Methods section, including alignment and model testing tools, as well as our rationale for using two ancestral sequence reconstruction methods, RAxML and PAML.

      Regarding the second point, we acknowledge that horizontal gene transfer has played a significant role in the evolution and distribution of biological nitrogen fixation, which has been established and explored in previous work by others. We have included in our Discussion an additional paragraph which addresses potential impact of horizontal gene transfer in nitrogenase evolution. Though we do not expect horizontal transfer to contribute a significant source of uncertainty in the timeline studied for the reasons discussed in the revised manuscript, we agree that it is an important consideration for future work and that may impact reconstructions in other lineages within the nitrogenase phylogeny.

      Finally, in new text within the Discussion, we also acknowledge that ancestral sequence reconstruction cannot yet directly test alternate historical scenarios. We have clarified our language concerning conservation and constraints in the evolution of biological nitrogen fixation. Because ancestral sequence reconstruction is informed by modern sequences, it is limited to exploring the historical sequence space within their shared ancestry. It is therefore possible that, early in the history of life, there were multiple enzymatic strategies for fixing nitrogen, and that they were outcompeted and thus have left no trace in modern genomes. Another possibility is that these alternate strategies simply never evolved.

      In the present study, we have identified a pattern of conservation with regard to a specific mechanism for dinitrogen binding and reduction, suggesting a level of evolutionary constraint that can be further interrogated. For example, ancestral sequence reconstruction, as implemented in our nitrogenase resurrection strategy, can be used to empirically investigate the underlying sources of these constraints. We note that despite decades of research in this domain, a full understanding of how nitrogenases perform this remarkable metabolic step, both today and in the past, remains elusive (as other reviewers of the present study have also noted). Evolutionarily informed studies of nitrogenase function enabled by ASR can reveal the design principles that have shaped its direct ancestry, which can potentially serve as a basis for engineering alternative molecular strategies for nitrogen fixation. The power of the molecular paleogenetic approach here is in extending functional investigations beyond the sequence space occupied by modern nitrogenase and identifying patterns in their functional variation through their evolutionary histories.

    1. Author Response

      Reviewer #1 (Public Review):

      The study's primary motivating goal of understanding how nutrigenomic signaling works in different contexts. The authors propose that OGT- a sugar-sensing enzyme- connects sugar levels to chromatin accessibility. Specifically, the authors hypothesize that the OGT/Plc-PRC axis in sweet taste neurons interprets the sugar levels and alters chromatin accessibility in sugar-activated neurons. However, the detailed model presented by authors on OGT/PRC/Pcl Rolled in regulating nutrigenomic signaling relies on pharmacological treatments and overexpression of transgenes to derive genetic interactions and pathways; these approaches provide speculative rather than convincing evidence. Secondly, evidence is absent to show that PRC occupancy remains the same in other neurons (non-sweet taste neurons) under varied sugar levels or OGT manipulations. Hence, the claim that OGT-mediated access to chromatin via PRC-Plc is a key regulatory arm of nutrigenomic signaling needs further substantiation.

      We thank the reviewer for their thoughtful reading of the manuscript and their suggestions. We disagree with the reviewer’s assessment that our work only relies solely on overexpression and pharmacological treatments and that this provides only “speculative” evidence. Indeed, both of the other two reviewers praised our approach:

      Reviewer 2: “This is an elegant group of experiments revealing mechanisms for how nutrigenomic signaling triggers cellular responses to nutrients”

      Reviewer 3: “Strengths: Good genetically targeted interventions; Thorough exploration of the epistatic relationships between different players in the system … The conclusions in this manuscript are mostly well or at least reasonably supported by data.

      All of our experiments combine genetic manipulations in combination with dietary and/or pharmacological treatments to show that molecular, neural, and behavioral taste phenotypes arise only in specific contexts, so no single phenotype occurs due to nonspecific manipulations. Without this approach, most of these epistatic relationships would be largely inaccessible in this system. We have also used a combination of both genetic and pharmacological tools to implicate not only genes but also their function (i.e., enzymatic activity) to nutrient-specific effects. Third, we established causality and relationship by inducing and rescuing the molecular, behavioral, and electrophysiological phenotypes. Thus, our model is based on a combination of direct and indirect data (genetic manipulations are by nature inferential) obtained from a controlled and careful set of experiments. Limitations of our approach were laid out under the “Limitation” section of the discussion, as well as alternative interpretations or possibilities. In the manuscript's revised version, we added additional genetic experiments to further support and validate our model and expanded data analyses as suggested by the reviewer.

      Reviewer #2 (Public Review):

      Nutrigenomics has advanced in recent years, with studies identifying how the food environment influences gene expression in multiple model organisms. The molecular mechanisms mediating these food-gene interactions are poorly understood. Previous work identified the enzyme O-GlcNAC (OGT) in mediating the decreased sensitivity in sweet-taste cells when exposed to a high-sugar diet. The present study, using fly gustatory neurons as a model, provides mechanistic insight into how nutrigenomic signaling encodes nutritional information into cellular changes. The authors expand previous work by showing that OGT is associated with neural chromatin at introns and transcriptional start sites, and that diet-induced changes in chromatin accessibility were amplified at loci with presence of both OGT and PRC2.1. The work also identifies Mitogen Activated Kinase as a critical mediator in this pathway. This is an elegant group of experiments revealing mechanisms for how nutrigenomic signaling triggers cellular responses to nutrients.

      We thank the reviewer for their thoughtful reading of the manuscript and their positive and actionable suggestions. We have addressed these in the revised manuscript.

      Reviewer #3 (Public Review):

      This paper dissects the molecular mechanisms of diet induced taste plasticity in Drosophila. The authors had previously identified two proteins essential for sugar-diet derived reduction of sweet taste sensitivity - OGT and PRC2.1. Here, they showed that OGT, an enzyme implicated in metabolic signaling with chromatin binding functions, also binds a range of genomic loci in the fly sweet gustatory receptor neurons where binding in a subset of those sites is diet composition dependent. Furthermore, a minority of OGT binding sites overlapped with PRC2.1 recruiter Pcl, where collectively binding of both proteins increased under sugar-diet while chromatin accessibility decreased. The authors demonstrate, that the observed taste plasticity requires catalytic activity of OGT, which impacts chromatin accessibility at shared OGT x Pcl but not diet induced occupancy. In an effort to identify transcriptional mechanisms that instantiate the plastic changes in sensory neuron functions the authors looked for transcription factors with enriched motifs around OGT binding sites and identified Stripe (Sr) as a transcription factor that yielded sugar taste phenotypes upon gain and loss of function experiments. In follow-up overexpression experiments, they show that this results in reduced taste sensitivity and reduced taste evoked spiking in gustatory receptor neurons. Notably the effects of Sr on taste sensitivity also depend on OGT catalytic activity as well as PRC2.1 function. Finally, they explore the function of rolled (rl) - an extracellular-signal regulated kinase (ERK) ortholog in Drosophila, suggested to function upstream of Sr - in diet induced gustatory plasticity. The authors showed that the overexpression of the constitutively active form of rl kinase results in reduced neuronal and behavioral responses to sucrose which was dependent on OGT catalytic activity. In sum, these findings reveal several new players that link dietary experience to sensory neuron plasticity and open up clear avenues to explore up- and downstream mechanisms mediating this phenomenon.

      Strengths:

      • Good genetically targeted interventions

      • Thorough exploration of the epistatic relationships between different players in the system• Identification of several new signaling systems and proteins regulating diet derived gustatory plasticity

      Weaknesses:

      • The GO term enrichment analyses with little functional follow up has limited explanatory power• ERK/rl data is a bit hard to interpret since any imbalance in this system appears to reduce gustatory sensitivity.

      The conclusions in this manuscript are mostly well or at least reasonably supported by data.

      We appreciate the reviewer’s thoughtful read of the manuscript and their feedback. We were pleased to read the reviewer’s positive comments on the experimental treatment of epistatic relationships and the identification of new pathways; we have addressed the reviewer’s comments and suggestions in the revised manuscript.

      We agree with the reviewer about the limited explanatory power of the GO term analysis. We have expanded our computation analysis of the OGT/PRC2 genes in Figure 5 and selected several of these genes for functional analysis. In the revised version of the manuscript, we show that several of the genes affected by diet via this nutrigenomic pathway impact sugar taste sensation as measured by PER. We also agree with the reviewer that the Erk data are harder to interpret than those from OGT or PRC2; this effect is somewhat expected, given the reported action of this kinase in neural activity and plasticity. Importantly, the epistatic interactions between ERK/Sr and OGT/PRC2 we discovered are intriguing and may be involved in other cellular processes beyond taste.

      Below are a few recommendations for improvement:

      • The paper claims to address cell-type-specific nutrigenomic regulatory mechanisms. However, this work only explores nutrigenomic mechanisms in a single cell type (Gr5a+ sweet sensing cells) and we don't really learn whether these nutrigenomic mechanisms exist in all other cell types or just Gr5a+ cells. It would be valuable to see how specific OGT and PRC2.1 binding locations and effects on chromatin accessibility are in a different cell type - e.g. bitter sensing Gr66a. This would reveal how global in nature these findings are and or which aspects of nutrigenomic signaling are specific for sweet sensory cells.

      This study is a cell-specific investigation of nutrigenomic mechanisms in the Gr5a+ sweet taste neurons, which is what we outlined to do. It was not our intention for this study to examine mechanisms across different cell types. However, we can understand the reviewer’s comment after rereading the abstract and introduction. As such, we have rewritten part of the manuscript to better introduce the rationale behind the study as the integration of metabolic signaling and cellular contexts. We hope this is now an improved framing for the study rationale.

      (As in response to the author’s recommendations): About analyzing the effects of diet on other cells; no doubt this is an interesting question. However, this also signifies embarking on a completely separate project that would take, optimistically speaking, at least one year to complete and require a budget of ~ $130,000 (see breakdown). Thus, this suggestion doesn’t seem in line with the peer review and editorial philosophy of eLife. Carrying out this new project would result in an additional 6-7 figures but would not fundamentally change the conclusion of the current work; in fact, it may even take away from the targeted integration of molecular biology and neuroscience we have tried to achieve. Beyond this, we do not have such an unallocated budget, and so this new project would require us first to generate preliminary data on the bitter neurons to write then a grant proposal to fund it; as you can appreciate, this would take longer than a year, especially since we do not even know if the bitter gustatory neurons are affected by a high-sugar diet. Beyond this, looking at the bitter neurons would do little to prove specificity. If we found no effects of this pathway on the activity of the bitter neurons, it wouldn’t establish that the changes in the sweet taste neurons are specific. In fact, the same pathway could be acting in some of the other thousands of fly circuits that were not investigated (Black swan effect). If we did find that OGT/PRC2/Sr play a role in the bitter neurons, it would also do little to disprove specificity since their targets would likely be different because the sets of genes expressed in these two sensory neurons are different. By analogy, the protein sensor mTOR is expressed and active in every cell, where it modulates some of the same targets (i.e., S6K); however, the effects of the pathway may be different due to the distinct metabolic and genetic idiosyncrasies of cells, as well as cellular compartments. This lack of specificity doesn’t mean that mTOR is not important. Finally, we would like to note that we have tested the effects of manipulating OGT levels in other neurons (dopamine and Mushroom Body Output Neurons) without effects on behavior or neural responses (May et al. 2020; Pardo-Garcia et al. 2022); based on these, OGT doesn’t seem to affect neurons indiscriminately.

      Budget = $129,000

      Salary and benefit for PD for 10 calendar months: (2 months behavior experiments, 2 months training for molecular biology experiments and troubleshooting in new neurons, 4 months growing flies and conducting experiments, 2 months data analysis and visualization)= $75,000. DAM ID: Pcl:dam and OGT:dam in CD and SD, with and without OSMI x 4 biological replicates per condition= 32 samples @ $500 per sample (UM Genomics core) $16,0000

      TRAP: Pcl mutant and OSMI in CD and SD x 4 biological replicates per condition + sequencing input = 32 samples @ $500 per sample (UM Genomics core) $16,0000

      Animals: $500 per person/10 months = $5,000

      Reagents: including sequencing kit (32 reactions =$6,000) x 2 = $12,000, and other reagents such as drugs and plastic = $17,000

      Note that this PD would have to be hired and retrained. The first author of the manuscript who carried out the molecular experiments graduated in Dec 2021 but failed to pass on the technical knowledge due to COVID restrictions at the UM: we were completely shut down until July 2020, and at 20% capacity from March 2020 to July 2021 (people couldn’t also work together to show techniques), and no new people joined the lab in 2020-2022 (most of the 2021 grad student class deferred to 2022).

      ● Behavioral data from the screen identifying Sr is missing. Which other candidates were screened and what were the phenotypes?

      We have now added the screen data in Fig. 5-Supplemental Fig. 1C. We targeted RNAi and OE transgenes against the candidate transcription factors (or control RNAi) to the Gr5a+ neurons and measured PER to 30, 20, and 5% sucrose in fasted flies on a control diet.

      ● Go terms analysis for Figure 4

      We selected a dozen DEGs dependent on OGT and PRC2.1 (purple circle in Fig. 4E) and tested the effects on PER when these were overexpressed or knocked down (depending on the direction of changes in the SD). In Fig. 4F we show the effects of a handful of them on proboscis responses to sucrose.

    1. Author Response

      Reviewer #2 (Public Review):

      The ability of the model to recreate one non-trivial aspect of the crossover distribution is not sufficient to rule out other possible models, which would be necessary to consider this work a significant advance. However, if the authors are able to provide additional, non-trivial predictions relating to this and to other experimental conditions, this would dramatically elevate their ability to claim that a coarsening-based mechanism is indeed the most plausible one to explain crossover distribution. Some of these conditions could involve experimental perturbation of key parameters in the model: HEI10 levels, the number of DSBs or recombination intermediates (the 'substrate' that ends up resulting in crossovers), the length of time coarsening is allowed to proceed, or the volume of the nucleus.

      As discussed above, we have now included additional experiments and modelling investigating the patterning of late-HEI10 foci in a pch2 mutant, which exhibits partial synapsis. We have also demonstrated that the nucleoplasmic coarsening model can explain the recently published massive elevation of COs in zyp1 + HEI10 overexpressor lines (Durand et al., 2022). We hope that these additional results, explaining other non-trivial aspects of CO patterning, sufficiently elevates this work to be considered as a significant advance within the field.

      Reviewer #3 (Public Review):

      The new model assumes the possibility of loading HEI10 directly from the nucleoplasm, which of course is logical considering the phenotype of the zyp1 mutant in Arabidopsis. However, in a situation where the SC is fully functional, should not we expect some level of nucleoplasmic coarsening in addition to the dominant SC-mediated coarsening? Should the original model not be corrected, and if it is not necessary (e.g., because it included this effect from the very beginning, or the effect is too weak and therefore negligible), the authors should discuss it. With reference to this observation, it would be worthwhile to compare different characteristics of both types of coarsening (e.g., time course).

      We agree with this reviewer that it seems intuitive and likely that some small amount of nucleoplasmic coarsening will persist even in the wild-type situation. As mentioned above, we have now explicitly modelled a combined version of the coarsening model than incorporates aspects of SC and nucleoplasm-mediated coarsening and compared this to simulation outputs from our original coarsening model (which did not incorporate nucleoplasmic recycling). The effects and implications of combining the two models on coarsening dynamics are now discussed.

      Recently, a preprint from the Raphael Mercier group has been released, in which the authors show a massive increase in crossover frequency in zyp1 mutants overexpressing HEI10. I think this is a great opportunity to check to what extent the parameters adopted by the authors in the nucleoplasmic coarsening model are universal and can correctly simulate such an experimental set-up. Therefore, can the authors perform such a simulation and validate it against the experimental data in Durand et al. doi.org/10.1101/2022.05.11.491364? Can CO sites identified by Durand et al. be used instead of MLH1 foci for the modeling?

      As mentioned above, we have now incorporated additional modelling demonstrating that the nucleoplasmic coarsening model can reproduce the massive increase in COs observed in zyp1 + HEI10 overexpressor lines (Durand et al., 2022). We have compared our model simulations against cytological data from this study (MLH1 counts from male Col-0 plants) as we feel this is the most appropriate data to compare our model against. The remaining CO patterning data in the Durand et al., paper is from genetic experiments, which are not optimal for comparing model simulations against for two main reasons. Firstly, the metric of interference (and coarsening) is microns of axis/SC length and not, for example, Mbp and we feel that (due to the non-uniform compaction of chromatin along pachytene chromosomes) the coarsening model cannot currently be reliably used to explain genetic mapping data. Secondly, genetic CO data includes both class I and class II COs, whereas the coarsening model only simulates class I CO patterning. Therefore, we strongly feel that, for now, it is better to exclusively rely on cytological data to fit our model against.

    1. Author Response

      Reviewer #2 (Public Review):

      By now, the public is aware of the peculiarities underlying the omicron variants emergence and dissemination globally. This study investigates the mutational biography underlying how mutation effects and epistasis manifest in binding to therapeutic receptors.

      The study highlights how epistasis and other mutation effect measurements manifest in phenotypes associated with antibody binding with respect to spike protein in the omicron variant. It rigorously tests a large suite of mutations in the omicron receptor binding domain, highlighting differences in how mutation effects affect binding to certain therapeutic antibodies.

      Interestingly, mutations of large effect drive escape from binding to certain antibodies, but not others (S309). The difference in the mutational signature is the most interesting finding, and in particular, the signature of how higher-order epistasis manifests in the partial escape in S309, but less so in the full escape of other antibodies.

      The results are timely, the scope enormous, and the analyses responsible.

      My only main criticisms walk the stylistic/scientific line: many of the others have pioneered discussions and methods relating to the measurement of epistasis in proteins and other biomolecules. While I recognize that the purpose of this study is focused on the public health implications, I would have appreciated more of a dive into the peculiarity of the finding with respect to epistasis. I think the authors could achieve this by doing the following:

      a) Reconciling discussions around the mutation effects in light of contemporary discussions of global epistasis "vs" idiosyncratic epistasis, etc. Several of the authors of the manuscript have written other leading manuscripts of the topic. I would appreciate it if the authors couched the findings within other studies in this arena.

      We added a discussion related to global epistasis at the end of the “Epistasis Analysis” methods section. We tried to highlight that the cause and relevance of global epistasis phenomena are quite different at molecular and at organismic level.

      B) While the methods used to detect epistasis in the manuscript make sense, the authors surely realize that methods used to measure is a contentious dimension of the field. I'd appreciate an appeal/explanation as to why their methods were used relative to others. For example, the Lasso correction makes sense, but there are other such methods. Citations and some explanation would be great.

      We added more context and justification in the methods section (Epistasis Analysis). We used Lasso correction not particularly to obtain a sparser representation of the epistasis coefficients (an assumption that is not always valid, particularly within proteins) but rather to reduce instabilities created by the Tobit model inference. In this inference, the model coefficients are unbounded. Thus, if one mutation causes a complete binding loss, all epistatic terms associated with this mutation are not constrained and can become very large in magnitude. A Lasso term with a small coefficient constrains these coefficients but will have a limited influence on the other coefficients.

      Lastly (somewhat relatedly), I found myself wanting the discussion to be bolder and more ambitious. The summary, as I read it, is on the nose and very direct (which is appropriate), but I want more: What do the findings say for greater discussions surrounding evolution in sequence space? For discussions of epistasis in proteins of a certain kind? In, my view, this data set offers fodder for fundamental discussion in evolutionary biology and evolutionary medicine. I recognize, however, the constraints: such topics may not be within the scope of a single paper, and such discussions may distract from the biomedical applications, which are more relevant for human health.

      But I might say something similar about the biomedical implications: the authors do a good job outlining exactly what happened, but what does this say about patterns (the role of mutations of large effect vs. higher-order epistasis) in some traits vs others? Why might we expect certain patterns of epistasis with respect to antibody binding relative to other pathogenic virus phenotypes?

      We agree that these are interesting questions, and have added a paragraph in the discussion to explore these points.

      In summary: rigorous and important work, and I congratulate the authors.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors investigate a means of cell communication through physical connections they call membrane tubules (similar or identical to the previously reported nanotubes, which they reference extensively). They show that Cas9 transfer between cells is facilitated by these structures rather than exosomes. A novel contribution is that this transfer is dependent on the pair of particular cell types and that the protein syncytin is required to establish a complete syncytial connection, which they show are open ended using electron microscopy.

      The data is convincing because of the multiple readouts for transfer and the ultrastructural verification of the connection. The results support their conclusions. The implications are obvious, since it represents an avenue of cellular communication and modifications. It would be exciting if they could show this occurring in vivo, such as in tissue. The implication of this would be that neighboring cells in a tissue could be entrained over time through transfer of material.

      Thank the reviewer for his/her comments and suggestion. It’s possible that the thick tubular connections found in this study also exist in vivo. A previous study reported that TNT-like structures were found in mouse or human primary tumor cells (PMID: 34494703; PMID: 34795441). Our transfer assays could be adopted to evaluate such transfer in primary cultures and in vivo. We anticipate this for future work.

      Reviewer #2 (Public Review):

      There is a lot of interest in how cells transfer materials (proteins, RNA, organelles) by extracellular vesicles (EV) and tunneling nanotubes (TNTs). Here, Zhang and Schekman developed quantitative assays, based on two different reporters, to measure EV and direct contact-dependent mediated transfer. The first assay is based on transfer of Cas9, which then edits a luciferase gene, whose enzymatic activity is then measured. The second assay is based on a split-GFP system. The experiments on EV trafficking convincingly show that purified exosomes, or any other diffusible agent, are unable to transfer functional Cas9 (either EV-tethered or untethered) and induce significant luciferase activity in acceptor cells. The authors suggest a plausible model by which Cas9 (with the gRNA?) gets "stuck" in such vesicles and is thus unable to enter the nucleus to edit the gene.

      To test alternative pathways of transfer, e.g. by direct cell-cell contact, the authors co-cultured donor and acceptor cells and detect significant luciferase activity. The split GFP assay also showed successful transfer. The authors further characterize this process by biochemical, genetic and imaging approaches. They conclude that a small percentage of cells in the population produce open-ended membrane tubules (which are wider and distinct from TNTs) that can transfer material between cells. This process depends on actin polymerization but not endocytosis or trogocytosis. The process also seems to depend on endogenously expressed Syncytin proteins - fusogens which could be responsible for the membrane fusion leading to the open ends of the tubules.

      The paper provides additional solid evidence to what is already known about the inefficiency of EV-mediated protein transport. Importantly, it provides an interesting new mechanism for contact-dependent transport of cellular material and assigns valuable new information about the possible function of Syncytins. However, the evidence that the proteins and vesicles transfer through the tubules is incomplete and a few more experiments are required. In addition, certain inconsistencies within the paper and with previous literature need to be resolved. Finally, some parts of the text, methods and the figures require re-writing or additional information for clarity.

      Major comments

      1) In Figure 1F, the authors compare the function of exosome-transported SBP-Cas9-GFP vs. transient transfection of SBP-Cas9-GFP. It is not clear if the cells in the transiently transfected culture also express the myc-str-CD63 and were treated with biotin. It is important to determine if CD63-tethering itself affects Cas9 function.

      Thank the reviewer for his comments and suggestions. We now show in Figure 1- figure supplement 1D that CD63-tethering itself does not affect Cas9 function.

      2) The authors do not rule out that TNTs are a mode of transfer in any of their experiments. Their actin polymerization inhibition experiments are also in-line with a TNT role in transfer. This possibility is not discussed in the discussion section.

      Yes, the results in this study do not rule out a role for TNTs in the transfer. At present, we are not aware of conditions that would functionally distinguish transfer mediated by TNTs and thick tubules. We have now included this in the Discussion section.

      3) Issues with the Split GFP assay:

      a) On page 4, line 176, the authors claim that "A mixture of cells before co-culture should not exhibit a GFP signal". However, this result is not presented.

      The results of mixture experiment are included in Figure 2-figure supplement 1D, E.

      b) The authors show in Figure 2C and F that in MBA/HEK co-culture or only HEK293T co-culture, there are dual-labeled, CFP-mCherry, cells. First - what is the % of this sub-population? Second, the authors dismiss this population as cell adhesion (Page 5, line 192) - but in the methods section they claim they gated for single particles (page 17, line 642), supposedly excluding such events. There is a simple way to resolve this - sort these dual labeled cells and visualize under the microscope. Finally - why do the authors think that the GFP halves can transfer but not the mature CFP or mCherry?

      The plot in the Figure 2C and F are displayed in an all-cell mode, not in singlet mode. The percentage of dual-labeled CFP-mCherry in singlet was 0-0.2%. Thus, most of the signal was from doublet, or cell adhesion. We did not claim that the mature CFP or mCherry cannot be transferred. We suggested that the GFP signal of split-GFP recombination may be a more accurate reflection of cytoplasmic transfer between cells. In contrast, mature CFP or mCherry may simply attach to the cell surface but not enter into the other cells.

      c) In the Cas9 experiments - the authors detect an increase in Nluc activity similar in order of magnitude that that of transient transfection with the Cas9 plasmid - suggesting most acceptor cells now express Nluc. However, only 6% of the cells are GFP positive in the split-GFP assay. Can the authors explain why the rate is so low in the split-GFP assay? One possibility (related to item #2 above) is that the split-GFP is transferred by TNTs.

      The Cas9-based Nluc activity assay is more sensitive as it measures an enzyme with a very high turnover number. The split-GFP assay requires a transfer of GFP fragments to produce intact GFP molecules where the signal is not amplified. We think this explains the dramatic increase in a signal once Cas9 is transferred. Our cell sorting results suggest that at least 6% of the receptor cells are transferred in the co-cultures. Of course, nothing in either analysis rules out a role for TNTs in this transfer.

      4) The membrane tubules, the membrane fusion and the transfer process are not well characterized:

      a) The suggested tubules are distinct from TNTs by diameter and (I presume, based on the images) that they are still attached to the surface - whereas TNTs are detached. However, how are these structures different from filopodia except that they (rarely) fuse?

      We used TIRF microscopy and found that the thick tubules are not attached to the surface (not shown). Filopodia are much closer in diameter to TNTs (0.1-0.4 micron). The thick tubules we observe are much thicker (2-4 micron in diameter).

      b) Figure 5E shows that the acceptor cells send out a tubule of its own to meet and fuse. Is this the case in all 8 open-ended tubules that were imaged? Is this structure absent in the closed-ended tubules (e.g. as seen in Figures 6 & 8)?

      Around half of open-ended tubules appeared to emanate from acceptor cells. Likewise, for closed-ended tubules, for example, in Figure 6E where a recipient HEK293T cell projected a short tubule.

      c) The authors suggest a model for transport of the proteins tethered to vesicles (via CD63 tethering). However, the data is incomplete.

      i) They show only a single example of this type of transport, without quantification. How frequent is this event?

      The transport of the proteins tethered to vesicles (via CD63 tethering) were found in all 8 open-ended tubules that we detected in this study.

      ii) Furthermore, the labeling does not conclusively show that these are vesicles and not protein aggregates. Labeling of the vesicle - by dye or protein marker will be useful to determine if these are indeed vesicles, and which type.

      In Figure 4B, the moving punctum in a tubular connection appears to contain SBP-Cas9-GFP, Streptavidin-CD63-mCherry, and the cell surface WGA conjugate that may have been internalized into a donor cell endosome, which indicates that the moving punctum is vesicle type. Nonetheless, in general we cannot distinguish the forms of Cas9 that are transferred and become localized to the nucleus of target cells and we make no claim other than to suggest this possibility that Cas9 may be transferred as an aggregate.

      iii) The data from Figure 2 suggest (if I understand correctly) transfer of the CD63-tethered half-GFP, further strengthening the idea of vesicular transfer. However, the authors also show efficient transfer of untethered Cas9 protein (Figure 2A and other figures). Does this mean that free protein can diffuse through these tubules? The Cas9 has an NLS so the un-tethered versions should be concentrated in the nucleus of donor cells. How, then, do they transfer? The authors do not provide visual evidence for this and I think it is important they would.

      Based on the results using the Cas9-based luciferase assay (His- or SBP-tagged Cas9) (Figure 2A) and split-GFP assay (free GFP1-10) (Figure 2G), we suggest that free protein could be transferred between cells. Our current imaging approach is not designed to quantify protein diffusion. However, we are able to detect from images that Cas9-GFP does not colocalize exclusively with CD63 or concentrate in the nucleus, but also appears in the cytoplasm. These data indicate that both vesicle association and free diffusion may mediate the transfer through tubules. We thank the referee for emphasizing this issue which we will consider for future work to distinguish the transfer types through tubules.

      iv) In Figures 6 & 8, where transfer is diminished, there are still red granules in acceptors cells (representing CD63-mcherry). Does this mean that vesicles do transfer, just not those with Cas9-GFP? Is this background of the imaging? The latter case would suggest that the red granule moving from donor to acceptor cells in figure 4 could also be "background". This matter needs to be resolved.

      There are a few red puncta in the acceptor cell in Figure 6B. Since the acceptor cell is close to and overlapped with other donor cells containing CD63-mCherry, the red signal may, as the reviewer suggests, be from donor cells and not as a result of transfer through tubular connections. However, donor-acceptor cultures of HEK293T where transfer is not observed, little CD63-mCherry signal, for example, in Figure 6a, was seen in acceptor cells, even during several hours of observation (Figure 6- figure supplement video). A minor red signal could arise from exosomes secreted by donor cells that are internalized by acceptor cells. Images of single-culture receptor cells were added in Figure 4- figure supplement 1.

      For Figure 8, we used MDA-MB-231 syncytin-2 knock-down cells containing Fluc:Nluc:mCherry as the receptor cell, thus in these experiments the red signal most likely represents mCherry expressed in the acceptor cells.

      In Figure 4, we observed moving punctum in a tubular connection which contained co-localized green, red, and purple signals, corresponding to SBP-Cas9-GFP, streptavidin-CD63-mCherry, and the WGA conjugate, respectively. The video of punctum transport (Figure 4-figure supplement video) suggests that the red signal is not “background”.

      5) Why do HEK293T do not transfer to HEK293T?

      a) A major inexplicable result is that HEK293T express high levels of both Syncytin proteins (Figure 7 - supp figure 1A) yet ectopic expression of mouse Syncytin increases transfer (Figure 7E). Why would that be? In addition, Fig 3A shows high transfer rates to A549 cells - which express the least amount of Syncytin. The authors suggest in the discussion that Syncytin in HEK293T might not be functional without real evidence.

      We cannot yet explain why the basal level of syncytin expressed in HEK293 cells is insufficient to promote open-ended tubular connections between these cells. It could be that the proteins are not well represented in a processed form at the cell surface. Nonetheless, ectopic expression of mouse syncytin-A in HEK293T produced some increased transfer but less than when syncytin-A is ectopically expressed in MDA-MB-231 cells (up to 4-fold vs. 30-fold change of Nluc/Fluc signal) (Figure 7E). Furthermore, we have added new results which show that apparent furin-processed forms of syncytin-A, -1 and -2 can be detected by cell surface biotinylation in transfected MDA-MB-231 cells (Figure 8-figure supplement 1D). All we demonstrate is that syncytin in the acceptor cell is required for fusion and we make no claim that it is the only protein or lipid at the cell surface in the acceptor cell required for fusion. Clearly, more work is essential to establish the complexity of this fusion reaction.

      For A549 cells, syncytin-1 is highly expressed in A549 cells, thus it is possible that syncytin-1 in A549 plays crucial roles in the process.

      b) In addition - previous publications (e.g. PMID: 35596004; 31735710) show that over expression of syncytin-1 or -2 in HEK293T cells causes massive cell-cell fusion. The authors do not provide images of the cells, to rule out cell-cell fusion in this particular case.

      Overexpression of syncytin-1 or -2 in cells indeed causes massive cell-cell fusion, while overexpression of syncytin-A induced much less cell fusion than syncytin-1, or -2. We have now added new images shown in Figure 8-figure supplement 1A-C to document these observations. It may be that overexpressed human syncytins are better represented in a furin-processed form in both cell types. In contrast, we did not observe donor-acceptor cell fusion at basal levels of expression of syncytin in HEK293T and MDA-MB-231. For example, the Figure 4-figure supplement video shows that tubular structures were seen to form and break during the course of visualization with a tubule fusion event but no cell fusion to form heterokaryons.

      Reviewer #3 (Public Review):

      In this manuscript, Zhang and Schekman investigated the mechanisms underlying intercellular cargo transfer. It has been proposed that cargo transfer between cells could be mediated by exosomes, tunneling nanotubes or thicker tubules. To determine which process is efficient in delivering cargos, the authors developed two quantitative approaches to study cargo transfer between cells. Their reporter assays showed clearly that the transfer of Cas9/gRNA is mediated by cell-cell contact, but not by exosome internalization and fusion. They showed that actin polymerization is required for the intercellular transfer of Cas9/gRNA, the latter of which is observed in the projected membrane tubule connections. The authors visualized the fine structure of the tubular connections by electron microscopy and observed organelles and vesicles in the open-ended tubular structure. The formation of the open-ended tubule connections depends on a plasma membrane fusion process. Moreover, they found that the endogenous trophoblast fusogens, syncytins, are required for the formation of open-ended tubular connections, and that syncytin depletion significantly reduced cargo Cas9 protein transfer.

      Overall, this is a very nice study providing much clarity on the modes of intercellular cargo transfer. Using two quantitative approaches, the authors demonstrated convincingly that exosomes do not mediate efficient transfer via endocytosis, but that the open-ended membrane tubular connections are required for efficient cargo transfer. Furthermore, the authors pinpointed syncytins as the plasma membrane fusogenic proteins involved in this process. Experiments were well designed and conducted, and the conclusions are mostly supported by the data. My specific comments are as follows.

      1) The authors showed that knocking down actin (which isoform?) in both donor and acceptor cells blocked transfer, and more so in the acceptor cells perhaps due to the greater knockdown efficiency in these cells. However, Arp2/3 complex knockdown in donor cells, but not recipient cell, reduced Cas9 transfer. It would be good to clarify whether the latter result suggests that the recipient cells use other actin nucleators rather than Arp2/3 to promote actin polymerization in the cargo transfer process. Are formins involved in the formation of these tubular connections?

      We thank the reviewer for his/her comments and suggestions. Beta-actin was knocked down in this study. We tried a formin inhibitor, SMIFH2 which resulted in a decrease the Cas9 transfer between cells (Figure 3F).

      2) The authors provided convincing evidence to show that the tubular connections are involved in cargo transfer. Intriguingly, in Figure 4-figure supplement video (upper right), protein transfer appeared to occur along a broad cell-cell contact region instead of a single tubular connection. How often does the former scenario occur? Is it possible that transfer can happen as long as cells are contacting each other and making protrusions that can fuse with the target cell?

      In the Figure 4-figure supplement video (upper right), it may be that several membrane tubes from several different donor cells contact at sites close to one another on the recipient cell resulting in the appearance a broad cell-cell contact. This was a rare observation. In our quantification, only 8 connections were open-ended in 120 cell-cell contact junctions. Once open-ended, or plasma membrane fused, cargo transfer is observed.

      3) The requirement of MFSD2A in both donor (HEK293T) and recipient (MDA-MB-231) cells is consistent with a role for syncytin-1 or 2 in both types of cells. Since HEK293T cells contain both syncytins and MFSD2A but cargo transfer does not occur among these cells, does this suggest that syncytins and/or MFSD2A are only trafficked to the HEK293T cell membrane in the presence of MDA-MB-231 cells?

      A proper answer to this question requires the visualization of syncytins and MFSD2A. The commercial syncytin antibodies were inadequate for immunofluorescence. In advance of the more detailed effort required to tag the genes for endogenous syncytin 1 and 2, we performed live cell imaging and surface biotin labeling of cells transiently transfected to express fluorescently-tagged forms of syncytin-1, -2 and -A. We now show that syncytin-A, -1, and -2 partially localize to the plasma membrane or the cell surface of MDA-MB-231 and at points of cell-cell contact. In fact, overexpression of codon-optimized human syncytin-1, and -2 induced dramatic HEK293T cell-cell fusion. However, at basal levels of syncytin expression, HEK293T could not form open-ended tubular connections, which may be because the basal level of syncytins are not well represented in a processed form at the cell surface or their activity is limited by unknown factors.

      As an independent test of cell surface localization, we used surface biotinylation to show that a fraction of the syncytins can be labeled externally (Figure 8-figure supplement 1D). This fraction shows evidence of proteolytic processing consistent with furin cleavage whereas the overwhelming majority of transfected syncytins detected in a blot of lysates suggests that most remain in the unprocessed precursor form, consistent with the punctate and reticular fluorescence images (Figure 8-figure supplement 1A-C).

      We used IF and GFP-tagged MFSD2A and found this protein partially localized to the plasma membrane of HEK293T cells (Figure 9E, F). Given the results reveal that cargos could be transferred among MDA-MB-231 cells (Figure 2G), syncytin and its receptor appear to function in transfer among these cells.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The authors show that there are several classes of Snf1 targets (Fig. 3e), most notably some that are phosphorylated immediately after Snf1 activation by glucose (<5 min) and others that are only phosphorylated after 15 min. In a simple view, all direct Snf1 targets should be phosphorylated immediately after Snf1 activation. Is that the case? What is the overlap between the direct targets found using the OBIKA assay and the slow and fast responding in vivo targets? What about the phosphorylation motif, does it differ between the groups? These points are not discussed in the text except to point out that the direct Snf1 target Msn4 is among the slowly phosphorylated group.

      This is a very good point and we have performed the suggested analysis, which resulted in an interesting finding that we describe now in the text as follows:

      “Notably, of the 145 confirmed target sites, 81 (i.e. 72%) were significantly regulated after both 5 min and 15 min. Of the remaining 64 sites, 32 responded only after 5 min, while the other 32 responded only after 15 min. Some of the former residues are located within Snf1 itself, the -subunit of the Snf1 complex (i.e. Sip1), the Snf1-targeting kinase Sak1, or Mig1, while some of the latter are located within the known Snf1-interacting proteins such as Gln3, Msn4, and Reg1. These observations indicate that Snf1-dependent phosphorylation initiates, as expected, within the Snf1 complex and then progresses to other effectors. Interestingly, based on the residues that responded exclusively after 5 min, we retrieved a perfect Snf1 consensus motif (i.e. an arginine residue in the -3 position and a leucine residue in the +4 position; Supplementary figure 2A). The one retrieved for the residues that respond exclusively at 15 min, in contrast, significantly deviated from this consensus motif (Supplementary figure 2B). The slight temporal deferral of Snf1 target phosphorylation may therefore perhaps in part be explained by reduced substrate affinity due to consensus motif divergence.”

      2) The data showing that Snf1-dependent phosphorylation of Pib2 plays a key role in triggering inhibition of TORC1 is convincing but is entirely dependent on a rescue of the TORC1 inhibition defect seen in cells where Snf1 is inhibited. That is, TORC1 is normally inactivated during glucose starvation; this does not occur when Snf1 is inhibited by 2nm-pp1 but does occur when Snf1 is inhibited in a strain carrying a phosphomimetic version of Pib2 (Pib2SESE). This indicates that Pib2 phosphorylation is sufficient to replace Snf1 signaling and inhibit TORC1 during glucose starvation. However, in a simple model, a phosphodead version of Pib2 (SASA) should have the opposite effect. That is TORC1 should remain active during glucose starvation in the Pib2SASA strain-but that is not the case (Fig. 4g). This point is not discussed in the paper; why do the authors think that TORC1 is inhibited normally in the SASA mutant inhibits TORC1 normally?

      We fully agree with this statement and have highlighted and discussed this issue now in the last paragraph of the results section (where we think this fits best) as follows:

      “In contrast, the separated and combined expression of Sch9S288A and Pib2S268A,S309A showed, as predicted, no significant effect in the same experiment. Unexpectedly, however, the latter combination did not result in transient reactivation of TORC1, like we observed in glucose-starved, Snf1-compromised cells. This may be explained if TORC1 reactivation would rely on specific biophysical properties of the non-phosphorylated serines within Sch9 and Pib2 that may not be mimicked by respective serine-to-alanine substitutions. Alternatively, Snf1 may employ additional parallel mechanisms (perhaps through phosphorylation of Tco89, Kog1, and/or other factors; see above) to prevent TORC1 reactivation even when Pib2 and Sch9 cannot be appropriately phosphorylated. While such models warrant future studies, our current data still suggest that Snf1-mediated phosphorylation of Pib2 and Sch9 may be both additive and together sufficient to appropriately maintain TORC1 inactive in glucose-starved cells”

      Reviewer #2 (Public Review):

      1) Because PIB2 is a major focus of the manuscript, I was surprised that it was not discussed in the introduction. I think it would be appropriate to discuss prior evidence linking this protein to TORC1.

      We thank the reviewer for this suggestion. Pib2 and its role in TORC1 control is now described in the introduction.

      2) The authors introduce mutations into PIB2 at two sites determined to be phosphorylated by SNF1, at S268 and S309. Somewhat confusing results are obtained, in that the PIB2 null and phosphomimic mutants (S268E and S309E) confer a similar TORC1 phenotype, compared to the S268A S308A mutant. These results require further explanation than simply that "TORC1 inactivation defect in SNF1-compromised cells is due to a defect in PIB1 phosphorylation". This is particularly intriguing given that the opposite results are observed with the SCH9 mutants, where the null and alanine mutants confer a similar phenotype compared to the S to E mutants.

      The finding that both loss of Pib2 and expression of the phosphomimetic allele yield the same phenotype is indeed counterintuitive. Hence, we fully agree with the criticism put forward here. We believe that the underlying reason for our observation is based on the unique property of Pib2 in having both a C-terminal TORC1-activating domain (CAD) and an-N-terminal TORC1-inhibitory domain (NID). We have addressed this point briefly in the discussion ("Our current data favor a model according to which Snf1-mediated phosphorylation of the Kog1-binding domain in Pib2 weakens its affinity to Kog1 and thereby reduces the TORC1-activating influence of Pib2 that is mediated by the C-terminal TORC1-activating (CAD) domain via a mechanism that is still largely elusive"), but now also address this issue in the results section as suggested.

      3) The authors conclude, based on the co-IP data in Figure 4H, that interactions between KOG1 and PIB2 are direct. However, it remains possible that interactions between these proteins are mediated by other components of TORC1 or within cells. This should be addressed.

      Please note that the Kog1-Pib2 interaction has previously been demonstrated by different methods. Accordingly, Pib2 has not only been shown to interact with Kog1 (or TORC1) in co-IP studies in vivo (PMID: 30485160, PMID: 29698392), but also by co-IP studies in vitro (PMID: 29698392, PMID: 28483912, PMID: 34535752). In addition, the interaction between Kog1-Pib2 has also been dissected (down to defined domains) by classical two hybrid analyses (PMID: 28481201). All of these studies are cited now in the introduction where Pib2 is discussed.

      4) The authors demonstrate convincingly that the PIB2 and SCH9 SNF1-specific phospho-site mutants have a detectable effect on TORC1, primarily by examining TORC1-dependent phosphorylation of SCH9. What is unclear is whether phosphorylation at these sites has a significant physiological impact on cells. It appears that the rapamycin hyper-sensitivity displayed in Figure 6E is the only data presented to address this question. It would be appropriate for the authors to comment further on the significance of SNF1-dependent phosphorylation of these two substrates.

      To further address the physiological role of the Snf1-dependent phosphorylation of Sch9 and Pib2 combined, we newly assessed the growth rate of the strain that expresses the Sch9SE and Pib2SESE alleles combined. Accordingly, we found the snf1as pib2SESE sch9SE strain to exhibit a significantly higher doubling time than the snf1as strain on both low-nitrogen-containing media and standard synthetic complete media. This is now included in the text (results section).

      Reviewer #3 (Public Review):

      1) Conceptually, the manuscript shows that Snf1 activity is important for the acute inhibition of TORC1 during glucose starvation. However, this is mainly restricted to 10 and 15 minutes of glucose starvation. After 20 minutes, TORC1 is inhibited by some unknown mechanisms independent of Snf1 (Hughes Hallet et al). This raises concern regarding the physiological relevance of Snf1-mediated TORC1 inhibition during acute glucose stress. The authors show that this regulation is important for the survival of cells under TORC1 inhibition. How do the authors envision that the acute role of Snf1 plays an important long-term physiological relevance during rapamycin treatment? Providing more support for the physiological relevance of this regulation will make this study of interest to a broad readership.

      Please see our response to point 4 of reviewer #2.

      2) Another major concern of the manuscript is the inconsistencies between the various representative immunoblots and their quantifications. The effect of AMPK activity on TORC1 signaling under glucose starvation seems very subtle. A few specific concerns are mentioned below:

      a) In figure 1A, the increase in TORC1 activity upon inhibition of analogue sensitive Snf1as by 2NM-PP1 is very marginal. Although quantification shows a significant increase, a representative western blot figure should be shown.

      We have replaced the original immunoblots with more representative ones in Figure 1A.

      b) Does deleting Snf1 itself have any effect on TORC1 activity? Lane 4 of figure 1A shows reduced activity compared to lane 1.

      TORC1 activity is generally assessed as the ratio between phosphorylated Sch9 and total Sch9 (see also below under (e)). Accordingly, based on the quantification of 6 blots (we added two more experiments to address this point; Figure 1B), loss of Snf1 has no significant impact on TORC1 activity in exponentially growing cells, as we expected.

      c) To show the effect of Snf1 on the repression of TORC1, the time-course experiments are run on two separate gels in figure 1C. Hence, it is difficult to compare the effect of Snf1 on unscheduled reactivation of TORC1 under glucose starvation.

      Please note that the data of the two blots were cross-normalized to the sample from exponentially growing cells (labeled “Exp”; i.e. the same sample was loaded on the two blots) in order to compare and quantify the effects of Snf1.

      d) In figure 1E, the effect of Reg1 deletion on TORC1 activity seems minor as both phospho- and total levels of Sch9 are reduced.

      As correctly pointed out by this reviewer, we consistently found the total Sch9 levels to be lower in reg1Δ cells when compared to wild-type cells. To assess TORC1 activity, we therefore always determine the ratio between phosphorylated Sch9 and total Sch9, and the respective ratio is significantly different in reg1∆ cells when compared to wild-type cells. We speculate that the reduced Sch9 levels in this mutant are caused by the reduced growth rate (PMID: 22140226) and hence lower protein synthesis rate (to which translation of SCH9 mRNA may be specifically sensitive).

      Since further mechanistic insights are based on these initial findings of figure 1, solidifying these observations is very important.

      3) In figure S1, the analogue sensitive Snf1as shows significant reduction in its activity (reduced S79 phosphorylation of ACC1-GFP). This raises the concern of whether this genetic background is an ideal system to resolve the mechanism of TORC1 suppression.

      The Snf1as allele is indeed hypomorphic, which we acknowledge appropriately in the text. We would like to point out however, that we took great care in each experiment to include the DMSO control that allowed us to unequivocally assign any observed effects to the specific drug-mediated inhibition of Snf1as. Importantly, we think that the hypomorphic nature of the Snf1as allele (which allows normal growth on non-fermentable carbon sources) represents a minor trade-off when compared to the advantages that this allele provides over the use of a snf1∆ strain, which exhibits a fundamentally reprogrammed transcriptome/proteome (PMID: 17981722). Accordingly, this allele allows the assessment of Snf1 inhibition on very short time scales while minimizing confounding large-scale proteome rearrangements that may indirectly affect the studies. Moreover, use of the Snf1as allele also allowed us to compare our results more directly with other phosphoproteome studies that used the same allele (PMID: 25005228, PMID: 28265048). Finally, please also note that our main conclusions (on Snf1-mediated control of TORC1) are corroborated by additional genetic data such as the ones in Figure 1A/E where we use snf1∆ and reg1∆ cells.

      4) In figure 2, during glucose restimulation, there is increased retention of Snf1as-pThr210 in the presence of 2NM-PP1. This suggests that the upstream glucose sensing pathway as well as Snf1 might be more active than in DMSO-treated cells. This also raises concerns regarding the suitability of the genetic background for the study. Can authors comment on why this phosphorylation persists? Does the phosphoproteomic analysis give any hint for this phenotype?

      This is a very good point. In fact, we forgot to mention in the text that the observed effect of the 2NM-PP1 treatment on Snf1-Thr210 phosphorylation has already been studied and mechanistically explained earlier (PMID: 23184934). Accordingly, the entry of the drug into the broader catalytic cleft of the Snf1as mutant causes the catalytic domain to be stabilized in a conformation, which prevents dephosphorylation of pThr210 by the dedicated Glc7-Reg1 phosphatase heterodimer. This can be observed each time when we compared 2NM-PP1- and DMSO-treated cells and probed for Snf1-Thr210 phosphorylation. This is, in fact, an independent control for proper 2NM-PP1 functioning. We have now added a sentence (including reference) that pinpoints this issue in the text.

      5) In figure 4H, where authors claim reduced binding of Kog1 to Pib2SESE, levels of Kog1 in input are also reduced. Can authors provide further support using colocalization studies? Also, does Pib2SESE has any defect in forming Kog1 bodies?

      We took great care to load equal amounts of IPed Pib2-myc variants and then normalized the co-IPed Kog1-HA on the IPed Pib2-myc variant levels. The Kog1-HA input levels vary a bit between the 4 experiments, but they are on average not significantly lower in Pib2SESE-myc-expressing cells when compared to WT cells. In addition, in our Co-IP experiments, the beads are saturated with Pib2-myc variants and Kog1-HA levels are generally not limiting. We therefore deem it fair to say that the Pib2SESE has a reduced affinity for Kog1. Based on our experience with other co-localization studies of membrane-bound proteins and protein complexes (e.g. TORC1 versus EGOC), we find it extremely difficult to quantify local interactions by fluorescence microscopy (unless they are close to all or nothing). In this case, where we have a partial defect in the interaction between Kog1 and Pib2SESE, we anticipate that such analyses will not allow us to draw additional conclusions.

      Regarding the issue of Kog1/TORC1-body formation: all of our mutations in PIB2 and SCH9 were introduced (by CRISPR-Cas9) in the genome of our snf1as strain, which was used throughout this study. To analyze Kog1/TORC1-bodies, we have therefore first tried to C-terminally tag KOG1 with GFP in the genome of our strain background (similarly as was done in the original description of Kog1 bodies; PMID: 26439012). However, because all our attempts failed to create KOG1-GFP in our strain, we assumed that this construct may be lethal in our strain background. This is not completely unexpected, as it is known that the Kog1-GFP allele is hypomorphic and temperature sensitive (PMID: 19144819). In an alternative approach, we have therefore set out to study TORC1 body formation in our strains by using a GFP-TOR1 allele that can be integrated into the genome and that expresses functional TORC1 (PMID: 25046117). As we have described earlier, the respective GFP-Tor1 construct localized on vacuolar membranes and on foci that we previously have shown to correspond to signaling endosomes (PMID: PMID: 30732525, 30527664). Unexpectedly, however, when we starved the respective cells for glucose, the number of GFP-Tor1 foci did only marginally increase (20%) in our strain background over a period of up to 1 hour. Given these various unexpected issues, we prefer to not include any of these preliminary data in the current version of our manuscript, but to rather follow up on these observations in a separate study. We deem this particularly justified as the current literature on TORC1-body and TOROID formation also appears controversial and may need further clarification. For instance, while TORC1-body formation has been suggested to represent a Snf1-dependent process that is dispensable for TORC1 inhibition (PMID: 30485160), TOROID formation has been suggested to represent a Snf1-independent process that is mechanistically linked to TORC1 inhibition (PMID: 28976958).

      6) In figure 5F, where the authors claim the Sch9SE mutant has lower TORC1 activity, the difference is very minor. Furthermore, corresponding lanes also show reduced levels of Snf1as expression. Hence, improved blots are required here. Also, an in vitro kinase assay with full-length Sch9 KD with and without the Ser288 mutation could solidify the observation that phosphorylation of Ser288 indeed affects TORC1-mediated phosphorylation.

      We have replaced the blots in Figure 5F with an alternative set that more clearly highlights the (statistically significant) differences, while also exhibiting more equal levels of Snf1as levels. Regarding the in vitro kinase assays: we have repeatedly tried to perform TORC1 kinase assays on full length Sch9KD without success. We currently believe that proper TORC1-mediated phosphorylation of Sch9 may have to occur on membranes to which both TORC1 and Sch9 are tethered through phospholipid interactions (PMID: 29237820). We are trying to set up such a system on liposomes, but we assume that this will be a major effort that cannot be resolved in due time.

      7) In figure 6E, the Sch9SE mutant shows no effect in the presence of rapamycin. Thus, in vivo, phosphorylation at Ser288 may not be perturbing the phosphorylation of Sch9 by TORC1.

      When cells are grown on glucose where TORC1 is highly active (as in Fig. 6E or 6A/B in Exp), expression of Sch9SE has no significant effect indeed. However, in glucose-starved cells, where TORC1 activity is low, expression of the Sch9S288E allele clearly and significantly contributes to inhibition of Sch9-Thr737 phosphorylation by TORC1 (Figure 6A/B and Figure 5F/G).

      8) According to the author's proposed mechanism, TORC1 activity in Pib2SASA or Pib2SASA/Sch9SA backgrounds should be higher during glucose starvation compared to the control strains. However, glucose starvation shows a similar level of reduction in TORC1 activity in these backgrounds. This raises concern regarding the proposed mechanism. The authors mainly base their conclusions on Ser to Glutamate mutants. The authors should be cautious that Ser to Glutamate changes may also affect the protein structure which can confer similar phenotypes. How do the authors justify this discrepancy?

      Please see our response to point 2 of reviewer #1.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sequence some of the oldest maize macroremains found to date, from lowland Peru. They find evidence that these specimens were already domesticated forms. They also find a lack of introgression from wild maize populations. Finally, they find evidence the Par_N16 sample already carried alleles for lowland adaptation.

      Overall I think this is an interesting topic, the study is well-written and executed for the most part. I have a variety of comments, most important of which revolve around methodological clarity. I will give those comments first.

      1) The authors should say in the Results section how "alleles previously reported to be adaptive to highlands and lowlands, specifically in Mesoamerica or South America" were identified in Takuno et al. 2015. What method was used? I see this partly comes in the Discussion eventually, but it would help to have it in the Results with more detail. The answer to this question would help a skeptical reader decide the appropriateness of the resource, given that many selection scans have been performed on maize genomes, the choice would ideally not be arbitrary.

      This was explained in more detail in the Material and Methods section, to keep the Results and Discussion sections more concise. However, we agree that adding a brief explanation in the Results section would be useful and we have modified the revised version accordingly. Now the relevant part of the section Specific adaptation to lowlands in Mesoamerica and South America reads as follows: “To assess this, we identified in Par_N16 all covered SNPs with alleles previously reported to be adaptive to highlands and lowlands, specifically in Mesoamerica or South America by Takuno and coworkers (Takuno et al., 2015). These authors used genome-wide SNP data from 94 Mesoamerican and South American landraces and identified SNPs with significant FST values to infer which allele was likely adaptive. For example, those SNPs showing significant FST only in Mesoamerica, were characterized as adaptive for lowlands if they were at high frequency in the lowland population and at low frequency in the highland population, and vice versa. The same was applied for South America (Takuno et al., 2015). They identified 668 Mesoamerican and 390 South American previously reported adaptive SNPs, from which 32 and 20 were covered in Par_N16, respectively.”

      2) How were the covered putative adaptive SNPs distributed in the genome? Were any clustered and linked? The random sampled SNPs should be similarly distributed to give an appropriate null.

      The SNPs in Takuno et al. (2015) are in general at a median distance of 353 bp from each other. The 20 adaptive sites covered in Par_N16 for South America (SA) are at a median distance of 8,301,843 bp (approximately 8.3 Mbp), while the 32 for Mesoamérica (MA) are at a median distance of 24,295,968 bp (approximately 24.3 Mbp). SNPs in five pairs from Mesoamerica are closer than 100 bp between them, but each pair is at a considerable distance (beyond 1 cM) from each other and from other SNPs covered in Par_N16. This same happens for only one SNP pair from South America. Then, in general, the covered adaptive SNPs are not clustered. For our random samples, the range of genomic distances between SNPs is similar to those of adaptive SNPs. This shows that our null distributions are adequate for our statistical purposes. The genomic positions of covered adaptive sites in Par_N16 are now included in a new Table in the revised version (Supplementary File 2). We have included these observations in the main text (section Specific adaptation to lowlands in Mesoamerica and South America), as follows: “In general, adaptive SNPs represented in Par_N16 were not clustered. The 20 South American adaptive SNPs are at a median distance of 8,301,843 bp, while the 32 Mesoamerican SNPs are at a median distance of 24,295,968 bp (Supplementary File 2). SNPs in five pairs from MA are closer than 100 bp between them, but each pair is at a considerable distance (beyond 1 cM) from each other and from other SNPs. This same happens for only one SNP pair from SA. Then, although at low proportions, the adaptive SNPs in Par_N16 are a bona fide representation of different genomic responses to selection pressures...” and “We analyzed some of these random samples and observed a similar behavior as the adaptive SNPs regarding the range of distances between SNPs (Fig, S18).”

      3) How is genetic similarity calculated? It should be briefly described in the Results.

      This is formally explained in the Material and Methods section, but now we have included a brief description in the Results section (Specific adaptation to lowlands in Mesoamerica and South America) as follows: “The allelic similarity is the average of the frequencies of the Par_N16 alleles in the intersected sites with each test population (see Material and Methods).”

      4) It would help for the authors to state why they focus on Par_N16, I did not see this in my reading. Presumably, the analyses done are because of the higher quality data, but it would also help to mention why Par_N16 was sequenced in an additional run.

      Indeed, Par_N16 has an endogenous DNA content of 1.1 %, while the other two samples presented a very low DNA content (0.2%). Therefore, we decided to invest more in the best sample, as a cost/benefit decision for additional sequencing. We have included brief explanations of this in the revised text. In the Results section Paleogenomic characterization of ancient maize samples, it reads as follows: “Due to its higher endogenous DNA content (one order of magnitude larger, we further sequenced the Par_N16 library, obtaining 459M additional reads, to generate a total of 851M for this sample (Table 2).” and “To determine if the specific elimination of C to T and G to A modifications could bias the results in favor of maize rather than teosinte alleles, an additional database was generated in which all transitions were eliminated (i.e., only transversions were included) in Par_N16 only, because it was the only sample with enough sequencing data to conduct this experiment.” While in the section Tests of gene flow from mexicana, is as follows: “Par_N16 was the only sample with enough DNA sequence data to perform this analysis. All the samples showed the same phylogenetic position; therefore, Par N 16 was considered to be representative of ancient Paredones maize.”

      5) In the sections on phylogenetic analysis, introgression, and D statistics, the authors could do a better job specifically indicating how the results support their conclusions.

      Precise indications of how our results support our conclusions are given in the Discussion section. Nevertheless, we added relevant sentences in the specified sections. In the section Relationship between ancient maize, extant landraces, and Balsas teosinte, we added the following: “Thus, based on genome-wide relatedness, Paredones maize clusters with extant domesticated Andean landraces, supporting both, a single origin for maize and that these Peruvian samples were already domesticated.” In the section on introgression and D-statistics (Tests of gene flow from mexicana), we improved the last sentence as follows: “These results consistently show the absence of significant gene flow between Par_N16 and mexicana, implying that the lineage that gave rise to Paredones maize left Mesoamerica without relevant introgressions from this teosinte.”

      Reviewer #2 (Public Review):

      In this foundational article, the authors conduct an ancient DNA characterization of maize unearthed in archaeological contexts from Paredones and Huaca Prieta in the Chicama river valley of Peru. These maize specimens were recovered by painstakingly controlled excavation. Their context would appear to be beyond reproach though the individual radiocarbon determinations should be subject to further scrutiny.

      1) Radiocarbon determination for at least one of the maize cobs analyzed for aDNA is not a direct date, but dates associated material. The authors should provide a table of the direct dates on the specimens that were analyzed for ancient DNA. They should also specify the type and quantity of material sent and whether the cob, glumes, pith, or husks were submitted for dates. Include δ13C determinations for each cob with laboratory analysis numbers because there is justifiable concern that at least one of these cob dates has a δ13C value suggesting the material dated is not maize. Generally, the δ13C for maize ranges from -14 to -7. One or more of the specimens subjected to ancient DNA analysis in this paper have δ13C values far outside of this confidence interval.

      The indirect radiocarbon date on a maize cob was derived from a single piece of wood charcoal in a hearth directly associated with the analyzed cob, both embedded in a thin intact floor in Unit 20 at the Paredones site. The assay on the charcoal and the floor are in an undisturbed stratigraphic context and are in agreement with assays on other maize and charcoal remains in floors both above and below the hearth. We have included this information in Table 1 in the revised version. The information sought by Reviewer 2 on the studied cobs was published previously in Grobman et al. 2012 and in Dillehay 2017. Since details of the cobs were published, we decided to submit only what we thought were pertinent data for this manuscript.

      As for the δ13C reading of one cob outside of the confidence interval for maize, the dated specimen with this value is a maize husk fragment. Both the macro- and micro-morphology and the ancient DNA analysis of the husk demonstrated it was maize. We do not understand what affected the δ13C value for this specimen. Similarly, three human skeletons from deeper site levels have δ13C values greater than the expected range for human remains.

      2) From the perspective of future scientists being able to repeat the analyses performed here, I would hope that all details of specimen treatment, extraction methods, read length and quality would need to be assiduously described. Routine analytical results should be reported so that comparisons with earlier and future results are facilitated, and not made difficult to decipher or search for.

      The general procedures for accurate ancient DNA extraction were described in Vallebueno-Estrada et al. 2016 and we do not see the need to repeat this information in this article. Specific aspects of sample treatment and DNA extraction of the samples analyzed here are described in the Material and Methods, section on Extraction and sequencing of ancient samples. Results on quality (percentage of endogenous DNA, quality-filtered reads, mapped reads to either repetitive or unique regions, amount of sequence mapped, mapping Phred scores, estimated error rates, percentage of deamination, fragment median lengths, percentage of sites with signatures of molecular damage, number of unique genomic sites covered and their corresponding average sequencing depth) are described in the Results, section Paleogenomic characterization of ancient maize samples. This section also includes the number of SNPs in relation to the reference and the number of intersected SNPs between our samples and the HapMap3 database. In addition, complementary information to this section is included in Tables 2-4 and supplementary Figures S2-S6, as properly referenced in the last mentioned section.

      3) The aDNA analysis may or may not be affected by the anomalous δ13C values but one would anticipate that standard aDNA extraction and analysis protocols would provide a means by which the specimen's preservation of the specimens could be ascertained, for example, perhaps deamination and fragmentation rates could be compared or average read length evaluated with modern-contemporary materials so that preservation of the Paredones samples relative to that of maize in the CIMMYT germplasm bank and the San Marcos specimens investigated by the same researchers can be evaluated.

      Average read length from contemporary material depends more on the sequencing platform than sample preservation. For example, Illumina can only read fragments of hundreds of base pairs, while MinIon or PacBio can read fragments in the order of kb. Also, deamination is not an issue in DNA extracted from modern material (unless bisulfite is used for methylation detection). Comparison with San Marcos samples indicates that Paredones samples are heavily degraded, although this is not a function of time only (humidity, temperature, and pH are among other relevant factors). Therefore, to avoid misleading interpretations, we are not including a comparison with San Marcos samples in the revised version.

      4) The size and shape of the cobs depicted are similar to specimens occurring much later in Mesoamerican assemblages. For example, the approximate rachis diameter of the San Marcos specimens depicted by Valle-Bueno et al. (2016: Fig.1) averages less than 0.5cm while the specimens depicted in Valle-Bueno et al. (this manuscript) average 1.0 cm. The former - San Marcos - specimens are dated at 5300-4970 BP cal while the larger - Paredones - specimens date roughly 6777 - 5324 BP cal. The considerable disparity among the smaller more recent specimens compared to the very much larger putatively older specimens suggests the Paredones specimen's radiocarbon determinations are equivocal. The authors point this out but repeatedly state these cobs are the most ancient; a conundrum that should be resolved.

      Radiocarbon determinations in Paredones are not equivocal, on the contrary, they are perfectly in agreement with and supported by the unimpeachable stratigraphy of the site and by more than 150 other radiocarbon and OSL dates from Paredones and nearby excavated contexts. The difference in morphology between the more recent samples from Tehuacan and the more ancient samples from Paredones is exactly the paradox we try to address. Our results indicate that the rapid migration and adaptation of maize to the coast of Peru in comparison with a slower migration and adaptation to Tehuacan lands explains this apparent conundrum. This rapid movement and migration allowed the presence of more “modern” maize in Peru than in Tehuacan on the respective dates. This more rapid maize development also coincides with more rapid and advanced socio-cultural transformations in Peru, including proto-urbanism (i.e, first cities), early religious symbolism, long-distance irrigation canals, and other major innovations that far exceed what was happening in Mesoamerica at the time.

      5) I would suggest the authors consider redating these three specimens and if they do, hope that they will prepare the laboratory personnel with depositional environment information. MacNeish was skeptical about late dates on maize at Tehuacan, at first. Adovasio was initially certain about maize's associated dates from Meadowcroft. One would prefer to be reasonably certain the foundation this article creates is solid; the author's repeated reference to these cobs as the most ancient in the Americas should be reaffirmed so retraction will not be necessary.

      As discussed in Grobman et al. 2012 and in Dillehay 2017, we do not confide in C14 dating of unburned corn remains due to the possible intrusion of fungi in the soft cellular structure of cobs. The chrono-stratigraphically acceptable dates on cobs and other maize remains were taken on burned and hard tissue remains, such as husks. See detailed discussion in Supplementary Materials.

      MacNeish and Adovasio were excavating cave and rock shelter sites, which are known to often have areas of stratigraphically disturbed deposits. Paredones, Huaca Prieta, SR-18 and other Preceramic sites excavated in the study area here contain late to early varieties of maize and radiocarbon assays that are in chrono-stratigraphic agreement. As noted in the main text and in prior publications, these sites are open air localities with clear stratigraphy defined by intact floor and fill sequences, with no tree root, animal burrowing, or other major taphonomic disturbances.There were occasional hearths and pits (i.e., human burials) that intruded into deeper floor-fill sequences but none of the assayed and studied maize samples were derived from these contexts. Once again, we encourage readers to examine the stratigraphy shown in the main text and in Grobman et al. (2012) and Dillehay (2017). Moreover, as noted in the text, there is a growing number of Preceramic sites in South America that date between 6800 and 6000 years ago and later that contain micro-maize remains (see Kistler et al., 2018). Not all of these sites are well-dated and present reliable contexts, but several have good chrono-stratigraphic settings and micro-evidence (e.g., phytoliths, starch grains) indicative of a maize presence at or prior to 6000 years ago.