992 Matching Annotations
  1. Aug 2021
    1. Author Response:

      Reviewer #2:

      This manuscript explored the effect N1-methylation of G37 of tRNAs in bacteria. The authors found that loss of methylation, through the depletion of trmD, results in defects in aminoacylation and peptidyl-transfer, leading to ribosome stalling and activation of the stringent response (as mediated by accumulation of deacylated tRNAs). Briefly, the authors conducted ribosome profiling on trmD conditional-knockout E coli cells and compared it to "wild-type" cells, and documented increased ribosome stalling on codons decoded by tRNAs modified by trmD. Stalling occurs when the ribosome is decoding these codons, i.e. when they occupy the A site. Further biochemical characterization showed that stalling is likely to occur due to defects in aminoacylation and peptide-bond formation for the trmD-substrate tRNAs, primarily for tRNAPro. Finally, analysis of gene expression shows that loss of trmD results in the activation of the stringent response as well as rewiring of central-carbon metabolism.

      Overall, this is a comprehensive study of an essential and universally conserved tRNA methylation. The manuscript expands on the role of m1G37 in translation, beyond its established role in reading-frame maintenance. However, the novelty of the findings was not immediately clear to me, and in particular whether they significantly advance our understanding of tRNA modification. For instance, it is known that defects in tRNA methylation (albeit different than N1-methylation of G37, discussed here) activates Gcn2 in yeast, which arguably is equivalent to the stringent response in bacteria.

      We thank this reviewer for the overall positive comments of our work. To address the concern about novelty, we have revised the fourth paragraph in Discussion (pp24-25) to emphasize the novelty of our finding.

      Specifically, most of the published genome-wide studies of tRNA modifications, leading to a stress response, are performed in eukaryotes (e.g., Saccharomyces, Neurospora, and Drosophila) (cited in the revised manuscript). Although we have shown that loss of the s2 group from the cmnm5s2U34-state in E. coli tRNAGln led to reduced aminoacylation and reduced tRNA binding and accommodation to the ribosome A site, we did not investigate whether it induces a stress response in E. coli (Rodriguez-Hernandez et al, 2013). While there are studies in bacteria that demonstrate changes of tRNA modifications in response to stress, these are not in the same theme as the focus of this work, which is to determine how changes of tRNA modifications induce a stress response. Thus, our work here provides an important example showing that m1G37 deficiency leads to the stringent response in E. coli, which is in parallel with the results of studies in eukaryotes showing that loss of tRNA modifications turns on the GCN4 response in yeast and the mTOR-like response in Drosophila. This parallel provides a framework for understanding the evolution of a common cellular priority that activates amino acid biosynthesis in response to deficiency of amino acids or to deficiency of tRNA modification, both of which would prevent active protein synthesis and compromise cell viability.

      Furthermore, the authors made the claim "In contrast, while m1 G37 deficiency reduces peptide bond formation for some tRNAs at the A site, it consistently reduces the rate of aminoacylation for all tRNAs examined, which has not been shown for other metabolically deficient tRNAs." in the discussion section, which is inaccurate. Previous data, some from the same group, has shown that thiolation of the wobble base in tRNAGln is important for aminoacylation, tRNA selection by the ribosome and reading-frame maintenance. The argument that m1G37's pleiotropic effect on translation is unique is not convincing.

      Yes, we agree with the reviewer and have removed the claim from Discussion. We apologize for our over-statement in the previous submission. We have also cited our own work on E. coli tRNAGln (Rodriguez-Hernandez et al, 2013), and explained that we did not investigate the possibility of a stress response in that work (pp24-25). These considerations demonstrate the novelty of this present work on m1G37 deficiency in E. coli, providing an example of a stress response in bacteria that is in parallel with the stress response in eukaryotes that is activated by changes of the post-transcriptional modification state of tRNA.

      Reviewer #3:

      The study expands upon the previous findings of the Hou lab that the lack of TrmD-catalyzed modification in the anticodons of several bacterial tRNAs leads to +1 frameshifting when the undermodified tRNA is positioned in the ribosomal P site. In the current study, the authors show that a number of other aspects of translation are affected when the m1G modification in the tRNA anticodon is lacking.

      Specifically, the study shows that undermodified tRNAs are less efficiently aminoacylated by the corresponding aminoacyl-tRNA synthetases leading to excessive presence of deacylated tRNAs. One of the consequences is ribosome pausing when the respective codons need to be decoded. The shift in the balance of aminoacyl-tRNA relative to deacyl-tRNA resembles the one caused by amino acid starvation. Indeed, the authors show that changes in the transcriptome triggered by reduced tRNA modification resemble those observed at stringent response.

      We thank this reviewer for the positive comments on our manuscript.

      While the paper is generally good and interesting in its current version it is not perfectly focused: discussion of the metabolic changes resulting from transcriptome remodeling are relatively fuzzy and do not contribute much to the main story.

      We agree with the reviewer that the discussion of metabolic changes resulting from transcriptome remodeling is preliminary. We have substantially shortened the Results section “Metabolic changes” (pp 21-22) and have removed a previous figure that illustrated metabolic changes.

      Another problem is that some of the claims (e.g. that the lack of anticodon modification affects peptide bond formation) are not properly termed and thus, misleading. In fact, the lack of tRNA modification affects dipeptide formation (possibly by interfering with decoding or tRNA accommodation) rather than influencing the rate of peptidyl transfer per se.

      We agree with the reviewer that our measurement of peptide-bond formation encompasses all of the reaction steps up to and including peptide-bond formation in the A site. The kobs of each of our measurements is a composite kinetic term that reports on the overall rate of peptide-bond formation. We have carefully revised the text to reflect this point in Results “Reduced aminoacylation and A-site peptide-bond formation of m1G37-deficient tRNAs” (pp13-14).

    1. Author Response:

      We appreciate reviewers’ favorable opinions regarding the significance and quality of our study, their excellent comments, and constructive criticisms. Following reviewers’ comments, we have performed additional experiments and re-analysis of our existing data, which confirms and strengthens conclusions of our study. We believe that we have been able to address all the reviewers’ concerns, and we thank the reviewers for the valuable comments, which have led to a substantial improvement of our manuscript.

    1. Author Response:

      Reviewer #1:

      Nava Gonzales et al. have reconstructed in unprecedented detail the morphology of olfactory sensory neurons (OSNs) within their sensilla in D. melanogaster, characterising the majority of sensory hairs, and OSNs types. To that end they used 8 datasets - 7 of which had been previously published - of serial block-face electron microscopy (SBEM) images where different individual OSN classes were genetically labelled in each dataset. The morphometric dataset collected will be a reference point for the field of olfaction research in Drosophila, and furthermore might inspire similar analyses of other sensory systems, building our understanding of how peripheral morphological features contribute to sensory neuron processing. In addition, they made several observations that warrant follow up studies in the future. These include: 1) Finding what seems to be new sensillum types, and identification of variation in the number of neurons within a single sensillum class, including empty sensilla. 2) mitochondrial enrichment in the dendritic base of certain OSN classes, 3) the presence of extracellular vacuoles within the sensillum lymph, likely derived from the tormogen accessory cell. The paper is purely descriptive but is a valuable addition to the literature and the claims made in the paper are well justified by the results. I have a few comments that I detail in the below.

      We thank the reviewer for sharing with us their appreciation for our study.

      • The authors should include more detail as to how the different sensillum classes were identified. The only information given is: "Within a morphological class, sensillum identity was determined by the number of enclosed neurons, the relative position of the sensillum on the antenna, as well as by genetic labelling when this information was available", and "we distinguished ab2 from ab3 by its characteristic antennal location". However, it is worth noting that while sensilla distribution across the antennae is heterogeneous and indeed specific sensillum types are restricted to particular domains, the distribution of many sensillum types follows a "salt and pepper" pattern, intermingling with each other. This is specifically the case for ab2 and ab3 sensilla, both found in partially overlapping regions of the antennae. Therefore, a more detailed description in the methods as to how each sensillum type was assigned will aid the reader understand how the authors reached their conclusions. Furthermore, the authors should avoid circular arguments, such as the one presented for ab2 sensilla, where the identification was made based on position (with the caveat highlighted above) and on the difference in size, but this difference is then used as part of the results, making the argument circular.

      We thank the reviewer for raising this point, in particular regarding the distinction between ab2 and ab3 sensilla. In the Results, we have now clarified that “Among the two large basiconic sensilla that house two neurons, we distinguished ab2 from ab3 by its lack of DAB staining in the Or22a dataset, in which ab3A was genetically labeled by APEX2. Apart from the Or22a dataset, an ab2 sensillum was identified in the Or7a dataset on the basis of its proximity to the labeled ab4 sensilla, because ab3 is not found in the same topographical region as ab4 (de Bruyne et al., 2001).”

      We have also described how each sensillum type was identified in the revised Source Data for Table 1.

      • Following on this point, one of the novel basiconic sensilla identified abx(3) is undistinguishable in terms of morphological features from ab3 sensilla. How was it then distinguished from ab3? Was it due to the lack of genetic marking? This is not explicitly stated in the manuscript and needs to be specified. Furthermore, the authors propose that this sensillum type could be an ab1 sensilla that is missing the ab1D neuron. How did they arrive to this conclusion? If it was based on location, this needs to be explained more explicitly.

      We apologize for the confusion. As indicated in the subheading, abx(3) designates a novel large basiconic sensillum type that houses three ORNs. In contrast, ab3 is a well-characterized large basiconic sensillum that are known to house only two ORNs. Therefore, we can distinguish abx(3) from ab3 according to the number of neurons found in each sensillum. To further clarify this matter, we have also indicated how each sensillum type was identified in the revised Source Data for Table 1.

      In addition, we wish to clarify that we proposed, instead of concluded, that abx(3) may represent an ab1 subset based on the similarity of their A and B neuron size differential (not based on antennal location). However, we agree with the reviewer: we cannot rule out the possibility that abx(3) is instead an ab3 subset, or houses three uncharacterized orphan ORNs. Therefore, we considered these possibilities in the revised Results, which reads “However, it is also possible that abx(3) represents an ab3 subset, or houses three uncharacterized orphan ORNs whose receptors have not yet been reported.”

      A suggestion is to show in Figure 1 a diagram of an antennae and indicate from where in the antennae each of the datasets was taken. Furthermore, in subsequent figures it would be good to show on a schematic antennae the approximate location of the described sensilla, and specify from which dataset they were reconstructed.

      We thank the reviewer for the excellent suggestion. We have included a schematic antenna in the revised Figure 1 to indicate the antennal regions covered by individual SBEM volumes.

      For each sensillum, we have also specified the source dataset from which it was identified (Source Data for Table 1). Further detailed information can be found in a new “Source Data for Figure 1” file.

      • I have some concerns regarding some of the claims made for ab2 sensilla, as these are based on a single sensillum reconstruction (Table 2, n=1 for ab2 sensilla).

      We appreciate the reviewer’s concern. Although we identified a total of four ab2 sensilla, only one of them contained neurons that could be segmented in their entirely. However, we note that in comparison to the ab3 neurons, the ab2 neurons are highly distinctive based on its striking A/B size differential (2.7 : 1 for ab2, Figure 6G, and 1.5 : 1 for ab3, Figure 7C).

      • The discovery of a large number of mitochondria in the inner dendritic segment of some OSN classes but not others is intriguing. Although there seem to be no correlation between this and the size of the soma and therefore spike amplitude generated by each OSN (see ab5A vs ab5B sensilla). It would be interesting if the authors could generate some graphs correlating the number of mitochondria with some physiological parameters previously published, such as spike amplitude, and resting spike frequency of each OSN type.

      We thank the reviewer for the suggestion. In preliminary analysis, we did not observe any correlation between mitochondria number and other ORN features. Although we prefer not to show this negative result in a separate figure, we have incorporated the reviewer’s suggestion by expanding Table 2 to include the mitochondria number.

      In addition, we wish to clarify that the resting spike frequency is determined by the receptor expressed in the ORNs (Hallem et al., Cell, 2004), which is independent of whether the neuron is a large- or small-spike ORN. By extension, the resting spike frequency is independent of the neuron’s morphometric features.

      • Their findings on at4 sensilla imply that this sensillum type should be reclassified as at4_T2 and at4_T3, because at4_T2 contains only two neurons expressing Or82a and Or47b, while at4_T3 sensilla contains three neurons, expressing Or82a, Or47a and Or65a. This is extremely interesting and predicts that there would be more Or82a and Or47a neurons in the antennae than Or65a neurons, something unexpected given the previous assumption of a single at4 sensillum type with 3 neurons. Based on this finding the authors claim: "We show that not all ORNs expressing the same receptor are house in a singular sensillum type". This statement should be rephrased as it was known before that the same receptor can be housed in two sensillum types, as it is the case for Or35a being hosted in both ac3i and ac3ii sensilla, being paired with either Ir75b or Ir75c.

      We agree with the reviewer that the sentence may have overstated the novelty of our finding. We have therefore removed the statement in the revised text.

      Besides these comments, the manuscript provides plenty of novel and intriguing findings that will set the bases for many future investigations.

      Once again, we thank the reviewer for expressing their appreciation for the significance of our study.

      Reviewer #2:

      Gonzales et al., took advantage of high-end automated, volume-based EM technology, and genetic labelling thus providing an extensive 3D morphometric dataset of 122 olfactory receptor neurons (ORN, that is about 10 per cent of the reported number of ORNs on the antenna of Drosophila melanogaster) grouped in 33 ORN types and housed in 13 of the 19 known antennal sensilla types. For the ORNs morphometric measures, such as ORN soma size and dendritic branching pattern are analyzed. In addition, over 500 sensilla, derived from eight data sets, are identified, including new morphological types. Cellular features, such as empty sensilla, mitochondria number, extracellular vacuoles and extensive dendritic branching in distinct ORNs are described. In selected cases the structure and relationship to the supporting cell in sensilla (thecogen, tormogen and trichogen) are depicted. The studies goes beyond previous structural work done in this field by covering a large number of sensilla and its olfactory receptors.

      The sheer number and completeness of the data strongly complements our knowledge of the sensilla assembly and ORN types in Drosophila. Of particular interest is the ORN cell variability but also their generic structural features (such as soma size for the A and B neuron) reported in a large number of identified ORNs. All olfactory sensilla types (basiconica, trichodea, coelonica) are covered in this study. Therefore, the data presented here are valuable for the experimental neurobiologist for comparing functional properties in ORNs (from own single cell ORN recordings), and is also of potential use for comparative studies in other insects outside the Drosophila neuroscience community.

      In general, the manuscript is well organized. The figures, including figure legends, are nicely designed to give a comprehensive overview that is mostly well to read with the accompanying text. See, my suggesting for improvements below.

      The morphometric analysis is restricted to ORN macroscopic features, such as cell size and dendrite branching pattern of ORNs, cellular features, such as mitochondria distribution, or the relationship to the sensilla supporting cells are only analyzed in exemplified cases.

      I do recommend for a publication in e-life providing the authors make an effort for a more detailed discussion of their findings, and a more comprehensive introduction, e.g. for essential sensilla components such as support cells.

      We thank the reviewer for the careful reading of our manuscript and for expressing their appreciation for the significance of our study.

      For a wider audience of the neuroscience community the manuscript would much benefit from:

      1) by expanding your discussion with respect functional significance of your findings: How does your classification of ORN types compares to previous anatomical and functional studies ?

      We wish to clarify that our study focuses on the nanoscale morphological and morphometric features of sensilla and ORNs, instead of the distribution of sensilla on the antenna. Each SBEM volume samples a specific portion of the antenna covering the APEX2-labeled sensilla, making it difficult to precisely determine its relative antennal location. Therefore, we do not feel comfortable drawing direct comparison to other studies regarding the distribution of sensilla on the antenna, as that is not the focus of our study.

      To address the reviewer’s concern, we wrote in the beginning of Results section “In agreement with the characteristic topographical distribution of sensilla on the antenna (de Bruyne, Foster, & Carlson, 2001; Grabe et al., 2016; Shanbhag et al., 1999), the four morphological sensillum classes were unevenly represented in our eight SBEM datasets (Figure 1B,C).”

      How does our classification of ORN types compare to previous functional studies? We wish to clarify that the odor response profile of an ORN is predominantly determined by the receptor expressed in the neuron (Hallem et al., Cell, 2004), which is independent of whether the neuron is a large- or small-spike ORN. By extension, the odor response profile is independent of the neuron’s morphometric features. It is therefore of limited usefulness to search for any correlation between ORNs’ morphological features and odor response properties.

      However, we have incorporated the reviewer’s suggestion by revising the text to include key ligands for ORNs that respond to ethologically salient odors. In addition, we have included the following sentence in the revised Table 2 legend “The odor response profiles for many of the characterized ORNs can be found in the DoOR database (<http://neuro.uni- konstanz.de/DoOR/default.html>)” such that readers who are curious about the functional data can easily find the information.

      Is an 'empty sensillum' a novel finding ?

      Yes, it has never been described before, making the identification of empty sensilla an exciting and novel finding. To clarify the confusion, we have explicitly stated "such empty sensilla have never been reported before" in the revised text.

      How are physiological responses on the receptor level correlate with neurons' soma size and number of mitochondria ?

      We thank the reviewer for raising this interesting question. Currently, there is no clear relationship demonstrated between ORN soma size and physiological response properties.

      The only known functional significance of ORN size differential is its impact on the asymmetrical ephaptic interaction between compartmentalized ORNs, which we have investigated in detail in our previous publication (Zhang et al., 2019). We have summarized our prior findings in the main text, which reads “Indeed, ORNs housed in the same sensillum can inhibit each other by means of direct electrical interaction, termed ephaptic coupling, which can also modulate fruitfly behavior in response to odor mixtures (Su, Menuz, Reisert, & Carlson, 2012; Zhang et al., 2019). Strikingly, in most sensillum types, lateral inhibition is asymmetric between compartmentalized ORNs: the large-spike neuron is not only capable of exerting greater ephaptic influence but is also less susceptible to ephaptic inhibition by its small-spike neighbors. Mechanistically, this functional disparity arises from the size difference between grouped neurons. The large-spike ORN has a larger soma than its small-spike neighbor(s); this feature is translated into a smaller input resistance for the “A” neuron, thus accounting for its dominance in ephaptic interaction (Zhang et al., 2019).”

      On a similar note, there is also no clear relationship between ORN mitochondria number and odor response properties. However, to addressed the reviewer’s comment, we have now provided background information on mitochondria function in olfactory signaling “We note that in vertebrate ORNs, mitochondria play a direct role in regulating cytosolic Ca2+ response profile and thereby ensure a broad dynamic range for the neurons’ spike responses (Fluegge et al., 2012). Although it is unclear whether mitochondria play a similar role in insect olfactory signaling, a recent study shows that odor-induced Ca2+ signals in Drosophila ORNs are shaped by mitochondria (Lucke, Kaltofen, Hansson, & Wicher, 2020). Therefore, it will be interesting to investigate the functional significance of this striking mitochondrial disparity between grouped ORNs in future research.”

      Some ORNs express more than one receptor, as shown recently previous work by the Potter lab: Task (2020) Widespread Polymodal Chemosensory Receptor Expression in Drosophila Olfactory Neurons 2020.11.07.355651 .

      Although the multiplicity of receptors expressed in individual insect ORNs raises intriguing questions, this information is not directly related to our study. It has been shown that deleting or the tuning OR in an ORN does not change its spike amplitude (Dobritsa et al, Neuron, 2003; Hallem et al., Cell, 2004), which by extension suggests that the receptor does not influence the morphometric feature of an ORN. To explicitly demonstrate this point, we have now provided the information in the revised Introduction “Interestingly, deleting or substituting the tuning receptor for an ORN does not alter its characteristic spike amplitude (Dobritsa, van der Goes van Naters, Warr, Steinbrecht, & Carlson, 2003; Hallem et al., 2004), suggesting that this feature is independent of the receptor identity.”

      2) The Table 2, that gives a summary of your result, should be more informative and presented in broader context of what is known on the receptors you describe. . Please, give a reference to the DoOR database (http://neuro.uni-konstanz.de/DoOR/default.html) that provides an excellent overview of functional and anatomical properties of ORNs. Additional columns, e.g. ORN corresponding glomeruli for the their representation in the antennal lobe, -DoOR response, -OR co-receptors, or -best ligand by of would be very valuable.

      As suggested, we have included ORN glomerulus projection as an additional identifier in Table 2. The revised legend now includes: “ORN identity is indicated by the sensillum type, relative spike amplitude (A, B, C or D), odor-tuning receptor, and glomerular projection.”

      For clarity, and in consideration of the large information load, we focus on our own morphometric data in Table 2. For the same reasons, we also focus on the tuning receptors as a key ORN identifier without mentioning other co-expressed receptors of unknown function.

      Furthermore, we wish to clarify that the odor response profile of an ORN is predominantly determined by the tuning receptor expressed in the neuron (Hallem et al., Cell, 2004), which is independent of whether the neuron is a large- or small-spike ORN. By extension, the odor response profile is independent of the neuron’s morphometric features. It is therefore of limited usefulness to search for any correlation between ORNs’ morphological features and odor response properties.

      However, we have incorporated the reviewer’s suggestion by revising the text to include key ligands for ORNs that respond to ethologically salient odors. We have also included the following sentence in the revised Table 2 legend “The odor response profiles for many of the characterized ORNs can be found in the DoOR database (<http://neuro.uni- konstanz.de/DoOR/default.html>)” such that readers who are curious about the functional data can easily find the information.

      For Figure 1, a clearer description of the location and representation of the genetically /non-genetically ORN and sensilla types is necessary. A nice overview is given by Grabe (2016), see Figure 1, here.

      We thank the reviewer for the excellent suggestion. We have now added a new panel (Fig 1C) to illustrate the antennal regions covered by individual SBEM volumes.

      3) Do you plan to make your datasets publicly available in an open source platform ? In particular, the non-genetically labelled, but identified ORN types are candidates for other researchers to explore cellular features in more detail. Can you make statements of the preservation of the ultrastucture in these preparations ?

      Such efforts were made for the Drosophila brain connectome with data repositories provided by HHMI Janelia Research Campus and further suggestions for appropriate software (https://www.janelia.org/project-team/flyem).

      We thank the reviewer for raising this critical point. All eight SBEM image volumes described in this study have been deposited in the Cell Image Library (http://www.cellimagelibrary.org/home). We have also provided the accession numbers in the revised “SBEM datasets” under the Materials and Methods section.

      As mentioned in the Introduction and Material and Methods sections, the antennal tissues were high-pressure frozen and freeze-substituted (i.e. cryofixed), which optimally preserved the ultrastructure of cells. We note that the tissue preservation method has been described and discussed in detail in our prior publication (Tsang et al, eLife, 2018).

      To address the reviewer’s comment, we have now stated explicitly in the Introduction “Taking advantage of the CryoChem method, which we have previously developed to permit high-quality ultrastructural preservation of cryofixed and genetically labeled samples for volume EM (Tsang et al., 2018), we have acquired serial block-face scanning electron microscopy (SBEM) images of antennal tissues in which select ORNs expressed an membrane-tethered EM marker (APEX2-mCD8GFP or APEX2-ORCO) (Tsang et al., 2018; Zhang et al., 2019).”

    1. Author Response:

      Reviewer #1:

      In this manuscript Shi and Fay investigate how natural genetic variation in cis-regulatory sequences impact gene expression dynamics, using budding yeast as a model. Much work in the field, including some landmark studies from this laboratory, have focused on allele specific expression. By contrast, relatively few have investigated the impact of natural genetic variation on the kinetics of gene expression, as the authors do here during the diauxic shift using both inter- and intra-specific hybrids. Strikingly, they find that ASE dynamics are more strongly associated with insertions and deletions than ASE levels. Using reporter assays the authors test which promoter regions and individual variants are sufficient to produce the observed dynamics of gene expression. By investigating chimeric promoter regions between species, the authors gain insight into constraints on the evolution of gene expression dynamics. This manuscript addresses an important question, the findings are novel, and the methods are appropriate. I have a couple of suggestions that I hope the authors will agree can improve their work.

      1) Line 124: I understand the focus on regulatory regions, but post transcription regulation of transcript stability can arise from many mechanisms. RNA binding proteins frequently interact with regions within an open reading frame. I understand the complications of considering coding mutations, but why exclude synonymous polymorphisms within ORFs, for example? At a bare minimum it should be noted in the text.

      We included all variants, synonymous or otherwise, within coding regions. We now state this in the methods. Coding region results were excluded from Figure 2, but are included in Table S4.

      2) In what is otherwise an exceptionally clear manuscript it took some time to understand on line 157 precisely how the 334 'regions' were defined from the 1,818 CREs. Some extra sentences would be very helpful to guide the reader here, perhaps with a figure panel to scaffold the logic.

      Some regions were excluded due to overlap with upstream genes. We have now stated this in the methods: "The intra-specific and inter-specific libraries respectively represented 334 and 452 regions upstream of 69 and 98 genes after removing regions that overlapped with upstream genes, and contained a total of 7,268 and 7,232 synthetic CRE sequences." We also modified the text in the results to indicate that the total number of CREs comes from the number of variants in the 334 regions. "The total library contained 1,818 CREs with four barcode replicates per CRE, and included all variants within 334 regions upstream of the 69 genes."

      3) In figure 4 the scale of the x-axis (time) is confusing. Most of the plots don't seem to start at t=0, but it is impossible to tell from the labeling. Because the timepoints highlighted also differ depending on the message being plotted, which is of course natural, interpreting differences in slope, etc. becomes confusing. The authors should either replot with the origins at t=0 or clearly indicate that there is a break in the axis.

      There are no breaks in the x-axes. We felt it would be misleading to put all the plots on the same time-scale, i.e. with t=0 being the first point. The reason is that glucose depletion occurs at a different time in the RNA-seq and CRE-seq experiments, both of which are shown in Figure 4. We have now added an arrow to indicate the approximate time of glucose depletion in both Figure 4 and Figure 5 in order to provide some indication of the time differences.

      4) Line 209 and 210 - I understand that the PhastCons scores did not improve the association between upstream polymorphisms and ASE dynamics, but it would be nice to hear a bit more from the authors about what this might mean. The observation is restated in the discussion but again mostly without any speculation about what it might mean before moving on to the discussion of technical limitations. If the result is true what might it mean?

      We have modified the discussion to clarify this issue: "Beyond technical differences, the absence of association with conserved sequences and binding sites could be related to differences in cis- regulatory variants underlying ASE levels versus dynamics, to the strains used in each study, or to our smaller sample size. Strain differences may be relevant since we used variants between two wild strains Oak and ChII, whereas Renganaath et al. (2020) used a wine and laboratory strain, the later of which has evolved under relaxed selection and has more deleterious variants (Gu et al., 2005; Doniger et al., 2008). Consistent with a sample size explanation, we found that PhastCons conservation scores improved the odds ratios from genome-wide logistic regression for SNPs with ASE levels and dynamics (Table S6)." Note that the last sentence has been changed to report improvement of odds ratios rather than significance of those ratios.

      Reviewer #2:

      Weaknesses:

      First, the results in the first half of the paper are not overly surprising. They boil down to "genetic variation does influence expression dynamics". This is not unexpected, given genetic variation has been shown to influence just about any cellular process studied so far. As such, the paper essentially confirms the existence of a phenomenon whose existence was not really in doubt. Fortunately, the work into causal variants in the second half of the paper does provide additional insight.

      Second, the results are somewhat descriptive. This is not uncommon for genomics work, but does leave the reader wondering how exactly a given variant may alter gene expression dynamics, especially if it neither occurs at a conserved site nor drastically changes transcription factor binding. I do understand that a deep dive into individual causal variants is outside of the already impressive scope of this paper. I nevertheless hope that one impact of this work will be future mechanistic studies of some of these variants.

      We acknowledge both of these weaknesses. Our goal was not to demonstrate the existence of expression dynamics but to determine whether patterns of variation in expression levels and dynamics are similar. While these results are descriptive we felt they were necessary to complete before testing whether cis-regulatory variants or their associated features (conservation and binding sites) differed between genes with ASE levels versus dynamics. We have edited the discussion to better put our work into perspective.

      Third, the statistical model to infer ASE strikes me as suboptimal (line 420). From how I understand the Methods section, allelic read counts are transformed to an allele frequency. This frequency is assumed to be 0.5 in the absence of ASE. ASE is then modeled as deviation from 0.5, using a linear model. This last point seems problematic. First, frequencies can only range from 0 and 1, whereas a basic linear model would be allowed to infer frequencies outside of this range. It is not clear to me that this model can properly capture the bounded nature of these data. Second, RNA-Seq data is count based, and transforming to an allele frequency loses information about the accuracy of each measurement. Specifically, genes with few reads have less power due to more stochastic counting noise. Third, the choice of weighting observation simply by the raw read counts (line 422) seems ad hoc and should be justified. More broadly, the authors could have opted for more established, count-based analysis strategies for ASE data, such as binomial tests or more advanced frameworks (e.g. beta-binomial tests as in https://www.biorxiv.org/content/10.1101/699074v2).

      We examined and have now included estimates of our false positive rate based on permutation re-sampling of the data. "Ten permutations of the data were used to validate the statistical cutoffs. Permuting the counts for the two alleles independently at each time-point yielded an average of between 0.3 and 2.0 false positive across the five hybrids at an FDR cutoff of 0.01 for the test of ASE levels. Permuting the time-points yielded an average of between 2.3 and 7.7 false positives across the five hybrids at an FDR cutoff of 0.01 for the test of ASE dynamics." To provide a comparison with a count based test we used DESeq2 to test for differences in ASE levels for the hybrid with S. paradoxus. We found 2,970 genes at an FDR cutoff of 1%, slightly more than the 2,930 genes found with the frequency test. The majority of these genes (2,530) were significant for both tests. Thus, we find that our existing statistics are valid, but we agree that count based method could be more powerful at detecting ASE levels than the frequency based test that we use. Our rationale for using the frequency based tests is as follows: We found no count based method that could detect auto-correlations whereas the Durbin- Watson auto-correlation test is applicable to allele frequencies. We wanted to use as similar a statistical framework as possible for the two tests of levels and dynamics and so we used allele frequency for both tests. To enable at least some means of taking into account the number of counts underlying the frequencies, we used counts as weights in both the Durbin-Watson test for ASE dynamics and the linear model test for levels.

      Fourth, there is only one biological replicate per hybrid, creating the risk that this one observation of the given time course may not be biologically representative. This also raises questions about how the linear model (see above) was fit without replicate data.

      For the linear model each time-point was used as a replicate measure of allele frequency. We agree that certain aspects of the data may be specific to the single time course we used. However, there are number of reasons we believe our results are biologically representative. First, we see similar patterns in each of the three intra-specific hybrids. This is the most direct evidence that the time-courses are biologically representative. Before addressing other evidence, we'd like to point out that all the technical error in the experiment (extraction, library preparation, sequencing, etc) is independent and statistically accounted for. Second, most but not all biological variation between time courses will affect both alleles. Such biological variation would include slight differences in rates of glucose depletion, rates of metabolism or other variations in the culture that easily affect gene expression. A greater concern is whether there is biological variation that differs between the two alleles. Stochastic noise in expression is a good example, is known to be common, and could cause allele differences to extend over time since RNA decay is not immediate. However, noise alone should not cause allele-specific differences at the population level since we measured the average across many cells and stochastic noise in expression is independent across cells. In summary, significant allele differences are unlikely to be specific to a single time course, although we recognize that the magnitude and time over which they change may differ between independent time courses. We did consider replication of the time-course. However, we found that the time-point at which glucose was depleted varied in replicate cultures by more than 15 minutes, which would have made it difficult to accurately align replicates based on glucose depletion.

      My final comments (these are not weaknesses but more discussion points) are about the analyses relating the number of sequence differences at a given gene to its strength of ASE (starting at line 120). The authors report significant associations, in line with previous studies. However, it is worth pointing out that this analysis makes an implicit assumption that there are multiple causal variants with effects in the same direction such that adding each variant would increase the ASE difference. The analyses cannot account for the case of multiple causal variants with effects in opposite directions. In this case, even a large number of variants could result in no net ASE. The authors' observation that the association between the number of variants and ASE is strongest for the most closely related strain pair (line 139) may be explained by this scenario. If there are many causal variants that cancel each other, having fewer variants in closely related strains reduces the opportunity for such cancellation. Given these considerations, it is actually somewhat surprising that there is any association between the number of variants at a gene and its ASE.

      We agree that the results of the logistic regression depend on divergence, and we have now added that the effects of multiple variants could cancel each other out: "Association between ASE and divergence may be weak or absent if most substitutions between species do not affect gene expression or if there are many substitutions that affect expression but they have random effects that cancel each other out." However, this appears to only occur in the inter-specific hybrids where the number of variants becomes so large that it becomes uninformative. Empirically we find that the increase in the probability of ASE with the number of variants is linear for the intra- specific hybrids (Figure 2-figure supplement 2). Thus, while this effect may be present it is not strong enough to eliminate the logistic regression signal from the intra-specific hybrids.

      Along similar lines, the authors' point (line 226 and end of the Discussion) that inter-species chimeras should lie between the two parental species unless there are epistatic interactions misses the possibility that there could be multiple causal variants with effects in different directions. Additive combinations of these may well create phenotypes more extreme than the parents. For example, say the distal promoter of a given gene has accumulated five variants that all increase expression by the same amount x, and the proximal promoter has accumulated four variants that each decrease expression by the same amount x. The net difference between species would be an increase of one x. A chimera that only has the five distal variants would show a difference of 5x without needing to evoke epistasis.

      We agree. We were assuming no relationship between the effects of the alleles and their position. Upon reflection this is not a good assumption and have revised the text accordingly:

      "Expression driven by chimeric sequences may lie within the range of the two parental species and can be used to map parental differences to the proximal or distal portion of the cis-regulatory region (Figure S8). However, chimera expression may also lie outside of the parental range if recombination brings together variants with effects in the same direction or if there are epistatic interactions between variants. Such cis-regulatory interactions are thought to be common due to binding site turnover (Zheng et al., 2011), and do not require expression divergence between the parental species."

    1. Author Response:

      Reviewer #1:

      The evolutionary conserved Notch receptor cell-cell communication pathway is required in cell fate decisions in many vertebrate and invertebrate cells. In Drosophila, Notch controls (among others) the cell fate decision of the sensory organ precursor cell, SOP. SOPs divides asymmetrically to give rise to an anterior and a posterior cell, pIIb and pIIa, respectively, which ultimately result in the formation of a bristle. In a recent paper form the Schweisguth lab (Trylinsky et al., 2017) is was shown that Notch is found both apical and basal of the midbody at the pIIa/pIIb interface during cytokinesis, and that it is mainly the basal pool of Notch that contributes to signaling.

      Houssin et al. now asked how polarity and signaling proteins involved are distributed during cytokinesis and how this distribution could impact on Notch signaling and hence fate decision. The authors show that during cytokinesis of the SOP several polarity determinants are re-distributed. Bazooka /Par3 becomes enriched at the pIIa/pIIb interface, where it occurs in nano-clusters, both apical and basal to the midbody, while aPKC remains in the apical compartment. Bazooka co-localizes with Notch, Sanpodo, Delta and Neuralised (Neur) in these clusters. In the absence of baz, both the apical and the lateral Notch-positive clusters are decreased in intensity and the number of lateral clusters is reduced at the pIIa/pIIb interface. Strikingly, this only slightly reduces the signaling activity of Notch. Formation of the Baz-Notch clusters depend on the Notch-cofactor Sanpodo: in its absence, the lateral Baz-Notch clusters do not assemble, suggesting that Sanpodo supports Notch signaling by promoting lateral clusters. From the data the authors conclude that the Notch/Baz/Spdo/Neur clusters represent the signaling units at the pIIa/pIIb interface.

      Major strengths and weaknesses

      The authors performed a very detailed analysis to further dissect how Notch signaling at the pIIa/pIIb interface is controlled. They used state-of-the-art live-cell imaging of tagged proteins in wild-type and mutant animals and applied careful statistical analyses of their data. Thereby, they provide a novel link between the role of the polarity protein Bazooka in clustering Notch, and how the particular redistribution of Bazooka/Notch in clusters on the lateral membrane during cytokinesis of the SOP organize putative signaling hubs.

      However, in the discussion the authors fall somewhat short to substantiate their main conclusion that these clusters "represent signaling units at the pIIa/pIIb interface." (line 560). First, although in the absence of Baz the number and size of Notch clusters are decreased, Notch signaling is only slightly affected.

      Second, no suggestion for any molecular mechanism is provided as to how Baz may organize these clusters, e.g. about the molecular interaction between Baz and Spdo, both of which are required to cluster Notch.

      We have not tested experimentally the putative molecular interaction between Baz and Spdo. As also explained in the discussion, we postulate several hypotheses regarding the mode of action of Baz (e.g. positioning of Notch/Spdo clusters, exocyst receptor, physical interaction with Notch/Spdo, regulation of Serrate activity). By way of comparison, although it is accepted that Baz, by assembling into nanoscopic clusters, regulates the repositioning of Cadherin-Catenin clusters at apico-lateral sites for AJ spot assembly (McGill et al., 2009), the underlying molecular mechanisms have not yet been characterised to our knowledge. Thus, understanding the mechanism of action of Baz is a study in itself, which we believe is beyond the scope of this work.

      And finally, the fact that the clusters are similar in composition apical and basal to the midbody does not help to support (or disprove) the conclusions put forward in Trylinsky et al., 2017, showing that Notch signaling mainly occurs by the lateral clusters.

      From the work published in (Trylinski and Schweisguth, 2017) and (Bellec et al, 2021) there is no question that both apical and basal pools of Notch contribute to signalling following asymmetric division of the SOP.

      The novelty of this study is to describe the function of Baz in Notch signalling, on the one hand, and the function of Baz in the assembly of the Delta, Neuralized/Notch, Spdo clusters, which we hypothesised would constitute Notch signalling units, at the apical as well as the lateral interface. Our findings on Baz/Notch/Spdo clusters further support the notion that signalling can occur from both sites, albeit likely not to same extent, as the apical pool has a short life compared with the lateral ones.

      Reviewer #2:

      Sensory organ precursor cells of the fly are a strong model system to understand the spatio- temporal regulation of Notch signalling in the context of cell fate regulation. Different signalling competent pools of Notch have been identified previously at the newly formed membrane that separates the two SOP daughters. It is unclear how for instance the Notch signalling pools are restricted to localize exclusively to this membrane.

      This study now takes a closer look at one of the Notch pools and finds that SOPs known to remodel PAR-dependent polarity at the beginning of mitosis, seem to remodel polarity once more, this time later, around anaphase when the new membrane is formed. This remodelling is evident with the assembly of intriguing Par3/Baz containing clusters that strikingly co-localize with Notch as well as other members of the Notch signalling pathway. Baz cluster formation is independent of Notch, but negatively regulated by Numb and Neuralized. Notch in turn depends on Baz to some extend to localize to the clusters. The study proposes that the Baz dependent clusters form a "snap button" type of platform to cluster Notch and facilitate directed Notch signalling, which is an interesting idea.

      The concept is relevant, especially as the dependency on PAR regulation provides an angle for future research to address the question why Notch accumulates only at the interphase of pIIa/b, but not at other interfaces with other neighbours in the future. The Baz clusters are well-described and the experiments to dissect their origin, dependency and impact on Notch well-designed.

      The signalling relevance of the different Notch pools is extremely challenging to address. This has been attempted in the past by the authors and redone in this study. Despite the fact that the sensitivity of these assays is notoriously noisy, the observed tendencies of signalling measured by nuclear Notch levels in the relevant cells support their model. Relevance of the Baz dependent Notch pool appears to be a likely possibility. The fact that this clusters are modulated by Numb, Delta, Neuralized ans Sanpodo are in contrast in strong favour that the here described Baz clusters are under control of this system and relevant.

      The study is a little imbalanced in the use of quantification, the phenotypes appear admittedly often evident and convincing, but would need to be backed up by more thorough quantification. Clarity of figures, legends and writing could be strengthened.

      We thank the reviewer for her/his constructive comments. In the revised manuscript, we have now quantified all the experiments, and added statistical tests where they were missing. We have also taken care to amend the legends and the body of the text to clarify the points raised by the reviewer.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper aims to address the question of whether the rotational dynamics in motor cortex may be due to sensory feedback signals rather than to recurrent connections and autonomous dynamics as is typically assumed. This is indeed a question of importance in neural control of movement.

      The authors employ both analyses of motor cortical data and simulation analyses where a neural network is trained to perform a motor task. For the simulations, the authors use a neural network model of a brain performing arm control tasks. Importantly, in addition to the task goals, the brain also receives delayed sensory feedback from the muscle activity and kinematics of the simulated arm. The brain is modeled either using a stack of two recurrent neural networks (RNN) or using two non-recurrent neural network layers to investigate the importance of autonomous recurrent dynamics. The authors use this framework to simulate the brain performing two tasks: 1) posture perturbation task, where the arm is perturbed by external loads and has to return to original posture, and 2) delayed center-out reach task. In both tasks, the authors apply jPCA to units of the trained network, simulated muscle activity, and simulated kinematics and investigate their rotational dynamics. They find that when using an RNN in the brain model, both the RNN layers and kinematics show rotational dynamics but the muscle activity does not. Interestingly, these conclusions for both tasks also hold when networks without recurrent connections are used instead of the RNNs. Also importantly, the rotational dynamics also exist in the sensory feedback signals about the limb state (e.g. joint position, velocity). These results suggest that recurrent dynamics are not necessary for the emergence of rotational dynamics in population activity, rather sensory feedback can also achieve the same.

      The authors perform similar jPCA analyses on monkey motor cortical (MC) or somatosensory cortical activity during the same two tasks and find largely consistent results. As with simulations, neural population activity and kinematics show rotational dynamics but muscle activity, which is explored only in the posture task, does not. Importantly, population activity in both motor and somatosensory cortices shows rotational dynamics. This observation is more consistent with the view that rotational dynamics emerge due to inter-region communications and processing of sensory feedback and planning, rather than autonomous dynamics within the motor cortex.

      The approach of the paper is interesting and valuable and the questions being addressed are very important to the field. To further improve the paper and the analyses, there are several major comments that should be addressed to fully support the conclusions and clarify the results:

      Major:

      1) In the Methods, the authors explain how they model a non-recurrent network as follows: "We also examined networks where we removed the recurrent connections from each layer by effectively setting Whh, Woo to zero for the entire simulation and optimization (NO-REC networks)". However, if this is the only modification, it still leaves recurrent elements in the network. For example, if we set W_{hh} to zero, equation 2 will be:

      h_{t+1} = (1-a) h_t + a tanh(W_{sh} * s_t + b_h)

      where a is a constant scalar (seems to be equal to 0.5). This is indeed still a recurrent neural network since h_{t+1} depends on ht. If their explanation in the Methods is accurate, then the current approach restricts the recurrent dynamics to be a specific linear dynamic (i.e. "h{t+1} = (1-a) ht + …") but does not fully remove them. The second layer is also similar (equation 3) and will still have recurrent linear dynamics even if W{oo} is set to 0. To be able to describe networks as non-recurrent, the first terms in equations 2 and 3 (that is (1-a)h_t and (1-a)o_t) should also be set to 0. This is critical as an important argument in the paper is that non-recurrent networks can also produce rotational dynamics, so the networks supporting that argument must be fully non-recurrent. Perhaps the authors have already done this but just didn't explain it in the Methods, in which case they should clarify the Methods. However, if the current Method description is accurate, they should rerun their NO-REC simulations by also setting the fixed linear recurrent components (that is (1-a)h_t and (1-a)*o_t) to zero as explained above to have a truly non-recurrent model.

      We thank the reviewer for raising this important concern. We have re-simulated the NO-REC network while removing the dynamics related to the leaky-integration component. This removal did not impact the network’s ability to perform the tasks and yielded virtually identical neural dynamics (see Figure 8). Throughout the Results we have updated the figures for the NO-REC network to the network without the leak-integration component.

      2) Assuming my comment in 1 is addressed and the results stay similar, the authors show in simulations that even without recurrent dynamics (referred to as the NO-REC case), rotational dynamics are observed in the simulated brain during both tasks (Figure 8). This result is used to suggest that the sensory feedback is what causes the rotational dynamics in the brain model in this case. However, I think to fully demonstrate the role of feedback, additional simulations are also needed where the sensory feedback is removed from the brain model. In other words, what would happen if recurrent and non-recurrent brain models are trained to perform the tasks but are not provided with the sensory feedback (only receive task goals)? One would expect the recurrent model to still be able to perform the task and autonomously produce similar rotational dynamics (as has been shown in prior work), but the non-recurrent model to fail in doing the task well and in showing rotational dynamics. I think adding such simulations without the feedback signals would really strengthen the paper and help its message.

      We apologize if the network architecture was not clear. In the case of the NO-REC network the only way they can generate the time-varying signals needed for the tasks is through sensory feedback. The network simply will not work without recurrent AND sensory feedback. For the posture task there are no additional inputs since it only receives sensory feedback. For the reaching task the task-goal input is static and the GO cue turns off on a timescale considerably shorter (~20ms) than the reach duration. Thus, the REC network would always perform better than the NO-REC network when sensory feedback was removed as the NO-REC network cannot generate any dynamics. We have now included in the Results the following statement. "Note, by removing the recurrent connections these networks can only generate time-varying outputs by exploiting the time-varying sensory inputs from the limb." (line 345-347).

      We have also now included simulations to highlight how REC networks that receive sensory feedback are able to generalize better to scenarios with increased motor noise than REC networks where sensory feedback is either completely removed (reaching task) or only provided at the beginning of the trial (posture task) (Figure S8). Thus, sensory feedback makes REC networks more robust in less predictable scenarios.

      We agree that this could be an interesting manipulation and have now included manipulations of the sensory feedback delays. We considered three separate delays, 0ms, 50ms and 100ms and found that there was a dependence on the rotational frequency of the top jPC plane with greater delays resulting in a general reduction in frequency (see now Supplementary Figure 10). There was less effect of delay on fit qualities to the constrained and unconstrained dynamical system. This has been added to the Results section (line 423-446).

      We simulated this scenario and found the answer to be rather complex and we have added these results to the supplementary material. The network's behavioural performance in the perturbation posture task is similar to the previous networks with joint-based feedback. However, the dynamics in the output layer are not the same with a clear reduction in how well the dynamics are described as rotational (Figure S11A-B).

      Oddly, rotational dynamics could still be observed in the input layer dynamics (data now shown) and the kinematic signals when they were converted to a cartesian reference frame (Figure S11D-E). Furthermore, rotational dynamics could emerge in the output layer if we used a different initialization method for the network weights. We initialized weights from a uniform distribution bound from ±1/√N, where N is the number of units. In contrast, previous studies have initialized network weights using a Gaussian distribution with standard deviation equal to g/√N where g is constant larger than 1. This alternative initialization scheme encourages strong intrinsic dynamics often needed for autonomous RNN models (Sussillo et al., 2015). We found networks initialized with this method and trained on the perturbation posture task exhibited stronger rotational dynamics with fits to the constrained and unconstrained dynamical systems of 0.5 and 0.88, respectively (Figure S11C-D). When examining the reaching task, we found similar results (Figure S11F-K). When initialized with a uniform distribution, fit quality for the constrained and unconstrained dynamical systems were 0.4 and 0.77, respectively (Figure S11F-G), which were smaller than for the joint-based feedback (Figure 7B, constrained R2=0.7, unconstrained R2=0.83). Qualitatively, the dynamics were different when the network was initialized with a Gaussian distribution (Figure S11H), however fit qualities were comparable between the two initialization methods (Figure S11 I). There was also a noticeable reduction in the fit quality for the kinematic signals particularly for the constrained dynamical system (Figure S11K, constrained R2=0.36, unconstrained R2=0.77). These findings have been added to the Results

      3) A measure of how well each trained network is able to perform the task should be provided. For example, is the non-recurrent network able to perform the tasks as accurately as the recurrent models? The authors could use an appropriate measure, for example average displacement in the posture task and time-to-target in the center-out task, to objectively quantify task performance for each network. Another performance measure could be the first term of the loss in equation 5. Also, plots of example trials that show the task performance should be provided for the non-recurrent networks (for example by adding to Figure 8), similar to how they are shown for the recurrent models in Figures 2 and 6.

      We have now presented and quantified the NO-REC network behavioural performance. Kinematics for the NO-REC network are shown in Figure S7A-C and E-G which are comparable to the REC network. Furthermore, quantifying the maximum displacement during the posture task yielded no obvious differences between the NO-REC and REC networks (Figure S7D). For the reaching task, the time-to-target was noticeably more variable and tended to be slower for the NO-REC network (Figure S7H). These observations have been added to the Results.

      4) An important observation is that rotational dynamics also exist in the sensory signals about the limb state. This may imply that the task structure that dictates the limb state and thus the associated sensory feedback may play an important role in the rotations without the recurrent connections. While the present study will be a valuable addition regardless of what the answer is, this is an important point to address: What is the role of the task structure in producing rotational dynamics? In both the posture task and the center-out task, the task instruction instructs subjects to return to the initial movement 'state' by the end of the trial: in the posture task the simulated arm needs to return to the original posture upon disturbance, and in the center out task the arm needs to start from zero velocity and settle at the target with zero velocity. Is this structure what's causing the rotational dynamics? This is an important question both for this paper and for the field and the authors have a great simulation setup to explore it. For example, what happens if the task instructions u* instruct the arm to follow a random trajectory continuously, instead of stopping at some targets? With a simulated tracking task like this, one could eliminate obvious cases of return-to-original-state from the task. Would the network still produce rotational dynamics? Of course, I don't expect the authors to collect experimental monkey data for such new tasks, rather to just change the task instructions in their numerical simulations to explore the dependence of observed rotational dynamics on the task structure. I think this will help the message of the paper and can be very useful for the field.

      We agree that a tracking task would be an interesting manipulation and have simulated this with the REC and NO-REC networks (Figure 9). Here, we trained up the network to reach from the starting position and track a target moving radially at a constant velocity for the rest of the trial (1.2seconds). Thus, the network has to move the limb at a constant velocity. We found there was a consistent reduction in how well the network’s dynamics (constrained R2=0.13, unconstrained R2=0.3) were described as rotational when compared to the previous reaching task (Figure 7, constrained R2=0.7, unconstrained R2=0.83). Also, note that this reduction in rotational dynamics remained even when we initialized the network weights using a Gaussian distribution (see Essential revision 2.3). These simulations have been added to the Results section.

      5) It would be beneficial if the authors could elaborate in the discussion on intuitive explanations of why sensory feedback can produce rotational dynamics even with no internal recurrent dynamics in the brain model. To me, it seems like sensory feedback is providing a path for recurrence to exist in the overall brain-arm system, so the non-recurrent neural networks can learn to exploit that path to effectively implement some recurrent dynamics. Some intuitive explanations like this will be helpful for readers.

      The main reason why rotational dynamics emerge in sensory feedback is due to the phase offset between the joint position and velocity as changes first occur in the velocity followed by position (see pendulum example Pandarinath et al., 2018a also DeWolf et al., 2016; Susilaradeya et al., 2019). This phase offset is maintained across reach directions and gives rise to the orderly rotational dynamics observed in the kinematic signals (DeWolf et al., 2016; Pandarinath et al., 2018a; Susilaradeya et al., 2019; Vyas et al., 2020). Furthermore, the tracking task disrupted this phase relationship and thus the rotational dynamics were substantively reduced in the network models. This text has been added to the Discussion (lines 519-526).

      6) One main result in data from non-human primates is that there exist rotations also in the somatosensory cortex not just in motor cortex. A more thorough discussion of prior work on rotational dynamics or lack thereof across brain regions and behavioral tasks is important to add here. For example, besides the works cited by the authors, there are other works such as (Kao et al., 2015; Gao et al., 2016; Remington et al., 2018; Stavisky et al., 2019; Aoi et al., 2020; Sani et al., 2021) that discuss or show rotational dynamics in various brain regions and behavioral tasks and should be cited and discussed.

      We have cited the above papers and included in the Discussion the following paragraph (lines 537-549) “Importantly, findings of rotational dynamics in cortical circuits are not trivial. Activity in the supplementary motor area does not exhibit rotational dynamics during reaching (Lara et al., 2018). The hand area of MC also does not exhibit rotational dynamics during grasping-only behaviour (Suresh et al., 2020), though it does exhibit rotational dynamics during reach-to-grasp (Abbaspourazad et al., 2021; Rouse and Schieber, 2018) which may reflect the reaching component of the behaviour. More broadly there is a growing body of work characterizing cortical neural dynamics across different behavioural tasks which have revealed rotational (Abbaspourazad et al., 2021; Aoi et al., 2020; Libby and Buschman, 2021; Remington et al., 2018; Sohn et al., 2019; Stavisky et al., 2019), helical (Russo et al., 2020), stationary (Machens et al., 2010), and ramping dynamics (Finkelstein et al., 2021; Kaufman et al., 2016; Machens et al., 2010) and these dynamics appear to support various classes of computations. Thus, finding rotational dynamics across the fronto-parietal circuit in our study is not trivial."

      7) The authors state that "In contrast, rotational dynamics appear to be absent in… MC activity during grasping driven by sensory inputs (Suresh et al., 2020)." There are other papers that study dynamics during reach-grasps and still finds rotational dynamics and modes (Abbaspourazad et al., 2021; Vaidya et al., 2015) and should be cited and discussed. The recent paper on naturalistic reach-grasps (Abbaspourazad et al., 2021) also argues for the involvement of a large-scale network in these movements, which further supports the authors' interpretation that "This interpretation of motor control emphasizes that the objective of the motor system is to attain the behavioural goal and this requires feedback processed by a distributed network." A discussion of this point made in this recent paper in the intro/discussion is important. Finally, there is a recent paper that argues for the input-driven nature of motor cortex (Sauerbrei et al., 2020) and is cited/discussed by the authors but briefly and mainly in the discussion. I think given the relevance of this recent paper to the core message here, it should also be briefly discussed in the introduction to better set up the work.

      We agree with the reviewer that there are discrepancies between the motor cortical dynamics reported by Suresh et al. 2020 and Abbaspourazad et al., 2021 during grasping tasks. This difference may reflect differences in task as in Suresh et al. 2020 the monkeys grasped objects whereas in Abbaspourazad et al., 2021 monkeys had to reach and grasp objects. Thus, rotations may reflect the reaching component of the behaviour. This has been elaborated on in the Discussion which now reads (lines 539-542) “The hand area of MC also does not exhibit rotational dynamics during grasping-only behaviour (Suresh et al., 2020), though it does exhibit rotational dynamics during reach-to-grasp (Abbaspourazad et al., 2021; Rouse and Schieber, 2018; Vaidya et al., 2015) which may reflect the reaching component of the behaviour.”.

      We have also briefly mentioned the findings by Sauerbrei et al. 2020 in the Introduction which now reads (line 79-81) “Lastly a recent study demonstrates that motor cortical dynamics are driven by inputs coming from motor thalamus (Sauerbrei et al., 2020)."

      Minor:

      1) The Methods are clear and comprehensive, but just to make understanding of the simulation setup easier, it would help to have a diagram of the computation graph for the recurrent and non-recurrent networks that shows their number of units, activations/nonlinearities, RNN cell type, etc., added as supplementary figure.

      We agree that this is useful and have added it to Figure 1

      2) Again, to help more clearly convey the simulations, it would help to show the task goals (x*) that are inputs to the simulated brain for example trials in each task (for example added to Figures 2 and 6).

      We agree that this is useful and have added it to Figure 1

      3) Similar to how VAF is shown on top of all plots of jPC planes, it would be helpful to have the rotation frequency for each jPC plane noted next to it. Currently it is difficult to find the jPC frequency associated with each plot from the text.

      We agree and have added it to the appropriate figures

      4) I am a bit surprised by how different the null distributions are for modeling muscle activity (Figure 3F) and kinematics (Figure 3H). The null distribution is simply the R2 for a constrained or unconstrained dynamic model fit to a subsampled version of the neural activity. The only difference between the null distributions in Figure 3F and Figure 3H seems to be the downsampled dimension, which for muscle activity is 6 and for kinematics is 4 (per equation 1). Any insight will be welcome as to why down sampling the population activity to 4 (Figure 3H) results in so much worse R2 compared with down sampling it to 6 (Figure 3F)?

      We thank the reviewer for raising this concern. Originally, we had applied PCA to reduce the dimensionality of the kinematic signals from 4 dimensions to 2, and the muscle signals from 6 to 4. We realize now that to be more conservative in our significance testing, we should use the full dimensionality of the kinematic and muscle signals. As such, we have changed the figures throughout to reflect this.

      References:

      Abbaspourazad, H., Choudhury, M., Wong, Y.T., Pesaran, B., Shanechi, M.M., 2021. Multiscale low-dimensional motor cortical state dynamics predict naturalistic reach-and-grasp behavior. Nature Communications 12, 607. https://doi.org/10.1038/s41467-020-20197-x

      Aoi, M.C., Mante, V., Pillow, J.W., 2020. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nature Neuroscience 1-11. https://doi.org/10.1038/s41593-020-0696-5

      Gao, Y., Archer, E.W., Paninski, L., Cunningham, J.P., 2016. Linear dynamical neural population models through nonlinear embeddings, in: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 163-171.

      Kao, J.C., Nuyujukian, P., Ryu, S.I., Churchland, M.M., Cunningham, J.P., Shenoy, K.V., 2015. Single-trial dynamics of motor cortex and their applications to brain-machine interfaces. Nature Communications 6, 7759. https://doi.org/10.1038/ncomms8759

      Remington, E.D., Narain, D., Hosseini, E.A., Jazayeri, M., 2018. Flexible Sensorimotor Computations through Rapid Reconfiguration of Cortical Dynamics. Neuron 98, 1005-1019.e5. https://doi.org/10.1016/j.neuron.2018.05.020

      Sani, O.G., Abbaspourazad, H., Wong, Y.T., Pesaran, B., Shanechi, M.M., 2021. Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Nature Neuroscience 24, 140-149. https://doi.org/10.1038/s41593-020-00733-0

      Stavisky, S.D., Willett, F.R., Wilson, G.H., Murphy, B.A., Rezaii, P., Avansino, D.T., Memberg, W.D., Miller, J.P., Kirsch, R.F., Hochberg, L.R., Ajiboye, A.B., Druckmann, S., Shenoy, K.V., Henderson, J.M., 2019. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8, e46015. https://doi.org/10.7554/eLife.46015

      Vaidya, M., Kording, K., Saleh, M., Takahashi, K., Hatsopoulos, N.G., 2015. Neural coordination during reach-to-grasp. Journal of Neurophysiology 114, 1827-1836. https://doi.org/10.1152/jn.00349.2015

    1. Author Response:

      Reviewer #1 (Public Review):

      There is continued speculation on the extent of within-host adaptive evolution of acutely infecting pathogens, including SARS-CoV-2 and influenza. Previous studies have found little evidence of positive selection during influenza infections of healthy adults. Here the authors examine within-host influenza dynamics in two interesting populations: children experiencing likely their first infections with H3N2, and children and adults infected with the newly emerging H1N1pdm09. The authors extend previous observations of adults infected with H3N2 to children, showing that despite potentially higher viral population sizes and/or longer infections, H3N2 largely experiences purifying selection within hosts. H1N1pdm09 infections, in some contrast, show some evidence of positive selection. The authors analyze specific substitutions in different genes, finding some evidence of CTL escape/reversion and epistasis through stabilizing mutations. Using a simple model, the investigators contend that H3N2 reaches mutation-selection equilibrium late in infections.

      This is a generally accurate and interesting analysis that enriches our understanding of within-host influenza dynamics. It is valuable to see the dynamics of (mostly) primary infections, where little antibody pressure is expected, and also some impact of the cellular immune response.

      We thank the reviewer for their careful consideration of our manuscript.

      My primary reservations concern the analysis of H1N1pdm09:

      First, the authors describe a higher rate of nonsynonymous substitutions early in infection, but the statistics backing this claim are unclear. Figure 2B shows box plots suggesting this trend, but the caption describes typically only two samples per day. In that case, it's better to plot the data points directly. Is there really statistical power to claim a significant trend over time and meaningful difference from H3N2?

      We agree that there is a lack in statistical power in the A/H1N1pdm09 virus dataset to claim meaningful differences in temporal trends to A/H3N2 within-host dynamics. The only reasonable conclusion that can be made here is that there was a greater accumulation in nonsynonymous iSNVs relative to synonymous ones in A/H1N1pdm09 within-host virus populations. As per the reviewer’s suggestion, we have now removed the boxplots for the A/H1N1pdm09 virus panel in Figure 2B, replacing it with a scatter plot. We have also updated the manuscript to reflect our inability to characterise the within-host temporal trends for A/H1N1pdm09 viruses using this dataset:

      Line 210: “We observed higher nonsynonymous evolutionary rates relative to synonymous ones initially after symptom onset but were unable to determine if they were significantly different due to the low number of samples (i.e. median = 2 samples per day post-symptom onset). In turn, we also could not meaningfully characterise the temporal trends of within-host evolution for the pandemic virus with this dataset. Nonetheless, consolidating over all samples across all time points, there was significantly higher rates of accumulation of nonsynonymous variants in the polymerase basic 2 (PB2), polymerase acidic (PA), HA and matrix (M) gene segments (Figure 2B, Figure 2 – figure supplement 2 and Figure 3 – figure supplement 2). All gene segments also yielded NS/S ratios > 1 (Table S1).”

      Line 565: “Owing to the low number of A/H1N1pdm09 virus samples and different next-generation sequencing platforms used to sequence samples of the two virus subtypes and consequently differences in base calling error rates and depth of coverage (Figure 1 – figure supplement 1), we were unable to directly compare the observed levels of within-host genetic diversity and evolutionary dynamics between the two influenza subtypes here.”

      Second, the authors interpret individuals infected with H1N1pdm09 infections as being as naive to the virus as ~2 year olds experiencing their first H3N2 infection (ll. 352-354). Setting cellular immunity aside--- which maybe we shouldn't---at least two studies found substantial targeting of an epitope on H1N1pdm09 HA that was homologous to H1N1 HA epitopes from the late 1970s and early 1980s (Linderman et al., 2014, PNAS, and Huang et al., 2015, JCI). In other words, there likely is some adaptive immune pressure with these H1N1pdm09 infections.

      Linderman et al. (PNAS, 2014) and Huang et al. (JCI, 2015) found that individuals born prior to the early 1980s possessed antibodies that recognized HA-166K (H3 numbering) residing in the Sa antigenic site of A/H1N1pdm09 viruses. They attributed this to previous exposures to seasonal A/H1N1 viruses with the HA-166K Sa epitope. This adaptive immune response likely led to the fixation of HA-K166Q in A/H1N1pdm09 viruses, which abrogated antibody recognition of this epitope. However, this epitope was shielded by glycans in seasonal A/H1N1 viruses in 1986 due to the acquisition of a glycosylation site in HA-129. As such individuals born after the late 1980s did not possess the same antibodies and are therefore unlikely to exert the same adaptive immune pressure as their older counterparts.

      Out of the 32 A/H1N1pdm09-infected individuals analysed in our study, only six of them were born before 1986. The median birth year of all individuals was 1999 (IQR = 1989, 2005). Hence, the same adaptive immune pressure on HA-166K was not present in these younger individuals during the first wave of the A/H1N1pdm09 pandemic then. We also did not detect the HA-166Q variant in any of the six older individuals born prior to 1986.

      Besides HA-166K, Li et al. (JEM, 2013) also found that individuals born between 1983 and 1996 have narrowly focused antibodies against the HA-133K epitope as a result of previous exposures to seasonal A/H1N1 viruses. HA-133K has, however, remained conserved in the global A/H1N1pdm09 virus population to date. We also did not find any variants above the calling threshold in any of the individuals investigated.

      The HA protein is the primary target of human adaptive immune response, which in turn drives its antigenic evolution (Petrova and Russell, Nat Rev Microbiol, 2018). In terms of cellular immunity, HA encodes few CTL epitopes (Woolthuis et al., Sci Rep, 2016). Most CTL epitopes are found in the nucleoprotein (NP), which we have considered here in our discussion observing recurrent NP-G384R variants independently found in multiple individuals.

      Finally, it is curious that mutation-selection balance is posited for H3N2 but not H1N1pdm09. Obviously there's not much real "balance" in infections that are so short, and the H1N1pdm09 infections appear shorter than H3N2. As there is likely some preexisting immunity shortening infections with H1, does this imply the mutation-selection balance story is unlikely to hold for H3N2 in older children and adults? What evolutionary dynamics can convincingly be ruled out after more careful consideration of the H1N1pdm09 temporal trends?

      As mentioned earlier, the A/H1N1pdm09 virus dataset lack statistical power. As such, we are unable to characterise temporal trends for the pandemic virus and have no longer discuss this in the updated manuscript (see response to reviewer #3 as well).

      However, the reviewer was right to point out one of our key conclusions that mutation-selection balance is only observed in naïve young children with longer A/H3N2 virus infections and would be less likely to hold for the typically shorter-lived infections of older children and adults. We have now put more emphasis on this conclusion in the abstract and discussion:

      Line 42: “For A/H3N2 viruses in young children, early infection was dominated by purifying selection. As these infections progressed, nonsynonymous variants typically increased in frequency even when within-host virus titres decreased. Unlike the short-lived infections of adults where de novo within-host variants are rare, longer infections in young children allow for the maintenance of virus diversity via mutation-selection balance creating potentially important opportunities for within-host virus evolution.”

      Line 530: “Through simulations of a within-host evolution model, we investigated the hypothesis that in the absence of any positive selection, the accumulation of nonsynonymous iSNVs was a result of their neutral or only weakly deleterious effects and the expanding within-host virion population size during later timepoints in longer infections of naïve young children such that mutation-selection balance was reached. In contrast, this balance was not detected in otherwise healthy older children or adults with short-lived influenza virus infections lasting no more than a week where de novo nonsynonymous iSNVs are rarely found 4,8–11,44.”

      Reviewer #2 (Public Review):

      At the global level, influenza evolution is characterized by positive selection and antigenic drift. While similar dynamics have been seen in chronically infected individuals, multiple studies of acute infections have been characterized by limited diversity and a lack of antigenic selection. Here the authors leverage a unique dataset of deeply sampled, longitudinal isolates from individuals whose infection lasted up to two weeks. The intermediate length of these infections helps bridge observations from studies of acute and chronically infected hosts. Additionally, the data set is comprised of endemic H3N2 isolates as well as H1N1pdm09 isolates from infections early during the 2009 pandemic. The dataset provides insight into host-level differences between emerging and endemic viruses. Although there is little evidence of within-host antigenic selection the authors do uncover a few mutations found in multiple samples at later time points. Their detailed analysis shows these may be the result of positive selection and epistatic interactions. Additionally, the study reveals increasing rates of nonsynonymous substitution over time and simulations show these trends would be expected under mutation-selection balance with most NS mutations being mildly deleterious. Nonsynonymous rates are also higher in H1N1pdm09 isolates as could be expected of a virus that is less adapted to its host.

      Disentangling biological phenomena from methodological artifacts is a challenge in any deep-sequencing, within-host study. The increase in nonsynonymous and nonsense mutations seen in later samples with high Ct is consistent with the author's conclusions, but it is also consistent with PCR errors which are common in low titer samples. Although the authors have applied quality and depth thresholds to help mitigate against these artifacts, figure 1 figure-supplement 2 appears to show that some variants used in the analysis were only found in 1 of the multiple overlapping amplicons. These variants are potentially PCR artifacts and may indicate other variants at similar frequencies are also artifacts. The same phenomena might also just be a consequence of imperfect variant detection at low frequencies. It would be interesting to see if the same general trends in the estimated rates are observed if the variant-calling stringency is increased to exclude these such variants. Longitudinal sampling is a key strength of this study. Observing the same mutation at different time points suggests they are unlikely to be random PCR artifacts. And the abundance of nonsynonymous mutations seen in H1N1pdm09 isolates is maintained across minor allele frequencies. In general, the major conclusions appear robust to random PCR error.

      This is a thorough study of a unique dataset, that combines a cross-sectional and longitudinal analysis to uncover general trends (NS/S rates over time) and specific events (parallel evolution at later time points) that shape within-host influenza evolution. The authors support their conclusions with a diverse array of quantitative analyses (e.g. transmission-bottlenecks, with-host evolutionary rates, haplotype reconstruction). This study helps unite previous observations from acute and chronic infections and is an important step in a fuller understanding of how evolutionary forces act across biological scales.

      We thank reviewer 2 for reviewing our manuscript.

      Reviewer #3 (Public Review):

      The authors analyze deep sequencing data from H3N2 and pandemic H1N1 infections, primarily from children and young adults. The pandemic H1N1 samples came from the first year of the pandemic, just after the virus's emergence into human hosts, and the authors often had access to longitudinal samples from the same infection. The authors used within-host variants detected to estimate evolutionary rates at different times throughout the infection. They identify several instances of seemingly recurrent mutations, and they perform simulations to determine how synonymous and nonsynonymous mutations would accumulate over time given different assumptions about the distribution of fitness effects. The manuscript's findings largely reinforce prior findings about influenza's evolutionary dynamics within hosts and at transmission, though the authors analyze longitudinal samples from longer infections than in previous studies.

      We thank reviewer 3 for their thoughtful consideration of our manuscript.

    1. Author Response:

      Evaluation Summary:

      The authors measure the three-dimensional organization within an epithelial cell monolayer and find that cell neighbors change frequently along the apicobasal axis. State-of-the-art image analysis convincingly justifies correlation, though not causation, between epithelial cell packing and nuclear position. With some stronger theoretical arguments to back up the claims made, this paper will be of interest to scientists studying tissue mechanics and packing of cells in epithelial tissues.

      As we emphasize also in the title of this paper, we can explain the observed 3D cell neighbour relationships with a minimisation of the lateral cell-cell surface contact energy. It is important to distinguish this from questions regarding the 3D cell shape, which we do not address. The confusion easily arises because our theory explains the polygon type and thus the shape of cross-sectional areas, but not their size. The size of the different cross-sectional areas, however, defines the overall 3D shape of the cell.

      We show that, where present, the nuclear cross-sectional area is only slightly smaller than that of the cell, and the two measures correlate strongly (r = 0.94, Figure 5g). Referees 1 & 3 comment that this does not imply that the nucleus affects the cross-sectional area of the cell. We beg to differ here. The measured nuclear volumes are too large to allow a spherical nucleus to fit into a cylindrical cell of the measured height (as we now show explicitly in the new Figure 5h). Accordingly, to fit into the cell, the nucleus has to deform. Nuclei respond to external forces with anisotropic shape changes (Haase et al., 2016; Neelam et al., 2016), which is consistent with the elliptical nuclear shapes that we observe (Figure 5d). However, there is a limit to how much the stiff nucleus can deform (Lammerding, 2011; Shah et al., 2021), necessarily resulting in a local widening of the cell where the nucleus is present. Cell sections without nucleus typically have smaller cross-sectional areas, leading to a higher frequency of small crosssections in cells compared to nuclei. We have added these additional explanations and references to the manuscript to strengthen the argument.

      Nuclei in pseudostratified epithelia are well known to move continuously during the cell cycle, a phenomenon referred to as interkinetic nuclear migration (IKNM). The moving nuclei will continuously change the cross-sectional areas. In the live microscopy, the nuclei are not visible as we lack a live reporter for the nuclei. But given the strong correlation between the crosssectional areas between nucleus and cell along the entire apical-basal axis and independent of the nuclear position and thus cell cycle phase (Figure 5g), and given the measured crosssectional size of the nuclei, we believe that it is a safe inference that the nuclei are located at the wide parts of the cells, and these wide parts move during the cell cycle (Figure 8), consistent with IKNM.

      In summary, we defined the physical principle behind the 3D cell neighbour relationships, and our data strongly suggest that the shape and movement of the nucleus is a key driver of the cell shape changes that translate into neighbour changes, both along the apical-basal axis and over time. Future work is required to unravel the physical determinants of the 3D shape of the epithelial cells and their nuclei.

      Reviewer #1 (Public Review):

      The authors aim at characterizing the cellular organization in epithelial sheets by reconstructing the shape of lung epithelial cells from light sheet microscopy images. The find that in each imaging plane, the organization follows the laws of Lewis and Aboave-Weaire, which describe the organization of the apical surface of tightly packed cell monolayers, but that the organization can differ substantially between different planes. Equivalently, the authors observe frequent cell neighbor exchanges as the imaging plane moves from the basal to the apical side. The authors achieve a very good reconstruction of static and dynamic monolayers. The finding of frequent neighbor exchanges as one moves along the apicobasal axis can potentially change our image of epithelial monolayers, which so far mostly considers these cells to have the shape of prisms and frusta to which so-called scutoids have recently been added.

      The quantification of the packing uses the same methods as for cell packing in two dimensions and underlying mechanism proposed by the authors neglects contributions from the dimension along the apicobasal axis. The authors reasoning behind the observed Aboave-Weaire's and Lewis' laws utilizes the same arguments as for the cell packing in apical layers. The differences between the cellular organization in different layers is ascribed to the position of the nuclei along the apicobasal axis. Here, the authors take correlations for causes and this discussion is missing any three-dimensional elements (except for the nucleus position). Explicitly, the authors state that the origin of the observed laws is a minimization of the lateral cell-cell surface energy in each plane. However, the cells are oblivious to the planes and the analysis should include the cell-cell interaction energy of the whole cell surface. Furthermore, the nucleus with its stiffness against deformations would need to be included in this analysis. Finally, according to the authors, the changing nucleus position along the apicobasal axis is at the origin of the neighbor exchanges. Apart from a correlation, there is no data supporting this claim.

      This assessment appears to reflect a misunderstanding. We are not analysing the determinants of the 3D cell shape, but of the 3D cell neighbour relationships. The confusion easily arises because our theory explains the polygon type and thus the shape of cross-sectional areas, but not their size. The size of the different cross-sectional areas, however, defines the overall 3D shape of the cell.

      We show that the observed cell neighbour arrangements minimise the total cell perimeter in each plane for the observed area distribution in that plane. Integrating over the entire apicalbasal axis, this then minimises the lateral cell-cell contact surface energy for the given distribution of cell volumes along the apical-basal axis.

      Here, it is important to emphasize that we are NOT saying that the 3D cell geometry itself minimises the overall cell-cell contact surface energy. If that was the case, we would expect spherical cells, or if the height was enforced, cylindrical cells, or if also cell-cell adhesion was enforced, hexagonal honeycomb structures of equally-sized cells. However, because of active processes, including cell growth and division, cell volumes differ, and the stiff, moving nucleus (in conjunction with other cellular forces) further enforces irregular cell shapes. For those irregular cell shapes, the particular neighbour arrangements minimise the lateral surface energy.

      We can consider each plane separately, because experiments show that epithelia return to a mechanical equilibrium on a timescale of minutes, if not faster. Accordingly, we can expect that the packing reflects a mechanical equilibrium. The balance of forces must then hold in any cutting plane. That’s why Aboav-Weaire's and Lewis’ laws hold in each plane, and we can consider planes separately.

      Reviewer #2 (Public Review):

      Through detailed analysis of growing mouse embryonic lung explants, these authors investigate the statistical and physical relationships underlying three-dimensional cell organization in pseudostratified epithelial tissues. The authors find that tissue curvature plays a minor role in their tissue of interest, but that cell cross-sectional area and neighbour statistics conform to previously proposed geometrical 'laws' and can be explained with a minimisation of lateral cell-cell contact surface energy, which in turn follows from nuclear packing and dynamics. Overall, this work constitutes a significant investigation into the drivers of complex three-dimensional cell shapes and tissue structures, the primary aims appear to be largely supported by the data provided, and the work should be of interest to many in developmental biology.

      Thank you.

      Reviewer #3 (Public Review):

      Gómez et al. study cellular packing in epithelial tissues. The authors dissect how 2D cell packing statistics change along the apico-basal axis by examining different cross-sections parallel to the epithelial surface. They obtain 3D ex-vivo data, both fixed and live, from the developing mouse pseudostratified lung epithelium, which they analyze using 3D cell segmentation. They compare these experimental data to known topological invariants (Euler characteristic), existing phenomenological relations (Aboav-Weaire law, Lewis' law), and phenomenological relations by the authors (quadratic cell area scaling, dependence of hexagon fraction on cell area variability), which they had proposed in an earlier paper (Kokic et al., 2019).

      The authors moreover discuss changes in the 2D cell neighbor relations that occur in the lung epithelium along the apico-basal axis, which they call T1L transitions ("lateral" T1 transitions). Recent work had already discussed such T1L transitions and proposed that they can be induced by epithelial curvature. The authors of the current manuscript first tested this existing hypothesis on their experimental data on both tubular parts and tips of the developing mouse bronchioles, and they conclude that curvature cannot explain the T1L transitions they observe. However, they demonstrate that the T1L transitions in their data are strongly correlated with variations in the cross-sectional cell area and the nucleus positions in the pseudostratified epithelium.

      1. This paper will be of interest for anybody working on cell packing in tissues and the mechanics of epithelia. In the past, 2D cellular packing arrangements in epithelia have most often been studied at the apical side only, because of technical limitations. Using state-of-the-art imaging and image analysis techniques, this manuscript goes a step further and studies how the 2D cellular packing changes along the apico-basal axis. The only other papers that I am aware of that have started to address this question are Gómez-Gálvez et al., Nat. Comm., 2018, which the authors cite, and Rupprecht et al., MBoC, 2017, which first discussed apico-basal changes in cell neighbor relations as far as I know. Hence, being among the first papers to address this question, the current manuscript would be of interest to the community.

      Thank you. We are now citing Rupprecht et al., MBoC, 2017 in the revised version. We apologise to the authors for the oversight of not including this paper in the original version.

      1. A major conclusion that T1L transitions are correlated with changes in the cross-sectional cell area and the nucleus positions are well supported by the data. While the authors seem to claim causation here, this is not backed by the experimental data presented.

      Reviewer 1 had similar concerns, and we have expanded our statement in the section “Changes in cross-sectional area as a result of interkinetic nuclear migration (IKNM)” to read: "Where present, the nuclear cross-sectional areas are only slightly smaller than those of the entire cell, and the crosssectional areas of the cell and the nucleus are strongly correlated (r = 0.94, Figure 5g). The strong correlation can be accounted for by the opposing actions of cells and nuclei in the columnar epithelium. The nuclear volumes are too large to allow for a spherical nucleus to fit into a cylindrical cell of the measured height (Figure 5h). Accordingly, to fit into the cell, the nucleus necessarily has to deform. Nuclei respond to external forces with anisotropic shape changes (Haase et al., 2016; Neelam et al., 2016), which is consistent with the elliptical nuclear shapes that we observe (Figure 5d). However, there is a limit to how much the stiff nucleus can deform (Lammerding, 2011; Shah et al., 2021), resulting in a local widening of the cell where the nucleus is present. Cell sections without nucleus typically have smaller cross-sectional areas, thereby leading to a higher frequency of small cross-sections in cells compared to nuclei." While we, of course, agree that correlation does not imply causation, we believe that our various data taken together strongly indicate that anything but causation is unlikely.

      1. The topological and phenomenological relations discussed seem to be reflected by the data. However, estimations of uncertainties would be required to better judge this point (e.g. in Fig. 2 e,f).

      Aboav-Weaire’s and Lewis’ laws are phenomenological and hold only approximately. Deviations of the actual data from the lines are therefore expected. We have described these two relationships and how much they deviate from the original simple straight lines in much detail in earlier publications [Vetter et al., bioRxiv 2019; Kokic et al., bioRxiv 2019]. In Fig. 2e,f, we plot the phenomenological relationships only for reference; the data is not expected to approach them exactly with shrinking uncertainty bounds. Nevertheless, we agree that error estimates help judge the statistical significance with which we find the data to deviate from the simple phenomenological laws. We have added error bars (SEM) to all data points in Fig 2f. In Fig 2e, they are omitted because they are smaller than the symbols.

      1. In lines 165-197, the authors discuss in how far the observed numbers of T1L transitions per cell can be consistent with curvature-induced transitions as discussed earlier (e.g. Gómez-Gálvez et al.). To this end, they also use theoretical predictions derived in the Supplemental Material (SM). Unfortunately, there appear to be several problems with the derivation in the SM.

      We thank the referee for their careful checking of our theory, which allowed us to improve the quality of the supplementary material. We are resolving all points below.

      a) Most importantly, in Eq. S5, epsilon is derived for the situation displayed in Fig. S1b. However, after a T1L transition on the blue line, the formula will qualitatively change (e.g. the "1+" should go from the numerator to the denominator, and the meaning on n changes as the cells abutting the blue line are now the other two cells). Hence, computing the derivative in Eq. S6 to see how epsilon changes during a T1 transition seems highly problematic.

      The particular aspect addressed here appears to be a misunderstanding. Our theory relates the fold-change of tissue curvature between two consecutive T1L transitions to the local neighbor number n that the cell has in between them. Thus, between two radii R1 and R2 where two consecutive T1L transitions occur, n is constant. The derivative dɛ/dn is to be interpreted as the infinitesimal continuation of the difference in aspect ratio between (regular) polygons that differ by one edge. This corresponds to the region between two T1L transitions. The misunderstanding might have originated from us not mentioning this perspective clearly enough in the SM text. In the revised version of the SM, we have expanded the discussion by adding a new paragraph (last paragraph on p.2), and by rephrasing the text relating to the corresponding equations.

      b) The angle integral, first formula on the second page of the SM, appears to evaluate to exactly zero, which is different from what the authors obtain.

      Thank you for spotting this. Indeed, we have been imprecise at this point in the derivation. As our theory quantifies the curvature change between T1L transition irrespective of their sign, we meant to average the absolute value of dɛ/dr over all possible orientations. Doing this then yields the non-zero integral value as we had it in the original SM. The integrand in the angle integral was missing the magnitude bars. We have fixed this in the revised version, carrying through the absolute value in the entire derivation after the integration. The consequence of this correction is that the final equation that we arrive at (now Eq. S9 in the SM) is now symmetric with respect to R1 <-> R2, as it, of course, should be. The blue curve that we plotted in Fig. 3g of the main article, and with it our conclusion regarding the impact of curvature, are unaffected by this correction.

      Unfortunately, these two points cast strong doubt on the predicted formula in Eq. S8, Fig. S2, blue curve in Fig. 3g. As a consequence, the conclusions drawn in lines 165-197 of the main text are not sufficiently convincing.

      1. The authors compare to predictions from their earlier preprint (Kokic et al., 2019), where they say that Lewis' law is replaced by a quadratic dependency of cell area on cell neighbor number when the cell area fluctuations become large. However, it is not clear whether this transition between linear and quadratic prediction is smooth or discontinuous. Moreover, the magnitude of cell area fluctuations where the transition is expected to occur seems unclear. In these aspects, the theoretical prediction seems elusive, which makes it harder to critically compare it to the experimental data.

      As we showed in our previous preprint (Kokic et al., 2019) using simulations, the transition is continuous until the cell area variability is high enough to support the quadratic relationship. Accordingly, the individual apical samples only approximate the linear Lewis’ law (Figure 1h). Figure 1k shows by how little the apical area variability has to increase to support a quadratic relationship (comparison between black points and all other points).

      Minor comments / potential sources of confusion for readers:

      1. There seems to be a typo in Eq. (2). What is likely meant is m_n = 5 + 8/n.

      Thanks for spotting the typo - we have corrected this.

      1. It is unclear how the "T1L transitions per cell" are counted (e.g. when talking about "up to 14 cell neighbour changes per cell" on line 154, in lines 165-197, or in Figs. 3d, 8b). Do the authors refer to the number of T1L transitions divided by the number of cells, or the average number of times a cell is involved in a T1L transition? The latter number should be at least four times the former, because at least four cells are typically involved in a single T1L transition. The caption to Fig. 3d suggests that the former is meant, while lines 304-307 in the discussion suggest the latter is meant.

      T1L transitions per cell refers to the number of times a single cell is changing a neighbour relationship along the apical-basal axis. We analysed this for each cell in the epithelium individually, and in Figure 3d, we report the fraction of cells with n neighbour changes along their apical-basal axis. A neighbour change for a cell will, of course, also imply a neighbour change for neighbouring cells, which is counted when analysing that particular neighbouring cell.

    1. Author Response:

      Reviewer #1:

      This is a very interesting study that examines the neural processes underlying age-related changes in the ability to prioritize memory for value information. The behavioral results show that older subjects are better able to learn which information is valuable (i.e., more frequently presented) and are better at using value to prioritize memory. Importantly, prioritizing memory for high-value items is accompanied by stronger neural responses in the lateral PFC, and these responses mediate the effects of age on memory.

      Strengths of this paper are the large sample size and the clever learning tasks. The results provide interesting insights into potential neurodevelopmental changes underlying the prioritization of memory.

      There are also a few weaknesses:

      First, the effects of age on repetition suppression in the parahippocampal cortex are relatively modest. It is not clear why repetition suppression effects should only be estimated using the first and last but not all presentations. The consideration of linear and quadratic effects of repetition number could provide a more reliable estimate and provide insights into age-related differences in the dynamics of frequency learning across multiple repetitions.

      Thank you for this helpful suggestion. As recommended, we have now computed neural activation within our parahippocampal region of interest not just for the first and last appearance of each item during frequency learning, but for all appearances. Specifically we extended our repetition suppression analysis described in the manuscript to include all image repetitions (p. 36 - 37). Our new methods description reads:

      “For each stimulus in the high-frequency condition, we examined repetition suppression by measuring activation within a parahippocampal ROI during the presentation of each item during frequency-learning. We defined our ROI by taking the peak voxel (x = 30, y = -39, z = -15) from the group-level first > last item appearance contrast for high-frequency items during frequency-learning and drawing a 5 mm sphere around it. This voxel was located in the right parahippocampal cortex, though we observed widespread and largely symmetric activation in bilateral parahippocampal cortex. To encompass both left and right parahippocampal cortex within our ROI, we mirrored the peak voxel sphere. For each participant, we modeled the neural response to each appearance of each item using the Least Squares-Separate approach (Mumford et al., 2014). Each first-level model included a regressor for the trial of interest, as well as separate regressors for the onsets of all other items, grouped by repetition number (e.g., a regressor for item onsets on their first appearance, a regressor for item onsets on their second appearance, etc.). Values that fell outside five standard deviations from the mean level of neural activation across all subjects and repetitions were excluded from subsequent analyses (18 out of 10,320 values; .01% of observations). In addition to examining neural activation as a function of stimulus repetition, we also computed an index of repetition suppression for each high-frequency item by computing the difference in mean beta values within our ROI on its first and last appearance.”

      As suggested, we ran a mixed effects model examining the influence of linear and quadratic age and linear and quadratic repetition number on neural activation. In line with our whole-brain analysis, we observed a robust effect of linear and quadratic repetition number, suggesting that neural activation decreased non-linearly across stimulus repetitions. In addition, we observed significant interactions between our age and repetition number terms, suggesting that repetition suppression increased into early adulthood. Thus, although the relation we observed between age and repetition suppression is modest, the results from our new analyses suggest it is robust. Because these results largely aligned with the pattern of age-related change we observed in our analysis of repetition suppression indices, we continued to use that compressed metric in subsequent analyses looking at relations with behavior. However, we have updated our results section to include the full analysis taking into account all item repetitions, as suggested. Our updated manuscript now reads (p. 9):

      “We next examined whether repetition suppression in the parahippocampal cortex changed with age. We defined a parahippocampal region of interest (ROI) by drawing a 5mm sphere around the peak voxel from the group-level first > last appearance contrast (x = 30, y = -39, z = -15), and mirrored it to encompass both right and left parahippocampal cortex (Figure 2C). For each participant, we modeled the neural response to each appearance of each high-frequency item. We then examined how neural activation changed as a function of repetition number and age. To account for non-linear effects of repetition number, we included linear and quadratic repetition number terms. In line with our whole-brain analysis, we observed a main effect of repetition number, F(1, 5016.0) = 30.64, p < .001, indicating that neural activation within the parahippocampal ROI decreased across repetitions. Further, we observed a main effect of quadratic repetition number, F(1, 9881.0) = 7.47, p = .006, indicating that the reduction in neural activity was greatest across earlier repetitions (Fig 3A). Importantly, the influence of repetition number on neural activation varied with both linear age, F(1, 7267.5) = 7.2, p = .007 and quadratic age , F(1, 7260.8) = 6.9, p = .009. Finally, we also observed interactions between quadratic repetition number and both linear and quadratic age (ps < .026). These age-related differences suggest that repetition suppression was greatest in adulthood, with the steepest increases occurring from late adolescence to early adulthood (Figure 3).”

      "For each participant for each item, we also computed a “repetition suppression index” by taking the difference in mean beta values within our ROI on each item’s first and last appearance (Ward et al., 2013). These indices demonstrated a similar pattern of age- related variance — we found that the reduction of neural activity from the first to last appearance of the items varied positively with linear age, F(1, 78.32) = 3.97, p = .05, and negatively with quadratic age, F(1, 77.55) = 4.8, p = .031 (Figure 3B). Taken together, our behavioral and neural results suggest that sensitivity to the repetition of items in the environment was prevalent from childhood to adulthood but increased with age.”

      In addition, in the main text on p. 10, we have now included the suggested scatter plot (see new Fig. 3B, below) as well as a modified version of our previous figure S2 to show neural activation across all repetitions in the parahippocampal cortex (see new Fig 3A). We thank the reviewer for this helpful suggestion, as we believe these new figures much more clearly illustrate the repetition suppression effects we observed during frequency learning.

      Fig 3. (A) Neural activation within a bilateral parahippocampal cortex ROI decreased across stimulus repetitions both linearly, F(1, 5015.9) = 30.64, p < .001, and quadratically, F(1, 9881.0) = 7.47, p = .006. Repetition suppression increased with linear age, F(1, 7267.5) = 7.2, p = .007, and quadratic age F(1, 7260.8) = 6.9, p = .009. The horizontal black lines indicate median neural activation values. The lower and upper edges of the boxes indicate the first and third quartiles of the grouped data, and the vertical lines extend to the smallest value no further than 1.5 times the interquartile range. Grey dots indicate data points outside those values. (B) The decrease in neural activation in the bilateral PHC ROI from the first to fifth repetition of each item also increased with both linear age, F(1, 78.32) = 3.97, p = .05, and quadratic age, F(1, 77.55) = 4.8, p = .031.

      Second, the behavioral data show effects of age on both initial frequency learning and the effects of item frequency on memory. It is not clear whether the behavioral findings reflect the effects of age on the ability to use value information to prioritize memory or simply better initial learning of value-related information on older subjects.

      Thank you for raising this important point. Indeed, one of our main findings is that older participants are better both at learning the structure of their environments and also at using structured knowledge to strategically prioritize memory. In our original manuscript, we described results of a model that included participants’ explicit frequency reports as a predictor of memory. Model comparison revealed that participants’ frequency reports — which we interpret as reflecting their beliefs about the structure of the environment — predicted memory more strongly than the item’s true frequency. In other words, participants’ beliefs about the structure of the environment (even if incorrect) more strongly influenced their memory encoding than the true structure of the environment. Critically, however, frequency reports interacted with age to predict memory (Fig 8). Even when we accounted for age-related differences in knowledge of the structure of the environment, older participants demonstrated a stronger influence of frequency on memory, suggesting they were better able to use their beliefs to control subsequent associative encoding. We have now clarified our interpretation of this model in our discussion on p. 23:

      “Importantly, though we observed age-related differences in participants’ learning of the structure of their environment, the strengthening of the relation between frequency reports and associative memory with increasing age suggests that age differences in learning cannot fully account for age differences in value-guided memory. Even when accounting for individual differences in participants’ explicit knowledge of the structure of the environment, older participants demonstrated a stronger relation between their beliefs about item frequency and associative memory, suggesting that they used their beliefs to guide memory to a greater degree than younger participants.”

      As noted by the reviewer, however, our initial memory analysis did not account for age-related differences in participants’ initial, online learning of item frequency, and our neural analyses further did not account for age differences in explicit frequency reports. We have now run additional control analyses to account for the potential influence of individual differences in frequency learning on associative memory. Specifically, for each participant, we computed three metrics: 1.) their overall accuracy during frequency-learning, 2.) their overall accuracy for the last presentation of each item during frequency-learning (as suggested by Reviewer 2), and 3.) the mean magnitude of the error in their frequency reports. We then included these metrics as covariates in our memory analyses.

      When we include these control variables in our model, we continue to observe a robust effect of frequency condition (p < .001) as well as robust interactions between frequency condition and linear and quadratic age (ps < .003) on associative memory accuracy. We also observed a main effect of frequency error magnitude on memory accuracy (p < .001). Here, however, we no longer observe main effects of age or quadratic age on overall memory accuracy. Given the relation we observed between frequency error magnitudes and age, the results from this model suggests that there may be age-related improvements in overall memory that influence both memory for associations as well as learning of and memory for item frequencies. The fact that age no longer relates to overall memory when controlling for frequency error magnitudes suggest that age-related variance in memory for item frequencies and memory for associations are strongly related within individuals. Importantly, however, age-related variance in memory for item frequencies did not explain age-related variance in the influence of frequency condition on associative memory, suggesting that there are developmental differences in the use of knowledge of environmental structure to prioritize valuable information in memory that persist even when controlling for age-related differences in initial learning of environmental regularities. Given the importance of this analysis in elucidating the relation between the learning of environmental structure and value-guided memory, we have now updated the results in the main text of our manuscript to include them. Specifically, on p. 13, we now write:

      “Because we observed age-related differences in participants’ online learning of item frequencies and in their explicit frequency reports, we further examined whether these age differences in initial learning could account for the age differences we observed in associative memory. To do so, we ran an additional model in which we included each participant’s mean frequency learning accuracy, mean frequency learning accuracy on the last repetition of each item, and explicit report error magnitude as covariates. Here, explicit report error magnitude predicted overall memory performance, χ2(1) =13.05, p < .001, and we did not observe main effects of age or quadratic age on memory performance (ps > .20). However, we continued to observe a main effect of frequency condition, χ2(1) = 19.65 p < .001, as well as significant interactions between frequency condition and both linear age χ2(1) = 10.59, p = .001, and quadratic age χ2(1) = 9.15, p = .002. Thus, while age differences in initial learning related to overall memory performance, they did not account for age differences in the use of environmental regularities to strategically prioritize memory for valuable information.”

      In addition, as suggested by the reviewer, we also included the three covariates as control variables in our mediation analysis. When controlling for online frequency learning and explicit frequency report errors, PFC activity continued to mediate the relation between age and memory difference scores. We have now included these results on p. 16 - 17 of the main text:

      “Further, when we included quadratic age, WASI scores, online frequency learning accuracy, online frequency learning accuracy on the final repetition of each item, and mean explicit frequency report error magnitudes as control variables in the mediation analysis, PFC activation continued to mediate the relation between linear age and memory difference scores (standardized indirect effect: .56, 95% confidence interval: [.06, 1.35], p = .023; standardized direct effect; 1.75, 95% confidence interval: [.12, .3.38], p = .034).”

      We also refer to these analyses when we interpret our findings in our discussion. On p. 23, we write:

      “In addition, we continued to observe a robust interaction between age and frequency condition on associative memory, even when controlling for age-related change in the accuracy of both online frequency learning and explicit frequency reports. Thus, though we observed age differences in the learning of environmental regularities and in their influence on subsequent associative memory encoding, our developmental memory effects cannot be fully explained by differences in initial learning.”

      We thank the reviewer for this constructive suggestion, as we believe these control analyses strengthen our interpretation of age differences in both the learning and use of environmental regularities to prioritize memory.

      Reviewer #2:

      Nussenbaum and Hartley provide novel neurobehavioral evidence of how individuals differentially use incrementally acquired information to guide goal-relevant memory encoding, highlighting roles for the medial temporal lobe during frequency learning, and the lateral prefrontal cortex for value-guided encoding/retrieval. This provides a novel behavioral phenomenology that gives great insight into the processes guiding adaptive memory formation based on prior experience. However, there were a few weaknesses throughout the paper that undermined an overall mechanistic understanding of the processes.

      First, there was a lack of anatomical specificity in the discussion and interpretation of both prefrontal and striatal targets, as there is great heterogeneity across these regions that would infer very different behavioral processes.

      We agree with the reviewer that our introduction and discussion would benefit from more anatomical granularity, and we did indeed have a priori predictions about more specific neural regions that might be involved in our task.

      First, we expected that both the ventral and dorsal striatum might be responsive to stimulus value across our age range. Prior work has suggested that activity in the ventral striatum often correlates with the intrinsic value of a stimulus, whereas activity in the dorsal striatum may reflect goal-directed action values (Liljeholm & O’Doherty, 2012). In our task, we expected that high-frequency items may acquire intrinsic value during frequency-learning that is then reflected in the striatal response to these items during encoding. However, because participants were not rewarded when they encountered these images, but rather incentivized to encode associations involving them, we hypothesized that the dorsal striatum may represent the value of the ‘action’ of remembering each pair. In line with this prediction, the dorsal striatum, and the caudate in particular, have also been shown to be engaged during value-guided cognitive control (Hikosaka et al., 2014; Insel et al., 2017).

      We have now revised our introduction to include greater specificity in our anatomical predictions on p. 3:

      “When individuals need to remember information associated with previously encountered stimuli (e.g., the grocery store aisle where an ingredient is located), frequency knowledge may be instantiated as value signals, engaging regions along the mesolimbic dopamine pathway that have been implicated in reward anticipation and the encoding of stimulus and action values. These areas include the ventral tegmental area (VTA) and the ventral and dorsal striatum (Adcock et al., 2006; Liljeholm & O’Doherty, 2012; Shigemune et al., 2014).”

      Though we initially predicted that encoding of high-value information would be associated with increased activation in both the ventral and dorsal striatum, the activation we observed was largely within the dorsal striatum, and specifically, the caudate. We have now revised our discussion accordingly on p. 26:

      “Though we initially hypothesized that both the ventral and dorsal striatum may be involved in encoding of high-value information, the activation we observed was largely within the dorsal striatum, a region that may reflect the value of goal-directed actions (Liljeholm & O’Doherty, 2012). In our task, rather than each stimulus acquiring intrinsic value during frequency-learning, participants may have represented the value of the ‘action’ of remembering each pair during encoding.”

      Second, while the ventromedial PFC often reflects value, given the control demands of our task, we expected to see greater activity in the dorsolateral PFC, which is often engaged in tasks that require the implementation of cognitive control (Botvinick & Braver, 2015). Thus, we hypothesized that individuals would show increased activation in the dlPFC during encoding of high- vs. low-value information, and that this activation would vary as a function of age. We have now clarified this hypothesis on p. 3:

      “Value responses in the striatum may signal the need for increased engagement of the dorsolateral prefrontal cortex (dlPFC) (Botvinick & Braver, 2015), which supports the implementation of strategic control.”

      In our discussion, we review disparate findings in the developmental literature and discuss factors that may contribute to these differences across studies. For example, in our discussion of Davidow et al. (2016), we highlight differences between their task design and the present study, focusing on how their task involved immediate receipt of reward at the time of encoding, while our task incentivized memory accuracy. We further note that studies that involve reward delivery at the time of encoding may engage different neural pathways than those that promote goal-directed encoding. Beyond Davidow et al. (2016), there are no other neuroimaging studies that examine the influence of reward on memory across development. Thus, we cannot relate our present neural findings to prior work on the development of value-guided memory. As we note in our discussion (p. 28), “Further work is needed to characterize both the influence of different types of reward signals on memory across development, as well as the development of the neural pathways that underlie age-related change in behavior.”

      Second, age-related differences in neural activation emerged both during the initial frequency learning as well as during memory-guided adaptive encoding. While data from this initial phase was used to unpack the behavioral relationships on adaptive memory, a major weakness of the paper was not connecting these measures to neural activity during memory encoding/retrieval. This would be especially relevant given that both implicit and explicit measures of frequency predicted subsequent performance, but it is unclear which of these measures was guiding lateral PFC and caudate responses.

      Thank you for this valuable suggestion. We agree that it would be interesting to link frequency- learning behavior to neural activity at encoding. As such, we have now conducted additional analyses to explore these relations.

      In the original version of our manuscript, we examined behavior at the item level through mixed- effects models, and neural activation during encoding at the participant level. Thus, to examine the relation between frequency-learning metrics and neural activation at encoding, we created two additional participant-level metrics. For each participant we computed their average repetition suppression index, and a measure of frequency distance. The average repetition suppression index reflects the overall extent to which the participant demonstrated repetition suppression in response to the fifth presentation of the high-frequency items, and is computed by averaging each participant’s repetition suppression indices across items. We hypothesized that participants who demonstrated the greatest degree of repetition suppression might be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information. The frequency distance metric reflects the average distance between participants’ explicit frequency reports for items that appeared once and items that appeared five times, and is computed by averaging their explicit frequency reports for items in each frequency condition, and then subtracting the average reports in the low-frequency condition from those in the high- frequency condition. We hypothesized that participants with the largest frequency distances might similarly be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information.

      We first wanted to confirm that the relations we observed between repetition suppression, frequency reports, and age, could also be observed at the participant level. In line with our prior, behavioral analyses, we found that age related to both mean repetition suppression indices (marginally; linear age: p = .067; quadratic age: p = .042); and frequency distances (linear and quadratic age: ps < .001).

      In addition, we further tested whether these two metrics related to memory performance. In contrast to our item-level findings, we did not observe a significant relation between repetition suppression indices and memory (p = .83). We did observe an effect of frequency distance on memory performance. Specifically, we observed significant interactions between frequency distance and age (p = .014) and frequency distance and quadratic age (p = .021) on memory difference scores, such that the influence of frequency distance on memory difference scores increased with increasing age from childhood to adolescence.

      We next examined how mean repetition suppression indices and frequency distances related to differential neural activation during encoding of high- and low-value pairs. In line with our memory findings, we did not observe any significant relations between mean repetition suppression indices and neural activation in the caudate or prefrontal cortex during encoding (ps > .15).

      Frequency distance did not relate to caudate activation during encoding nor did we observe a frequency distance x age interaction effect (ps > .16). Frequency distance did, however, relate to differential PFC activation during encoding of high- vs. low-value pairs. Specifically, we observed a main effect of frequency distance on PFC activation (p = .0012), such that participants whose explicit reports of item frequency, were on average, more distinct across frequency conditions, demonstrated increased PFC activation during encoding of pairs involving high- vs. low-frequency items. Interestingly, when we included frequency distance in our model, we no longer observed a significant effect of age on differential PFC activation, nor did we observe a significant frequency distance x age interaction (ps > .13). These findings suggest that PFC activation during encoding may have, in part, reflected participants’ beliefs about the structure of the environment, with participants demonstrating stronger differential engagement of control processes across conditions when their representations of the conditions themselves were more distinct.

      Finally, we examined how age, frequency distance, and PFC activation related to memory difference scores. Here, even when controlling for both frequency distance and PFC activation, we continued to observe main effects of age and quadratic age on memory difference scores (linear age: p = .006; quadratic age: p = .001). In line with our analysis of the relation between frequency reports and memory, these results suggest that age-related variance in value-guided memory may depend on both knowledge of the structure of the environment and use of that knowledge to effectively control encoding.

      We have now added these results to our manuscript on p. 13 - 14. We write:

      “Given the relations we observed between memory and both repetition suppression and frequency reports, we examined whether they related to neural activation in both our caudate and PFC ROI during encoding. To do so, we computed each participant’s average repetition suppression index, and their “frequency distance” — or the average difference in their explicit reports for items in the high- and low-frequency conditions. We expected that participants with greater average repetition suppression indices and greater frequency distances represented the high- and low-frequency items as more distinct from one another and therefore would show greater differences in neural activation at encoding across frequency conditions. In line with our prior analyses, both metrics varied with age (though repetition suppression only marginally (linear age: p = .067; quadratic age: p = .042); Appendix 3 y Tables 22 and 25), suggesting that older participants demonstrated better learning of the structure of the environment. We ran linear regressions examining the relations between each metric, age, and their interaction on neural activation in both the caudate and PFC. We observed no significant effects or interactions of average repetition suppression indices on neural activation (ps > .15; Appendix 3 Tables 23 and 24). We did, however, observe a significant effect of frequency distance on PFC activation (β = .42, SE = .12, p = .0012), such that participants who believed that average frequencies of the high- and low-frequency items were further apart also demonstrated greater PFC activation during encoding of pairs with high- vs. low-frequency items. Here, we did not observe a significant effect of age on PFC activation (β = -.03, SE = .13, p = .82), suggesting that age-related variance in PFC activation may be related to age differences in explicit frequency beliefs. Importantly, however, even when we accounted for both PFC activation and frequency distances, we continued to observe an effect of age on memory difference scores (β = .56, SE = .20, p = .006), which, together with our prior analyses, suggest that developmental differences in value-guided memory are not driven solely by age differences in beliefs about the structure of the environment but also depend on the use of those beliefs to guide encoding.”

      We have added the full model results to Appendix 3: Full Model Specification and Results.

      Given these results, we have now revised our interpretation of our neural data. Our memory analyses demonstrate that across our age range, we observed age-related differences in both the acquisition of knowledge of the structure of the environment and in its use. Originally, we interpreted the PFC activation as reflecting the use of learned value to guide memory. However, the strong relation we found between frequency distance and PFC activation suggests that the age differences in PFC activation that we observed may also be related to age differences in knowledge of the structure of the environment that governs when control processes should be engaged most strongly. However, these results must be interpreted cautiously. Participants provided explicit frequency reports after they completed the encoding and retrieval tasks, and so explicit frequency reports may have been influenced not only by participants’ memories of online frequency learning, but also by the strength with which they encoded the item and its paired associate, and the experience of successfully retrieving it.

      We have now revised our discussion to consider these results. On p. 23, we now write,

      “Our neural results further suggest that developmental differences in memory were driven by both knowledge of the structure of the environment and use of that knowledge to guide encoding.”

      On p. 24, we write,

      “The development of adaptive memory requires not only the implementation of encoding and retrieval strategies, but also the flexibility to up- or down-regulate the engagement of control in response to momentary fluctuations in information value (Castel et al., 2007, 2013; Hennessee et al., 2017). Importantly, value-based modulation of lateral PFC engagement during encoding mediated the relation between age and memory selectivity, suggesting that developmental change in both the representation of learned value and value-guided cognitive control may underpin the emergence of adaptive memory prioritization. Prior work examining other neurocognitive processes, including response inhibition (Insel et al., 2017) and selective attention (Störmer et al., 2014), has similarly found that increases in the flexible upregulation of control in response to value cues enhance goal-directed behavior across development (Davidow et al., 2018), and may depend on the engagement of both striatal and prefrontal circuitry (Hallquist et al., 2018; Insel et al., 2017). Here, we extend these past findings to the domain of memory, demonstrating that value signals derived from the structure of the environment increasingly elicit prefrontal cortex engagement and strengthen goal-directed encoding across childhood and into adolescence.”

      And on p. 25, we have added an additional paragraph:

      “Further, we also demonstrate that in the absence of explicit value cues, the engagement of prefrontal control processes may reflect beliefs about information value that are learned through experience. Here, we found that differential PFC activation during encoding of high- vs. low-value information reflected individual and age-related differences in beliefs about the structure of the environment; participants who represented the average frequencies of the low- and high-frequency items as further apart also demonstrated greater value-based modulation of lateral PFC activation. It is important to note, however, that we collected explicit frequency reports after associative encoding and retrieval. Thus the relation between PFC activation and explicit frequency reports may be bidirectional — while participants may have increased the recruitment of cognitive control processes to better encode information they believed was more valuable, the engagement of more elaborative or deeper encoding strategies that led to stronger memory traces may have also increased participants’ subjective sense of an item’s frequency (Jonides & Naveh-Benjamin, 1987).”

      Third, more discussion is warranted on the nature of age-related changes given that some findings followed quadratic functions and others showed linear. Further interpretation of the quadratic versus linear fits would provide greater insight into the relative rates of maturation across discrete neurobehavioral processes.

      We agree with the reviewer that more discussion is warranted here. While many cognitive processes tend to improve with increasing age, the significant interaction between quadratic age and frequency condition on memory accuracy could reflect a number of different patterns of developmental variance. Because quadratic curves are U-shaped, the significant interaction between quadratic age and frequency condition could reflect a peak in value-guided memory in adolescence. However, the combination of linear and quadratic effects can also capture “plateauing” effects, where the influence of age on a particular cognitive process decreases at a particular developmental timepoint. To determine how to interpret the quadratic effect of age on value-guided memory — and specifically, to test for the presence of an adolescent peak — we ran an additional analysis.

      To test for an adolescent peak in value-guided memory, we first fit our memory accuracy model without any age terms, and then extracted the random slope across frequency conditions for each subject. We then conducted a ‘two lines test’ (Simonsohn, 2018) to examine the relation between age and these random slopes. In brief, the two-lines test fits the data with two linear models — one with a positive slope and one with a negative slope, algorithmically determining the breakpoint in the estimates where the signs of the slopes change. When we analyzed our memory data in this way, we found a robust, positive relation between age and value-guided memory (see newly added Appendix 2 Figure 3, also below) from childhood to mid- adolescence, that peaked around age 16 (age 15.86). From age ~16 to early adulthood, however, we observed only a marginal negative relation between age and value-guided memory (p = .0567). Thus, our findings do not offer strong evidence in support of an adolescent peak in value-guided memory — instead, they suggest that improvements in value-guided memory are strongest from childhood to adolescence.

      Appendix 2 - Figure 3. Results from the two-lines test (Simonsohn, 2018) revealed that the influence of frequency condition on memory accuracy increased throughout childhood and early adolescence, and did not significantly decrease from adolescence into early adulthood.

      To more clearly demonstrate the relation between age and value-guided memory, we have now included the results of the two-lines test in the results section of our main text. On p. 12 - 13, we write:

      “In line with our hypothesis, we observed a main effect of frequency condition on memory, χ2(1) = 21.51, p <.001, indicating that individuals used naturalistic value signals to prioritize memory for high-value information. Critically, this effect interacted with both linear age (χ2(1) = 11.03, p < .001) and quadratic age (χ2(1) = 9.51, p = .002), such that the influence of frequency condition on memory increased to the greatest extent throughout childhood and early adolescence. To determine whether the interaction between quadratic age and frequency condition on memory accuracy reflected an adolescent peak in value-guided memory prioritization, we re-ran our memory accuracy model without including any age terms, and extracted each participant’s random slope across frequency conditions. We then submitted these random slopes to the “two-lines” test (Simonsohn, 2018), which fits two regression lines with oppositely signed slopes to the data, algorithmically determining where the sign flip should occur. The results of this analysis revealed that the influence of frequency condition on memory significantly increased from age 8 to age 15.86 (b = .03, z = 2.71, p = .0068; Appendix 2 – Figure 3), but only marginally decreased from age 15.86 to age 25 (b = -.02, z = 1.91, p = .0576). Thus, the interaction between frequency condition and quadratic age on memory performance suggests that the biggest age differences in value-guided memory occurred through childhood and early adolescence, with older adolescents and adults performing similarly.”

      That said, this developmental trajectory is likely specific to the particular demands of our task. In our previous behavioral study that used a very similar paradigm (Nussenbaum, Prentis, & Hartley, 2018), we observed only a linear relation between age and value-guided memory.

      Although the task used in our behavioral study was largely similar to the task we employed here, there were subtle differences in the design that may have extended the age range through which we observed improvements in memory prioritization. In particular, in our previous behavioral study, the memory test required participants to select the correct associate from a grid of 20 options (i.e., 1 correct and 19 incorrect options), whereas here, participants had to select the correct associate from a grid of 4 options (1 correct and 3 incorrect options). In our prior work, the need to differentiate the ‘correct’ option from many more foils may have increased the demands on either (or both) memory encoding or memory retrieval, requiring participants to encode and retrieve more specific representations that would be less confusable with other memory representations. By decreasing the task demands in the present study, we may have shifted the developmental curve we observed toward earlier developmental timepoints.

      We originally did not emphasize our quadratic findings in the discussion of our manuscript because, given the marginal decrease in memory selectivity we observed from age 16 to age 25 and the different age-related findings across our two studies, we did not want to make strong claims about the specific shape of developmental change. However, we agree with the reviewer that these points are worthy of discussion within the manuscript. We have now amended our discussion on p. 25 accordingly:

      “We found that memory prioritization varied with quadratic age, and our follow-up tests probing the quadratic age effect did not reveal evidence for significant age-related change in memory prioritization between late adolescence and early adulthood. However, in our prior behavioral work using a very similar paradigm (Nussenbaum et al., 2020), we found that memory prioritization varied with linear age only. In line with theoretical proposals (Davidow et al., 2018), subtle differences in the control demands between the two tasks (e.g., reducing the number of ‘foils’ presented on each trial of the memory test here relative to our prior study), may have shifted the age range across which we observed differences in behavior, with the more demanding variant of our task showing more linear age-related improvements into early adulthood. In addition, the specific control demands of our task may have also influenced the age at which value- guided memory emerged. Future studies should test whether younger children can modulate encoding based on the value of information if the mnemonic demands of the task are simpler.”

      We thank the reviewer for this helpful suggestion, and believe our additions that expand on the quadratic age effects help clarify our developmental findings.

      Although hippocamapal and PHC results did not show a main effect of value, it seems by the introduction that this region would be critical for the processes under study. I would suggest including these regions as ROIs of interest guiding age-related differences during the memory encoding and retrieval phases. Even reporting negative findings for these regions would be helpful to readers, especially given the speculation of the negative findings in the discussion.

      Thank you for this suggestion. We have now examined how differential neural activation within the hippocampus and parahippocampal cortex during encoding of high- vs. low-value information varies with age. To do so, we followed the same approach as with our PFC and caudate ROI analyses. Specifically, we first identified the voxel within both the hippocampus and parahippocampal cortex with the highest z-statistic from our group-level 5 > 1 encoding contrast. We then drew a 5-mm sphere around these voxels and examined how mean beta weights within these spheres varied with age.

      We did not observe any relation between differential hippocampal or parahippocampal cortex activation during encoding of high- vs. low-value information and age (ps > .50). We agree with the reviewer that these results are informative, and have now added them to Appendix 2: Supplementary Analyses, which we refer to in the main text (p. 15). In Appendix 2, we write:

      “Hippocampal and parahippocampal cortex activation during encoding A priori, we expected that regions in the medial temporal lobe that have been linked to successful memory formation, including the hippocampus and parahippocampal cortex (Davachi, 2006), may be differentially engaged during encoding of high- vs. low- value information. Further, we hypothesized that the differential engagement of these regions across age may contribute to age differences in value-guided memory. Though we did not see any significant clusters of activation in the hippocampus or parahippocampal cortex in our group level high value vs. low value encoding contrast, we conducted additional ROI analyses to test these hypotheses. As with our other ROI analyses, we first identified the peak voxel (based on its z-statistic; hippocampus: x = 24, y = 34, z = 23; parahippocampal cortex: x = 22, y = 41, z = 16) in each region from our group-level contrast, and then drew 5-mm spheres around them. We then examined how average parameter estimates within these spheres related to both age and memory difference scores.

      First, we ran a linear regression modeling the effects of age, WASI scores, and their interaction on hippocampal activation. We did not observe a main effect of age on hippocampal activation, (β = .00, SE = .10, p > .99). We did, however, observe a significant age x WASI score interaction effect (β = .30, SE = .10, p = .003). Next, we conducted another linear regression to examine the effects of hippocampal activation, age, WASI scores, and their interaction on memory difference scores. In contrast to our prefrontal cortex activation results, activation in the hippocampus did not relate to memory difference scores, (β = -.02, SE = .03, p = .50).

      We repeated these analyses with our parahippocampal cortex sphere. Here, we did not observe any significant effects of age on parahippocampal activation (β = -.07, SE = .11, p = .50), nor did we observe any effects of parahippocampal activation on memory difference scores (β = .01, SE = .03, p = .25).”

      Reviewer #3:

      This paper investigated age differences in the neurocognitive mechanisms of value-based memory encoding and retrieval across children, adolescents and young adults. It used a novel experimental paradigm in combination with fMRI to disentangle age differences in determining the value of information based on its frequency from the usage of these learned value signals to guide memory encoding. During value learning, younger participants demonstrated a stronger effect of item repetition on response accuracy, whereas repetition suppression effects in a parahippocampal ROI were strongest in adults. Item frequency modulated memory accuracy such that associative memory was better for previously high-frequency value items. Notably, this effect increased with age. Differences in memory accuracy between low- and high-frequency items were associated with left lateral PFC activation which also increased with age. Accordingly, a mediation analyses revealed that PFC activation mediated the relation between age and memory benefit for high- vs. low-frequency items. Finally, both participants' representations of item frequency (which were more likely to deviate in younger children) and repetition suppression in the parahippocampal ROI were associated with higher memory accuracy. Together, these results data add to the still scarce literature examining how information value influences memory processes across development.

      Overall, the conclusions of the paper are well supported by the data, but some aspects of the data analysis need to be clarified and extended.

      Empirical findings directly comparing cross-sectional and longitudinal effects have demonstrated that cross-sectional analyses of age differences do not readily generalize to longitudinal research (e.g., Raz et al., 2005; Raz & Lindenberger, 2012). Formal analyses have demonstrated that proportion of explained age-related variance in cross-sectional mediation models may stem from various factors, including similar mean age trends, within-time correlations between a mediator and an outcome, or both (Lindenberger et al., 2011; see also Hofer, Flaherty, & Hoffman, 2006; Maxwell & Cole, 2007). Thus, the results of the mediation analysis showing that PFC activation explains age-related variance in memory difference scores, cannot be taken to imply that changes in PFC activation are correlated with changes in value-guided memory. While the general limitations of a cross-sectional study are noted in the Discussion of the manuscript, it would be important to discuss the critical limitations of the mediation analysis. While the main conclusions of the paper do not critically depend on this analysis, it would be important to alert the reader to the limited information value in performing cross-sectional mediation analyses of age variance.

      Thank you for raising this critical point. We have expanded our discussion to specifically note the limitations of our mediation analysis and to more strongly emphasize the need for future longitudinal studies to reveal how changes in neural circuitry may support the emergence of motivated memory across development. Specifically, on p. 26, we now write:

      “One important caveat is that our study was cross-sectional — it will be important to replicate our findings in a longitudinal sample to more directly measure how developmental changes in cognitive control within an individual contribute to changes in their ability to selectively encode useful information. Our mediation results, in particular, must be interpreted with caution as simulations have demonstrated that in cross-sectional samples, variables can emerge as significant mediators of age-related change due largely to statistical artifact (Hofer, Flaherty, & Hoffman, 2006; Lindenberger et al., 2011). Indeed, our finding that PFC activation mediates the relation between age and value-guided memory does not necessarily imply that within an individual, PFC development leads to improvements in memory selectivity. Longitudinal work in which individuals’ neural activity and memory performance is sampled densely within developmental windows of interest is needed to elucidate the complex relations between age, brain development, and behavior (Hofer, Flaherty, & Hoffman, 2006; Lindenberger et al., 2011).”

      It would be helpful to provide more information on how chance memory performance was handled during data analysis, especially as it is more likely to occur in younger participants. Related to this, please connect the points that belong to the same individual in Figure 3 to facilitate evaluation of individual differences in the memory difference scores.

      Thank you for raising this important point. On each memory test trial, participants viewed the item (either a postcard or picture) above images of four possible paired associates (see Figure 1 on p. 6). On each memory test trial, participants had 6 seconds to select one of these items. If participants did not make a response within 6 seconds, that trial was considered ‘missed.’ Missed trials were excluded from behavioral analyses and regressed out in neural analyses. If participants selected the correct associate, memory accuracy was coded as ‘1;’ if they selected an incorrect associate, accuracy was coded as ‘0.’ On each trial, there was 1 correct option and 3 incorrect options. As such, chance-level memory performance was 25%. We have now clarified this on p. 34 and included a dashed line indicating chance-level performance within Fig. 4 (formerly Figure 3) on p. 12. In addition, we have also updated Figure 4 (see below) to connect the points belonging to the same participants, as suggested by the reviewer.

      Figure 4. Participants demonstrated prioritization of memory for high-value information, as indicated by higher memory accuracy for associations involving items in the five- relative to the one-frequency condition (χ2(1) = 19.73, p <.001). The effects of item frequency on associative memory increased throughout childhood and into adolescence (linear age x frequency condition: χ2(1) = 10.74, p = .001; quadratic age x frequency condition: χ2(1) = 9.27, p = .002).

      Out of 90 participants, 2 children performed at or below chance (<= 25% memory accuracy). Interpreting the behavior of the participants who responded to fewer than 12 out of 48 trials correctly is challenging. On the one hand, they might not have remembered anything and responded correctly on these trials due to randomly guessing. On the other hand, they may have implemented an encoding strategy of focusing only on a small number of pairs. Thus, a priori, based on the analysis approach we implemented in our prior, behavioral study (Nussenbaum et al., 2019), we decided to include all participants in our memory analyses, regardless of their overall accuracy. However, when we exclude these two participants from our memory analyses, our main findings still hold. Specifically, we continue to observe main effects of frequency condition and age, and interactions between frequency condition and both linear and quadratic age on associative memory accuracy (ps < .012).

      We have now clarified these details about chance-level performance in the methods section of our manuscript on p. 34.

      “For our memory analyses, trials were scored as ‘correct’ if the participant selected the correct association from the set of four possible options presented during the memory test, ‘incorrect’ if the participant selected an incorrect association, and ‘missed’ if the participant failed to respond within the 6-second response window. Missed trials were excluded from all analyses. Because participants had to select the correct association from four possible options, chance-level performance was 25%. Two child participants performed at or below chance-level on the memory test. They were included in all analyses reported in the manuscript; however, we report full details of the results of our memory analyses when we exclude these two participants in Appendix 3 (Table 15). Importantly, our main findings remain unchanged.”

      In Appendix 3, we include a table with the full results from our memory model without these two participants:

      Appendix Table 15: Associative memory accuracy by frequency condition (below chance subjects excluded)

      I would like to see some consideration of how the different signatures of value learning, repetition suppression and reported item frequency, are related to the observed PFC and caudate effects during memory encoding. Such a discussion would help the reader connect the findings on learning and using information value across development.

      Thank you for this valuable suggestion. We agree that it would be interesting to link frequency- learning behavior to neural activity at encoding. As such, we have now conducted additional analyses to explore these relations.

      In the original version of our manuscript, we examined behavior at the item level through mixed- effects models, and neural activation during encoding at the participant level. Thus, to examine the relation between frequency-learning metrics and neural activation at encoding, we created two additional participant-level metrics. For each participant we computed their average repetition suppression index, and a measure of frequency distance. The average repetition suppression index reflects the overall extent to which the participant demonstrated repetition suppression in response to the fifth presentation of the high-frequency items, and is computed by averaging each participant’s repetition suppression indices across items. We hypothesized that participants who demonstrated the greatest degree of repetition suppression might be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information. The frequency distance metric reflects the average distance between participants’ explicit frequency reports for items that appeared once and items that appeared five times, and is computed by averaging their explicit frequency reports for items in each frequency condition, and then subtracting the average reports in the low-frequency condition from those in the high- frequency condition. We hypothesized that participants with the largest frequency distances might similarly be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information.

      We first wanted to confirm that the relations we observed between repetition suppression, frequency reports, and age, could also be observed at the participant level. In line with our prior, behavioral analyses, we found that age related to both mean repetition suppression indices (marginally; linear age: p = .067; quadratic age: p = .042); and frequency distances (linear and quadratic age: ps < .001).

      In addition, we further tested whether these two metrics related to memory performance. In contrast to our item-level findings, we did not observe a significant relation between repetition suppression indices and memory (p = .83). We did observe an effect of frequency distance on memory performance. Specifically, we observed significant interactions between frequency distance and age (p = .014) and frequency distance and quadratic age (p = .021) on memory difference scores, such that the influence of frequency distance on memory difference scores increased with increasing age from childhood to adolescence.

      We next examined how mean repetition suppression indices and frequency distances related to differential neural activation during encoding of high- and low-value pairs. In line with our memory findings, we did not observe any significant relations between mean repetition suppression indices and neural activation in the caudate or prefrontal cortex during encoding (ps > .15).

      Frequency distance did not relate to caudate activation during encoding nor did we observe a frequency distance x age interaction effect (ps > .16). Frequency distance did, however, relate to differential PFC activation during encoding of high- vs. low-value pairs. Specifically, we observed a main effect of frequency distance on PFC activation (p = .0012), such that participants whose explicit reports of item frequency, were on average, more distinct across frequency conditions, demonstrated increased PFC activation during encoding of pairs involving high- vs. low-frequency items. Interestingly, when we included frequency distance in our model, we no longer observed a significant effect of age on differential PFC activation, nor did we observe a significant frequency distance x age interaction (ps > .13). These findings suggest that PFC activation during encoding may have, in part, reflected participants’ beliefs about the structure of the environment, with participants demonstrating stronger differential engagement of control processes across conditions when their representations of the conditions themselves were more distinct.

      Finally, we examined how age, frequency distance, and PFC activation related to memory difference scores. Here, even when controlling for both frequency distance and PFC activation, we continued to observe main effects of age and quadratic age on memory difference scores (linear age: p = .006; quadratic age: p = .001). In line with our analysis of the relation between frequency reports and memory, these results suggest that age-related variance in value-guided memory may depend on both knowledge of the structure of the environment and use of that knowledge to effectively control encoding.

      We have now added these results to our manuscript on p. 13 - 14. We write:

      “Given the relations we observed between memory and both repetition suppression and frequency reports, we examined whether they related to neural activation in both our caudate and PFC ROI during encoding. To do so, we computed each participant’s average repetition suppression index, and their “frequency distance” — or the average difference in their explicit reports for items in the high- and low-frequency conditions. We expected that participants with greater average repetition suppression indices and greater frequency distances represented the high- and low-frequency items as more distinct from one another and therefore would show greater differences in neural activation at encoding across frequency conditions. In line with our prior analyses, both metrics varied with age (though repetition suppression only marginally (linear age: p = .067; quadratic age: p = .042); Appendix 3 Tables 22 and 25), suggesting that older participants demonstrated better learning of the structure of the environment. We ran linear regressions examining the relations between each metric, age, and their interaction on neural activation in both the caudate and PFC. We observed no significant effects or interactions of average repetition suppression indices on neural activation (ps > .15; Appendix 3 Tables 23 and 24). We did, however, observe a significant effect of frequency distance on PFC activation (β = .42, SE = .12, p = .0012), such that participants who believed that average frequencies of the high- and low-frequency items were further apart also demonstrated greater PFC activation during encoding of pairs with high- vs. low-frequency items. Here, we did not observe a significant effect of age on PFC activation (β = -.03, SE = .13, p = .82), suggesting that age-related variance in PFC activation may be related to age differences in explicit frequency beliefs. Importantly, however, even when we accounted for both PFC activation and frequency distances, we continued to observe an effect of age on memory difference scores (β = .56, SE = .20, p = .006), which, together with our prior analyses, suggest that developmental differences in value-guided memory are not driven solely by age differences in beliefs about the structure of the environment but also depend on the use of those beliefs to guide encoding.”

      We have added the full model results to Appendix 3.

      Given these results, we have now revised our interpretation of our neural data. Our memory analyses demonstrate that across our age range, we observed age-related differences in both the acquisition of knowledge of the structure of the environment and in its use. Originally, we interpreted the PFC activation as reflecting the use of learned value to guide memory. However, the strong relation we found between frequency distance and PFC activation suggests that the age differences in PFC activation that we observed may also be related to age differences in knowledge of the structure of the environment that governs when control processes should be engaged most strongly. However, these results must be interpreted cautiously. Participants provided explicit frequency reports after they completed the encoding and retrieval tasks, and so explicit frequency reports may have been influenced not only by participants’ memories of online frequency learning, but also by the strength with which they encoded the item and its paired associate, and the experience of successfully retrieving it.

      We have now revised our discussion to consider these results. On p. 23, we now write,

      “Our neural results further suggest that developmental differences in memory were driven by both knowledge of the structure of the environment and use of that knowledge to guide encoding.”

      n p. 24, we write,

      “The development of adaptive memory requires not only the implementation of encoding and retrieval strategies, but also the flexibility to up- or down-regulate the engagement of control in response to momentary fluctuations in information value (Castel et al., 2007, 2013; Hennessee et al., 2017). Importantly, value-based modulation of lateral PFC engagement during encoding mediated the relation between age and memory selectivity, suggesting that developmental change in both the representation of learned value and value-guided cognitive control may underpin the emergence of adaptive memory prioritization. Prior work examining other neurocognitive processes, including response inhibition (Insel et al., 2017) and selective attention (Störmer et al., 2014), has similarly found that increases in the flexible upregulation of control in response to value cues enhance goal-directed behavior across development (Davidow et al., 2018), and may depend on the engagement of both striatal and prefrontal circuitry (Hallquist et al., 2018; Insel et al., 2017). Here, we extend these past findings to the domain of memory, demonstrating that value signals derived from the structure of the environment increasingly elicit prefrontal cortex engagement and strengthen goal-directed encoding across childhood and into adolescence.”

      And on p. 25, we have added an additional paragraph:

      “Further, we also demonstrate that in the absence of explicit value cues, the engagement of prefrontal control processes may reflect beliefs about information value that are learned through experience. Here, we found that differential PFC activation during encoding of high- vs. low-value information reflected individual and age-related differences in beliefs about the structure of the environment; participants who represented the average frequencies of the low- and high-frequency items as further apart also demonstrated greater value-based modulation of lateral PFC activation. It is important to note, however, that we collected explicit frequency reports after associative encoding and retrieval. Thus the relation between PFC activation and explicit frequency reports may be bidirectional — while participants may have increased the recruitment of cognitive control processes to better encode information they believed was more valuable, the engagement of more elaborative or deeper encoding strategies that led to stronger memory traces may have also increased participants’ subjective sense of an item’s frequency (Jonides & Naveh-Benjamin, 1987).”

      A point worthy of discussion are the implications of the finding that younger participants demonstrated greater deviations in their frequency reports for the development of value learning, given that frequency reports were found to predict associative memory accuracy.

      Thank you for raising this important point. Indeed, one of our main findings is that older participants are better both at learning the structure of their environments and also at using structured knowledge to strategically prioritize memory. In our original manuscript, we described results of a model that included participants’ explicit frequency reports as a predictor of memory. Model comparison revealed that participants’ frequency reports — which we interpret as reflecting their beliefs about the structure of the environment — predicted memory more strongly than the item’s true frequency. In other words, participants’ beliefs about the structure of the environment (even if incorrect) more strongly influenced their memory encoding than the true structure of the environment. Critically, however, frequency reports interacted with age to predict memory (Fig 8). Even when we accounted for age-related differences in knowledge of the structure of the environment, older participants demonstrated a stronger influence of frequency on memory, suggesting they were better able to use their beliefs to control subsequent associative encoding. We have now clarified our interpretation of this model in our discussion on p. 23:

      “Importantly, though we observed age-related differences in participants’ learning of the structure of their environment, the strengthening of the relation between frequency reports and associative memory with increasing age suggests that age differences in learning cannot fully account for age differences in value-guided memory. Even when accounting for individual differences in participants’ explicit knowledge of the structure of the environment, older participants demonstrated a stronger relation between their beliefs about item frequency and associative memory, suggesting that they used their beliefs to guide memory to a greater degree than younger participants.”

      As noted by the reviewer, however, our initial memory analysis did not account for age-related differences in participants’ initial, online learning of item frequency, and our neural analyses further did not account for age differences in explicit frequency reports. We have now run additional control analyses to account for the potential influence of individual differences in frequency learning on associative memory. Specifically, for each participant, we computed three metrics: 1.) their overall accuracy during frequency-learning, 2.) their overall accuracy for the last presentation of each item during frequency-learning (as suggested by Reviewer 2), and 3.) the mean magnitude of the error in their frequency reports. We then included these metrics as covariates in our memory analyses.

      When we include these control variables in our model, we continue to observe a robust effect of frequency condition (p < .001) as well as robust interactions between frequency condition and linear and quadratic age (ps < .003) on associative memory accuracy. We also observed a main effect of frequency error magnitude on memory accuracy (p < .001). Here, however, we no longer observe main effects of age or quadratic age on overall memory accuracy. Given the relation we observed between frequency error magnitudes and age, the results from this model suggests that there may be age-related improvements in overall memory that influence both memory for associations as well as learning of and memory for item frequencies. The fact that age no longer relates to overall memory when controlling for frequency error magnitudes suggest that age-related variance in memory for item frequencies and memory for associations are strongly related within individuals. Importantly, however, age-related variance in memory for item frequencies did not explain age-related variance in the influence of frequency condition on associative memory, suggesting that there are developmental differences in the use of knowledge of environmental structure to prioritize valuable information in memory that persist even when controlling for age-related differences in initial learning of environmental regularities. Given the importance of this analysis in elucidating the relation between the learning of environmental structure and value-guided memory, we have now updated the results in the main text of our manuscript to include them. Specifically, on p. 13, we now write:

      “Because we observed age-related differences in participants’ online learning of item frequencies and in their explicit frequency reports, we further examined whether these age differences in initial learning could account for the age differences we observed in associative memory. To do so, we ran an additional model in which we included each participant’s mean frequency learning accuracy, mean frequency learning accuracy on the last repetition of each item, and explicit report error magnitude as covariates. Here, explicit report error magnitude predicted overall memory performance, χ2(1) =13.05, p < .001, and we did not observe main effects of age or quadratic age on memory performance (ps > .20). However, we continued to observe a main effect of frequency condition, χ2(1) = 19.65 p < .001, as well as significant interactions between frequency condition and both linear age χ2(1) = 10.59, p = .001, and quadratic age χ2(1) = 9.15, p = .002. Thus, while age differences in initial learning related to overall memory performance, they did not account for age differences in the use of environmental regularities to strategically prioritize memory for valuable information.”

      In addition, as suggested by the reviewer, we also included the three covariates as control variables in our mediation analysis. When controlling for online frequency learning and explicit frequency report errors, PFC activity continued to mediate the relation between age and memory difference scores. We have now included these results on p. 16 - 17 of the main text:

      “Further, when we included quadratic age, WASI scores, online frequency learning accuracy, online frequency learning accuracy on the final repetition of each item, and mean explicit frequency report error magnitudes as control variables in the mediation analysis, PFC activation continued to mediate the relation between linear age and memory difference scores (standardized indirect effect: .56, 95% confidence interval: [.06, 1.35], p = .023; standardized direct effect; 1.75, 95% confidence interval: [.12, .3.38], p = .034).”

      We also refer to these analyses when we interpret our findings in our discussion. On p. 23, we write:

      “In addition, we continued to observe a robust interaction between age and frequency condition on associative memory, even when controlling for age-related change in the accuracy of both online frequency learning and explicit frequency reports. Thus, though we observed age differences in the learning of environmental regularities and in their influence on subsequent associative memory encoding, our developmental memory effects cannot be fully explained by differences in initial learning.”

      We thank the reviewer for this constructive suggestion, as we believe these control analyses strengthen our interpretation of age differences in both the learning and use of environmental regularities to prioritize memory.

    1. Author Response:

      Reviewer #1:

      Tan et al. resequence the whale shark genome using long-read technology (PacBio), improving a previously available assembly, which was obtained from the same source DNA used in this study. The analyses of this improved genome led to a gapless assembly, which, together with the annotation, led the authors to analyze several features of the whale shark genome, including gene family gains/losses, the evolution of immune genes important for patter recognition receptors, rates of substitution in whale shark compared to other vertebrates, the evolution of genes potentially related with the emergence of gigantism and cancer rate.

      Whale sharks constitute a charismatic group of vertebrates for several reasons. First, they belong to a poorly studied group, and their genomic properties inform us about gene families dating back to the split between osteichthyes and chondrichthyes, i.e. to the origin of gnatostomes. Additionally, they have a unique biology, associated with their large size, which can inform us on the evolution of gigantism in vertebrates.

      Tan et al. indeed assemble a gapless genome for whale shark, with large contiguity (contig N50 of 10^5), i.e. providing a novel resource for shark genomes and for the study of early vertebrate evolution. However, repetitive regions that are not included in the assembly, account for over 700 Mb. The authors could provide more information about the genome assembly, the coverage, and a chromosome-level map of the genome.

      We clarify what this means in the relevant sentence (line 133) and added some clarity on lines 138–139. Supplementary Table 1 also lists these stats. A chromosome-level map is outside of the scope of what is possible with our present data.

      This improved assembly, together with the annotation, lead to exploring several aspects of the whale shark genome evolution. They reveal gene family gains and losses specific to the lineage leading to whale sharks. It is however not entirely clear to what extent the novel assembly enabled these findings compared to the previous assembly.

      The previous assembly was extremely fragmented with only ~15% BUSCO complete orthologs as previously noted by Hara et al. (2018). We thus regard the present genome assembly as large improvement for genome-based studies, including the present analyses based primarily on studying gene family evolution and downstream analyses. We now provide more information in the main text about the increase in ortholog completeness (line 155), which should make it clearer why gene family evolution analyses would be improved, including origin and loss of gene families, evolution of innate immune genes, and rates of gene family expansion.

      This work provides an important resource for the study of the evolution of Pattern Recognition Receptors in early vertebrates, potentially identifying a novel class of TLR (i.e. TLR29) in whale sharks. This work further suggests that a diverse set of PRR is fully compatible with the evolution of lymphocyte-based adaptive immunity. In other words, the authors hypothesize that the evolution of adaptive immunity did not lead to the functional loss of innate immune effectors.

      Using comparative genomics, the authors explore the genomic basis of gigantism and cancer evolution. First, they adopt the two-cluster test to study substitution rates in single-copy orthologs, confirming that cartilaginous fish have slower rate of substitution to other vertebrates - confirming a previous finding. However, whale shark substitution rate does not differ from other sharks (i.e. non-giant sharks), suggesting that gigantism may not be directly or uniquely correlated with a slow substitution rate. I would recommend the authors to expand on the use of the two cluster test in the result section.

      The two-cluster test is a fairly simple test and we describe the results of all of the analyses performed. The descriptions seem comparable to the detail provided in other publications that use the two-cluster test (e.g. Venkatesh et al. 2014, Weber et al. 2020).

      Overall, the manuscript would greatly benefit from leveraging on its main asset, which a genome assembly with relatively high contiguity. Even if the authors study the genome of one individual, they could provide more information regarding changes in heterozygosity along the genome.

      We have moved results about heterozygosity to the main text and added some more information regarding heterozygosity by performing SNP calling using freebayes to lines 146- 148. As we do not have a chromosome-scale assembly and that is outside the goals of the current study, we do not summarize changes in the heterozygosity along the genome.

    1. Author Response:

      Reviewer #1:

      Maimon-Mor et al. examined the control of reaching movement of one-handers, who were born with a partial arm, and amputees, who lost their arm in adulthood. The authors hypothesized that since one-handers started using their artificial arm earlier in life then amputees, they are expected to exhibit better motor control, as measured by point-to-point reaching accuracy. Surprisingly, they found the opposite, that the reaching accuracy of one-handers is worse than that of amputees (and control with their non-dominant hand). This deficit in motor control was reflected in an increase in motor noise rather than consistent motor biases.

      Strengths:

      • I found the paper in general very well and clearly written.
      • The authors provide detailed analyses to examine various possible factors underlying deficits in reaching movements in one-handers and amputees, including age at which participants first used an artificial arm, current usage of the arm, performance in hand localization tasks, and statistical methods that control for potential confounding factors.
      • The results that one handers, who start using the artificial arm at early age, show worse motor control than amputees, who typically start using the arm during adulthood, are surprising and interesting. Also intriguing are the results that reaching accuracy is negatively correlated with the time of limbless experience in both groups. These results suggest that there is a plasticity window that is not anchored to a certain age, but rather to some interference (perhaps) from the time without the use of artificial arm. In one-handers these two time intervals are confounded by one another, but the amputees allow to separate them. I think that the results have implications for understanding plasticity aspects of acquiring skills for using artificial limbs.

      Weaknesses:

      • While I found that one of the main conclusion from the paper is that the main factor that is related to increased motor noise is the time spent without the artificial arm, it felt that this was not emphasized as such. These results are not mentioned in the abstract and the correlation for amputees is not shown in a figure.

      We thank the reviewer for their comment. While it is true that motor noise correlated with time of limbless experience in both groups, we were hesitant to highlight the results found in amputees, considering the small number of participants, and lack of converging evidence (e.g., contrary to the congenital group, we did not find a strong main effect). For these reasons, we have chosen to include it in the manuscript but not highlight it or base our main conclusions on it. Following the reviewer’s comment, the correlation of the amputees’ data is now visualised in Figure 3. Moreover, while the behavioural correlation might be similar in both groups, from a neural standpoint, the limbless experience of a toddler with a developing brain is qualitatively different to that of an adult, with a fully developed brain, who has lost a limb. As such, we were hesitant to link these two findings into a single framework, however in the revised manuscript we highlight this tentative link.

      Discussion (4th paragraph):

      “In both the congenital and acquired groups, artificial arm reaching motor noise correlated with the amount of time they spent using only their residual limb. It is therefore tempting to link these two results under a unifying interpretation; however, this requires further research, considering the neural differences between the two groups.”

      Figure 3. Years of limbless experience before first artificial arm use in the acquired group. (A) Relationship between years of limbless experience and (A) artificial arm reaching errors or (B) artificial arm motor noise in the acquired group.

      • The suggested mechanism of a deficit in visuomotor integration is not clear, and whether the results indeed point to this hypothesis. The results of the reaching task show that the one-handers exhibit higher motor noise and initial error direction than amputees. The results of the 2D localization task (the same as the standard reaching task but without visual feedback) show no difference in errors between the groups. First, it is not clear how the findings of the 2D localization task are in line with the results that one-handers show larger initial directional errors.

      We fully take on the reviewer’s comment regarding the vague use of the term visuomotor integration. In the revised manuscript, we have opted instead for a much broader term, suggesting a deficit in visual-based corrective movements, considering we are limited in our ability to infer the specific underlying mechanism from our result. We have also made changes to the abstract based on the reviewer’s comment (see below).

      With regards to discussing how the various results fit together, in the revised manuscript, these are now discussed more at length. In short, in the 2D localisation task (reaching without visual feedback), participants were not instructed to perform fast ballistic movements. Instead, participants were instructed that they could perform movements to correct for their initial aiming error (using proprioception). Together with the similar performance observed for the proprioceptive task, this strengthens our suggestion that the deficit in the congenital group is triggered by visual-driven corrections. These various considerations are now detailed as follows:

      Abstract:

      “Since we found no group differences when reaching without visual feedback, we suggest that the ability to perform efficient visually-based corrective movements, is highly dependent on either biological or artificial arm experience at a very young age.”

      Result (section 7, 1st paragraph):

      “From these results, we infer that early-life experience relates to a suboptimal ability to reduce the system’s inherent noise, and that this is possibly not related to the noise generated by the execution of the initial motor plan. Early life experience might therefore relate to better use of visual feedback in performing corrective movements. The continuous integration of visual and sensory input is at the heart of visually- driven corrective movements. Therefore, one possibility is that limited early life experience, results in suboptimal integration of information within the sensorimotor system.”

      Discussion (2nd paragraph):

      “When performing reaching movements without visual feedback (2D localisation task), the congenital group did not differ from the acquired or control group. This begs the question, if the congenital group has a deficit in motor planning why was it not evident in this task as well? In the 2D localisation task, unlike the main task, participants were allowed to make corrective movements. While they did not receive visual feedback, the proprioceptive and somatosensory feedback from the residual limb appears to be enough to allow them to correct for initial reaching errors and perform at the same level as the acquired and control group. Moreover, we did not find strong evidence for an impaired sense of localisation of either the residual or the artificial arm in the congenital group. As such, by elimination, our evidence suggests that the process of using visual information to perform corrective movements isn’t as efficient in the congenital group.”

      Discussion (2nd paragraph):

      “Lack of concurrent visual and motor experience during development might therefore cause a deficit in the ability to form the computational substrates and thus to efficiently use visual information in performing corrective movements.”

      Discussion (last paragraph):

      “By the process of elimination, we have nominated suboptimal visual feedback-based corrections to be the most likely cause underlying this motor deficit.”

      Second, I think that these results suggest that the deficiency in one-handers is with feedback responses rather than feedforward. This may also be supported by the correlation with age: early age is correlated with less end-point motor noise, rather than initial directional error. Analyses of feedback correction might help shedding more light on the mechanism. The authors mention that the participants were asked to avoid doing corrective movement and imposed a limit of 1 sec per reach to encourage that. But it is not clear whether participants actually followed these instructions. 1 sec could be enough time to allow feedback responses, especially for small amplitude movements (e.g., <10 cm).

      Please see below our response to the feedback correction analysis suggestion. Regarding corrective movements, we had the same concern as the reviewer which led us to use hand velocity data to identify first movement termination. We apologise if the experimental design and pre-processing procedures were not clear.

      In short, a 1 sec trial duration was imposed on all trials to generate a sense of time- pressure and encourage participants to perform fast ballistic movements. As we were worried that participants might still perform secondary corrective movements within this 1 sec window, for each trial, we used the hand velocity profile to identify the end of the first movement. Below, we have plotted the arm velocity from a single trial to illustrate this procedure. For this trial, the timepoint indicated by the circular marker has been identified as the time of the end of the first movement (See Methods for further information). For each trial, endpoint location was defined as the location of the arm at the movement termination timepoint defined by the kinematic data and not the endpoint at the 1 sec timepoint. It is worth noting that performing the same analysis using the end- points recorded at the 1 sec timepoint did not generate different statistical results.

      This has now been further clarified in the text.

      Results (section 1, 1st paragraph):

      “Reaching performance was evaluated by measuring the mean absolute error participants made across all targets (see Figure 1C). The absolute error refers to the distance from the cursor’s position at the end of the first reach (endpoint) to the centre of the target in each trial. The endpoint of each trial was set as the arm location at the end of the first reaching movement, identified using the trial’s kinematic data (See Methods).”

      Methods (section: Data processing and analysis – main task):

      “Within the 1 sec movement time constraint, in some trials, participants still performed secondary corrective movements. We therefore used the tangential arm velocities to identify the end of the first reach in each trial (i.e., movement termination).”

      Reviewer #2:

      This is a broad and ambitious study that is fairly unique in scope - the questions it seek to answer are difficult to answer scientifically, and yet the depth of the questions it seeks to answer and the framework in which it is founded seem out of place in a clinical journal.

      And yet, as a scientist and clinician, I found myself objecting to the claims of the authors, only have them to address my objection in the very next section. The results are surprising, but compelling - the authors have done an excellent job of untangling a very complicated question, and they have tested (for our field) a large number of subjects.

      The main two results of the paper, from my perspective, are as follows:

      1) Persons with an amputation can form better models of new environments, such as manipulandums, than can those with congenital deficiencies. This result is interesting because a) the task did not depend on significant use of the device (they were able to use their intact musculature for the reaching-based task), and b) the results were not influenced by the devices used by the subjects (cosmetic, body-powered, or myoelectric).

      2) Persons with congenital deficiency fit earlier in life had less error than those fit later in life.

      Taken together, these results suggest that during early childhood the brain is better able to develop the foundation necessary to develop internal models and that if this is deprived early in childhood, it cannot be regained later in life - even if subjects have MORE experience. (E.g., those with congenital deficiencies had more experience using their prosthetic arm than those with amputation, and yet scored worse).

      The questions analyzed by the researchers are excellent and the statistical methods are generally appropriate. My only minor concern is that the authors occasionally infer that two groups are the same when a large p-value is reported, whereas large p-values do not convey that the groups are the same; only that they cannot be proven to be different. The authors would need to use a technique such as ICC or analysis of similarities to prove the groups are the same.

      We appreciate the reviewer’s concern about inferring the null from classical frequentist statistics. In this manuscript, we have opted to using Bayesian statistics as a measure of testing the significance of similarity across groups (See Methods: Statistical analysis) as opposed to the frequentist methods suggested by the reviewer. This approach is equivalent to the ones proposed by the reviewer and are widely used in our field. A Bayesian Factor (BF) smaller than 0.33 is regarded as sufficient evidence for supporting the null hypothesis that is, that there are no differences between the groups.

      This approach is described in detail in the methods and is introduced in the first section of the results as well.

      Results (1st section 2nd paragraph):

      “To further explore the non-significant performance difference between amputees and controls, we used a Bayesian approach (Rouder et al., 2009), that allows for testing of similarities between groups (the null hypothesis). In this analysis, the smaller effect size of the two reported here (1.39) was inputted as the Cauchy prior width. The resulting Bayesian Factor (BF10=0.28) provided moderate support to the null hypothesis (i.e., smaller than 0.33).”

      Methods (Statistical analysis section):

      “In parametric analyses (ANCOVA, ANOVA, Pearson correlations), where the frequentist approach yielded a non-significant p-value, a parallel Bayesian approach was used and Bayes Factors (BF) were reported (Morey & Rouder, 2015; Rouder et al., 2009, 2012, 2016). A BF<0.33 is interpreted as support for the null-hypothesis, BF > 3 is interpreted as support for the alternative hypothesis (Dienes, 2014). In

      Bayesian ANOVAs and ANCOVA’s, the inclusion Bayes Factor of an effect (BFIncl) is reported, reflecting that the data is X (BF) times more likely under the models that include the effect than under the models without this predictor. When using a Bayesian t-test, a Cauchy prior width of 1.39 was used, this was based on the effect size of the main task, when comparing artificial arm reaches of amputees and one- handers. Therefore, the null hypothesis in these cases would be there is no effect as large as the effect observed in the main task.”

      Following the reviewer’s comment, we have carefully scanned through the manuscript to make sure no equivalence claims are made without the support of a significant BF. In one instance that has been the case and has been rectified.

      Results (3rd section, 2nd paragraph):

      “We compared artificial arm and nondominant arm biases (distance from the centre of the endpoint to the target) across groups, using intact arm biases as a covariate. The ANCOVA resulted in no significant (inconclusive) group differences (F(2,47)=2.40, p=0.1, BFIncl=0.72; see Figure 2A).”

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The authors do a great job of listing and evaluating possible explanations, one of which is simply that the strains carry multiple mutations of small effect. All but one of the successfully mapped variants consists of missense and nonsense mutations. I think it's important to note that this represents a particular range of the effect-size distribution of mutations affecting the YFP phenotype. We know from the authors' earlier work that there are lots of mutations that can affect gene expression in cis, and so the absence of trans-acting cis-regulatory variants here is parsimoniously interpreted as due to their small effects. In general, work in other systems (particularly human genetics) has shown that even molecular traits are often hugely polygenic, affected by thousands of variants of tiny but non-zero magnitude. With a forward screen of the sort performed here, it's difficult to know how much of the phenotypic variance is due to unmapped small-effect variants, but two lines of evidence suggest it may be a lot: first, the absence of mappable causal mutations in 36/82 mutants, and second, the differences between EMS mutant strains and their matched single-site mutants. The authors commendably report and discuss these issues but to my mind they neglect them in drawing inferences and generalizations from their findings.

      We thank the reviewer for these encouraging comments and also appreciate the reviewer pointing out these concerns.

      With respect to the overlap of the trans-regulatory mutations we mapped and previously identified eQTL, we agree the possibility of similar mapping biases in the two BSA-seq studies contributing to the overlap of trans-regulatory mutations and eQTL warrants further exploration. We interpret the reviewer’s comment to suggest that if some regions of the genome systematically showed lower sequencing read coverage (because of poor read mappability, PCR biases or any other reason), the power to detect trans-regulatory mutations and eQTL in these regions would be decreased compared to regions of the genome with higher coverage because the G-tests used to identify significant associations with expression in both studies are based on read counts. Consequently, variation in sequencing read coverage across the genome shared in this study and the prior study identifying eQTL, both of which used BSA-seq, could lead to the enrichment of transregulatory mutations in eQTL regions. Indeed, consistent patterns of read coverage across the S. cerevisiae genome have been observed in prior work.

      To determine whether trans-regulatory mutations were enriched in regions of the genome with higher sequence read coverage, we compared read coverage between regions of the genome identified as having trans-regulatory mutation or non-regulatory mutations. The identification of variants classified as non-regulatory is expected to be less dependent on the depth of sequencing read coverage because this designation does not require a statistically significant G-test. We found that the mutations identified as trans-regulatory showed 120x coverage whereas mutations identified as non-regulatory showed only 100x coverage, consistent with greater power to detect associations with expression in regions of the genome with higher sequencing read coverage. However, eliminating this difference in read coverage by excluding non-regulatory mutations with lower sequence read coverage did not eliminate the observed enrichment of trans-regulatory mutations in regions previously shown to contain eQTL. Non-regulatory mutations with higher and lower sequencing read coverage were also equally likely to be found within eQTL regions, suggesting that similar variation in sequence read coverage across the genome between the two studies is unlikely to explain the observed overlap of trans-regulatory mutations and eQTL. These analyses are now included in a new Figure 7-figure supplement 1.

      With respect to better incorporating biases in what we were able to map and considerations for extending findings from this work to other systems, we have tried to better address these issues in the revised discussion.

      Reviewer #2 (Public Review):

      Fabien Duveau et al. tried to characterize mutations in trans-regulation effects on expression of the TDH3 by using EMS mutants with TDH3 reporter in Saccharomyces cerevisiae. This work is an extension of works of Gruber et al. (2012) and Metzger et al. (2016) with specific mutation effect on TDH3 expression. They found that these trans-regulatory mutations that have effects on expression of TDH3 reporter were enriched in coding sequences of transcription factors. They found that the trans regulatory mutations with effect are associated with natural variants of trans within S. cerevisiae. In summary, the data is well described and supports their claims. The method of study could be used for study the mechanism how regulatory network works.

      [...] Although the paper does have strengths in principle, some weaknesses of the paper would cause the quality of data presented. [...]

      We thank the reviewer for taking the time to evaluate this work and have the following responses to the weaknesses noted:

      1) The reviewer is correct that we focused this paper on trans-regulatory mutations because cis-regulatory mutations affecting TDH3 expression were previously characterized. Furthermore, long distance enhancers with cis-acting effects on expression have not been described in S. cerevisiae and the term promoter is commonly used to encompass both the basal (core) promoter (including a TATA box for some genes) as well as other upstream activating sequences (UAS) and upstream repressing sequences (URS). In other words, the cis-acting sequences for S. cerevisiae genes are confined to a particular region much more than in multicellular eukarlyotes. In fact, our prior work with TDH3 (Metzger et al. 2015) showed that 97% of cis-acting variation affecting TDH3 expression could be explained by sequence variation in the 678 bp region we define as its promoter. Consequently, all mutations outside of this region were considered to have transregulatory effects on TDH3 expression. In the revised version, we extended the discussion to specifically compare the structure of regulatory sequences in S. cerevisiae to other eukaryotic model systems.

      2) In this study, a mutation is defined as trans-regulatory if it affects TDH3 expression and is not located in the TDH3 promoter, regardless of whether or not it also affects growth rate. In fact, mutations in RAP1 and GCR1 affect growth rate (Figure 5), but are clearly trans-regulators of TDH3 with well-established binding sites in the TDH3 promoter. In other words, we do not think that mutations should be discounted as having trans-regulatory effects because they also impact growth rate.

      3) (A) Prior work examining the statistics of BSA-seq has shown that G-tests are most appropriate because they take into account the independent sampling from two bulk populations inherent to bulk-segregant analysis (Magwene et al. 2011 PLOS Computational Biology). (B) We are guessing that the reviewer is asking about multiple testing corrections rather than post-hoc tests, as we used a false discovery rate correction for multiple tests in Figure 2-supplement 5A. Although we did not use a multiple test correction for the BSA-seq data, we used a conservative significance threshold of 0.001 that was expected to result in a 3.5% false positive rate. Perhaps more importantly, we functionally validated the effects of 40 of the 41 associated mutations tested. (C) We may indeed have been overly optimistic about mapping power when choosing mutants to analyze with BSA-seq given that the 36 EMS mutants for which we failed to find a significant association between a mutation and fluorescence tended to have smaller effects on PTDH3-YFP expression than the EMS mutants for which we observed one or more associated sites (Figure 3-figure supplement 3). The reviewer’s comment also made us realize that our original sentence referring to mapping power had reported the effect size for estimated RNA levels rather than fluorescence. To avoid confusion, and because our anticipated mapping power does not affect the results of the study, we deleted this statement from the revised manuscript. Regardless of our anticipated mapping power, we were ultimately able to map mutations that affected fluorescence by as little as 1.6% relative to the wildtype strain.

      4) The GO enrichment analysis was performed with widely used tools on www.pantherdb.org. The statistical significance of enrichment for each GO term was computed using Fisher’s exact tests that compared 1) the proportion of genes with non-regulatory mutations and 2) the proportion of genes with trans-regulatory mutations that corresponded to the tested GO term. Because the total number of genes identified in our study with trans-regulatory mutations (42 genes) was much lower than the total number of genes with non-regulatory mutations (1043 genes), it was possible to obtain strong and statistically significant enrichment (P < 0.05 in Fisher’s exact test) even if only a small number of genes corresponded to the GO term in both categories. Although we found a large number of enriched GO terms, these GO terms were not always independent from each other. For instance, GO:0009168 (purine ribonucleoside monophosphate biosynthetic process) and GO:0009167 (purine ribonucleoside monophosphate metabolic process) refers to the same biological process and contains the same genes. For this reason, even though we reported all enriched GO terms in Supplementary File 8, we only showed GO terms that were at the tips of different branches in the GO hierarchy on Figure 6 and we grouped GO terms in four main categories that together encompassed most genes with trans-regulatory mutations.

      5) We agree with the reviewer that trans-regulatory mutations can affect either the function of a gene product (including the ability of a transcription factor to bind to DNA) as well as the abundance of that gene product, but we do not think this is a weakness of the study. In fact, we think one of the strengths of the study is that we have empirical data testing the relative frequency of these two types of possible changes, finding that mutations in coding regions (presumably more likely to affect the function of the gene product than its expression) are the primary source of changes in TDH3 expression greater than 1%.

      6) The goal of the study was to characterize the effects of individual trans-regulatory mutations, thus we did not look at the combined effects of mutations in proteins that might work in a complex. We do, however, mention transcription factors working in a complex: "Transcription factors encoded by the TYE7 and GCR2 genes found to harbor trans-regulatory mutations affecting expression of PTDH3-YFP are known to regulate the expression of glycolytic genes (including TDH3) by forming a complex with transcription factors encoded by the RAP1 and GCR1 genes” (line 461). We think that looking at the combined effects of mutations that all impact the same complex of regulatory proteins is an interesting direction for future work.

      Finally, we’d like to point out that the reviewer’s statement in their opening summary about mutations being enriched in the coding sequence of transcription factors is not quite correct: the mutants we mapped were enriched in coding sequences, and we found more mutations in transcription factors previously shown to regulate (directly or indirectly) expression of TDH3 than expected by chance, but trans-regulatory mutations were not significantly enriched in genes encoding transcription factors relative to non-regulatory mutations (as described in the manuscript).

      Reviewer #3 (Public Review):

      [...] The mutagenesis approach in yeast the authors used is very powerful, but it naturally has drawbacks. The regulatory landscape in yeast is arguably simpler compared to e.g. metazoa or plants, in that the cis-regulatory regions are predominantly closely linked to target genes, the genes in majority do not have introns and post-transcriptional regulation of mRNA through e.g. splicing is rare. These features distinguish the systems, as in animals and plants introns are a very prominent source of regulatory elements (close to half of all enhancers are intronic in many animals), and alternative splicing of e.g. transcription factors are known to play major roles in transcriptional regulation. Further, chromatin is a very important layer in metazoan and plant gene regulation. To benefit the general readership, it would be informative to further elaborate on the significance of the findings for researchers studying other organisms. In addition, it would help to clarify what aspects of the differences in the regulatory landscape the authors think are important to distinguish.

      We thank the reviewer for their kind words and recognition of the novelty of this work. We have modified the introduction to try to clarify the relationship of this work to eQTL studies, which we hope addresses the reviewer’s first concern. Specifically, we’ve tried to clarify that the complex, polygenic nature of trans-regulatory variation segregating within a species is well established by prior eQTL studies. We also sought to clarify that our work (which maps single mutations from mutagenized strains rather than natural variation) provides complementary insight into the distribution of regulatory mutations within the genome and within a gene’s regulatory network. Revisions have also been made to try to clarify that the single mutations we mapped were from EMS-induced mutants containing only ~24 mutations per genome, which is more than 1000-fold less than the number of single nucleotide polymorphisms between two strains of S. cerevisiae. That is, this study was designed to identify single trans-regulatory mutations rather than to characterize the genetic architecture of naturally occurring trans-regulatory variation. Although we intentionally focused on characterizing properties of single mutations here, we agree with the reviewer that testing for epistatic interactions among trans-regulatory mutations will be an interesting avenue for future work, and have added this point to the revised discussion. We have also added text to the discussion describing some similarities and differences in gene regulation as among eukaryotes that should be considered when trying to generalize from this work.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Authors' rigorous experimental design (based on bacterial genetics and structural biology), solid biochemical assays (including photo-crosslinking, cysteine crosslinking, and Western blotting), and carefully drawn interpretation and conclusions are impressive. Finally, authors delineate the mechanisms of BepA activation and LptD biogenesis, which are supported by the current and previous studies by the authors and other research groups.

      Thank you for the nice summarization and the positive evaluation of our study.

      While this is overall a wonderful piece of work, this manuscript would be further improved by clarifying the following points:

      1. Authors examined how mutations (Pro and Cys scanning) on the edge-strand of BepA affected degradation and maturation of LptD.

      It was assumed that these mutations impact the structure of BepA only locally. However, a mutational effect can be propagated in an unexpected way affecting the structural integrity of other regions. Although authors tested that A106P retains proteolytic activity as shown by self-cleavage, a similar test (for example, in vitro experiments using a structureless substrate) may need to be extended to other mutations to support the conclusions.

      Thank you for the suggestion. Unfortunately, we have not succeeded in reproducibly detecting the proteolytic activity of BepA with purified BepA even when an unstructured substrate (-casein) is used and the assay was conducted at an elevated temperature, possibly because the protases activity of isolated BepA is tightly repressed by the mechanism that included His-246-mediated regulation as described in our paper (please see Introduction). Although BepA mutants with a mutation of His-246 or a deletion of H9 loop (these mutations release the His-246-mediated repression) significantly degrade -casein, a combination of these mutations with the edge-strand mutations should make the interpretation of the results complicated. We thus think that the suggested experiments cannot be conducted soon.

      Instead, we described the following points in the revised manuscript. Although we mentioned the self-cleavage activity of only the A106P mutant in the original manuscript, our results showed that the other edge-strand Pro mutants (other than F107P) exhibited significant self-cleavage activities as well (Figure 1-figure supplement 2B). In addition, the Pro mutants other than the A106P mutant degraded mis- or un-folded BamA at a detectable level (Figure 1-figure supplement 2A). Furthermore, all the Pro mutants accumulated at a level comparable to that of wild-type BepA. These observations together indicate that most of the Pro mutations specifically affected the edge-strand structure, but not drastically altered the active site or the protein's overall structures. We described the above points in the revised text (line 174 in p7 to line 181 in p8).

      1. In the result (Line 159), authors report chaperone-like activity of BepA. Here, the term "chaperone-like" is rather obscure regarding whether this activity facilitates LptD maturation without proteolysis (i.e., via holdase activity), or involves proteolysis as a part of quality control mechanisms. In another experiment, authors show that the chaperone-like activity may not necessarily involve proteolysis. It would be good to describe a possible molecular principle of how the edge-stand binding to the substrate can lead to chaperone activity.

      We suppose that the interaction of BepA (via the edge-strand) with an assembly intermediate of LptD on the BAM complex stabilizes the partially unfolded assembly intermediate of LptD on the BAM complex to help the association of LptE with LptD. This was explained in Discussion (lines 388–392 in p16) and the legend to Figure 5.

      Reviewer #2 (Public Review):

      The authors found that a conserved β-strand (edge-strand β2 of BepA) directly contacts with the N-terminal half of the β-barrel-forming domain of an immature LptD; the C-terminal region of the β-barrel-forming domain of the BepA-bound LptD intermediate interacts with a "seam" strand of BamA in the BAM complex. By combining crosslinking and mutational studies, they showed that the edge-strand of BepA may have both the proteolytic and the chaperone-like functions. Based on the authors' previous studies of BepA, they proposed a model that the edge-strand and His switch of BepA regulate BepA in LptD assembly and degradation.

      Thank you for the nice summarization of our study.

      Reviewer #3 (Public Review):

      [...] By performing an impressive systematic cross linking analysis, combined with previous known findings, the authors are able to dissect the general architecture of how BepA interacts with beta-barrel substrates as they are being assembled by the Bam complex. The experiments presented are nicely executed and the data are of high quality. I am convinced that the edge strand of BepA interacts with LptD, likely as it is assembling on the Bam complex. It is also clear that this interaction is functionally important because mutations in this region that disrupt the BepA-LptD interaction interfere with LptD maturation and degradation. This suggests that the substrate binding to the protease domain of BepA is important for both its chaperone and proteolytic activity. The work is well executed and will be of value to others interested in the regulation of membrane protein folding and, more broadly, in the biogenesis of the bacterial cell envelope.

      Thank you for the nice summarization and the positive evaluation of our study.

      While the authors conclusively establish a link between this region of BepA and its function, the data do not explain the underlying mechanism of how BepA discriminates between substrates targeted for integration into the membrane and those targeted for destruction. The model proposed at the end incorporates the presence of the edge strand of BepA, but its role in the process remains vague. As mentioned in the discussion, the mechanisms that control the switch from chaperone to protease function in BepA is likely governed by the loops that gate access to the catalytic residues proximal to the edge strand. It is possible that the edge strand may just be reporting on substrate binding to the protease domain active site. While this may be important for substrate recognition, it does not mean that the edge strand-substrate interaction plays a deterministic role in subsequent protein triage during LptD assembly.

      Our data demonstrated that the edge-strand of BepA directly binds a substrate. As pointed out by the reviewer, the involvement of the edge-strand in substrate binding has been known for other proteases. However, it was not known whether the substrate binding at the edge-strand contributes to the chaperone-like function; it was possible that the binding sites of a substrate on BepA during its proteolysis and its maturation are totally different as the chaperone-like activity of BepA is independent of its protease activity (it was conceivable, for example, that substrate binding during it maturation occurs on the surface of the C-terminal TPR domain that has been shown to interact with LptD). Our results showed that the defective binding of a substrate (LptD) at the edge-strand impairs not only its proteolysis but also its normal maturation (assembly). Because the edge-strand-bound substrate would be directly presented to the proteolytic active site for its degradation, this binding step should be important for the determination of the fates of the bound substrate. Our results strongly suggest that the substrate binding by the edge-strand is a crucial common step required for the subsequent protein triage during the LptD assembly.

  2. Jul 2021
    1. Author Response:

      Reviewer #2:

      Weaknesses of the Methods and Results:

      1) In my view, the experiment does not allow to unambiguously disentangle self vs. other distinction (as mentioned in the abstract "..we investigated how affect sharing and self-other distinction interact.."). For example, genuine vs. pretended pain could be distinguished from the participants own experience in a comparable way. The higher rating of unpleasantness for genuine pain in others does not necessary mean that the participants cannot separate own from others experiences.

      We thank the reviewer for raising this issue and for prompting us to further clarify and better state our research purpose. In terms of its original theoretical foundation and motivation, the current study aimed to investigate whether and how neural signatures underlying two essential components of empathy, namely affect sharing and self-other distinction, track individual responses to genuine vs. pretended pain. We agree though that our experimental design does not allow to disentangle unequivocally the precise aspects of self- and other-related processing in the two main conditions of interest (genuine pain or pretended pain). We thus modified any wording suggesting otherwise, so as to avoid further misunderstanding by readers.

      Accordingly, we have provided a more elaborate theoretical clarification in the Abstract and Introduction about our particular interest in studying self-other distinction and its neural correlates in the right supramarginal gyrus (rSMG) during empathy. We also mention as a potential limitation that our design did not aim to explicitly quantify self-other distinction.

      Action taken: In the manuscript, we have made the following changes:

      1) We modified the sentence "[...] we investigated how affect sharing and self-other distinction interact [...]" to

      “[...] we investigated how the brain network involved in affect sharing and self-other distinction underpinned [...] ” in the Abstract (P. 1).

      Besides, we modified another sentence “[...] to investigate the hypothesized distinct interactions between affective response and self-other distinction [...]” to

      “[...] to investigate the hypothesized brain patterns of affective responses and self-other distinction [...]” in the Introduction (P. 4).

      2) We added sentences in the Discussion (P. 13): “An additional limitation was that our study design did not aim to explicitly quantify self-other distinction. Rather, in line with previous research and based on our theoretical framework and rationale, we inferred the engagement of this process from the experimental conditions and the associated behavioral and neural responses. We expect our findings to prompt and inform future research designed to quantify and experimentally disentangle self- and other-related processes more explicitly.”

      2) The experimental design does not unambiguously allows to disentangle genuine vs pretended pain from other factors, such as the differences in pain expression, painful feeling in others and higher unpleasantness in these two conditions. I understand that the intensity pain expression, painful feeling in others and unpleasantness for others is inherently tied to genuine vs. pretended pain. But the author already saw that the instruction of "genuine vs. pretented" influenced the ratings of pain expression. Hence, this allows two interpretation of the results: either the influence from the anterior Insula on the rSMG is driven by higher perceived pain expression, painful feeling in others and unpleasantness or by the conditions of genuine vs. pretended pain. Or (more likely) by an interaction between these factors. It would, for example help to explore the association between the aIns-rSMG interaction pain expression ratings (or painful feeling in others or higher unpleasantness) in videos with genuine pain und pretended pain separately. The author should further discuss this point that different factors (pain expression, etc) contribute to the differences between genuine vs. pretended pain.

      We thank the reviewer for the thoughtful consideration of different factors that might contribute to disentangle genuine pain vs. pretend pain. One thing we would like to address beforehand: to disentangle the specific contributors underlying the manipulation is not the main focus for the current study, as 1) our primary aim was to study the effects of the experimental manipulation as a whole; we thus used the three behavioral ratings mainly to collect additional information on and to interpret the expected effects of the manipulation, and 2) these factors (and their behavioral measures) are inherently (cor)related and hard to be disentangled precisely anyways, as mentioned by the reviewer and as shown by extensive previous research both by our and other groups.

      Action taken: Nonetheless, in the revised manuscript, we have now:

      1) discussed how different factors possibly interact and in this way contribute to the differences (in the modulatory effect) between genuine vs. pretended pain, in the Discussion (P. 11):

      “We speculate that a dynamic interaction between sensory-driven and control processes is underlying the modulatory effect: when individuals realized after an initial sensory-driven response to the facial expression that it was not genuinely expressing pain, control and appraisal processes led to a reappraisal of the triggered emotional response, and thus a dampening of the unpleasantness.”

      2) performed additional linear regression models and model comparison (see details in the response to comment #3) to investigate whether an interaction between behavioral measures could be a potential contributor to the modulatory effect of genuine pain and pretended pain; in short, the model without interactions is the winning model both for genuine pain and pretended pain.

      We have now discussed this result (P. 11):

      “Model comparison showed that the best model to explain the inhibitory effect with the behavioral ratings for both the genuine and pretended pain is the model without interactions between ratings. That is, if any behavioral rating contributed to the modulation of aIns to rSMG, the effect would be more likely coming from single ratings rather than their interactions. Specifically, we found [...]”

      We thank the reviewer for this suggestion for further analysis.

      We performed additional linear regression models (with and without interaction) and model comparison to explore whether any interaction between behavioral ratings heavily contributed to the modulatory effect. Results showed that the model without interaction was the most efficient model for both conditions.

      We report the additional analyses as follows:

      In the Methods section (P. 24-25): “Considering that interactions between behavioral ratings might contribute to the regression model, we tested five regression models (with and without interaction; see Supplementary Table 1) for both genuine pain and pretended pain. Results showed that for both genuine pain and pretend pain, the model without any interaction outperformed other models.”

      Supplementary Table 1. Model comparison of linear regression models with three behavioral ratings (independent variables) and the inhibitory effect (dependent variable) for genuine pain and pretended pain. Smaller AIC/BIC indicates better model fit. Results showed that M1 (without interaction; highlighted with underlining) was the best fitting model for both genuine pain and pretended pain.

      Accordingly, we now report the results of the winning model of the multiple regression analyses, instead of the original stepwise regression. These analyses found that only the rating of painful feelings in others was significant for genuine pain, while no significant effects whatsoever were found for pretended pain.

      Action taken:

      In the manuscript, we have made the following changes:

      1) We modified “stepwise linear regression” to “multiple linear regression” in the Methods, Results, and Figure 3 legend (P. 24, P. 7, and P. 37)

      2) We added the sentence “The results of the winning multiple regression model are reported in the Results section.” in the Methods (P. 25).

      3) We added the results of the multiple regression analyses for genuine pain and pretended pain, in the Results section (P. 7-8): “For the genuine pain condition, we find that the modulatory effect was significantly related to the rating of painful feelings in others (t = 2.317, p = 0.026) but not related to the rating of either painful expressions in others (t = -1.492, p = 0.144) or unpleasantness in self (t = 0.058, p = 0.954). For the pretended pain condition, none of the ratings was significantly related to the modulatory effect (Figure 3D).”

      4) We moved the results of the original stepwise regression analyses with behavioral ratings into the supplementary data (see Supplementary File 2):

      “Results of the stepwise regression analyses on modulatory effects and behavioral ratings are shown below. Note that this analysis reflects our original analysis approach; prompted by a reviewer comment, we however changed the analysis plan and performed and reported the findings of multiple regression analyses in the main text. Importantly, the conclusions of the two analysis approaches are consistent.

      To examine how the modulatory effects from the DCM were related to the behavioral ratings, we computed two stepwise linear regression models for each condition. The regression model was significant for the genuine pain condition (F model (1, 41) = 4.639, p = 0.037, R2 = 0.104), when painful feelings in others were added to the model and the other two ratings were excluded (B = 0.079, beta = 0.322, p = 0.037). However, the model was not significant for the pretended pain condition. The variance inflation factors (VIFs) for three ratings in both models were calculated to diagnose collinearity, showing no severe collinearity problem (all VIFs < 5; the smallest VIF =1.132 and the largest VIF = 4.387).”

      3) The multiple regression analyses revealed an association between the unpleasantness for the participants and the aIns, when accounting for the painful expression and the pain experienced by the other. This, however, does not reveal the specificity of the aIns for encoding the unpleasantness for the participants. It might well be that variance is shared in the association between the aIns and pain expression and pain by the other and unpleasantness for the participants, but simply strongest for unpleasantness. Such ambiguity could be resolved by additional multiple regressions of 1) pain expression (controlling for pain by the other and unpleasantness for the participants) and 2) pain for the other (controlling for pain expression and unpleasantness for the participants).

      We thank the reviewer for this comment. As an overall premise, please note that we would not want to claim that the aIns is specifically engaged in encoding affective activities without any engagement of other processes; instead, we are entirely aware that the aIns activation participates in a variety of affective and cognitive processes. Nonetheless, our original multiple regression models were performed as a second-level group analysis with all three ratings as independent variables. Results showed that only the rating of “unpleasantness in self” was significant, rather than all ratings that were universally influenced by domain-general factors.

      As the reviewer suggested, we additionally performed five multiple regression analyses with all possible orders of three behavioral measures to test whether the order matters. In the end, we found consistent results across all six regression analyses, suggesting that the selective correlation of aIns and the rating of unpleasantness in self was robust.

      Action taken: In the manuscript, we have:

      1) Modified “specifically” to “selectively” in the Results (P. 6).

      2) Added the content in the Methods (P. 22) “To test whether the order of entering ratings into the regression model influence the results, we performed five additional regression analyses with all possible orders of three ratings. The results were consistent across all six regression models, and we only showed the result for one regression (i.e., expression + feeling + unpleasantness) in the Results section.

      3) Modified the sentence in the Results (P. 6) “We found significant clusters in bilateral aIns, visual cortex, and cerebellum (Figure 2B); notably, when statistically accounting for ratings of painful expressions in others and painful feelings in others, all three clusters were exclusively explained by the ratings of self-unpleasantness.” to

      “We found significant clusters in bilateral aIns, visual cortex, and cerebellum that could be selectively explained by the ratings of self-unpleasantness and could not be explained by either the ratings of painful expressions in others or painful feelings in others (Figure 2B).”

      4) Modified the sentence in the Discussion (P. 9) “[...] but the increased activation in aIns was also selectively correlated with ratings of self-oriented unpleasantness (i.e., after statistically accounting for painful expressions and painful feelings in others) [...]” to

      “[...] but the increased activation in aIns was also selectively correlated with ratings of self- oriented unpleasantness and was not correlated with neither other-related painful expressions nor painful feelings in terms of the regression analysis [...]”

      and added the sentence “[...] (otherwise the increased aIns activation should also be explained by other behavioral ratings in the sense of shared influence by domain-general effects).”

      5) Modified the legend for Figure 3 (P. 37) “[...] revealed a positive correlation between the inhibitory effect and painful feelings in others (after accounting for the other two ratings) for genuine pain [...]” to

      “[...] revealed a positive correlation between the inhibitory effect and painful feelings in others and not with other two ratings for genuine pain [...]”

      4) Is the regression biased by the differences between conditions in the aIns in both fMRI signals and the ratings?

      We thank the reviewer for this comment. The reason that we compared the differences between conditions was mainly aimed to control for potential effects of perceptual salience. This aim was consistent for both fMRI signals and behavioral ratings. Note that, as the aIns activation and all behavioral ratings were higher for genuine pain as opposed to pretended pain, the current result could not be explained by an inverse effect (i.e., higher aIns activation and higher ratings of unpleasantness in self for pretended pain). Therefore, we do not consider it is problematic to use differences between conditions when performing the multiple regression analysis.

      Action taken:

      1) We have more explicitly specified “differences between conditions for” three behavioral ratings as independent variables for the multiple regression model in the Methods (P. 22).

      2) We added the sentence “The reason that we used the comparison between conditions for both brain signals and behavioral ratings was to control for potential effects of perceptual salience.” In the Methods (P. 22).

      5) The inclusion of the rSMG into the DCM model is not straight forward for me. It could have been based on previous literature, but then the aMCC should have been added as well. Furthermore, while the implication of the rSMG in distinction of self vs. others is established, the actual process in this experiment cannot be revealed. The authors state that the rSMG is involved in action observation or imitating emotions (page 9, line 200).

      We appreciate the reviewer’s comment that shows we seemed not to convey clearly why we have postulated a role of rSMG. We have now made our rationale more explicit and clear.

      Action taken:

      We have now:

      1) Modified the clarification of rSMG in the Discussion (P. 10): “The inferior parietal lobule was shown to be generally engaged in selective attention, action observation and imitating emotions (Bach et al., 2010; Pokorny et al., 2015; Gola et al., 2017; Hawco et al., 2017). Importantly, a specific role in affective rather than cognitive self-other distinction has been consistently identified for rSMG (Silani et al., 2013; Steinbeis et al., 2015; Bukowski et al., 2020). [...]”

      2) Added further clarification in the Discussion (P. 12) after the sentence “ [...] the correlation findings provide further evidence that the modulation of aIns to rSMG is implicated in encoding others’ emotional states,”

      with “which serves as a functional foundation for self-other processing [...] This regulation cannot be totally attributed to domain-general processes, otherwise other ratings should have also explained this variation.”

      Additionally, we agree re: aMCC, which we also predicted to play a role; but it was not the case at least in our data. In fact, we have already addressed this in the original version of the ms. (maintained on P. 7 of revised ms.): “Our original analysis plan was to include aMCC in the DCM analyses, but based on the fact that aMCC did not show as strong evidence (in terms of the multiple regression analysis) as the aIns of being involved in our task, we decided to use a more parsimonious DCM model without the aMCC.”

      Whether Results support their conclusions:

      The results support the distinction between the experimental conditions of genuine vs. pretended pain in the aIns and as a modulatory influence on the connectivity between the aIns and the rSMG. However, the authors aimed to test if genuine vs. pretended pain modulate regulatory influences from the aIns on the rSMG that are connected to self-other distinction (as proposed in the discussion page 8, line 170). Yet, any insights about self-other distinction are only inferred reversely, since there is no outcome that indicates how well participants distinguished between themselves and the other person. For example in the discussion the authors state that: " we thus propose that the higher rSMG engagement in genuine pain conditions reflects an increasing demand for self-other distinction imposed by the stronger shared negative affect experiences in this condition". This is not supported by the results. Furthermore, the title mentioned automated responses to pretended pain, which I could not understand, given the current results.

      We thank the reviewer for this comment, which somewhat follows up on similar arguments made and replied to in comment #1 above. Indeed, we fully agree that our design did not allow us to quantify self-other distinction, but that we inferred its engagement based on a strong theoretical motivation and the replication of previous findings on rSMG involvement during self-other distinction. As outlined above (cf. #1), this limitation was added to the revised manuscript.

      We also adjusted the way of reasoning for which we put the theoretical explanation ahead of the inference so that readers can better realize this statement is supported by stronger theoretical motivation in the Discussion (P. 10):

      First “Theoretical models of empathy [...]” and then “Concerning the current finding, we thus propose that [...]”

      We thank the reviewer for pointing out the potential ambiguity in the title. We agree it may be somewhat “imprecise”, and have revised the title accordingly (P. 1):

      “Neural dynamics between anterior insular cortex and right supramarginal gyrus dissociate genuine affect sharing from perceptual saliency of pretended pain”.

      Likely impact of the work on the field:

      These results are expected to advance the field, since they allow to disentangle visual expressions of pain from genuine pain in others. Thereby, this work could resolves the question about neural processes that are specific to pain in others beyond other salient cues.

      We thank the reviewer for this positive acclaim of our study.

    1. Author Response:

      Reviewer #1:

      Insulin-secreting beta-cells are electrically excitable, and action potential firing in these cells leads to an increase in the cytoplasmic calcium concentration that in turn stimulates insulin release. Beta-cells are electrically coupled to their neighbours and electrical activity and calcium waves are synchronised across the pancreatic islets. How these oscillations are initiated are not known. In this study, the authors identify a subset of 'first responders' beta-cells that are the first to respond to glucose and that initiate a propagating Ca2+ wave across the islet. These cells may be particularly responsive because of their intrinsic electrophysiological properties. Somewhat unexpectedly, the electrical coupling of first responder cells appears weaker than that in the other islet cells but this paradox is well explained by the authors. Finally, the authors provide evidence of a hierarchy of beta-cells within the islets and that if the first responder cells are destroyed, other islet cells are ready to take over.

      The strengths of the paper are the advanced calcium imaging, the photoablation experiments and the longitudinal measurements (up to 48h).

      Whilst I find the evidence for the existence of first responders and hierarchy convincing, the link between the first responders in isolated individual islets and first phase insulin secretion seen in vivo (which becomes impaired in type-2 diabetes) seems somewhat overstated. It is is difficult to see how first responders in an islet can synchronise secretion from 1000s (rodents) to millions of islets (man) and it might be wise to down-tone this particular aspect.

      We thank the reviewer for highlighting this point. We acknowledge that we did not measure insulin from individual islets post first responder cell ablation, where we observed diminished first phase Ca2+. We do note that studies have linked the first phase Ca2+ response to first phase insulin release [Henquin et al, Diabetes (2006) and Head et al, Diabetes (2012)], albeit with additional amplification signals for higher glucose elevations. Thus a diminished first phase Ca2+ would imply a diminished first phase insulin (although given the amplifying signals the converse would not necessarily be the case).

      Nevertheless there are also important caveats to our experiment. Within islets we ablated a single first responder cell. In small islets this ablation diminished Ca2+ in the plane that we imaged. In larger islets this ablation did not, pointing to the presence of multiple first responder cells. Furthermore we only observed the plane of the islet containing the ablated first responder. It is possible elsewhere in the islet that [Ca2+] was not significantly disrupted. Thus even within a small islet it is possible for redundancy, where multiple first responder cells are present and that together drive first phase [Ca2+] across the islet. Loss of a single first responder cell only disrupts Ca2+ locally. That we see a relationship between the timing of the [Ca2+] response and distance from the first responder would support this notion. Results from the islet model also support this notion, where >10% of cells were required to be ablate to significantly disrupt first-phase Ca2+.

      While we already discuss the issue of redundancy in large islets and in 3D, we now briefly mention the importance of measuring insulin release.

      Reviewer #2:

      Kravets et al. further explored the functional heterogeneity in insulin-secreting beta cells in isolated mouse islets. They used slow cytosolic calcium [Ca2+] oscillations with a cycle period of 2 to several minutes in both phases of glucose-dependent beta cell activity that got triggered by a switch from unphysiologically low (2 mM) to unphysiologically high (11 mM) glucose concentration. Based on the presented evidence, they described a distinct population of beta cells responsible for driving the first phase [Ca2+] elevation and characterised it to be different from some other previously described functional subpopulations.

      Strengths:

      The study uses advanced experimental approaches to address a specific role a subpopulation of beta cells plays during the first phase of an islet response to 11 mM glucose or strong secretagogues like glibenclamide. It finds elements of a broadscale complex network on the events of the slow time scale [Ca2+] oscillations. For this, they appropriately discuss the presence of most connected cells (network hubs) also in slower [Ca2+] oscillations.

      Weakness:

      The critical weakness of the paper is the evaluation of linear regressions that should support the impact of relative proximity (Fig. 1E), of the response consistency (Fig. 2C), and of increased excitability of the first responder cells (Fig. 3B). None of the datasets provided in the submission satisfies the criterion of normality of the distribution of regression residuals. In addition, the interpretation that the majority of first responder cells retain their early response time could as well be interpreted that the majority does not.

      We thank the reviewers for their input, as it really opened multiple opportunities for us to improve our analysis and strengthen our arguments of the existence and consistency of the first responder cells. We present more detailed analysis for these respective figures below and describe how these are included in the manuscript.

      As it is described below, we performed additional in-depth analysis and statistical evaluation of the data presented in figures 1E, 2C, and 3B. We now report that two of the datasets (Fig.1 E, Fig.2 C) satisfy the criterion of normality of the distribution of regression residuals. The third dataset (Fig.3 B) does not satisfy this criterion, and we update our interpretation of this data in the text.

      Figure 1E Statistics, Scatter: We now show the slope and p-value indicating deviation of the slope from 0, and r^2 values in Fig.1 E. While the scatter is large (r^2=0.1549 in Fig.1E) for cells located at all distances from the first responder cell, we found that scatter substantially diminishes when we consider cells located closer to the first responder (r^2=0.3219 in Fig.S1 F): the response time for cells at distances up to 60 μm from the first responder cells now is shown in Fig.S1 F. The choice of 60 μm comes from it being the maximum first-to-last responder distance in our data set (see red box in Fig.1D).

      Additionally, we noticed that within larger islets there may be multiple domains with their own first responder in the center (now in Fig.S1 E) and below. Linear distance/time dependence is preserved withing each domain.

      Figure 1E Normality of residuals: We appreciate reviewer’s suggestion and now see that the original “distance vs time” dependence in Fig.1 E did not meet normality of residuals test. When plotted as distance (μm)/response time (percentile), the cumulative distribution still did not meet the Shapiro-Wilk test for normality of residuals (see QQ plot “All distances” below). However, for cells located in the 60 μm proximity of the first responder, the residuals pass the Shapiro- Wilk normality test. The QQ-plots for “up to 60 μm distances” are included in Fig.S1 G.

      Figure 2C Statistic and Scatter: After consulting a biostatistician (Dr. Laura Pyle), we realized that since the Response time during initial vs repeated glucose elevation was measured in the same islet, these were repeated measurements on the same statistical units (i.e. a longitudinal study). Therefore, it required a mixed model analysis, as opposed to simple linear regression which we used initially. We now have applied linear mixed effects model (LMEM) to LN- transformed (original data + 0.0001). The 0.0001 value was added to avoid issues of LN(0).

      We now show LMEM-derived slope and p-value indicating deviation of the slope from 0 in Fig.2 C. Further, we performed sorting of the data presented in Fig.2 C by distance to each of the first responders (now added to Fig.2D). An example of the sorted vs non-sorted time of response in the large islet with multiple first responders is added to the Source Data – Figure 1. We found a substantial improvement of the scatter in the distance- sorted data, compared to the non-sorted, which indicates that consistency of the glucose response of a cell correlates with it’s proximity to the first responder. We also discuss this in the first sub-section of the Discussion.

      Figure 2C Normality of residuals: The residuals pass Shapiro-Wilk normality test for LMEM of the LN-transformed data. We added very small number (0.0001) to all 0 values in our data set, presented in Fig.2C, D, and Fig.S4 A, to perform natural-log transformation. Details on the LMEM and it’s output are added to the Source data – Statistical analysis file.

      Figure 3B Statistic and Scatter: We now show LMEM-derived slope and p-value, indicating deviation of the slope from 0, values in Fig.3 B (below). The LMEM-derived slope has p-value of 0.1925, indicating that the slope is not significantly different from 0. This result changes our original interpretation, and we now edit the associated results and discussion.

      Figure 3B Normality of residuals: This data set does not pass Shapiro-Wilk test.

      A major issue of the work is also that it is unnecessarily complicated. In the Results section, the authors introduce a number of beta cell subpopulations: first responder cell, last responder cell, wave origin cell, wave end cell, hub-like phase 1, hub-like phase 2, and random cells, which are all defined in exclusively relative terms, regarding the time within which the cells responded, phase lags of their oscillations, or mutual distances within the islet. These cell types also partially overlap.

      To address this comment, we added Table 1 to describe the properties of these different populations.

      Their choice to use the diameter percentile as a metrics for distances between the cells is not well substantiated since they do not demonstrate in what way would the islet size variability influence the conclusion. All presented islets are of rather a comparable size within the diffusion limits.

      We replaced normalized distances in Fig.1 D with absolute distance from first responder in μm.

      The functional hierarchy of cells defining the first response should be reflected in the consistency of their relative response time. The authors claim that the spatial organisation is consistent over a time of up to 24 hours. In the first place, it is not clear why would this prolonged consistency be of an advantage in comparison to the absence of such consistency. The linear regression analysis between the initial and repeated relative activation times does suggest a significant correlation, but the distribution of regression residuals of the provided data is again not normal and non-conclusive, despite the low p-value. 50% of the cells defined a first responder in the initial stimulation were part of that subpopulation also during the second stimulation, which is rather random.

      We began to describe our analysis of the response time to initial and repeated glucose stimulation earlier in this reply. Further evidence of the distance-dependence of the consistency of the response time is now presented in Fig.S4 A: a response time consistency for cells at 60 μm, 50μm, and 40 μm proximity to the first responder. The closer a cell is located to the first responder, the higher is the consistency of its response time (the lower the scatter), below.

      If we analyze this data with a linear regression model, where the r^2 allows us to quantitatively demonstrate decrease of the scatter, we observe r^2 of 0.3013, 0.3228, 0.3674 respectively for cells at 60 μm, 50μm, and 40 μm proximity to the first responder (below). This data is not included in the manuscript because residuals do not pass Shapiro-Wilk Normality test for this model (while they do for the LMEM).

      One of the most surprising features of this study is the total lack of fast [Ca2+] oscillations, which are in mouse islets, stimulated with 11 mM glucose typically several seconds long and should be easily detected with the measurement speed used.

      Our data used in this manuscript contains Ca2+ dynamics from islets with a) slow oscillations only, b) fast oscillations superimposed on the slow oscillations, c) no obvious oscillations (likely continual spiking). Representative curves are below. Because we focused our study on the slow oscillations, we used dynamics of type (a) in our figures, which formed an impression that no fast oscillations were present. In our analysis of dynamics of type (b) we used Fourier transformation to separate slow oscillations from the fast (described in Methods). Dynamics of type (c) were excluded from the analysis of the oscillatory phase, and instead only used for the first-phase analysis. We indicate this exclusion in the methods.

      And lastly, we should also not perpetuate imprecise information about the disease if we know better. The first sentence of the Introduction section, stating that "Diabetes is a disease characterised by high blood glucose, …" is not precise. Diabetes only describes polyuria. Regarding the role of high glucose, a quote from a textbook by K. Frayn, R Evans: Human metabolism - a regulatory perspective, 4rd. 2019 „The changes in glucose metabolism are usually regarded as the "hallmark" of diabetes mellitus, and treatment is always monitored by the level of glucose in the blood. However, it has been said that if it were as easy to measure fatty acids in the blood as it is to measure glucose, we would think of diabetes mellitus mainly as a disorder of fat metabolism."

      We acknowledge that Diabetes alone refers to polyurea, and instead state Diabetes Mellitus to be more precise to the disease we refer to. We stated “Diabetes is a disease characterized by high blood glucose, ... “ as this is in line with internationally accepted diagnoses and classification criteria, such as position statements from the American Diabetes Association [‘Diagnosis and Classification of Diabetes Mellitus” AMERICAN DIABETES ASSOCIATION, DIABETES CARE, 36, (2013)]. We certainly acknowledge the glucose-centric approach to characterizing and diagnosing Diabetes Mellitus is largely born of the ease of which glucose can be measured. Thus if blood lipids could be easily measured we may be characterizing diabetes as a disease of hyperlipidemia (depending how lipidemia links with complications of diabetes).

    1. Author Response:

      Reviewer #1 (Public Review):

      In this report, the authors describe chromatin accessibility and RNA-seq data in B cells from three mouse models of neurodevelopmental disorders (Kabuki syndromes 1 and 2 and Rubinstein-Taybi syndrome type 1) caused by mutations in related epigenetic regulatory genes. They used ATAC-seq to profile chromatin accessibility and a novel bioinformatics approach to overlay the peaks across different mouse models.

      The novelty of our approach is not the way we overlay the peaks. It is the way in which, after we have performed differential analyses following standard practices, we detect differential features (genes/loci) shared across the disorders using conditional p-value distributions

    1. Author Response:

      Reviewer #1 (Public Review):

      Sokolsky et al. propose a new statistical model class for descriptive modeling of stimulus encoding in the spiking activity of neural populations. The main goals are to provide a model family that (G1) captures key activity statistics, such as spike count (noise) correlations, and their stimulus dependence, in potentially large neural populations, (G2) is relatively easy to fit, and (G3) when used as a forward encoder model for Bayesian decoders leads to efficient and accurate decoding. There are also three additional goals or claims: (C1) that this descriptive model family can serve to quantitatively test computational theories of probabilistic population coding against data, (C2) that the model can offer interpretable representations of information-limiting noise correlations, (C3) that the model can be extended to the case of temporal coding with dynamic stimuli and history dependence.

      The starting point of their model is a finite mixture of independent Poisson distributions, which is then generalized and extended in two ways. Due to the "mixture", the model can account for correlations between neurons (see G1). As any mixture model, the model can be viewed in the language of latent variables, which (in this case) are discrete categorical variables corresponding to different mixture components. The two extensions of the model are based on realizing that the joint distribution (of the observed spike counts and the latent variables) is in the exponential family (EF), which opens the door to powerful classical results to be applied (e.g. towards G2-G3), and allows for the two extensions by: (E1) generalizing Poisson distributions in mixture components to Conway-Maxwell-Poisson distributions, and (E2) introducing stimulus dependence by allowing the natural parameters of the EF to depend on stimulus conditions. They call the resulting model a Conditional Poisson Mixture or CPM (although the "Poisson" in CPM really means Conway-Maxwell-Poisson). E1 is key for capturing under-dispersion, i.e. Fano Factors below 1. For the case of discrete set of stimulus conditions, they propose minimal, maximal versions of E2; depending on which natural parameters are stimulus dependent. In the case of a continuum of stimuli (they only consider 1D continuum of stimulus orientations, e.g. in V1 encoding) they also consider a model-based parametric version of the minimal E2 which gives rise to Von Mises orientation tuning curves.

      Strengths:

      -Proposing a new descriptive encoding model of spike responses that can account for sub-poissonian and correlated noise structure, and yet can be tractably fit and accurately decoded.

      -Their experiments with simulated and real (macaque V1) data presented in Figs. 2-5 and Tables 1-2 provide good evidence that the model supports G1-3.

      -Working out a concrete Expectation Maximization algorithm that allows efficient fits of the model to data.

      -Exploiting the EP framework to provide a closed form expression for the model's Fisher Information for the minimal model class, a measure that plays a key role in theoretical studies of probabilistic population coding.

      As such, the papers makes a valuable contribution to the arsenal of descriptive models used to describe stimulus encoding in neural population, including the structure and stimulus dependence of their higher-order statistics.

      Thank you very much for your thorough, exact, and positive evaluation of our manuscript!

      Weaknesses:

      1) I found the title and abstract too vague, and not informative enough as to the concrete contributions of this paper. These parts should more concretely and clearly describe the proposed/developed model family and the particular contributions listed above.

      We found your summary of the paper and model to be highly accurate, and we rewrote the abstract to summarize the key strengths as you’ve listed them. We found it difficult to develop a more exact title which wasn’t overlong, so we left it as is.

      2) I was not convinced about claims C1 and C2 (which also contribute to the vagueness of abstract), but I think even without establishing these claims the more solid contributions of the paper are valuable. And while I can see how the model can be extended towards C3, there are no results pertaining to this in the current paper, nor even a concrete discussion of how the model may be extended in this direction.

      2.1) Regarding C1, the claim is supposed to follow from the fact that the model's joint distribution is in the exponential family (EF), and that they have reasonably shown G1-G3 (in particular, that it captures noise correlations and its Bayesian inversion provides an accurate decoder). While I agree with the latter part, what puzzles me is that in the probabilistic population coding (PPC) theoretical models that claim can be quantitatively tested using their descriptive model are, as far as I remember/understand, the encoder itself is in EF. By contrast here the encoder is a mixture of EF's and as such is not itself in EF. Perhaps this distinction is not key to the claim - but if so, this has to be clearly explained, and more generally the exact connection between the descriptive encoder model here and the models used in the PPC literature should be better elaborated.

      This claim was indeed poorly explained in our manuscript, and not self-evident. There is a deeper connection between our conditional models and PPCs, which we now make explicit in a new section of the manuscript (Constrained conditional mixtures support linear probabilistic population coding, line 364), which includes an equation (Equation 4) that shows their exact relationship.

      2.2) Regarding C2, I do not see how their results in Fig 5 (and corresponding section) provide any evidence for this claim. As a theoretical neuroscientist, I take "interpretable" to mean with a mechanistic or computational (theoretical) interpretation. But, if anything, I think the example studied in Fig 5 provides a great example of the general point: that even when successful descriptive models accurately capture the statistics of data, they may nevertheless not reveal (or even hide or mis-identify) the mechanisms underlying the data. In this example's ground-truth model, the stimulus (orientation) is first corrupted by input noise and then an independent population of neurons with homogeneous tuning curves (and orientation-independent average population rate) responds to this corrupted version of the stimulus. That is a very simple AND mechanistic interpretation (which of course is not manifest to someonw only observing the raw stimulus and spiking data). The fit CPM, on the other hand, does not reveal the continuous input noise mechanism (and homogeneous population response) directly, but instead captures the resulting noise correlation structure by inferring a large (~20) number of mixture components, in each of which population response prefers a certain orientation. For a given stimulus orientation, the fluctuations between (3-4 relevant) mixture components then approximate the effect of input noise. This captures the generated data well, but misses the true mechanism and its simpler interpretation. Let me be clear that I don't take this as a fault of their descriptive model. This is a general phenomenon, despite which their descriptive model, like any expressive and tractible descriptive model, still can be a powerful tool for neural data analysis. I'm just not convinced about the claim.

      This is a very fair point, and we’ve reformulated a few passages to emphasize that the model is primarily descriptive, at least in our applications in the paper (see new section title at like 393, the first corresponding paragraph).

      2.3) Regarding C3, I think the authors can at least add a discussion of how the model can be extended in this direction (and as I'm sure they are aware, this can be done by generalizing the Von Mises version of the model, whereby the model I believe can be more generally thought of as a finite mixture of GLMs).

      In Appendix 4 we detail the relationship between CPMs and GLMs. We also note here that, at least as far as we understand, CPMs are formally distinct from finite mixtures of GLMs — the easiest way to see this distinction is to note that the index probabilities of a CPM depend on the stimulus, whereas the equivalent index probabilities in a finite mixture of GLMs would not. We have also explained this in Appendix 4.

      Reviewer #2 (Public Review):

      Sokoloski, Aschner, and Coen-Cagli present a modeling approach for the joint activity of groups of neurons using a family of exponential models. The Conway-Maxwell (CoM) Poisson models extend the "standard" Poisson models, by incorporating dependencies between neurons.

      They show the CoM models and their ability to capture mixture of Poisson distributions. Applied to V1 data from awake and anesthetized monkeys, they show it captures the Fano Factor values better than simple Poisson models, compare spike count variability and co-variability. Log-likelihood ratios in Table 1 show on-par or better performance of different variant of the CoM models, and the optimal number of parameters to use for maximizing the likelihood [balancing accuracy and overfitting] and are useful for decoding. Finally, they show how the latent variables of the model can help interpret the structure of population codes using simple simulated Poisson models over 200 neurons.

      In summary, this new family of models offer a more accurate approach to the modeling and study of large populations, and so reflects the limited value of simple Poisson based models. Under some conditions it gives has higher likelihood than Poisson models and uses fewer parameters than ANN model.

      However, the approach, presentation, and conclusions fall short on several issues that prevents a clear evaluation of the accuracy or benefits of this family of models. Key of them is the missing comparison to other statistical models.

      1) Critically, the model is not evaluated against other commonly used models of the joint spiking patterns of large populations of neurons. For example: GLMs (e.g. Pillow et al Nature 2008), latent Gaussian models (e.g. Macke et al Neural Comp 2009), Restricted Boltzmann Machines (e.g. Gardella et al PNAS 2018), Ising models for large groups of neurons (e.g. Tkacik etal PNAS 2015, Meshulam et al Neuron 2017), and extensions to higher order terms (Tkacik et al J Stat Mech 2013), coarse grained versions (Meshulam et al Phys Rev Lett 2019), or Random Projections models (Maoz et al biorxiv 2018).

      . Most of these models have been used to model comparable or even larger populations than the ones studied here, often with very high accuracy, measured by different statistics of the populations and detailed spiking patterns (see more below). Much of the benefit or usefulness of the new family of models hinges on its performance compared to these other models.

      We agree very much with this point, and have done our best to address it by thoroughly comparing our model with a factor analysis encoding model in Appendices 1 and 2, and summarizing these results at appropriate points in the manuscript (lines 196–199 and 325–328). In particular, we visualized and compared the performance of factor analysis with our mixture models, and found that (i) factor analysis is better at capturing the first and second order statistics of the data, but (ii) when evaluated on held-out data, the performance gap more-or-less vanishes. Moreover, we found that an encoding model based on FA performs poorly as a Bayesian decoder, and we provided preliminary evidence that this is because our mixture models can capture higher-order statistics that FA cannot. We believe that these results have been very valuable to conveying the strengths and weaknesses of the mixture model approach.

      We have also extended the introduction to explain the differences between other model families suggested by the reviewer and our approach, to explain how the different assumptions about the form of data make it difficult to compare them quantitatively (see lines 42–63). To wit, GLMs and latent Gaussian models are both models that critically depend on modelling spike trains, and not spike counts. On the other hand, Restricted Boltzmann machines, Ising models, and random projection models all assume binary, rather than counting spiking data. As such, any comparison would depend on coming up with methods for either (i) reshaping our datasets and comparing spike- train/binary spike-count likelihoods to trial-to-trial likelihoods, or (ii) extending our conditional mixture approach to temporal/binary data, both of which are beyond the scope of our paper. We instead used factor analysis because it has been applied widely to modelling trial-to-trial spike counts, and thus avoid further transformations that might reduce the validity of our comparisons.

      2) As some of these models are exponential models, their relations to the family of the models suggested by the authors is relevant also in terms of the learned latent variables. Moreover, the number of parameters that are needed for these different models should be compared to the CoM and its variants.

      In our comparisons with factor analysis we also compared number of latent states/dimensions required to achieve maximum performance. Overall FA was consistently the most efficient, at least when evaluated on the ability to capture second-order statistics, although our mixture models also performed quite well with modest numbers of parameters.

      3) The analysis focuses on simple statistics of neural activity, like Fano Factors (Fig. 2) and visual comparisons rather than clear quantitative ones. More direct assessments of performance in terms of other spiking statistics for single neurons and small groups (e.g., correlations of different orders ) and direct comparison to individual spiking patterns (which would be practical for groups of up to 20 neurons) would be valuable

      In the Appendix 2 we evaluated the ability of our mixtures to capture the empirical skewness and kurtosis of recorded neurons, and found that the CoM-based mixture performs quite well (r2 for the CoM-Based mixture was between 0.6 and 0.9). Because FA cannot capture these higher-order moments, we speculate that modelling these higher-order moments is critical for maximizing decoding performance. This adds another perspective on the strengths of our approach, and we appreciate the suggestion.

      Reviewer #3 (Public Review):

      The authors use multivariate mixtures of Poisson or Conway-Maxwell-Poisson distributions to model neural population activity. They derive an EM algorithm, a formula for Fisher information, and a Bayesian decoder for such models, and show it is competitive with other methods such as ANNs. The paper is clear and didactically written, and I learned a lot from reading it. Other than a few typos the math and analyses appear to be correct.

      Thank you for the positive evaluation!

      Nevertheless there are some ways the study could be further improved.

      Most important, code for performing these analyses needs to be publicly released. The EM algorithm is complicated, involving a gradient optimization on each iteration - it is very unlikely people will rewrite this themselves, so unless the authors release well-packaged and well-documented code, their impact will be limited.

      We very much agree, and we have done this. We provide a link to our gitlab page, where all relevant code can be downloaded, and installation instructions are provided (we indicate this in the manuscript at lines 799–803).

      Second, it would be nice to extend the model to continuous latent factors. It seems likely that one or two latent factors could do the work of many mixture components, as well as increasing the interpretability of the models.

      We certainly agree that in some cases continuous latent variables could be much more parsi- monious. However, to the best of our knowledge most of the expressions that we rely on would no longer be closed-form, and so the machinery of the model would require suitable approximations. Nevertheless, it’s an interesting possibility that we now address in the Discussion (lines 482–491).

      Third, it would be interesting to see the models applied to more diverse types of population data (for example hippocampal place field recordings).

      We certainly agree with the importance of applying our model to other datasets, and indeed the purpose of our manuscript is to offer a method that can be applied broadly, and our goal in making the code available publicly is to facilitate that. However, we have decided to maintain the focus of this manuscript on the method itself, and limit the application to one kind of data (V1), for which we also now provide more extensive analysis and quantification of the response statistics (Figure 2 C-D, Figure 3 G-H, Appendix 2), a study of the sample sizes required to fit the model (Appendix 3), and model-comparison (Appendix 1–2). Overall we feel that the paper is already quite long and dense even when limited to a single kind of data. We believe applications to multiple kinds of data would perhaps be better suited for a different study, focusing on the comparisons between them. In that regard, we are certainly open to future collaborations on large-scale recordings from various stimulus-driven brain areas.

      Fourth, how does a user choose how many mixture components to add?

      To clarify this, we’ve added a section in the methods (Strategies for choosing the CM form and latent structure), and in particular the number of mixture components.

    1. Author Response:

      Reviewer #1 (Public Review):

      This work from Park et al represents a large, ambitious study utilizing a variety of mouse models (several novel) to establish mechanisms underlying cardiac pathologies observed upon loss of imprinting at the H19/IGF2 locus. The studies indicate 1) that mice recapitulate key cardiovascular features observed in humans with BWS, 2) that developmental cardiomegaly and progressive cardiomyopathy are distinct, non-correlated phenotypes driven by disparate mechanisms (upregulation of IGF2 and reduction of H19, respectively), and that H19 associated pathologies are driven by reduced interaction with let7 microRNAs. There is considerable novelty and potential impact of the work, as it presents substantial mechanistic insight into the consequences of LOI and the development of BWS. The authors use a variety of appropriate approaches, including mouse echocardiography, tissue IHC, pressure myography, and in vitro studies of cell size regulation to support these primary conclusions. In its current form, however, some primary conclusions are insufficiently supported, and additional quantification and controls would be needed.

      Major

      1) The conclusion of transient neonatal cardiomegaly that resolves by 2 months is insufficiently supported. Increased cell surface area is useful, but can be driven by cell spreading, not necessarily hypertrophy, and no data is shown at the 2-month time point to suggest reversion of cardiomegaly.

      Data for heart weight/tibia length at 2 months is described on lines 116-117.

      2) It seems an important validation is needed for the Let7 binding site deletion model - I do not see any data confirming that the gene editing was indeed successful nor that Let7 binding to H19 was effectively disrupted.

      As described above, we performed new experiments to validate the Let-7 binding site deletion model. See Lines 329-33 and Figure 6-figure supplement 1. Briefly, we purify H19 lncRNA from wild type and from H19let7/H19let7 extracts using biotinylated oligonucleotides. We show that this method is equally efficient in purifying wild type H19 and H19let7 lncRNAs. However, let-7 miRNAs copurify only with wild type H19.

      Further, it is unclear at what age the Let7 binding site deletion mice were assayed for cardiomegaly/hypertrophy.

      Mice were analyzed at one year of age. This information is now provided in the main text as well as in Figure 6 legend.

      The HW/TL values (WT = 7.5) are different from most others reported throughout the manuscript.

      The reviewer is correct that there are some differences in heart weight/tibia length values across the various mouse models.

      Based on the differences between genetic backgrounds, it would not be appropriate to compare between different data sets. Our study was designed so that each model is compared always to wild type littermates. Essentially, our paper describes 4 independent studies: WT vs LOI, LOI+BAC vs LOI, WT vs H19DEx1/+, and WT vs H19Dlet7/H19Dlet7. In each case we see a 15-30% increase in heart size associated with loss of H19 function. We consider it a strength of our study that we consistently account for potential strain effects and that we demonstrate H19 function in 4 completely independent comparisons.

      However, we also understand the reviewer’s confusion and modified the main text to make clear that each model is being compared to its wild type littermates only.

      Finally, WT/TL ratios for WT vs LOI and WT vs H19DEx1/H19+ models were measured on independent sets of mice by two different labs. We consistently measured a 20-30% increase in mice lacking H19 lncRNA. Also, echocardiography results showed a 25% increase in average heart mass.

      3) Many observations or conclusions are not sufficiently supported by quantification. For example, there is a lack of quantification of any western blots.

      As described above, all data are now quantitated. Especially, see Figure 2-figure supplement 1, Figure 3-figure supplement 1, Figure 4-figure supplement 1, and Figure 6-figure supplement 2 for quantification of Western blots.

    1. Author Response

      Reviewer #4 (Public Review):

      Francisco et al. investigate the role of CTP and hydrolysis in the binding of ParB to parS sequence and non-specific DNA at the single-molecule level. Using optical tweezers, they show the specific binding of ParB to parS sites, and demonstrate that this process is enhanced by the presence of CTP or CTPS. They find that lower density ParB proteins are also detected in distal non-specific DNA in the presence of parS, and that ParB spreading is restricted by protein roadblocks. Furthermore, using magnetic tweezers, they show that parS-containing DNA molecules are condensed by ParB at nanomolar protein concentration, which requires CTP binding but not hydrolysis. These finding show the significance of CTP-dependent ParB spreading and impact the understanding of the mechanism of DNA bridging and condensation by ParB networks.

      Based on these results, the authors propose a model for ParB-mediated DNA condensation, which requires one-dimensional ParB sliding along DNA from parS sites. Overall, the experiments were carefully done and thoroughly controlled. The manuscript provides critical insights that can be strengthened by addressing the following minor concerns:

      1) Did the authors observe the diffusion of isolated ParB foci along DNA? This will provide strong evidence for the proposed diffusion/sliding model.

      2) Based on the sliding clamp model, ParB spreading and diffusion result in DNA condensation by forming large DNA loops. Is it possible to show the dynamic spreading of ParB while keep the same numbers of ParB on DNA? For example, can the authors incubate ParB-containing DNA in channel 4 (ParB channel) at a certain time for the loading of ParB on parS sites, and then move it to the buffer channel without free ParB as well as with CTP or CTPS, where the images are acquired at the long interval time to minimize the photobleaching. The fluorescent intensity of the ParB during the spreading process can be analyzed. If the intensity remains constant through spreading in the presence of CTPS but significantly decrease in the presence of CTP, this data will strongly demonstrate the proposed spreading and CTP hydrolysis-dependent dissociation mechanism.

      We thank the reviewer for these suggestions to prove spreading. However, we decided to follow an alternative strategy based on the direct imaging of QD-labelled ParB. As described above, this strategy worked well and we have directly visualized ParB diffusion from parS sites.

      3) In Figure 2, the authors show the spreading of ParB can be blocked by EcoRI. Can the authors show that EcoRI is bound at the specificity positions? The spreading blockage by protein roadblocks showed in optical tweezers experiments potentially hints that the roadblocks may affect the DNA condensation. Can the authors apply the magnetic tweezers to show the affection of protein roadblocks to DNA condensation in vitro?

      It is well established that EcoRI has extremely high affinity and specificity for its site (Terry et al., 1983) and so, since we do not have labelled EcoRI mutant, our experiments assume the sites are occupied. This is one reason we have used multiple sites in our experiments. Nevertheless, we have tested the effect of protein roadblocks in condensation in MT experiments. We found partial concentration consistent with the blocking of spreading of ParB from parS (Fig. R5)

      Figure R5. ParB diffusion is required for DNA condensation by ParB. (A) Schematic representation of DNA substrate employed in these MT experiments. It contains a set of 5x EcoRI sites located at 3835 bp from the DIG labelled end, and 7x parS. The positions of the EcoRI and parS sites in the DNA cartoon are represented to scale. (B) Condensation assay using the EcoRI 7x parS DNA substrate under different experimental conditions. ParB partially condenses the DNA molecule when EcoRIE111G is present. (C) Quantification of the extension in base pairs of the non-condensed region under different experimental conditions. In the presence of EcoRIE111G, the length of the non-condensed region agrees well with the length of the region flanked by the DIG end and the EcoRI sites.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] I do have a couple of concerns.

      Major issues:

      The BOLD hemodynamic response function is slower than the pupil impulse response function. It seems that the authors did not correct for the "lag" between the two (as in Yellin et al., 2015, for example). How much does this matter for the results?

      We thank the reviewer for highlighting the insufficient treatment of the potential “lag” between the two signals. In the initial submission we only compared linear regression prediction scores obtained after introducing shifts in the <-5; 5> s range between the signals and verified that prediction scores were the highest at 0 s lag (Figure 3–figure supplement 1B).

      In line with the reviewer’s suggestion, we followed the approach of Yellin et al. (2015) and convolved the pupil diameter size signals with a hemodynamic response function (HRF) and repeated both the prediction and clustering analyses. As the pupil response times differ across conditions showing e.g. a 1 s response to luminance changes in Yellin et al. (2015) and a ~3 s task-evoked response in de Gee et al. (2017), we (similarly as in our previous publication - Pais et al. (2020)) convolved pupil signals with a range of HRFs with different peak times (HRFs shown in Figure 2–figure supplement 3A).

      We show that when using the convolved signals the results are reproducible based on the overlap of trials’ cluster membership (Figure 2–figure supplement 3B) as well as high similarities of the cluster-based spatial correlation maps (Figure 2–figure supplement 3C).

      In parallel to the signal shift-based prediction results included in the initial submission, we predicted the convolved pupil signals based on PCA fMRI time courses and verified that the highest prediction scores were obtained when predicting signals convolved with a kernel with a peak time at 0 s (and Figure 2–figure supplement 3C). We hypothesize that the slight increase in prediction scores (compared to raw data prediction) is a result of the convolution-based temporal smoothing.

      Baseline pupil size was different between the identified clusters. How was pupil size normalized across rats and scanning runs, so that we can meaningfully interpret such a difference?

      We thank the reviewer for pointing out the missing information. As part of the pupil diameter extraction pipeline, the diameter was normalized based on the eye size. In each video, the eye size was calculated based on manual landmark identification. Both the eye size and diameter were measured in the number of pixels. The eye size was then used to normalize the pupil size, such that pupil diameter values were limited to the <0, 1> range with 1 being the eye size. We now included this information in the Methods - Pupillometry acquisition & pupil diameter extraction section (Page 12). The PSDs were based on these signals, meaning that cluster baseline differences reflect differences in the mean pupil diameter. As already mentioned in Methods, the signals were variance normalized for the prediction procedure.

      A substantial part of the literature focuses on the relationship between task-evoked pupil and neuromodulatory responses. I understand that this paper describes results from a resting state experiment, but even in these conditions one typically observes rapid dilations. Right now, it seems that the analysis is somewhat blind to these. See for example Fig. 2C in which frequencies are plotted only until 0.05Hz. Can we see this on log-log axes, to inspect the higher frequencies? Note that there is some work that indicates that the slower pupil fluctuations more reliably track ACh signaling, and faster fluctuations more reliably track NE signaling (Reimer & McGinley et al., 2016).

      We thank the reviewer for pointing out the potentially meaningful analysis direction. In our work, we do not observe different correlation features when comparing e.g. the 1-10Hz rapid pupil dynamics versus the slow oscillation, which could be potentially caused by the anesthetic effect (Discussion, Page 10). Also, it should be noted that the acquired pupillometry data were limited by the quality of recorded videos and by the pupil size extraction method. The low SNR of videos recorded simultaneously with fMRI measurements using an MR-compatible camera made the faster pupil size changes hard to track. Additionally, the fast pupil size changes observed in the extracted signals are to a large degree caused by the fact that DeepLabCut, the employed toolbox, independently marks the landmarks in each video frame. Due to the lack of temporal dependence in landmark identification combined with the low SNR, “by default” landmark locations slightly differed in neighboring frames. This difference was amplified with increasing pupil size. Consequently, the magnitude of pupil size changes faster than 1 Hz was highly correlated with pupil size (i.e. baseline fluctuation). The effect can be seen in raw pupil data plots in Figure 2–figure supplement 2. Consequently, when we extracted e.g. the fluctuation of the 1-10 Hz pupil size power changes and used it to generate the PCA linear regression maps, the maps closely resembled the baseline-based maps (see the all trial-based maps in Response Figure 2B; spatial correlation r=0.83). Simultaneously, the prediction scores of the band-based signals were lower than those based on raw downsampled signals.

      The authors write "Cluster 2 had the strongest positive weights in […], but also in brainstem arousal-regulating locus coeruleus, laterodorsal tegmental and parabrachial nuclei." However, the voxel size is very large with respect to the size subcortical nuclei. Because of this, here and in other places, I think the authors should use locus coeruleus region or area, to indicate that their voxel captures more tissue than just LC proper. A discussion paragraph on the spatial specificity of their effects would also help.

      Now we explicitly write about the “area/region containing the locus coeruleus” at every mention of LC being highlighted in the maps.

      The approach is very data driven and the Results section mostly descriptive. I'm personally not at all unsympathetic to this approach, but I do think the authors could aid the reader better by briefly interpreting their results already in the Results section. Related, the authors end each paragraph with "These results verified […]" or "These results highlight […]"; however they don't explicitly inform us how.

      We modified the mentioned sentences to be more descriptive and self-explanatory.

      Rainbow and jet colormaps are confusing because they are not perceptually uniform (https://colorcet.holoviz.org/). Please consider using something like "coolwarm"?

      We changed all jet colormaps to coolwarm.

      Minor issues:

      "Trial" is not well defined. I take this is a 15 minute run?

      We thank the reviewer for pointing out the missing information. Now, at the first mention of “trial” in the Introduction, we specify the 15-minute trial duration.

      How many trials in each cluster (Fig. 2)?

      The clusters had the following trial counts: n1=8; n2=30; n3=24; n4=12. We now included this information in the Results section.

      It would be nice to see a more zoomed in version of Fig. 5 so that we can actually see the subcortical regions in more detail.

      We now provide linear regression PCA maps for each cluster in a bigger size and with overlaid atlas region borders as individual figure supplements (same format as the Figure 4 map created using all trials; Figure 5–figure supplements 1-4).

      Reviewer #2 (Public Review):

      [...] The mechanisms behind the time-varying fMRI-pupil coupling exhibited under anesthesia could also be further clarified. Specifically:

      The clusters appear to involve interpretable brain regions. However, a more formal analysis of reproducibility of these clusters, and statistical testing against an appropriate null model, are not present. Such tests would be useful for establishing the validity of the derived clusters, ensuring that the conclusions are strongly supported. Similarly, the differentiation between power spectral density of each cluster is not yet supported by statistical testing.

      We now addressed the essential issue of cluster reproducibility in a series of analysis steps. In the initial submission, we selected the n=4 cluster result based on silhouette scores computed after single initializations of UMAP dimensionality reduction and GMM clustering. Now, we repeated the random initializations 100 times. The selection of n=4 clusters based on silhouette scores has been reproduced (Figure 2B). Instead of selecting cluster memberships from a single initialization, we identified the most common cluster membership for each trial across repetitions. This resulted in changing cluster memberships for 2 out of 74 trials compared to the initial submission. The ratio of label matching and correlation map similarity across the 100 repetitions are shown in Figure 2–figure supplement 1AB).

      Next, following the reviewer’s recommendations we performed a spilt-half analysis and compared our results against a null model similarly to Allen et al. (2014). We divided the trials in two random halves 100 times and repeated the clustering analysis. We showed that to a large degree the cluster memberships are preserved when using trial halves (Figure 2–figure supplement 4). Next, using spatial surrogate maps with spatial autocorrelations, and value distributions matching those of real correlation maps (Figure 2–figure supplement 5A; created using the Brainsmash toolbox – Burt (2020)), we verified that the spatial location of correlation values and not the mean values or spatial autocorrelation properties were driving the clustering (Figure 2–figure supplement 5BC).

      We also assessed cluster reproducibility using pupil signals convolved with HRF kernels with different peak times (Figure 2–figure supplement 3A) to accommodate for the possible lag between the pupil and fMRI signals. In Figure 2–figure supplement 3BC we show that the clusters were reproducible when using the convolved signals.

      With regard to the decoding models, it appears there could be interdependence between the training and testing data (the PCA step seems to include all scans, and it was not clear if the training/testing sets contained data drawn from the same animal).

      We thank the reviewer for pointing out the missing information, which we now included in the manuscript (Page 4).

      The PCA model was fit only on the 64 training trials. The fit model was then used to project the time courses of the 10 remaining trials onto the existing components. The 64 training trials were randomly chosen and could belong to any rat. We now specify this in the manuscript. Additionally, we repeated the prediction procedure (including the PCA step) on 100 more random train-test data splits. The scatter plot of mean train and validation scores shows that our initial selection is not an extreme value and is representative of the distribution (Figure 3 – figure supplement 1A).

      While the paper is motivated by discussion that pupil diameter changes are complex and related to rich behaviors (mental effort, decision making, etc.), this paper examines data from anesthetized rats. The mechanisms behind the time-varying changes in fMRI-pupil coupling in the current data, and the potential impact of anesthesia, were not clear and could be elaborated upon.

      We elaborated upon the use and potential impact of anesthetics in a separate paragraph on page 9.

    1. Author Response:

      Reviewer #1:

      The manuscript by Lalanne and Li aims to provide an intuitive and quantitative understanding of the expression of translation factors (TFs) from first principles. The authors first find that the steady-state solutions for translation sub-processes are largely independent at optimality. With a coarse-grained model, the authors derive the optimal expression of translation factors for all important sub-processes. The authors show that intuitive scaling factors can explain the differential expression of translation factors.

      The results are impressive. However, as detailed in the major comments, the choice of some important parameters is not sufficiently justified in the current version. In particular, it is not clear to what extent parameter choice and rescaling was biased toward achieving a good agreement with the experimental data.

      Major comments:

      1) The work assumes that reaction times per TF are constant. That may be true at the highest growth rates, but it might not hold for conditions with lower growth rates. The data of Schmidt et al. (Nat. Biotechnol. 34, 104 (2016)) would allow to compare the predictions to proteome partitioning in E. coli across growth rates. It is ok to restrict the present work to maximal growth rates, but then this caveat should be made explicit. This last point also concerns ignoring the offset in the bacterial growth laws, which is only permissible at fast growth; that also should be stated more prominently in the manuscript; see also the legend of Fig. 1, "Our framework of flux optimization under proteome allocation constraint addresses what ribosome and translation factor abundances maximize growth rate".

      We see two distinct but related points made by the reviewer, which we address in turn.

      First, we thank the reviewer for highlighting the important and interesting point of the growth rate dependence of expression in components of the translational machinery, which encouraged us to investigate this aspect further. Leveraging other existing ribosome profiling datasets (which provide better quantitation than mass spectrometry data, see response to minor point #6 below) across multiple growth conditions and species, we compared the predicted optimal translation factor abundance in these conditions (using same formula for the optima). The new conditions and species now include E. coli at much slower growth rates, C. crescentus in two different media, and others. We found similar degrees of agreement between predicted and observed levels (shown in Figure 4-Figure supplement 1 ). One exception is aaRS in C. crescentus, and the discrepancy likely arises from a lack of quantification of tRNA abundance which is a parameter we use to predict the optimal aaRS levels.

      These additional data also provided another way to examine the model predictions. Specifically, we assessed the predicted square-root scaling of translation factor abundance with growth rate. While the expression stoichiometry remains constant across growth rates (see response to minor point #6 below), the overall abundance decreases following our predicted scaling (Figure 4-Figure supplement 2B). We now describe these new analyses and results in the main text (p. 7, line 216):

      "Analysis of tlF expression across slower growth conditions supports the derived square root dependence (Figure 4-Figure supplement 2)."

      The second point made by the reviewer pertains to the “offset in bacterial growth law” that corresponds to inactive ribosomes, which make up a substantial fraction of ribosomes at very slow growth rates. We note that the derivation of the optimality condition, equation 5, does not rely on all ribosomes being active. What is necessary is that that there is a direct proteomic trade-off between ribosomes and translation factors (see response to minor point 1 below). To rigorously place our work in the context of previous literature, we have replaced mention of ribosome with “active ribosome” (as well as in equation 1 and Figure 1), which we define as those functionally engaged in the translation cycle. We also formally include the proteome fraction of inactive ribosome in equations 2 and 3 leading to the optimality condition.

      2) The diffusion-limited regime considers only the free and idle reactants. For some translation factors, the free state only accounts for a small fraction of its total concentration. In this case, the diffusion-limited regime only explains a small fraction of the TFs. For example, most of EF-Ts may not be in its free state: in simulations with in vitro kinetics, free EF-Ts accounts for 6%-48% of its total concentration (Supplementary Data 3 in [21]). Can the authors use in vitro parameters (or other ways) to provide a rough estimate of the fraction of free TFs? Including this might allow to make quantitative statements about some of the deviations seen in Fig. 4, as most of the TFs are underestimated.

      We thank the reviewer for the suggestion that deviations between the diffusion-limited prediction and the observed abundance might be quantitatively explained by the finite catalytic activity of the respective factors. However, to do so requires accurate values of kcat, which are often not available. In the Supplement of the initial submission, we provided an example of the in vitro kcat being not compatible with the protein synthesis rates in vivo, which we have now moved to the main text (reproduced below).

      Another experimental approach that can feasibly be used to infer the bound fraction of translation factors in live cell is fluorescence microscopy of tagged proteins. Indeed, by quantifying the diffusive states of a tagged EF-Tu protein, Volkov et al (1) could estimate that <10% of EF-Tu was in its bound state, which is consistent with the agreement between our diffusion-limited prediction and observed abundance for that factor.

      We now discuss these possibilities and the facts about EF-Ts in a paragraph in the Discussion (p. 13, line 471):

      "Our optimization model can also be solved analytically in the non-diffusion-limited regime (Table 2), with the finite catalytic rate leading to an additional contribution of the form ∝ l 𝜆*/kcat. Recent detailed modeling of the EF-Ts cycle (Hu et al., 2020) estimated that a minor fraction (6 to 48%) of its abundance was in the free form in the cell, consistent with the large deviation we observe for this factor from our diffusion only prediction. However, the numerical values for these solutions are in general difficult to obtain because measurements of catalytic rates are sparse and often inconsistent with estimates of kinetics in live cells. As an example, the catalytic rates for aaRSs (Jeske et al., 2019) measured in vitro is ≈3 s-1 (median across different aaRSs), which is well below the minimal value of 15 s-1 required to sustain translation flux at the measured translation elongation rate (Appendix 5), suggesting substantial deviation between in vitro and in vivo kinetics. Although technically demanding, the fraction of free vs. bound factors can in principle be determined through live cell microscopy of tagged factors based on the partitioning the diffusive states of enzymes. Using that approach, (Volkov et al., 2018) estimated that EF-Tu was in its bound state <10% of the time (consistent with the agreement between our diffusion-limited prediction and the observed value for this factor)."

      3) "A factor-independent time τ_ind (e.g., peptidyl transfer), which does not come into play in our optimization framework, was added to account for additional steps making up the full elongation cycle." - what happened to this time? I couldn't find it anywhere else in the paper. What value was chosen, and by what rationale?

      We thank the reviewer for pointing out a lack of clarity in our presentation. The factor-independent time τind in fact did not appear in our optimization procedure at all (by virtue of obeying dτind/d𝜙TFi = 0 by definition), and was only included for generality to account for steps such as peptidyl transferase (extremely fast (2)). In line with the parsimony of our model, and to avoid any confusion, we have now removed this factor from our model and description altogether.

      4) Fig. 4: The agreement is very impressive, especially given the simplifying assumptions. However, there are some questions relating the choice of parameters.

      a) Were any parameters fitted? Which, how? What about τ_ind, for example (see above)?

      Our approach does not include any fitted parameter. We instead rely on biophysically measured quantities such as diffusion constants, protein sizes, tRNA abundances, cell doubling times (growth rates), and in vivo kinetic estimates. (In the line of Major Comment #3 above, we have removed τind for clarity.) We now include all quantities needed to predict the optimal translation factor abundances (using the formula listed in section “Summary of optimal solutions”, Table 2) in Appendix 5-Tables 1-3, including new Appendix 5-Tables 2-3, reproduced below.

      b) The "predicted" value for ribosomes is calculated from observed data (in a way described on p. S34 that I found incomprehensible, and would likely look very similar regardless of the predicted values for the TFs). According to the section "Equipartition between TF and corresponding ribosomes", the corresponding ribosomes can be quantified in the authors' scheme, too, by the method used for deriving optimal TF concentrations in equation 5. Why didn't the authors directly use the sum of these estimations as the optimal ribosome concentration in Fig. 4? In the current state, it does not seem fair to include the ribosome with the other predictions.

      We agree that the nature of the prediction for ribosomes was different than for other translation factors in our original manuscript in a way that might have lacked clarity. We now exclude ribosomes from Fig. 4 to avoid any possible confusion.

      It is interesting to directly estimate ribosome abundance using the equipartition principle. This estimation is however limited by the fact that the equipartition principle only accounts for ribosomes that are waiting for factor- dependent binding steps. Substantial fractions of ribosomes may be engaged at factor-free steps (e.g., peptidyl transfer catalyzed by ribosome itself) and factor-dependent catalytic steps after binding. Although the latter could be estimated using the observed tlF concentrations (by considering that the tlF in excess to the binding-limited predictions is sequestered in catalytic steps), the former is not estimated in our model. Furthermore, some other ribosomes may not be fully assembled yet or are inactive (3). Indeed, the predicted factor-dependent ribosome abundance using the equipartition principle with observed tlF abundances constitute a fraction (40%) of the measured total ribosome abundance.

      c) Predictions are for a specific growth rate (doubling time 21min). Was this growth rate also averaged over the three organisms? What were the individual values? These points would need to be discussed in the main text.

      The reviewer is correct. In the initial submission, we used the average growth rate of E. coli (doubling time 21.5±0.4 min), B. subtilis (doubling time 21±1 min), and V. natriegens (doubling time 19±1 min). A note has been added in the main text (p. 11, line 448):

      "We take the growth rate 𝜆* to be the average of the fast-growing species considered, corresponding to a doubling time of 21±1 min (E. coli: 21.5±1 min, B. subtilis: 21±1 min, V. natriegens: 19±1 min)."

      In addition, we now include predictions for different growth rates and compared them with several bacterial species grown in a wide range of conditions (Figure 4-Figure supplement 1) (see response to Major Comment #1 and to reviewer 2’s third request). These predictions and data are now included in Supplementary Files 1-4.

      5) In the same vein, in a footnote (!) to Table S4: "#For the ternary complex, the total mass of tRNA+EF-Tu was converted to an equivalent amino acid length." - I can see that this is important to get reasonable results, but it constitutes a major deviation from the strategy proclaimed throughout the main text: that the predicted effects result from a competition for fractions of the limited proteome. That rationale has to be changed (and explained in the main text), or the predictions in Fig. 4 should be based on calculations using only the protein part of TCs (i.e., EF-Tu).

      We are sorry for the confusion. The procedure of converting tRNA size to protein size was only used to estimate diffusion coefficients for the ternary complex (described in Appendix 5 Table 2), and not for the competition within the proteome. For factors for which no direct experimental estimates exist for in vivo diffusion coefficient, we used the relationship DA = (lTC/lA)1/3 DTC. The resulting estimated diffusion coefficients were then used to rescale the association rate inferred from in vivo measurements for the ternary complex (see response to point 6 below as well) to obtain association rates for other factors.

      6) S9: "we anchored our association rates to the estimated in vivo association rate for the ternary complex, 𝑘^𝑇𝐶 = 6.4 μM−1s−1 [13], and rescale the association rate by diffusion of related components" - in comparison, the diffusion limited k^TC is >100. If I understand this correctly, you simply rescale ALL on-rates by 100/6.4 = 15.6. If that is (qualitatively) correct, you would need to discuss this point (and the derivation of the scaling factor) explicitly in the main text.

      The reviewer is correct in his interpretation of our approach, and we are grateful for his remark as this led us to spot a mistake in our choice of parameter (capture radius R). Indeed, while the ternary complex as a largest physical dimension of about 10 nm (from structural data (4)), the appropriate capture radius is closer to 2 nm (size of the portion binding to the ribosome) (5). Correcting for the appropriate capture radius alone brings the estimate to 45 μM-1s-1 , which is however still several-fold higher than the measured value of 6.4 μM-1s-1. Whereas a part of this could be due to systematic overestimation of the diffusion coefficient, a large portion of the discrepancy is assuredly due to the many simplifying assumptions underlying the Smoluchowski estimate which serve to place an absolute upper bound on the reaction rate (perfectly/instantaneously absorbing spheres, and hence no notion of specific reaction position or molecular orientation).

      The estimate for capture radius R has been corrected (p. 47, line 1605) and a new sentence has now been included in the main text (p. 11, line 441):

      "Importantly, the absolute values of the optimal concentrations can be anchored by the association rate constant between TC and the ribosome obtained from translation elongation kinetic measurements in vivo (Dai et al., 2016). The latter was found to be several-fold smaller than the simplest and absolute upper bound of a Smoluchowski estimate of perfectly absorbing spheres (section Estimation of optimal abundances), and we assume that the rescaling factor is the same for all reactions."

    1. Author Response:

      Reviewer #1:

      Understanding the underlying mechanisms of stromal cell decidualization and cellular interactions in the uterus is vital to improving women's reproductive health and pregnancy outcomes. This manuscript builds on a series of innovative studies interrogating the impact of cell senescence on decidualization and embryo implantation. A novel decidualization co-culture system containing endometrial epithelial organoids and stromal cells (assembloids) was established. The authors utilize this model in combination with single cell RNA-sequencing and receptor-ligand analysis to interrogate the mechanisms underlying decidual cell senescence and their subsequent roles in embryo implantation. Notably, the authors move beyond predictive bioinformatics and utilize pharmacological inhibition to alter the developmental trajectory of decidualizing cells, resulting in an altered assembloid environment and ultimately impeding human blastocyst development. Overall, this manuscript provides foundational information that will help design definitive and mechanistic studies in the future. The data from this paper will be of general interest to those studying cell-type-specific interactions including both reproductive scientists and clinicians.

      We are grateful for these supportive comments.

      Reviewer #2:

      In this interesting and well written paper, the authors employ organoid culture, single cell transcriptomics and cell-cell interaction mapping and embryo-co-culture to investigate the role of senescent endometrial cells in implantation biology. The organoids consist of primary uterine epithelial cells and stromal fibroblasts. Transition to luteal phase endometrium is induced by MPA (an artificial progestin) and a membrane permeable cAMP derivative as in in vitro stromal cell decidualization. The so treated organoids are subjected to single cell transcriptomic analysis to reveal the cellular diversity induced in these constructs. Most importantly the authors report an unexpected degree of cellular diversity, both in the epithelial as well as in the stromal compartment, both include cells interpreted as senescent cells, and in the stromal compartment also a clearly distinct pre-decidual cell population. A ligand - receptor analysis suggests that the latter two populations are characterized by a strong engagement of the receptor tyrosine kinase signaling pathways, which gave them a chance to specifically address these cells with a tyrosine kinase inhibitor. They were able to produce decidualized organoids without senescent cells which allowed them to demonstrate that embryo implantation into the endometrial organoids is impossible without senescent cells, while it is readily happening in the presence of senescent stromal cells. The lack of uNK cells, necessary to limit excessive senescence, probably limits the stability of these cultures. This is the most direct evidence to date for a physiological role of senescent cells in embryo implantation.

      Thank you; this is an accurate summary of our findings.

      The main strength of the paper consists of the creative combination of organoid culture and single cell technology, revealing both cell state/type heterogeneity and cell-cell communication networks and the experimental test of hypotheses derived from the latter. Naturally this study is a waypoint towards more complete in vitro models of the in vivo situation, by the lack of leukocytes and blood vessels. There are also some questions about the exact details of the experimental protocol, but the robust, biologically interesting and meaningful results speak for themselves.

      We agree with the Reviewer that our model is an intermediate step towards increased cellular complexity. There are, however, some important hurdles to overcome. For example, while it is possible to co-culture immune cells in our model, cell motility is greatly restricted by the gel properties. Technical issues like these will need to be overcome before additional cellular complexity can be achieved.

      One aspect that should be justified in the paper is the use of the MPA/cAMP protocol to decidualize the organoids. This is the standard protocol for decidualizing stromal fibroblasts, and circumvents the lack of epithelial cells in standard stromal culture, essentially replacing the effects of epithelial signals with a downstream second messenger, cAMP. In this context it is not clear what this treatment is supposed to be simulating. In humans receptivity is reached with systemic progesterone. A treatment with proteases and/or IL1 could simulate the presence of the embryo. To properly interpret the results using the MPA/cAMP protocol a discussion of this point would be helpful to the reader.

      For initial characterisation of decidualizing assembloids, we decided to use our standard differentiation protocol (cAMP/MPA), first developed over a quarter of a century ago, partly because of its demonstrable robustness and partly because it has been used in other endometrial organoid studies (e.g. Fitzgerald et al., PMID: 31666317). However, we acknowledge the Reviewer’s point that there is a need to revisit the physiological drivers of decidualization, especially those activating the cAMP/PKA pathway in stromal cells. A great candidate is PGE2 and, in a preliminary experiment, PGE2 production by gland-like organoids was found markedly induced by relaxin, suggesting a potential stromal-epithelial feedback loop. However, we respectfully wish to argue that such complex studies are outside the scope of the present investigations.

      The authors interpret the epithelial compartment of their organoids as representing uterine gland epithelium. It is not clear why the authors do not also expect luminal epithelium (LE) identity to be present, and in particular since some key changes constituting the window of implantation are affecting the luminal epithelium. Is it possible that some of the epithelial diversity revealed in their single cell classification are actually LE cells? In particular the cells called "transitional" could be seen as LE cells, as a loss of polarity towards a mesenchymal phenotype is part of their biology during the window of receptivity.

      We refer the Reviewer to our previous response (Essential Revisions, point 2). However, the Reviewer raises an important point regarding the origins of luminal epithelial cells. In our opinion, luminal epithelium in cycling human endometrium is likely of mixed origins, although we do not have direct experimental evidence in support of this statement. However, rapid epithelization is the principal mechanism to limit menstrual blood loss and while this could involve rapid proliferation from remnant epithelium, compelling evidence have implicated ‘transitional’ cells (MET). We previously reported that mid-luteum endometrial surface epithelium is characterised by ‘stretches’ of P16INK4+ cells interspersed by P16INK4- cells. Hence, it is conceivable that cell turnover in luminal epithelium is greater and more dynamic than currently appreciated, even during the implantation window. Further, we recently reported the presence of similarly ‘ambiguous’ cells expressing both epithelial and stromal/mesenchymal genes in single- cell RNA-seq analysis of fresh luteal phase endometrial biopsies (PMID: 31965050), as stated in the Discussion of the current paper.

      The experimental protocol also needs a little interpretation: the authors grow their organoids for four days in "expansion media" [simulating the proliferative phase of the menstrual cycle] and some of those samples are then subjected to SC analysis. Another set of cultures is also subjected to differentiation media for an additional 4 days simulating transition to luteal phase and receptivity. Comparing the expansion media organoids only with the treated ones allows us to see what happens in this simulation of the transition to luteal phase, and as such this is OK. However, the result is never the less confounded by the fact that the treated organoids are older than the "expansion media only" samples. A control comparison with organoids treated 4 days with expansion media and then four days with differentiation media but without the MPA and cAMP would be helpful to disentangle the hormone/cAMP effects and the age related changes in culture.

      We appreciate the comments about the experimental design. However, in our opinion, maintaining undifferentiated assembloids for a further 4 days in the absence of differentiation stimuli would make comparison with decidualized assembloids potentially less informative than in the approach we chose to take. Our aim was to mimic the temporal progression from proliferative phase to secretory phase endometrium, at which point cell division ceases and transformation of the cells ensues. Arguably, maintaining undifferentiated assembloids in minimal differentiation medium without differentiation signals would lead to continued proliferation and growth, heightened stress responses, and potentially erroneous observations. However, we will keep this suggestion in mind for future experiments.

      Reviewer #3:

      In their study, authors designed a novel human endometrium research model, which they refer to as assembloids, containing not only the endometrium epithelial cells (as standard organoid models do), but also the tissue stromal cells. Once developed and well characterized using, among others, state-of-the-art single-cell RNA-sequencing, the authors showed its application potential by dipping into a candidate cause of endometrial declined function and receptivity, i.e. dysbalanced senescence. Culminating in the study is the addition of human (spare IVF) embryos to the developed endometrial assembloid with and without senescence perturbation, which is a first step toward in vitro deciphering human embryo-endometrium interaction in health and disease.

      We thank the Reviewer for this accurate interpretation of our results.

      The study has multiple strengths. Central are the design and detailed characterization of this new endometrium assembloid model, and the demonstration of its applicability for endometrial deficiency studies regarding biology and embryo interaction. The finding that cellular complexity is recapitulated in the assembloid culture and that cell types and states (in particular, senescence) mimic in vivo decidualization are major achievements. The benchmarking with in vivo data is highly interesting for the field. Limitations may be situated in the use of a rather crude method to inhibit cellular stress and senescence (i.e. using a generally acting tyrosine kinase inhibitor) and the premature immediate-generalization of findings to multiple fertility problems without the needed grounds so far. Although definitely a strong advancement in the field, adding still other endometrial cells, most importantly of luminal epithelial cells but also innate immune cells, will in the future further perfect the model.

      We agree that dasatinib has broad actions and more specific inhibitors are currently being tested in the lab. However, as detailed below, the expression profiles of senescent cells pointed towards dasatinib as a relevant inhibitor. Further, dasatinib has been shown to be effective in preventing uterine ageing in mice (PMID: 3195515).

      Adding complexity to the assembloid model is also an ongoing focus of our work but, as outlined in our response to Reviewer 1, major technical hurdles related to gel properties will need to be overcome first.

      The authors achieved their aims of establishing and characterizing a new, straightforward endometrium tissue model. Moreover, they achieved to applying this new tool to start unraveling causes of endometrium non-receptivity or ill-performance, in particular regarding (dys-)balanced senescence. Together, the study presents a promising path along which human (in-)fertility research can develop, to provide basic and translational insights in reproductive biology and into (deficient) fertility which may eventually be taken to the clinic to improving pregnancy chances.

      We thank the Reviewer for these supportive comments.

      Reviewer #4:

      Rawlings et al. investigated endometrial mechanisms that may underlie reproductive disorders such as recurrent implantation failure and pregnancy loss. The proper decidualization of the endometrium is essential for correct embryo implantation and this process is tightly controlled by hormone action during the menstrual cycle. The authors hypothesized that acute senescence in the decidualizing endometrium is necessary for successful embryo implantation. They also sought to characterize gene expression changes that occur in stromal and epithelial compartments as a result of decidualization.

      The authors used a novel assembloid model of endometrial culture to investigate how endometrial epithelial and stromal cells respond to decidualization and found that both epithelial and stromal cells displayed distinct gene expression signature groups before and after decidualization. They successfully showed that the cellular stress that occurs during decidualization can directly affect the degree to which decidual endometrial cells undergo senescence. By culturing their endometrial assembloids with human embryos, they were able to convincingly demonstrate that endometrial decidual senescence is necessary to allow the embryo to grow and invade after implantation.

      This work will be important to the endometrial biology field as well as to clinicians. In addition to offering a mechanism for some types of implantation failure and recurrent pregnancy loss, the authors showed that treatment with a tyrosine kinase inhibitor can modulate pre-decidual stress and decidual senescence, suggesting that endometrial receptivity could one day be manipulated pharmacologically.

      We greatly appreciate this endorsement of our study and indeed consider assembloids as an informative model to evaluate new therapeutics to modulate endometrial receptivity.

      Strengths:

      The major strength of this paper is their assembloid-embryo coculture model, which provides strong evidence that a lack of endometrial decidual senescence results in a lack of embryo growth and failure of the embryo to invade into the endometrial assembloid. Although endometrial organoid models containing both epithelial and stromal cells already exist, the assembloid model allows the visualization of embryo growth and invasion. The single-cell RNA-sequencing of the assembloids convincingly demonstrates the existence of different populations of endometrial epithelial and stromal cells displaying different gene expression profiles before and after decidualization. The separation into these groups is very clean and the gene expression correlates with endometrial gene expression across the cycle remarkably well.

      We thank the Reviewer.

      Weaknesses:

      Given that the women who contributed endometrial tissue to this study were attending an Implantation Research Clinic and were mostly nulliparous with past first trimester pregnancy loss, it is unclear whether the endometrial samples used in this study is representative of a healthy condition.

      When it comes to implantation and pregnancy, a binary classification of ‘heatlhy’ and ‘unhealthy’ subjects is not particularly useful or easily definable. Women who have had a miscarriage, even multiple consecutive losses, often achieve successful pregnancies; and a successful pregnancy, even multiple successful pregnancies, does not preclude a future miscarriage(s). This clinical reality also applies for implantation failure after IVF treatment. This should not be surprising as the endometrium is a cycling tissue that leads to a midluteal implantation environment under external homeostatic control of NK cells and bone marrow- derived progenitors (both recruited from the circulation). Thus, the likelihood of reproductive failure caused by an endometrial defect is much more likely to reflect the stringency of homeostatic control and the frequency of cycles resulting in an abnormal peri-implantation environment. Put differently, the notion that the endometrium ‘carbon-copies’ itself in each cycle, leading to a permanent ‘normal’ or ‘abnormal’ state, is mostly for the birds and not grounded in either clinical reality or biology.

      The rationale for using a minimal differentiation medium rather than the differentiation medium that has been established in the literature is unclear. In particular, the induction of glandular differentiation in endometrial assembloids by NAC, an antioxidant, deserves some discussion.

      The established medium for endometrial organoids (based on Turco et al., PMID: 28394884 but also highly similar to Boretto et al., PMID: 28442471) contains various mitogens and pathway inhibitors and modulators, designed to allow the establishment of epithelial organoids and facilitate the proliferation in the absence of supporting cells (i.e. stroma). We reasoned that (i) some of these factors may interfere with differentiation, and (ii) that the addition of the stromal cells to the culture should negate the need to provide additional stroma- derived growth factors. As described above, in response to Reviewer 2, the concentration of NAC added to the medium is low and does not appear to interfere with differentiation responses in either the gland-like epithelial structures or stroma.

      Additionally, some explanation as to why the authors chose to treat their assembloid cultures with decidualization cocktail for only four days, when decidualization in vivo occurs over a much longer period, would be helpful.

      As stated in the manuscript, this timepoint was chosen based on previous reconstruction of the decidual pathway in 2D cultures at single-cell level, demonstrating the emergence on day 4 of both decidual and senescent decidual populations (PMID: 31965050). Because assembloids do not as yet harbour uterine NK cells, extending the cultures with several days beyond this time-point will lead to progressive disintegration because of unopposed SASP.

      The authors state that EpS5 is a population of senescent epithelial cells producing SASP based on gene expression data. It would be more convincing if the authors could provide other evidence to characterize these cells as senescent.

      As shown in Figure 3-figure supplement 1 of the manuscript, the EpS5 population highly express CDKN1A (p21) and CDKN2A (p16) as well as numerous genes encoding for canonical SASP components (i.e. meeting the widely accepted criteria of cellular senescence). We also demonstrated that SASP produced by senescent epithelial cells is distinct from that of senescent decidual cells. We previously reported that P16-positive glandular and luminal epithelial cells emerge in mid-luteal endometrium, along with a modest but discernible rise P16-positive stromal cells, before rising sharply in late-luteal phase samples (PMID: 29227245).

      The authors draw many conclusions based on data from the CellPhoneDB computational tool, but it is unclear how the authors chose the input for this tool and whether the output of this program was validated in any way.

      CellPhoneDB is an online, publicly available repository of ligands, receptors and their interactions (https://github.com/Teichlab/cellphonedb), which integrates various existing datasets and new manually reviewed information. In order to use this computational tool, expression counts and cell metadata were extracted from our single-cell data for decidualizing cells (i.e. those populations present in D4 cultures) according to pipelines provided by the CellPhoneDB vignette. The CellphoneDB package then derives enriched receptor–ligand interactions between two cell types based on expression of a receptor by one cell type and a ligand by another cell type (as described here: https://www.cellphonedb.org/explore-sc-rna-seq). We chose to exclude integrin-interactions from our analysis and focus on cell-cell interactions rather than cell-ECM, but future investigation of these is certainly of interest to the progression of the model.

    1. Author Response:

      Reviewer #1:

      How the tolerance of gene overexpression varies across closely related organisms remains poorly understood and this manuscript offers the first systematic functional genomic screen to address this gap. Thus, the approach itself is clearly original and yeast is a great model system for such a study. The data itself would be an important resource for the functional genomics community. The broad picture emerging from this screen is also interesting: a subset of genes is commonly toxic when overexpressed, while many genes are toxic only in specific strains. Importantly, the commonly toxic genes are highly enriched in certain functional classes and often encode for protein complex members. All these make a lot of sense based on what is known about gene dosage sensitivity in baker's yeast.

      The more interesting and also riskier part is to identify and understand strain-specific overexpression phenotypes. The authors made great efforts to offer possible explanations for these, however, I had the impression that some further analyses could strengthen the conclusions and yield more insights.

      We thank the reviewer for their positive assessment of our work and its importance.

      I see four broad issues:

      1) Statistical analysis: I failed to find data on the reproducibility of the screen and how it varies across strains. Variation in reproducibility may hugely influence some of the conclusions as the number of genes with a significant fitness effect depends on measurement noise (and number of replicates).

      We went to great lengths to control all aspects of the experiment and the biological replication. All three biological replicates (two in the case of strain YPS606) are shown in Figure 1B, which demonstrates the high reproducibility. We calculated the average and standard deviation of replicate correlations within each strain; however, analyzing this data is in fact misleading: strains with the fewest deleterious effects are clearly highly reproducible in Figure 1B but often have lower correlation across replicates – this is because the log2 fitness effects are close to zero and thus the correlation is driven by noise. For example, strain Y2209 displayed the lowest correlation across replicates (r = 0.55), but Figure 1B shows very high reproducibility and low fitness effects; incidentally, this strain had above-average number of genes measured at Generation 0 (3,945 genes) and clearly maintains the 2-micron plasmid. Reduced statistical power simply cannot explain the few genes with strong fitness costs in this and other strains.

      If we consider only the 7 strains whose average replicate correlation is greater than 0.8 (ranging from 0.80 to 0.89), there is no relationship between mean correlation in their replicates and number of significant genes called (R2 = 0.01). For example, the average correlation in replicates for strain Y12 and YJM1273 is nearly the same (r = 0.795 versus 0.797), yet in Y12 there were 3,060 OE genes of significant effect compared to 1,726 in YJM1273. Thus, while there are always subtle differences in statistical power, our results cannot be explained by this. Instead, our results show that different yeast strains display different sensitivities to Moby 2.0 gene OE. We added a short section on this to the Methods on Page 21.

      On a related note, I'm somewhat puzzled by the claim that strains with large median fitness effects do not generally show more OE sensitive genes. Visually, it appears that this relationship is borne out for commonly toxic genes (Fig 3B), although not mentioned or interpreted.

      As stated in the manuscript, “While the median fitness cost of deleterious OE genes was not correlated overall with the number of deleterious genes per strain, strains with the most deleterious genes (NCYC3290, YJM1389, and Y12) did show an expanded range of fitness costs, with more genes showing very strong deleterious effects compared to other strains (Figure 2B). The correlation between number of deleterious genes and median fitness cost per strain is low (R2 = 0.08, excluding YPS606 done in duplicate).

      2) The authors show that the number of deleterious OE genes is strongly correlated with the amount of growth defect caused by expressing the empty Moby 2.0 vector (Figure 4D). This is a pretty strong correlation (r=0.7) and might influence the conclusions drawn from the data. In particular, the strong effect of empty Moby 2.0 should be taken into account when defining strain-specific fitness effects. For example, fitness effects that are present in 2-3 strains might be shared between strains that exhibit a similar cost of empty Moby and therefore need to be interpreted with caution. Previous genetic interaction studies suggest that slow growing mutants tend to show many epistatic interactions with any other mutations (Costanzo et al. 2010). I'm left with the feeling that the strain-specific differences in the number of OE sensitive genes might be a manifestation of this more general phenomenon.

      The reviewer raises an important point, one that we have made clearer in the revised manuscript: some strains are simply more sensitive to the library (perhaps due to the DNA burden, the protein burden, and/or the 2-micron replication). These strains are likely stressed during the experiment and may simply be more sensitive to gene OE, in a way that is not specific to the genes being expressed. We added some clarifying statements to the text, on pages 10, 11, 12, and 14-15. Specifically, we now cite that 60% of the deleterious genes meeting our “strain specific” criteria in Y12 were shared with another of the top four strains most sensitive to the empty vector ( DVBPG1373 YJM1592, YPS163, YJM1389, Figure 4D). Thus some of the identified genes may be deleterious if OE in other strains growing in suboptimal or stressful conditions. This is consistent with our aim in the original manuscript and hopefully now clearer with the textual changes in the revision.

      3) Lack of phylogenetic context: The investigated strains come from several distinct populations with different lifestyles and varying phylogenetic distances. I would have expected some further investigations on how strain-specific OE effects depend on lifestyle or phylogenetic relationship.

      We were very interest to see if strains from the same lineage or niche share trends in gene OE sensitivities. However, several analyses did not identify obvious effects. First, we did not find striking similarities for the strain-specific genes identified in strains from the same lineage (this can be seen to some extent from the heat map in Figure 1B). Second, we did not find that strains closely related shared a higher number of genes of similar effect. We would like to return to these questions in future work; thus, for now we have not added the analysis to the current manuscript to maintain focus on the more interesting results.

      4) The tryptophan depletion story is a nice example of strain-specific difference in physiology. Overall, the presented analyses on tryptophan-enriched genes are highly suggestive, however, it lacks a negative control, that is, other genes that have similar functions but are not enriched in tryptophan.

      In the manuscript we state, “Together, these data raised the possibility that DBVPG1373 is sensitive to conditions that deplete tryptophan from the cell.” Our results validated this hypothesis by showing that this strain, but not two others tested, are more sensitive to the OE genes in the absence of tryptophan. It is possible that that strain is more sensitive to any OE gene in this environment. As per the guidance of the editor, we have added a clarification that it is possible that this strain is sensitive to all OE genes in the absence of tryptophan.

    1. Author Response:

      Reviewer #1:

      In this study, the authors developed an elegant toolset called HiLITR for identifying the genes that are involved in protein localization to a specific subcellular organelle. The basic strategy is exquisitely designed: Two distinct types of organelle-specific membrane anchored proteins are respectively fused to the TEV protease domain and a transcription factor (TF). Colocalization of the two proteins induces release of the TF by proteolytic cleavage. The TF switches on the expression of a fluorescent protein enabling the amplification of localization signal. The expression levels of fluorescent proteins can be quantified and sorted by FACS.

      In combination with the CRISPRi screening employing a pooled sgRNA library, this strategy turns into a powerful high-throughput platform to discover genes that influence protein localization in various cellular compartments. Applying this method to protein localization in mitochondrial and ER membranes led to an unexpected discovery of the genes, SAE1 (SUMO activating enzyme) as a regulator of the tail-anchored (TA) protein insertion to mitochondrial membranes and EMC10 as an antagonist in the insertion of TA proteins to ER membranes.

      The basic workflow is thoroughly designed and optimized (e.g., the construct design, the choice of targeting sequences, the strategies to filter out false positive hits, FACS analysis, nontargeted and targeted identification of the genes affecting localization, validation of the identified genes, etc.). The triple filtering strategy (i.e., TA screen, SA screen and ER screen) is impressive since this not only enables filtering out false positives but also provides a way to investigate mislocalization or rerouting of TA proteins to ER membranes.

      Overall, this is an excellent study contributing to our understanding of protein localization and mislocalization. The manuscript reasonably well supports the conclusion. Nonetheless, there are several concerns that authors could further address:

      i) It would have been helpful to discuss how this method could be evolved to address more complex problems in protein localization and mislocalization. For example, the current version focuses on single membrane-spanning peptides as a localization signal, but the scientific community would be also interested in the localization problems of membrane proteins with multiple TM segments or larger water-soluble domains. In such case, how could the accessibility issue between TF and protease be overcome?

      The question of using HiLITR to probe targeting mechanisms for multipass transmembrane proteins is a very interesting one. If the protein of interest fails to localize to the same membrane as a single-pass transcription factor, or if topology is inverted in a way that places the protease in the lumen, then either outcome will reduce HiLITR activation. Similarly, a peripheral but non-transmembrane protein would reduce activation of an associated transcription factor if localization were impaired. We have observed that HiLITR can be sensitive to the geometries of the constructs, so it is likely that the targeting domain and linker lengths of the TF construct would need to be refined to enable and optimize activation.

      ii) Although this manuscript majorly focuses on the tool development, more in-depth explanations on the role of the identified genes (SAE1 and EMC10) would have helped readers to appreciate the significance of this work.

      We have added additional discussion on the possible cellular roles of SAE1. We speculate that the effect of SAE1 is likely indirect (i.e., SAE1 is unlikely to be directly interacting with TA proteins). With respect to EMC10, we believe we are the first to impute a functional role for EMC10 in the EMC complex. Recent structural studies have indicated that the EMC samples different states which vary in client protein accommodation. The finding that the EMC is not static hints that it may also be regulated. For several reasons that are complementary to our own findings, EMC10 is a logical lead for this antagonistic regulation. First, EMC10 is dispensable for complex stability and does not genetically cluster with other EMC members, so it has often been regarded as outside of the core complex. Second, the structural studies have indicated that EMC10 is more flexibly associated with the rest of the complex than other EMC subunits. The ability of EMC10 to dissociate from the rest of the complex could be a mechanism for this antagonistic regulation. We have added new text describing these ideas discussion section of the manuscript.

      iii) Signal amplification can be a double-edged dagger since it can magnify small differences more than what is actual. A statement would be needed how authors translate the HiLITR results into the actual effect of an identified component (e.g., HILITR vs Western blotting).

      In general, HiLITR seems to be more sensitive than direct measures of endogenous protein levels, which can be observed in the Western blotting and proteomics data related to SAE1 knockdown and in the Western blotting related to EMC10 knockdown. This is likely a function of the signal amplification of HiLITR, as the reviewer notes, and the use of clonal selection for highly sensitive cell lines. In theory, if there are perturbations that will be known to give specific effect sizes, they could be used to calibrate the HiLITR readout. Otherwise, we would recommend against imputing a specific effect size from HiLITR results. We have added these comments to the revised discussion.

      We do note that some of our data support the idea that, for a given cell line, a larger change in HiLITR activation corresponds to a larger perturbation of protein localization. For example, in Figures S7, S8, and S11, genes with larger effects on the ER HiLITR screen produced larger changes in the mutant protease localization assay (by fluorescence imaging).

      Reviewer #2:

      In this work, Coukos et al. describe the development of a genetic reporter system that involves the use of chimeric, photoactivatable substrate proteins that can be used to monitor the targeting of a tail-anchored (TA) protease to various organelle membranes. The authors present a strategy to couple these sensors with fluorescence activated cell sorting (FACS), deep sequencing, and CRISPRi libraries in order to identify genes that mediate membrane targeting. This study documents extensive optimization efforts and numerous controls to ensure the output of these screens are valid. Furthermore, the results include numerous examples of previously characterized insertases (i.e. core subunits of the ER membrane complex, or EMC) as well as the discovery of two novel genes that play a central role in the targeting of TA proteins to the outer mitochondrial membrane (OMM) or to the endoplasmic reticulum membrane (ERM). In follow up investigations, the authors show that the loss of the SUMO E1 ligase component SAE1 is critical for the targeting of TA proteins to the OMM. Using an array of quantitative cellular assays, the authors then confirm the specificity of the knockdown, and show that the disruption of SUMOylation results in the mis-targeting of endogenous substrates. Using another variation of this assay, the authors also discover that, while the knockdown of core EMC subunits decreases the targeting of TA proteins to the ERM, knocking down EMC10 results in an increase in the targeting of these substrates to the ERM. The authors also verify that this subunit specifically appears to antagonize this insertase activity of the EMC. Overall, this study provides both new tools for pooled genetic screening and identifies novel components of the topogenic machinery in human cells. These results have a clear impact on our understanding of membrane insertion pathways and are likely to influence efforts to develop new screening platforms.

      Strengths:

      This study includes both an impressive number of controls and several counter screens that make this approach both comprehensive and robust. The described approaches are also likely to be somewhat adaptable given the modular architecture of the HiLITR sensor proteins.

      Validation processes both confirm the roles of these genes in each respective process and provide evidence for the accuracy of the results. These efforts along with the detailed methods sections set a high standard for future screens that employ similar approaches.

      The novel roles of the SAE1 and EMC10 subunits suggest new factors that may control the efficiency of OMM and ERM targeting pathways. These findings are sure to inspire a slew of follow up studies centered around the mechanistic roles of these proteins in the context of each pathway.

      Weaknesses:

      Chimeric TA proteases represent an artificial substrate. While these screens clearly pick up the central machinery involved in these pathways, the characterization of such substrates has limited impact on our understanding of how the spectrum of native substrates navigate the partially redundant topogenic pathways within the cell.

      We have added this point to the discussion when describing limitations of our method. We agree that HiLITR screens will be most impactful when there is follow-up on hits by assessing the effect of their KD on endogenous proteins. We have assessed our two key hits, SAE1 and EMC10, in this way.

      The authors characterize two of the more robust and biochemically interesting hits in their follow up studies. Nevertheless, it is unclear how many of the other hits are likely to be relevant due to a lack of biological replicates and to the lack of metrics to describe the precision of the observed effect sizes.

      We performed all three screens in 2 biological replicates, and the whole genome screen was performed in 2 technical replicates. This information was previously only in the methods sections, so we have now also included it in Results. Precision of effect size can be estimated using the CasTLE software and is included for each screen in supplementary table 2. Empirically, we gain confidence from the replication of hits between the whole-genome and TA screens, robust recovery of the TRC pathway from the TA screen (Figure S7) and validation and follow-up on a limited subset of hits (Figure S8).

      The authors' efforts to characterize the effects of SAE1 and EMC10 knockdowns confirm the screening results and show that the activities of these proteins are important for targeting. However, these studies do not establish the mechanistic roles of these proteins within each insertion pathway. This will undoubtedly require additional investigations.

      Yes, we agree and have elaborated on this in the Discussion section.

    1. Author Response:

      Evaluation Summary:

      This paper will be of interest to scientists within the field of chromosome biology. The authors take advantage of the Xenopus egg cell free system and combine classical morphological analyses by immunofluorescence with chromosome conformation (Hi-C) analyses to elucidate the contribution of linker histone H1 to mitotic chromosome organization. The authors find that linker histone H1 limits the association of condensin and topoisomerase II to control chromosome length.

      We would like to note that our study also demonstrates the importance of H1.8 in preventing chromosome hyper-individualization prior to anaphase chromosome segregation by limiting the chromosome association of condensin and topo II. In addition, while it has been widely accepted that the linker histone controls local chromatin compaction by facilitating nucleosome-nucleosome interaction, we demonstrate that H1.8 can control a larger scale chromosome organization through regulating condensins. We believe that these points are conceptually novel and important.

      Reviewer #1:

      In this manuscript, Choppakatla et al. reconstitute chromatin assembly on sperm DNA in Xenopus meiosis II extracts to test the role of linker Histone H1. They find that depletion of embryonic isoform histone H1.8 increases the chromosomal levels of Topoisomerase 2A, as well as condensin I and II. Using in vitro-assembled nucleosome arrays, together with purified condensin and linker histone H1, they provide evidence that linker histone H1.8 competes with condensin and Top2A binding. They show that histone H1 depletion extends chromosome length, in a manner dependent on condensin I. Hi-C analysis suggested shorter chromatin loops in histone H1.8-depleted extracts, dependent on condensin I. This led the authors to conclude that histone H1.8 limits the association of condensin I with chromosomes to reduce the number, and thereby, increase the length of loops, shortening the chromosomes. The last part of the paper (Figures 5 and 6) explore the interplay between histone H1,8, condensin and Top2A in chromosome individualisation, arguing for a role of H1.8 in preventing chromosome dispersion, hypothesised to facilitate chromosome capture at metaphase.

      Strengths:

      • Draws together two important questions (role of linker histone H1, how chromosome size is controlled) into a potentially important mechanism.

      • Experiments are carefully conducted and controlled.

      • The paper is well written and presented.

      Weaknesses:

      • The in vivo significance is unclear. Although the in vitro studies are extremely informative, a more thorough discussion of the biological importance of the mechanisms proposed would be useful, particularly with relevance to the cell cycle.

      • The latter part of the paper exploring chromosome individualization is partly contradictory and difficult to contextualise.

      We would like to thank the reviewer for recognition of the importance of our work. Regarding the criticism against the idea that H1 limits chromosome individualization in metaphase, please see our response to Essential Revision point #7. Regarding the in vivo significance, it has been shown previously that elongated chromosomes in ∆H1 extracts result in chromosome segregation defects in anaphase (Maresca, Freedman, and Heald 2005).

      Reviewer #2:

      This is an interesting study performed using frog cycling extracts. The authors show that depletion of an embryonic histone, the H1.8 linker histone, leads to an increase in the binding of two important effectors of chromosome shape, topo II and Condensins. The increased loading of these two effectors leads to longer chromosomes that are less individualized in extracts. The major strength of this study is the elegant use of the frog extract/biochemistry to carefully dissect the contribution of chromatin-binding proteins to the overall shape and dimensions of chromosomes. The major caveat of this study is that it is based exclusively on in vitro observations, and validation experiments with purified Condensins were performed with nonstoichiometric complexes. The biological rationale justifying a role for an early embryonic histone in the reduction of chromosome length is also unclear. Cells in early embryos are typically very large and should be less dependent on size-reduction mechanisms provided by histone H1.8. One would assume that chromosome size should be maximally reduced in small somatic cells and promoted by somatic linker histone H1 subtypes. The authors did not provide an explanation for this apparent contradiction.

      We would like to thank the reviewer for their constructive criticisms. The Xenopus egg extract system has a long extensive history to make a number of fundamental discoveries that shaped the cell biology field. While this is technically an in vitro system, it recapitulates most, if not all, chromosome-dependent physiological events inside the egg. In this system, DNA replication, sister chromatid cohesion, mitotic chromosome compaction, and spindle assembly are recapitulated. This manuscript presents a molecular mechanism behind chromosome length and individualization control by the linker histone H1, and thus the system perfectly serves its purpose. The Xenopus egg extract system is also ideal for studying the mitotic roles of linker histones since we can circumvent the common problem in cellular system where linker histone depletion would affect transcriptional profiles, which would make it difficult to link between the biochemical properties and mitotic chromosome morphology. Importantly, unlike somatic tissue culture cell system where multiple linker histone subtypes are expressed, H1.8 is by far the dominant linker histone in egg extracts, further simplifying our analysis for our goal. As we responded in the comment to Essential Point 1, by quantitatively monitoring the chromatin proteins that are affected by H1.8 depletion, and recapitulating the phenomenon by the reconstituted system, we were able to support our conclusion at the level that is difficult to achieve in studies based on tissue culture system. It is plausible that a mechanism applies to somatic cells where linker histones bind mitotic chromatin as well and changing linker histone stoichiometry may play a role in controlling chromosome length among cells of different sizes as well (Woodcock, Skoultchi, and Fan 2006).

      The reviewer also pointed out an important possibility that we used nonstoichiometric condensin complex for our in vitro usages. In the revised version, we conducted mass photometry analysis to confirm that our complex indeed can maintain its subunit stoichiometry during our in vitro assay condition (Figure 2-figure supplement 1). Also, please note that we have shown that our complex was able to rescue condensin I depletion phenotypes in Xenopus egg extracts (Figure 2-figure supplement 2A).

      Reviewer #3:

      In the current manuscript, Choppakatla et al. address the contribution of the linker histone H1 to mitotic chromosome assembly in the Xenopus egg cell-free system. They show that the presence of this histone limits the binding of condensins I and II as well as topoisomerase II. Depletion of H1 from the egg extracts results in assembly of longer and thinner chromosomes, and increases dispersion of individualized chromosomes. Hi-C analyses indicate that average loop size is shortened and the DNA amount in each layer of mitotic loops is reduced in the absence of H1, a phenotype attributed to the increased presence of condensin I.

      Strengths:

      • Experiments are carefully designed and performed, the figures are clear and properly labeled and the manuscript is written with clarity.
      • Combination of classical assays (immunodepletion+chromosome assembly followed by image analysis) with Hi-C analyses (to my knowledge, this is the first time that Hi-C is used for chromosomes assembled in Xenopus extracts) as well as in vitro reconstitution of topoII and condensin binding to nucleosomal arrays to test the effect of H1.
      • The results support most of the conclusions of the manuscript, explain the previously reported effect of H1 depletion on chromosome assembly and are consistent with previous reports regarding the contribution of condensin I and condensin II to chromosome organization.

      Weakness:

      • The last part of the manuscript regarding "chromosome individualization" is a bit confusing, probably because it is unclear what this process entails in molecular terms. On the one hand, the authors mention the existence of entanglements between metaphase chromosomes that must be removed to allow complete individualization of chromosomes before segregation. On the other hand, chromosomes assembled in the egg extract cluster together (even in the absence of a spindle). The correlation between these two phenotypes and the actual contribution of the different factors (H1, condensins, topoII) is unclear.

      We thank overall positive evaluation and constructive criticisms by the reviewer. Regarding the issue about regulation of chromosome individualization, please read our response to Essential Point #7.

    1. Author Response:

      Reviewer #2 (Public Review):

      [...] 1) A weakness of the paper is the disruption of the complex during cryoEM grid preparation resulting in about half of the observed particles missing the membrane arm and likely also contributing to the disorder and biased orientation seen in the intact complexes. This leads to poor density in the membrane arm for all of the intact complex I structures presented and large variations in the local resolution of the membrane arm focused refinement.

      Purified E. coli complex I has always been known to be labile in particularly at the junction of peripheral and membrane arms (https://pubmed.ncbi.nlm.nih.gov/12637579/).

      Air-water interface likely plays a role in disrupting the complex in addition to other possible causes. Indeed, the dissociated arms, preferred particle orientation, and low protein concentration (~0.1 mg/ml) used to produce grids with high particle density all indicate that reconstituted complex I does interact with air-water interface. While disruption and denaturation of protein complexes on air-water interface has been well documented, (https://pubmed.ncbi.nlm.nih.gov/3043536/, https://pubmed.ncbi.nlm.nih.gov/30932812/ ), we are not aware of examples where air-water interfaces caused higher mobility of a complex or induced a stable conformation, different from the one in bulk solution. Therefore, we think that air-ware interface is neither the cause of the observed high arms mobility nor of their relative rotation.

      Preferential orientation was observed in the cryo-EM studies of most complex I homologs (Gutiérrez-Fernández et al., 2020; Parey et al., 2019; Zhu et al., 2016) as well as of other proteins, suggesting that adsorption of complex I on air water interface is a common phenomenon. In this case it is not clear why relative movement of the arms observed in all the structurally characterized complex I homologs is not due to the air-water interface, but in the case of E. coli complex it is.

      To provide additional support to our interpretation of the structural data we purified complex I in detergent LMNG, showed that it catalyzes redox reactions and solved its structure to resolution of 6.7 Å (Figure 6 and corresponding figure supplements). Because cryo-EM grids had to be prepared at a protein concentration of 2-3 mg/ml and the particles displayed nearly homogeneous distribution of orientations, we conclude that the interaction with the air-water interface was reduced. Still, the complex assumes a very similar, or even somewhat more uncoupled conformation and the relative mobility of the arms remained comparable to that in the nanodisc-reconstituted complex reconstructions. These data allow us to rule out the air-water interface and reconstitution of the protein into lipid nanodiscs as the possible causes of the high mobility and the unusual relative position of the arms.

      The corresponding modifications were added to the manuscript on lines 372-382:

      “To better understand the reasons for the observed uncoupled conformation and the missing density for HTMH1, we purified E. coli complex I in detergent LMNG, showed that it can catalyze redox reactions (Figure 6 - figure supplement 1) and solved its structure to resolution of 6.7 Å (Figure 6 - figure supplement 2). The detergent-solubilized complex also displays high relative mobility of the arms (Figure 6 - figure supplement 3) and has uncoupled conformation (Figure 6). Its peripheral arm is rotated even further away from the expected coupled state position than in the nanodisc-reconstituted structures. Both the cryo-EM sample preparation conditions and more homogeneous distribution of particle orientations indicate that interaction of the complex with air-water interface was significantly reduced when compared with the complex in nanodiscs. This allows us to conclude that neither air-water interface nor reconstitution into nanodiscs cause the uncoupled conformations.”

      It is not very clear what referee means by “poor”, when referring to the focused density of the membrane arm. The density corresponds well to the reported resolution of 3.7 Å. Indeed, it is in a stark contrast with the quality of the density obtained for the peripheral arm at 2.1 Å resolution. Given high mobility of the membrane arm it had to be refined essentially independently of the peripheral arm which remains still challenging for a ~200 kDa membrane protein without water-soluble domains in lipid nanodiscs. The density is heterogeneous as clearly stated at the beginning of the section “Structure of membrane arm” from line 264:

      “The model of complete membrane arm, including the previously missing subunit NuoH (Efremov and Sazanov, 2011), was built into the density map with local resolution better than 3.5 Å at the arm center and approximately 4.0 Å at its periphery (Figure 1A, Figure 1 - figure supplement 4).”

      Finally, for most complex I homologs the resolution was gradually improved over several years, as reflected in multiple publications of essentially the same structures. In contrast, no high-resolution structure information was available for the intact E. coli complex I until now. Therefore, it would be unreasonable to expect the complete structure to be solved at resolution of 2 Å at once.

      The resolution of the membrane domain in reconstructions of complete complex I is indeed lower due to high flexibility of the complex and the fact that refinement naturally focuses on more stable peripheral arm that does not have heterogeneous nanodisc around and that contains Fe-S clusters enhancing particle alignment power. Still, these conformations clearly resolve the interface between subunits albeit at lower resolution.

      This fact was also clearly stated at the beginning of results section lines 102-106:

      “Three conformations of the entire complex were reconstructed to average resolutions between 3.3 and 3.7 Å (Figure 1 - figure supplement 4) resolving the interface between the arms; however, due to high-residual mobility of the arms, the antiporter-like subunits were resolved at below 8 Å (Figure 1 - figure supplement 4).”

      2) A weakness of the paper is the disorder of important functional regions of the complex, namely the NuoH TMH1, whose disorder is unique to these nanodisc E. coli structures, and the NuoA TMH1-TMH2 loop. As the NuoH TMH1 forms part of the entry to the quinone tunnel of the complex, its absence in the structure leads to concerns regarding the function of the nanodisc preparation. Its absence it curious as this suggests flexibility of the helix, as pointed out by the authors, but the authors also state that there is not enough room in the nanodisc to accommodate this helix (given the visible density for the lipid and membrane scaffold protein). These observations suggest denaturation or unfolding in this region of the complex as opposed to simple flexibility.

      According to the usual definition of complex I activity our preparation in nanodiscs is active. We complemented our data with additional measurements and included NADH:DQ assays (see next point) that also indicate that our preparation is active. Additional 3D reconstruction of E. coli complex I that we obtained for protein solubilized in LMNG does resolve HTMH1 and its environment appears to be more similar to other detergent-solubilized structures of complex I homologues. At the same time, the helices around HTMH1 appear to be more tightly packed and more curved than in the nanodiscs which may reflect suppressed dynamics and distorted protein conformation. Most importantly, the overall conformation of the complex remains nearly the same and still corresponds to what we call the uncoupled conformation. That of course does not allow us to say where HTMH1 is positioned within the nanodisc, but it does enable us to conclude that the local changes in the vicinity of HTMH1 do not influence the global conformation of the complex.

      The additional structure is not described on lines 383-388:

      “The HTMH1 helix is resolved in the detergent-solubilized complex (Figure 6A). Its density is weaker than that of the surrounding helices and it is strongly bent (Figure 6B). Simultaneously, HAH1 takes the conformation resembling other complex I homologs while ATMH1 bends towards the arm core. The arrangement of helices in detergent-solubilized reconstruction appears to be more compact and more bent than in the lipid environment which may restrain the otherwise more flexible HTMH1.”

      In the revised discussion the environment of HTMH1 is described more clearly on lines 426-433:

      “The absence of HTMH1 density in nanodiscs, but not in detergent, is another unique feature of E. coli complex I. HTMH1 is exposed to the lipid environment and the width of the nanodisc next to HTMH1 is similar to other regions around the membrane arm (Movie 1). Moreover, homology modelled HHTM1 fits the empty space without steric clashes suggesting that HHTM1 is dynamic rather than displaced or unfolded. By comparing the detergent-solubilized and reconstituted complexes we can conclude that position and dynamics of this helix is neither the cause of the uncoupled conformation nor of the high relative mobility of the arms.”

      Disorder of ATMH1-TMH2 loop is not unique to E. coli complex I but also observed in some conformations of ovine complex I PDB 6zkd, 6zke, 6zkf.

      3) Unfortunately, the NADH:Q1 functional data do not fully address these concerns at Q1 is far more soluble that the native Q8 substrate of the complex. Although the Q1 activity is sensitive to the inhibitor Piericidin A, which clearly demonstrates that the Q1 reduction is occurring in the native quinone binding site as Piericidin A binds specifically at that site, this does not preclude the possibility of Q1 accessing this binding site via a different path. In fact, the structures indicate that given the flexibility in the connection between peripheral and membrane arms of the complex, the quinone binding site is likely open to the cytoplasm. This leads the authors themselves to conclude that the structures presented are likely disrupted/uncoupled states in which the energy converting mechanism of the complex is not likely possible.

      To address the raised concern, we have measured the activity of complex I in nanodiscs with less soluble decylubiquinone (DQ) as well as its inhibition. Small amounts of LMNG was used to increase the DQ solubility. Our results have confirmed that E. coli complex I in nanodiscs is active and the NADH:DQ activity is sensitive to piericidin A (see the modified Figure 1-figure supplement 2). We have also remeasured the Q1 activity and its inhibition which showed lower values than previously, due to a flaw in the activity measurements reported in the original submission (qualitatively, the results remained unchanged). Moreover, we have observed a similar activity results with somewhat higher values for E. coli complex I in LMNG (Figure 6-figure supplement 1). These data demonstrate that in the reconstituted complex I quinone analogues can enter the Q-site through the membrane. It is worth noting that due to extremely low solubility of longer quinones, including native ones, they are not used for activity measurements in purified preparations.

      Regarding the complex I conformation, we do think our reconstruction represents uncoupled state which is not able to pump protons (as states in the title). We have improved the clarity of this point throughout the manuscript including the discussion lines starting from the line 412.

      “The high mobility of the interfacial regions and the relative rotation of the arms disrupts conserved interfacial interactions and exposes Q-cavity to the solvent (Figure 5A). This differentiates E. coli complex I from its structurally characterized homologs in which the Q-cavity is sealed from the solvent. Thus, we interpret the observed conformation as an uncoupled state.”

      And from line 469: “We also observed the relative rotation of the membrane and peripheral arms disrupting the conserved interface and trapping the complex in an uncoupled conformation. Whether this conformation is biologically relevant or is a result of protein purification is to be clarified by further research.”

      4) A weakness of the paper is the building of atomic models into regions of the map which do not contain sufficient detail to warrant atomic models. This is particularly the case for the intact models of complex I as well as the membrane arm focused maps and results in low map-model correlations (0.58-0.71). The models were clearly highly restrained during refinement, resulting in good geometry, as is necessary for low resolution regions. But being able to restrain the geometry is not sufficient for placing atoms into regions where the density is weak or absent. If additional information was used in building/constraining the model, such as the X-ray structure, the regions of the model that are biased towards the X-ray structure model needs to be made clearer. Also, in several places in the membrane arm map residues bulge out of the density (side chain and main chain) leading to possible frame shifts with respect to the match between subsequent residues in the model and the map (see NuoM Ile168 for example).

      A large part of the membrane domain has been solved using X-ray crystallography to resolution of 3.0 Å which was used as a starting model for model building, therefore we don’t think there are register shifts in our model. We used standard setting for model refinement in phenix_refine. Our building and refinement procedure has been described in fine details in the original submission, see from line 674:

      “For the membrane domain, the previously obtained E. coli model (PDB ID: 3RKO) was real-space-refined in PHENIX. The missing NuoH subunit was homology-modelled using the T. thermophilus structure (PDB ID: 4HEA) in Coot 0.9. The final model was obtained after several rounds of manual rebuilding and real-space refinement using standard parameters with Ramachandran restrains, secondary-structure restrains applied to the NuoL TMH9-13, without ADP restrains, and with the optimized nonbonded_weight parameter. To generate the model of the complete complex I, the separate peripheral and membrane arm structures were combined and the missing parts at the interface (Table 2) were built manually. As the density of NuoL and NuoM was very poor in all the resolved full conformations, these subunits were subjected to rigid-body refinement in PHENIX, whereas the others were subjected to real-space refinement with minimization_global, local_grid_search, morphing, and ADP refinement. Ramachandran, ADP, and secondary-structure restrains were used. After manual rebuilding in Coot, real-space refinement of the full complex was performed with standard parameters and restrains.”

      To improve clarity, we added a following sentence to the Results section from line 116:

      “Using the resulting maps, atomic models of the peripheral and membrane arms have been built. The entire E. coli complex I was modelled by fitting models of the arms and extending additionally resolved loops and termini. Due to limited resolution, the antiporter-like subunits were refined as rigid bodies.”

      The model has been improved and side chains with absent density were truncated to C position.

      The density for focused refinement density of the membrane fragment is relatively week, but of sufficient quality to allow building side chains for most of the map. It even visualizes lipid densities (not described in the manuscript). Such weaker densities are common for small membrane proteins. While fully usable for model building, they naturally result in lower model map FSC and consequently, in lower real-space correlation. In addition, real-space correlation is lower when the map is heterogeneous, and it strongly depends on the way the heterogeneous map has been filtered. Therefore, lower cross correlations do not necessarily mean that the model fit is poor. In our case they reflect weaker signal to noise of the density. Model-map FSCs (Figure 1 figure supplement 4) are more informative than a single number and show that model-map cross correlations remain above 0.5 for the complete resolution range for all models.

      5) A weakness of the paper is that several specific claims are made about the positions of side chains but, when investigated, the density for those side chains is poorly resolved. An example of this is NuoH Lys274, which is in a low-resolution region of the map and although is fit as well as possible must be considered low confidence given the local resolution (nearby residues Phe277 and Phe282 have almost no side chain density for example).

      At lower resolution, a presence of residues density strongly depends on their mobility. Well-ordered residues may have well-defined densities while others, even in the proximity, may have a poor density. In the case of Lys274, there is a clear density for the side chain, its position makes chemical sense, and it is hydrogen-bonded to the backbone oxygen of Gly258. In fact, if examined closely, this is also the only meaningful position for Lys274 side chain. At the same time, the conformations of Phe277 and Phe282 are not restrained by interactions with other residues in their vicinity which is likely why their densities are weaker.

      6) A weakness of the paper is that the conformational changes seen between the membrane and peripheral arm of the complex in the different 3D classes are difficult to interpret. It is unclear if they are mechanistically significant or, perhaps more likely given the amount of broken complex observed, due to partial disruption of the complex before it completely breaks apart.

      As we discussed above, the observed multiple conformations are not due to the complex disruption. It is not very clear what the reviewer means by ‘difficult to interpret’. Many conformations of the peripheral and membrane arms observed for the complex I homologues are likely not mechanistically meaningful per see, but rather reflect overall flexibility of such a large complex. Here, our goal was to describe our structural data as accurately as possible which resulted in several resolved conformations.

      We do think they all represent the uncoupled complex I, in this respect they do not have different mechanistic meanings. However, they do permit us to understand how the arms move relative to each other and what degree of freedom exists between them.

      7) A strength of the paper is the interesting and original mechanistic proposal put forward by the authors. But a weakness is that it is unclear how this proposal stems from the structural data presented. Also, the arguments presented are difficult to follow in their current form and warrant a more detailed discussion with the requisite thermodynamic treatment. This may warrant a more complete discussion in an appendix or unless the authors can more convincingly show how the data presented in the paper suggests their proposed mechanism perhaps a separate review article. Furthermore, the proposed mechanism, as presented would make a simple prediction that in the absence of NuoM and NuoL (or equivalent subunits in other species) complex I would not pump any net protons. Experiments that are relevant to this prediction have been done in E. coli (NuoL deletion) and Y. lipolytica (nb8m deletion that results in loss of both NuoM and NuoL subunits). See https://pubmed.ncbi.nlm.nih.gov/21417432/ and https://pubmed.ncbi.nlm.nih.gov/21886480/. In both cases the complex is still able to pump protons. The behavior of the NuoL deletion in E. coli is reconcilable with their proposed mechanism as NuoM is still present, however, the case of the nb8m deletion in Y. lipolytica is more difficult to reconcile with their proposed mechanism. The authors would need to address these experiments in order to include their proposed mechanism.

      The description of the mechanism has been modified. It is very briefly outlined in the main text along with the Figure 7 and more detailed description, including thermodynamic considerations, is moved to the supplementary text. We have also explained more clearly how the model stems from the experimental data on line 435:

      “The absence of a continuous proton-translocation pathway between the Q-site and subunit NuoN, as well as high flexibility of the peripheral arm interface are not consistent with the recently proposed coupling mechanisms relying on specific movements of the interfacial loops (Cabrera-Orefice et al., 2018; Kampjut and Sazanov, 2020). This led us to ask whether a coupling mechanism consistent with known complex I properties, but without the movements of interfacial loops is conceivable.”

      Furthermore, we state that at this point this is a hypothetical mechanism.

      Supplementary data describing mechanism in more details now also includes the discussion of both papers mentioned by the reviewer from line 1368.

      “Experiments with engineering E. coli complex I lacking subunit NuoL and Y. lipolytica complex I lacking homologs of subunits NuoM and NuoL (Dröse et al., 2011; Steimle et al., 2011)(Dröse et al., 2011; Steimle et al., 2011) (correspond to n=2 and 1, respectively) both suggested that the engineered complexes were active and for both constructs stoichiometry was estimated as 2H+/2e-. While NuoL deletion experiments support our model, the NuoL/M deletion clearly contradicts it. Both experiments should be interpreted cautiously, however. Results of NuoL deletion for E. coli complex I were not reproducible (Verkhovskaya and Bloch, 2012). In the case of Y. lipolytica, the homologs of NuoL/M dissociated from the complex along with another 11 subunits upon deletion of supernumerary subunit NB8M located at the tip of NuoL (Zickermann et al., 2015). Since the proton-translocating modules were not deleted per se, the presence of contaminating amounts of assembled complex I in the preparations that generated observed proton pumping cannot be completely excluded. It is important to note that mutation of the conserved ionizable residues on the interface between NuoN and NuoM, i.e. ME144 (Torres-Bacete et al., 2007) or its counter ion NK395 (Amarneh and Vik, 2003), result in a completely inactive complex I suggesting that dissociation of subunits NuoL/M also should render complex I inactive (Verkhovskaya and Bloch, 2012).”

      The main problem with these experiments that that they have never been reproduced by other laboratories and are not completely consistent with the mutagenesis data. Deletion of subunits may also result in distinct pumping behavior of the remaining subcomplex. For example, it was shown for the bovine complex I that it can translocate Na+ ions in the deactive state (https://pubmed.ncbi.nlm.nih.gov/22854968/).

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      8) Overall, despite the many strengths of this paper detailed above it is unclear whether the authors achieved their goal of a structure of functional E. coli respiratory complex I reconstituted in lipid nano-discs. It appears that under the current grid preparation conditions that the complex is under excessive stress resulting in partial denaturation and partial-to-complete dissociation. Given the clear biophysical data presented on the intactness of the complex in solution, this disruption likely occurs during grid preparation and further optimization of grid conditions may resolve this issue. With the current maps more work needs to be done to improve the map-to-model correlation and to clearly indicate the regions in the models where this correlation is low.

      Additional reconstruction of complex I solubilized in LMNG help us to exclude the interaction of the complex with water-air interface and its reconstitution into lipid nanodiscs as the causes of the relative subunit rotation and high flexibility between the arms. At this moment, whether the structure represents an artifact of purification or is a biologically-relevant state remains an open question. However, answering it goes beyond the current study and will require additional research. This is now explained in the discussion section.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The major limitation of the manuscript lies in the framing and interpretation of the results, and therefore the evaluation of novelty. Authors claim for an important and unique role of beliefs-of-other-pain in altruistic behavior and empathy for pain. The problem is that these experiments mainly show that behaviors sometimes associated with empathy-for-pain can be cognitively modulated by changing prior beliefs. To support the notion that effects are indeed relating to pain processing generally or empathy for pain specifically, a similar manipulation, done for instance on beliefs about the happiness of others, before recording behavioural estimation of other people's happiness, should have been performed. If such a belief-about-something-else-than-pain would have led to similar results, in terms of behavioural outcome and in terms of TPJ and MFG recapitulating the pattern of behavioral responses, we would know that the results reflect changes of beliefs more generally. Only if the results are specific to a pain-empathy task, would there be evidence to associate the results to pain specifically. But even then, it would remain unclear whether the effects truly relate to empathy for pain, or whether they may reflect other routes of processing pain.

      We thank Reviewer #1's for these comments/suggestions regarding the specificity of belief effects on brain activity involved in empathy for pain. Our paper reported 6 behavioral/EEG/fMRI experiments that tested effects of beliefs of others’ pain on empathy and monetary donation (an empathy-related altruistic behavior). We showed not only behavioral but also neuroimaging results that consistently support the hypothesis of the functional role of beliefs of others' pain in modulations of empathy (based on both subjective and objective measures as clarified in the revision) and altruistic behavior. We agree with Reviewer 1# that it is important to address whether the belief effect is specific to neural underpinnings of empathy for pain or is general for neural responses to various facial expressions such as happy, as suggested by Reviewer #1. To address this issue, we conducted an additional EEG experiment (which can be done in a limited time in the current situation), as suggested by Reviewer #1. This new EEG experiment tested (1) whether beliefs of authenticity of others’ happiness influence brain responses to perceived happy expressions; (2) whether beliefs of happiness modulate neural responses to happy expressions in the P2 time window as that characterized effects of beliefs of pain on ERPs.

      Our behavioral results in this experiment (as Supplementary Experiment 1 reported in the revision) showed that the participants reported less feelings of happiness when viewing actors who simulate others' smiling compared to when viewing awardees who smile due to winning awards (see the figure below). Our ERP results in Supplementary Experiment 1 further showed that lack of beliefs of authenticity of others’ happiness (e.g., actors simulate others' happy expressions vs. awardees smile and show happy expressions due to winning an award) reduced the amplitudes of a long-latency positive component (i.e., P570) over the frontal region in response to happy expressions. These findings suggest that (1) there are possibly general belief effects on subjective feelings and brain activities in response to facial expressions; (2) beliefs of others' pain or happiness affect neural responses to facial expressions in different time windows after face onset; (3) modulations of the P2 amplitude by beliefs of pain may not be generalized to belief effects on neural responses to any emotional states of others. We reported the results of this new ERP experiment in the revision as Supplementary Experiment 1 and also discussed the issue of specificity of modulations of empathic neural responses by beliefs of others' pain in the revised Discussion (page 49-50).

      Figure *Supplementary Experiment Figure 1. EEG results of Supplementary Experiment 1. (a) Mean rating scores of happy intensity related to happy and neutral expressions of faces with awardee or actor/actress identities. (b) ERPs to faces with awardee or actor/actress identities at the frontal electrodes. The voltage topography shows the scalp distribution of the P570 amplitude with the maximum over the central/parietal region. (c) Mean differential P570 amplitudes to happy versus neutral expressions of faces with awardee or actor/actress identities. The voltage topographies illustrate the scalp distribution of the P570 difference waves to happy (vs. neutral) expressions of faces with awardee or actor/actress identities, respectively. Shown are group means (large dots), standard deviation (bars), measures of each individual participant (small dots), and distribution (violin shape) in (a) and (c).*

      In the revised Introduction we cited additional literatures to explain the concept of empathy, behavioral and neuroimaging measures of empathy, and how, similar to previous research, we studied empathy for others' pain using subjective (self reports) and objective (brain responses) estimation of empathy (page 6-7). In particular, we mentioned that subjective estimation of empathy for pain depends on collection of self-reports of others' pain and ones' own painful feelings when viewing others' suffering. Objective estimation of empathy for pain relies on recording of brain activities (using fMRI, EEG, etc.) that differentially respond to painful or non-painful stimuli applied to others. fMRI studies revealed greater activations in the ACC, AI, and sensorimotor cortices in response to painful or non-painful stimuli applied to others. EEG studies showed that event-related potentials (ERPs) in response to perceived painful stimulations applied to others' body parts elicited neural responses that differentiated between painful and neutral stimuli over the frontal region as early as 140 ms after stimulus onset (Fan and Han, 2008; see Coll, 2018 for review). Moreover, the mean ERP amplitudes at 140–180 ms predicted subjective reports of others' pain and ones' own unpleasantness. Particularly related to the current study, previous research showed that pain compared to neutral expressions increased the amplitude of the frontal P2 component at 128–188 ms after stimulus onset (Sheng and Han, 2012; Sheng et al., 2013; 2016; Han et al., 2016; Li and Han, 2019) and the P2 amplitudes in response to others' pain expressions positively predicted subjective feelings of own unpleasantness induced by others' pain and self-report of one's own empathy traits (e.g., Sheng and Han, 2012). These brain imaging findings indicate that brain responses to others' pain can (1) differentiate others' painful or non-painful emotional states to support understanding of others' pain and (2) predict subjective feelings of others' pain and one's own unpleasantness induced by others' pain to support sharing of others' painful feelings. These findings provide effective subjective and objective measures of empathy that were used in the current study to investigate neural mechanisms underlying modulation of empathy and altruism by beliefs of others’ pain.

      In addition, we took Reviewer #1’s suggestion for VPS analyses which examined specifically how neural activities in the empathy-related regions identified in the previous research (Krishnan et al., 2016, eLife) were modulated by beliefs of others’ pain. The results (page 40) provide further evidence for our hypothesis. We also reported new results of RSA analyses(page 39) that activities in the brain regions supporting affective sharing (e.g., insula), sensorimotor resonance (e.g., post-central gyrus), and emotion regulation (e.g., lateral frontal cortex) provide intermediate mechanisms underlying modulations of subjective feelings of others' pain intensity due to lack of BOP. We believe that, putting all these results together, our paper provides consistent evidence that empathy and altruistic behavior are modulated by BOP.

      Reviewer #2 (Public Review):

      [...] 1. In laying out their hypotheses, the authors write, "The current work tested the hypothesis that BOP provides a fundamental cognitive basis of empathy and altruistic behavior by modulating brain activity in response to others' pain. Specifically, we tested predictions that weakening BOP inhibits altruistic behavior by decreasing empathy and its underlying brain activity whereas enhancing BOP may produce opposite effects on empathy and altruistic behavior." While I'm a little dubious regarding the enhancement effects (see below), a supporting assumption here seems to be that at baseline, we expect that painful expressions reflect real pain experience. To that end, it might be helpful to ground some of the introduction in what we know about the perception of painful expressions (e.g., how rapidly/automatically is pain detected, do we preferentially attend to pain vs. other emotions, etc.).

      Thanks for this suggestion! We included additional details about previous findings related to processes of painful expressions in the revised Introduction (page 7-8). Specifically, we introduced fMRI and ERP studies of pain expressions that revealed structures and temporal procedure of neural responses to others' pain (vs. neutral) expressions. Moreover, neural responses to others' pain (vs. neutral) expressions were associated with self-report of others' feelings, indicating functional roles of pain-expression induced brain activities in empathy for pain.

      1. For me, the key takeaway from this manuscript was that our assessment of and response to painful expressions is contextually-sensitive - specifically, to information reflecting whether or not targets are actually in pain. As the authors state it, "Our behavioral and neuroimaging results revealed critical functional roles of BOP in modulations of the perception-emotion-behavior reactivity by showing how BOP predicted and affected empathy/empathic brain activity and monetary donations. Our findings provide evidence that BOP constitutes a fundamental cognitive basis for empathy and altruistic behavior in humans." In other words, pain might be an incredibly socially salient signal, but it's still easily overridden from the top down provided relevant contextual information - you won't empathize with something that isn't there. While I think this hypothesis is well-supported by the data, it's also backed by a pretty healthy literature on contextual influences on pain judgments (including in clinical contexts) that I think the authors might want to consider referencing (here are just a few that come to mind: Craig et al., 2010; Twigg et al., 2015; Nicolardi et al., 2020; Martel et al., 2008; Riva et al., 2015; Hampton et al., 2018; Prkachin & Rocha, 2010; Cui et al., 2016).

      Thanks for this great suggestion! Accordingly, we included an additional paragraph in the revised Discussion regarding how social contexts influence empathy and cited the studies mentioned here (page 46-47).

      1. I had a few questions regarding the stimuli the authors used across these experiments. First, just to confirm, these targets were posing (e.g., not experiencing) pain, correct? Second, the authors refer to counterbalancing assignment of these stimuli to condition within the various experiments. Was target gender balanced across groups in this counterbalancing scheme? (e.g., in Experiment 1, if 8 targets were revealed to be actors/actresses in Round 2, were 4 female and 4 male?) Third, were these stimuli selected at random from a larger set, or based on specific criteria (e.g., normed ratings of intensity, believability, specificity of expression, etc.?) If so, it would be helpful to provide these details for each experiment.

      We'd be happy to clarify these questions. First, photos of faces with pain or neutral expressions were adopted from the previous work (Sheng and Han, 2012). Photos were taken from models who were posing but not experience pain. These photos were taken and selected based on explicit criteria of painful expressions (i.e., brow lowering, orbit tightening, and raising of the upper lip; Prkachin, 1992). In addition, the models' facial expressions were validated in independent samples of participants (see Sheng and Han, 2012). Second, target gender was also balanced across groups in this counterbalancing scheme. We also analyzed empathy rating score and monetary donations related to male and female target faces and did not find any significant gender effect (see our response to Point 5 below). Third, because the face stimuli were adopted from the previous work and the models' facial expressions were validated in independent samples of participants regarding specificity of expression, pain intensity, etc (Sheng and Han, 2012), we did not repeat these validation in our participants. Most importantly, we counterbalanced the stimuli in different conditions so that the stimuli in different conditions (e.g., patient vs. actor/actress conditions) were the same across the participants in each experiment. The design like this excluded any potential confound arising from the stimuli themselves.

      1. The nature of the charitable donation (particularly in Experiment 1) could be clarified. I couldn't tell if the same charity was being referenced in Rounds 1 and 2, and if there were multiple charities in Round 2 (one for the patients and one for the actors).

      Thanks for this comment! Yes, indeed, in both Rounds 1 and 2, the participants were informed that the amount of one of their decisions would be selected randomly and donated to one of the patients through the same charity organization (we clarified these in the revised Method section, page 55-56). We made clear in the revision that after we finished all the experiments of this study, the total amount of the participants' donations were subject to a charity organization to help patients who suffer from the same disease after the study.

      1. I'm also having a hard time understanding the authors' prediction that targets revealed to truly be patients in the 2nd round will be associated with enhanced BOP/altruism/etc. (as they state it: "By contrast, reconfirming patient identities enhanced the coupling between perceived pain expressions of faces and the painful emotional states of face owners and thus increased BOP.") They aren't in any additional pain than they were before, and at the outset of the task, there was no reason to believe that they weren't suffering from this painful condition - therefore I don't see why a second mention of their pain status should increase empathy/giving/etc. It seems likely that this is a contrast effect driven by the actor/actress targets. See the Recommendations for the Authors for specific suggestions regarding potential control experiments. (I'll note that the enhancement effect in Experiment 2 seems more sensible - here, the participant learns that treatment was ineffective, which may be painful in and of itself.)

      Thanks for comments on this important point! Indeed, our results showed that reassuring patient identities in Experiment 1 or by noting the failure of medical treatment related to target faces in Experiment 2 increased rating scores of others' pain and own unpleasantness and prompted more monetary donations to target faces. The increased empathy rating scores and monetary donations might be due to that repeatedly confirming patient identity or knowing the failure of medical treatment increased the belief of authenticity of targets' pain and thus enhanced empathy. However, repeatedly confirming patient identity or knowing the failure of medical treatment might activate other emotional responses to target faces such as pity or helplessness, which might also influence altruistic decisions. We agree with Reviewer #2 that, although our subjective estimation of empathy in Exp. 1 and 2 suggested enhanced empathy in the 2nd_round test, there are alternative interpretations of the results and these should be clarified in future work. We clarified these points in the revised Discussion (page 41-42).

      1. I noted that in the Methods for Experiment 3, the authors stated "We recruited only male participants to exclude potential effects of gender difference in empathic neural responses." This approach continues through the rest of the studies. This raises a few questions. Are there gender differences in the first two studies (which recruited both male and female participants)? Moreover, are the authors not concerned about target gender effects? (Since, as far as I can tell, all studies use both male and female targets, which would mean that in Experiments 3 and on, half the targets are same-gender as the participants and the other half are other-gender.) Other work suggests that there are indeed effects of target gender on the recognition of painful expressions (Riva et al., 2011).

      Thanks for raising this interesting question! Therefore, we reanalyzed data in Exp. 1 by including participants' gender or face gender as an independent variable. The three-way ANOVAs of pain intensity scores and amounts of monetary donations with Face Gender (female vs. male targets) × Test Phase (1st vs. 2nd_round) × Belief Change (patient-identity change vs. patient-identity repetition) did not show any significant three-way interaction (F(1,59) = 0.432 and 0.436, p = 0.514 and 0.512, ηp2 = 0.007 and 0.007, 90% CI = (0, 0.079) and (0, 0.079), indicating that face gender do not influence the results (see the figure below). Similarly, the three-way ANOVAs with Participant Gender (female vs. male participants) × Test Phase × Belief Change did not show any significant three-way interaction (F(1,58) = 0.121 and 1.586, p = 0.729 and 0.213, ηp2 = 0.002 and 0.027, 90% CI = (0, 0.055) and (0, 0.124), indicating no reliable difference in empathy and donation between men and women. It seems that the measures of empathy and altruistic behavior in our study were not sensitive to gender of empathy targets and participants' sexes.

      image Figure legend: (a) Scores of pain intensity and amount of monetary donations are reported separately for male and female target faces. (b) Scores of pain intensity and amount of monetary donations are reported separately for male and female participants.

      1. I was a little unclear on the motivation for Experiment 4. The authors state "If BOP rather than other processes was necessary for the modulation of empathic neural responses in Experiment 3, the same manipulation procedure to assign different face identities that do not change BOP should change the P2 amplitudes in response to pain expressions." What "other processes" are they referring to? As far as I could tell, the upshot of this study was just to demonstrate that differences in empathy for pain were not a mere consequence of assignment to social groups (e.g., the groups must have some relevance for pain experience). While the data are clear and as predicted, I'm not sure this was an alternate hypothesis that I would have suggested or that needs disconfirming.

      Thanks for this comment! We feel sorry for not being able to make clear the research question in Exp. 4. In the revised Results section (page 27-28) we clarified that the learning and EEG recording procedures in Experiment 3 consisted of multiple processes, including learning, memory, identity recognition, assignment to social groups, etc. The results of Experiment 3 left an open question of whether these processes, even without BOP changes induced through these processes, would be sufficient to result in modulation of the P2 amplitude in response to pain (vs. neutral) expressions of faces with different identities. In Experiment 4 we addressed this issue using the same learning and identity recognition procedures as those in Experiment 3 except that the participants in Experiment 4 had to learn and recognize identities of faces of two baseball teams and that there is no prior difference in BOP associated with faces of beliefs of the two baseball teams. If the processes involved in the learn and reorganization procedures rather than the difference in BOP were sufficient for modulation of the P2 amplitude in response to pain (vs. neutral) expressions of faces, we would expect similar P2 modulations in Experiments 4 and 3. Otherwise, the difference in BOP produced during the learning procedure was necessary for the modulation of empathic neural responses, we would not expect modulations of the P2 amplitude in response to pain (vs. neutral) expressions in Experiment 4. We believe that the goal and rationale of Exp. 4 are clear now.

    1. Author Response:

      Reviewer #1 (Public Review):

      My main concern with this work is the absence of formal statistical analyses to support the authors' interpretation. These assertions seem to be based on a visual analysis of the data. In my opinion, formal statistical analyses should be performed. Also, I am not certain the evidence for the predictable ordering of mutation is sufficient.

      In particular, the statements regarding the rate of drug resistance evolution, the proportion of patients with 0-3 drug mutations, and the ordering of mutations do not receive formal statistical analysis, but are important to the interpretation. Indeed, formal statistical analysis does not appear in this manuscript.

      Without these analysis, it does not seem possible at present to assess whether the authors have achieved their aims.

      In response to Reviewer #1’s useful suggestion that we quantify formally the findings reported in Figure 2, we had added several new analyses.

      Reviewer #2 (Public Review):

      I found myself a little disappointed that the authors had stopped short of doing any modelling, especially given their remarks in the introduction about the need to match models to data, and the lack of a framework for understanding how best to recapitulate clinical data.

      We share the desire to add quantitative modeling matched to observations from clinical data.

      We focused on a combination of drugs with well-characterized mutation rates, mutant-selection windows, drug penetrances across multiple compartments, half-life and detailed clinical data (i.e., what is plotted in Fig 1A and Fig 2B and C) - 3TC+D4T+NFV. We extended two existing models of of spatial (Moreno-Gamez et al 2015) or temporal (Rosenbloom et al 2012) heterogeneity (via incomplete drug penetrance or adherence, respectively) to account for three drugs and simulated 1500 patients where we examined clinical features (resistance timing, number of mutations and order of mutations) similar to our analysis on real viral data. In doing so, we discovered that, for example, while the model of temporal heterogeneity can create sequential and predictable evolution of resistance, under such a model, very little resistance evolution emerges after initial virologic suppression, even in patients with moderate or low adherence. This model outcome is inconsistent with the ongoing resistance evolution observed so frequently in individuals with HIV. This finding validates our argument in the initial submission that quantitative models paired with clinical data are necessary to understand the evolution of multi-drug resistance, and motivates new questions about which types of adherence behaviors can allow ongoing resistance to emerge .

      While we still very much believe that a future study should compare patterns across many different types of triple drug therapies, starting with one well-characterized therapy already helps us understand which clinical patterns emerge straightforwardly from simple models and which ones do not, and motivates future thinking about how these models must be extended.

      Finally, I found it hard to pick out the new points being made by the authors from the previous literature, as well as the implications of these new ideas. For example, spatial and temporal differences in drug concentration have been used to explain viral rebounds etc. (which the authors discuss), however, is the central point in this paper that these two models of viral dynamics could also explain the three-fold pattern (as described in the Overview)? Perhaps the motivation could be clarified. I'm sure this will be a case of shortening and clarifying the introduction. (This confusion was compounded somewhat by the lack of quantitative analysis as the point above.)

      While previous studies have certainly explored the role of spatial and temporal heterogeneity in drug levels (brought on by imperfect penetrance or adherence) in permitting the evolution of drug resistance, there are no current studies that we know about that examine multiple carefully parameterized triple-drug drug therapies that vary in time and space and are compared to multiple facets of clinical data (resistance timing and rates, mutation presence/absence, and mutational ordering). To help make this point clearer, we’ve added a table (Supplemental Table 1) that discusses the pre-existing literature of models, what types of therapies are examined (one, two or three drugs), whether or not they’re compared to patient data, and what type. In addition, we have attempted to clarify this point in the introduction.

    1. Author Response:

      We thank the editors and the reviewers for their careful reading and rigorous evaluation of our manuscript. We thank them for their positive comments and constructive feedback, which led us to add further lines of evidence in support of our central hypothesis that intrinsic neuronal resonance could stabilize heterogeneous grid-cell networks through targeted suppression of low-frequency perturbations. In the revised manuscript, we have added a physiologically rooted mechanistic model for intrinsic neuronal resonance, introduced through a slow negative feedback loop. We show that stabilization of patterned neural activity in a heterogeneous continuous attractor network (CAN) model could be achieved with this resonating neuronal model. These new results establish the generality of the stabilizing role of neuronal resonance in a manner independent of how resonance was introduced. More importantly, by specifically manipulating the feedback time constant in the neural dynamics, we establish the critical role of the slow kinetics of the negative feedback loop in stabilizing network function. These results provide additional direct lines of evidence for our hypothesis on the stabilizing role of resonance in the CAN model employed here. Intuitively, we envisage intrinsic neuronal resonance as a specific cellular-scale instance of a negative feedback loop. The negative feedback loop is a well-established network motif that acts as a stabilizing agent and suppresses the impact of internal and external perturbations in engineering applications and biological networks.

      Reviewer #1 (Public Review):

      The authors succeed in conveying a clear and concise description of how intrinsic heterogeneity affects continuous attractor models. The main claim, namely that resonant neurons could stabilize grid-cell patterns in medial entorhinal cortex, is striking.

      We thank the reviewer for their time and effort in evaluating our manuscript, and for their rigorous evaluation and positive comments on our study.

      I am intrigued by the use of a nonlinear filter composed of the product of s with its temporal derivative raised to an exponent. Why this particular choice? Or, to be more specific, would a linear bandpass filter not have served the same purpose?

      Please note that the exponent was merely a mechanism to effectively tune the resonance frequency of the resonating neuron. In the revised manuscript, we have introduced a new physiologically rooted means to introduce intrinsic neuronal resonance, thereby confirming that network stabilization achieved was independent of the formulation employed to achieve resonance.

      The magnitude spectra are subtracted and then normalized by a sum. I have slight misgivings about the normalization, but I am more worried that, as no specific formula is given, some MATLAB function has been used. What bothers me a bit is that, depending on how the spectrogram/periodogram is computed (in particular, averaged over windows), one would naturally expect lower frequency components to be more variable. But this excess variability at low frequencies is a major point in the paper.

      We have now provided the specific formula employed for normalization as equation (16) of the revised manuscript. We have also noted that this was performed to account for potential differences in the maximum value of the homogeneous vs. heterogeneous spectra. The details are provided in the Methods subsection “Quantitative analysis of grid cell temporal activity in the spectral domain” of the revised manuscript. Please note that what is computed is the spectra of the entire activity pattern, and not a periodogram or a scalogram. There was no tiling of the time-frequency plane involved, thus eliminating potential roles of variables there on the computation here.

      In addition to using variances of normalized differences to quantify spectral distributions, we have also independently employed octave-based analyses (which doesn’t involve normalized differences) to strengthen our claims about the impact of heterogeneities and resonance on different bands of frequency. These octave-based analyses also confirm our conclusions on the impact of heterogeneities and neuronal resonance on low-frequency components.

      Finally, we would like to emphasize that spectral computations are the same for different networks, with networks designed in such a way that there was only one component that was different. For instance, in introducing heterogeneities, all other parameters of the network (the specific trajectory, the seed values, the neural and network parameters, the connectivity, etc.) remained exactly the same with the only difference introduced being confined to the heterogeneities. Computation of the spectral properties followed identical procedures with activity from individual neurons in the two networks, and comparison was with reference to identically placed neurons in the two networks. Together, based on the several routes to quantifying spectral signatures, based on the experimental design involved, and based on the absence of any signal-specific tiling of the time-frequency plane, we argue that the impact of heterogeneities or the resonators on low-frequency components is not an artifact of the analysis procedures.

      We thank the reviewer for raising this issue, as it helped us to elaborate on the analysis procedures employed in our study.

      Which brings me to the main thesis of the manuscript: given the observation of how heterogeneities increase the variability in the low temporal frequency components, the way resonant neurons stabilize grid patterns is by suppressing these same low frequency components.

      I am not entirely convinced that the observed correlation implies causality. The low temporal frequeny spectra are an indirect reflection of the regularity or irregularity of the pattern formation on the network, induced by the fact that there is velocity coupling to the input and hence dynamics on the network. Heterogeneities will distort the pattern on the network, that is true, but it isn't clear how introducing a bandpass property in temporal frequency space affects spatial stability causally.

      Put it this way: imagine all neurons were true oscillators, only capable of oscillating at 8 Hz. If they were to synchronize within a bump, one will have the field blinking on and off. Nothing wrong with that, and it might be that such oscillatory pattern formation on the network might be more stable than non-oscillatory pattern formation (perhaps one could even demonstrate this mathematically, for equivalent parameter settings), but this kind of causality is not what is shown in the manuscript.

      The central hypothesis of our study was that intrinsic neuronal resonance could stabilize heterogeneous grid-cell networksthrough targeted suppression of low-frequency perturbations.

      In the revised manuscript, we present the following lines of evidence in support of this hypothesis (mentioned now in the first paragraph of the discussion section of the revised manuscript):

      1. Neural-circuit heterogeneities destabilized grid-patterned activity generation in a 2D CAN model (Figures 2–3).

      2. Neural-circuit heterogeneities predominantly introduced perturbations in the lowfrequency components of neural activity (Figure 4).

      3. Targeted suppression of low-frequency components through phenomenological (Figure 5C) or through mechanistic (new Figure 9D) resonators resulted in stabilization of the heterogeneous CAN models (Figure 8 and new Figure 11). We note that the stabilization was achieved irrespective of the means employed to suppress low-frequency components: an activity-independent suppression of low-frequencies (Figure 5) or an activity-dependent slow negative feedback loop (new Figure 9).

      4. Changing the feedback time constant τm in mechanistic resonators, without changes to neural gain or the feedback strength allowed us to control the specific range of frequencies that would be suppressed. Our analyses showed that a slow negative feedback loop, which results in targeted suppression of low-frequency components, was essential in stabilizing grid-patterned activity (new Figure 12). As the slow negative feedback loop and the resultant suppression of low frequencies mediates intrinsic resonance, these analyses provide important lines of evidence for the role of targeted suppression of low frequencies in stabilizing grid patterned activity.

      5. We demonstrate that the incorporation of phenomenological (Figure 13A–C) or mechanistic (new Figure panels 13D–F) resonators specifically suppressed lower frequencies of activity in the 2D CAN model.

      6. Finally, the incorporation of resonance through a negative feedback loop allowed us to link our analyses to the well-established role of network motifs involving negative feedback loops in inducing stability and suppressing external/internal noise in engineering and biological systems. We envisage intrinsic neuronal resonance as a cellular-scale activitydependent negative feedback mechanism, a specific instance of a well-established network motif that effectuates stability and suppresses perturbations across different networks (Savageau, 1974; Becskei and Serrano, 2000; Thattai and van Oudenaarden, 2001; Austin et al., 2006; Dublanche et al., 2006; Raj and van Oudenaarden, 2008; Lestas et al., 2010; Cheong et al., 2011; Voliotis et al., 2014). A detailed discussion on this important link to the stabilizing role of this network motif, with appropriate references to the literature is included in the new discussion subsection “Slow negative feedback: Stability, noise suppression, and robustness”.

      We thank the reviewer for their detailed comments. These comments helped us to introducing a more physiologically rooted mechanistic form of resonance, where we were able to assess the impact of slow kinetics of negative feedback on network stability, thereby providing more direct lines of evidence for our hypothesis. This also allowed us to link resonance to the wellestablished stability motif: the negative feedback loop. We also note that our analyses don’t employ resonance as a route to introducing oscillations in the network, but as a means for targeted suppression of low-frequency perturbations through a negative feedback loop. Given the strong quantitative links of negative feedback loops to introducing stability and suppressing the impact of perturbations in engineering applications and biological networks, we envisage intrinsic neuronal resonance as a stability-inducing cellular-scale activity-dependent negative feedback mechanism.

      Reviewer #2 (Public Review):

      [...] The pars construens demonstrates that similar networks, but comprised of units with different dynamical behavior, essentially amputated of their slowest components, do not suffer from the heterogeneities - they still produce grids. This part proceeds through 3 main steps: a) defining "resonator" units as model neurons with amputated low frequencies (Fig. 5); b) showing that inserted into the same homogeneous CAN network, "resonator" units produce the same grids as "integrator" units (Figs. 6,7); c) demonstrating that however the network with "resonator" units is resistant to heterogeneities (Fig. 8). Figs. 9 and 10 help understand what has produced the desired grid stabilization effect. This second part is on the whole also well structured, and its step c) is particularly convincing.

      We thank the reviewer for their time and effort in evaluating our manuscript, and for their rigorous evaluation and positive comments on our study.

      Step b) intends to show that nothing important changes, in grid pattern terms, if one replaces the standard firing rate units with the ad hoc defined units without low frequency behavior. The exact outcome of the manipulation is somewhat complex, as shown in Figs. 6 and 7, but it could be conceivably summed up by stating that grids remain stable, when low frequencies are removed. What is missing, however, is an exploration of whether the newly defined units, the "resonators", could produce grid patterns on their own, without the CAN arising from the interactions between units, just as a single-unit effect. I bet they could, because that is what happens in the adaptation model for the emergence of the grid pattern, which we have studied extensively over the years. Maybe with some changes here and there, but I believe the CAN can be disposed of entirely, except to produce a common alignment between units, as we have shown.

      Step a), finally, is the part of the study that I find certainly not wrong, but somewhat misleading. Not wrong, because what units to use in a model, and what to call them, is a legitimate arbitrary choice of the modelers. Somewhat misleading, because the term "resonator" evokes a more specific dynamical behavior that than obtained by inserting Eqs. (8)-(9) into Eq. (6), which amounts to a brute force amputation of the low frequencies, without any real resonance to speak of. Unsurprisingly, Fig. 5, which is very clear and useful, does not show any resonance, but just a smooth, broad band-pass behavior, which is, I stress legitimately, put there by hand. A very similar broad band-pass would result from incorporating into individual units a model of firing rate adaptation, which is why I believe the "resonator" units in this study would generate grid patterns, in principle, without any CAN.

      We thank the reviewer for these constructive comments and questions, as they were extremely helpful in (i) formulating a new model for rate-based resonating neurons that is more physiologically rooted; (ii) demonstrating the stabilizing role of resonance irrespective of model choices that implemented resonance; and (iii) mechanistically exploring the impact of targeted suppression of low frequency components in neural activity. We answer these comments of the reviewer in two parts, the first addressing other models for grid-patterned activity generation and the second addressing the reviewer’s comment on “brute force amputation of the low frequencies” in the resonator neuron presented in the previous version of our manuscript.

      I. Other models for grid-patterned activity generation.

      In the adaptation model (Kropff and Treves, 2008; Urdapilleta et al., 2017; Stella et al., 2020), adaptation in conjunction with place-cell inputs, Hebbian synaptic plasticity, and intrinsic plasticity (in gain and threshold) to implement competition are together sufficient for the emergence of the grid-patterned neural activity. However, the CAN model that we chose as the substrate for assessing the impact of neural circuit heterogeneities on functional stability is not equipped with the additional components (place-cell inputs, synaptic/intrinsic plasticity). Therefore, we note that decoupling the single unit (resonator or integrator) from the network does not yield grid-patterned activity.

      However, we do agree that a resonator neuron endowed with additional components from the adaptation model would be sufficient to elicit grid-patterned neural activity. This is especially clear with the newly introduced mechanistic model for resonance through a slow feedback loop (Figure 9). Specifically, resonating conductances such as HCN and M-type potassium channels can effectuate spike-frequency adaptation. One of the prominent channels that is implicated in introducing adaptation, the calcium-activated potassium channels implement a slow activitydependent negative feedback loop through the slow calcium kinetics. Neural activity drives calcium influx, and the slow kinetics of the calcium along with the channel-activation kinetics drive a potassium current that completes a negative feedback loop that inhibits neural activity. Consistently, one of the earliest-reported forms of electrical resonance in cochlear hair cells was shown to be mediated by calcium-activated potassium channels (Crawford and Fettiplace, 1978, 1981; Fettiplace and Fuchs, 1999). Thus, adaptation realized as a slow negative-feedback loop, in conjunction with place-cell inputs and intrinsic/synaptic plasticity would elicit gridpatterned neural activity as demonstrated earlier (Kropff and Treves, 2008; Urdapilleta et al., 2017; Stella et al., 2020).

      There are several models for the emergence of grid-patterned activity, and resonance plays distinct roles (compared to the role proposed through our analyses) in some of these models (Giocomo et al., 2007; Kropff and Treves, 2008; Burak and Fiete, 2009; Burgess and O'Keefe, 2011; Giocomo et al., 2011b; Giocomo et al., 2011a; Navratilova et al., 2012; Pastoll et al., 2012; Couey et al., 2013; Domnisoru et al., 2013; Schmidt-Hieber and Hausser, 2013; Yoon et al., 2013; Schmidt-Hieber et al., 2017; Urdapilleta et al., 2017; Stella et al., 2020; Tukker et al., 2021). However, a common caveat that spans many of these models is that they assume homogeneous networks that do not account for the ubiquitous heterogeneities that span neural circuits. Our goal in this study was to take a step towards rectifying this caveat, towards understanding the impact of neural circuit heterogeneities on network stability. We chose the 2D CAN model for grid-patterned activity generation as the substrate for addressing this important yet under-explored question on the role of biological heterogeneities on network function. As we have mentioned in the discussion section, this choice implies that our conclusions are limited to the 2D CAN model for grid patterned generation; these conclusions cannot be extrapolated to other networks or other models for grid-patterned activity generation without detailed analyses of the impact of neural circuit heterogeneities in those models. As our focus here was on the stabilizing role of resonance in heterogeneous neural networks, with 2D CAN model as the substrate, we have not implemented the other models for grid-patterned generation. The impact of biological heterogeneities and resonance on each of these models should be independently addressed with systematic analyses similar to our analyses for the 2D CAN model. As different models for grid-patterned activity generation are endowed with disparate dynamics, and have different roles for resonance, it is conceivable that the impact of biological heterogeneities and intrinsic neuronal resonance have differential impact on these different models. We have mentioned this as a clear limitation of our analyses in the discussion section, also presenting future directions for associated analyses(subsection: “Future directions and considerations in model interpretation”).

      II. Brute force amputation of the low frequencies in the resonator model.

      We completely agree with the reviewer on the observation that the resonator model employed in the previous version of our manuscript was rather artificial, with the realization involving brute force amputation of the lower frequencies. To address this concern, in the revised manuscript, we constructed a new mechanistic model for single-neuron resonance that matches the dynamical behavior of physiological resonators. Specifically, we noted that physiological resonance is elicited by a slow activity-dependent negative feedback (Hutcheon and Yarom, 2000). To incorporate resonance into our rate-based model neurons, we mimicked this by introducing a slow negative feedback loop into our single-neuron dynamics (the motivations are elaborated in the new results subsection “Mechanistic model of neuronal intrinsic resonance: Incorporating a slow activity-dependent negative feedback loop”). The singleneuron dynamics of mechanistic resonators were defined as follows:

      Diagram

      Here, S governed neuronal activity, τ defined the feedback state variable, g represented the integration time constant, Ie was the external current, and g represented feedback strength. The slow kinetics of the negative feedback was controlled by the feedback time constant (τm). In order to manifest resonance, τm > τ (Hutcheon and Yarom, 2000). The steady-state feedback kernel (m∞) of the negative feedback is sigmoidally dependent on the output of the neuron (S), defined by two parameters: half-maximal activity (S1/2) and slope (k). The single-neuron dynamics are elaborated in detail in the methods section (new subsection: Mechanistic model for introducing intrinsic resonance in rate-based neurons).

      We first demonstrate that the introduction of a slow-negative feedback loop introduce resonance into single-neuron dynamics (new Figure 9D–E). We performed systematic sensitivity analyses associated with the parameters of the feedback loop and characterized the dependencies of intrinsic neuronal resonance on model parameters (new Figure 9F–I). We demonstrate that the incorporation of resonance through a negative feedback loop was able to generate grid-patterned activity in the 2D CAN model employed here, with clear dependencies on model parameters (new Figure 10; new Figure 10-Supplements1–2). Next, we incorporated heterogeneities into the network and demonstrated that the introduction of resonance through a negative feedback loop stabilized grid-patterned generation in the heterogeneous 2D CAN model (new Figure 11).

      The mechanistic route to introducing resonance allowed us to probe the basis for the stabilization of grid-patterned activity more thoroughly. Specifically, with physiological resonators, resonance manifests only when the feedback loop is slow (new Figure 9I; Hutcheon and Yarom, 2000). This allowed us an additional mechanistic handle to directly probe the role of resonance in stabilizing the grid patterned activity. We assessed the emergence of grid-patterned activity in heterogeneous CAN models constructed with networks constructors with neurons with different τm values (new Figure 12). Strikingly, we found that when τm value was small (resulting in fast feedback loops), there was no stabilization of gridpatterned activity in the CAN model, especially with the highest degree of heterogeneities (new Figure 12). With progressive increase in τm, the patterns stabilized with grid score increasing with τm=25 ms (new Figure 12) and beyond (new Figure 11B; τm=75 ms). Finally, our spectral analyses comparing frequency components of homogeneous vs. heterogeneous resonator networks (new Figure panels 13D–F) showed the suppression of low-frequency perturbations in heterogeneous CAN networks.

      We gratefully thank the reviewer for raising the issue with the phenomenological resonator model. This allowed us to design the new resonator model and provide several new lines of evidence in support of our central hypothesis. The incorporation of resonance through a negative feedback loop also allowed us to link our analyses to the well-established role of network motifs involving negative feedback loops in inducing stability and suppressing external/internal noise in engineering and biological systems. We envisage intrinsic neuronal resonance as a cellular-scale activity-dependent negative feedback mechanism, a specific instance of a well-established network motif that effectuates stability and suppresses perturbations across different networks (Savageau, 1974; Becskei and Serrano, 2000; Thattai and van Oudenaarden, 2001; Austin et al., 2006; Dublanche et al., 2006; Raj and van Oudenaarden, 2008; Lestas et al., 2010; Cheong et al., 2011; Voliotis et al., 2014). A detailed discussion on this important link to the stabilizing role of this network motif, with appropriate references to the literature is included in the new discussion subsection “Slow negative feedback: Stability, noise suppression, and robustness”.

    1. Author Response:

      Reviewer #1 (Public Review):

      I think that it is important for the authors to consider that for most (if not all) SARS-CoV-2 variants, increased transmissibility of the virus has not been directly demonstrated. While it is clear that numerous variants have emerged and will continue to emerge, the rapid upsurge of cases with a variant may be related to many factors (e.g. host susceptibility due to immunity or genetic factors, virus seeding events, predominant replication in particular age cohorts, ...) that cannot simply be captured as "transmissibility of the virus". Even for B.1.1.7 and D614G mutants, the direct evidence of increased transmissibility in humans is extremely limited if available at all. Most studies erroneously simply take the increasing occurrence of particular lineages or mutations in sequence databases as a measure of increased "transmissibility", which should be avoided, also in the present manuscript. Increased transmissibility can only be derived from field studies where transmission is measured directly.

      We thank the reviewer for pointing out that this is a controversial area. We have adjusted the text throughout to accommodate the fact that the published evidence of increased transmissibility/infectivity is not definitive.

      On several occasions in the manuscript (e.g. page 3, page 4 L58-59, page 9 in submitted version), the authors seem to suggest that changes that lead to increased "transmission" or binding affinity and changes that lead to immune escape are mutually exclusive. But the opposite might be true. Viruses may escape from antibody-mediated immunity by amino acid substitutions in linear or structural antibody-binding epitopes. However, viruses may also escape from antibody-mediated immunity through altered protein density on virion surfaces (e.g. less Spike) and/or altered affinity, making it harder for antibody to inhibit virus attachment. As an example, increased affinity may facilitate virus replication with less dense Spike protein, allowing more effective antibody escape. Lower affinity but more dense coverage of Spike may reduce accessibility of critical virus parts by antibodies. Several viruses are known to escape from antibody-mediated neutralization through changes in affinity/avidity.

      We agree with this point and have modified the text to avoid implying that increased transmissibility and antibody escape are mutually exclusive.

      In relation to the previous point, it is important that authors mention some limitations of the present work in the discussion. SARS-CoV-2 virion attachment to cells is not just a matter of spike protein binding and certainly not of a monomeric RBD. Escape from antibodies and effects on affinity are heavily influenced by the entire (trimeric) spike protein, including its N-terminal domains. Such components are not taken into account in the present experimental designs, and this should be discussed, as e.g. the NTD can be important in attachment and antibody-mediated neutralization.

      We thank the reviewer for this suggestion. We have added an appropriate caveat to the Discussion.

      The authors suggest that the pandemic virus as it spread across the globe initially did not have "optimized" affinity. However, in the first months of the pandemic, there was relatively limited variation in spike protein sequences. The major variants emerged only later and mostly in areas where population immunity was building up. Again, this begs the question whether natural selection is occurring as a consequence of receptor affinity or immune escape?

      We thank the reviewer for making this point. However, we do not think it is that surprising that it took a few months for the first Spike variants to be detected, for the following reasons. Firstly, the number of infections would have been relatively low early in the pandemic and SARS-CoV-2 replicates with a comparatively low error rate for an RNA virus. Secondly, the introduction of strict non-pharmacological measures (social-distancing etc), which would have increased the selective pressure on the virus, was somewhat delayed. Thirdly, it would take some time for any variant that emerged by chance to expand sufficiently to be detected by sequencing. While there is evidence suggestive of broader immunity in populations were the Beta and Gamma variants emerged, which we cite, we are not aware of evidence of widespread immunity in populations where the D614G, S477N and Alpha variants first emerged.

      Reviewer #2 (Public Review):

      Barton and colleagues investigated the effect of common SARS-CoV-2 RBD mutations and two ACE2 mutations on the RBD/ACE2 interaction. They concluded that the N501Y, E484K and S477N increased receptor binding while the K417N/T had the opposite effect. Double and triple mutants were also included. The ACE2 mutations (that are rare in the human population) also increased binding to most RBD mutants. The study is well-performed and written clearly.

      The primary conclusions of the manuscript were supported by the results. However, the interpretation was too speculative. In the abstract (lines 14-17), the authors suggest that the 501 and 477 mutations enhance transmission solely based on data on the RBD-ACE2 interaction. It is unknown whether increased affinity to ACE2 is beneficial for transmission. In addition, increased RBD affinity to ACE2 does not mean that the whole spike or virus particle also binds stronger to ACE2. Lastly, increasing ACE2 affinity does not necessarily increase binding to cells (for example S1A binding to sugars or spike abundance can also influence this).

      We agree that it would be inappropriate to assume, based on our affinity/kinetic studies alone, that 501 and 477 enhance transmission. That is why the relevant sentence in the abstract starts with the phrase, “Taken together with other studies”. We summarises the evidence from these other studies in the Discussion. We acknowledge that we have not examined the effects of the mutations on binding of the whole Spike protein to ACE2 or viruses to cells, and have added a suitable caveats to the Discussion.

      The overall impact on the field will be limited as there is substantial overlap with already published studies. The observation that the N501Y and E484K increase receptor binding while the K417N/T mutations decrease binding was already made prior by Laffeber et al (2021; J Mol Biol). Laffeber et al also investigated double and triple mutants and came to similar conclusions. Liu et al (2021) confirmed that the N501Y increases binding whereas the K417N/T have opposing effects (Liu et al., 2021 mAbs). The observation that the Y501N increases ACE2 affinity has been made by several groups (e.g. Liu et al 2021 Cell research; Starr et al 2020 Cell).

      We thank the reviewer for highlighting these addition studies, two of which are very recent. We have now cited these studies.

      Starr et all 2020 was a high throughput study in which the affinity measurements were semi-quantitative, and no kinetic analysis was performed. Liu et al (2021) and Laffeber et al (2021) were performed at 25 C and without rigorous controls for mass-transport and protein aggregation. Liu et al (2021) did not report kinetic measurements. Their results are broadly consistent with ours but their affinity and kinetic measurments are ~ 10 fold different. While we accept that some of the measurements of the effects of mutations have been made before, our measurements of affinity and especially kinetics are performed more rigorously than in previous studies and, for the first time, at a physiological temperature (37 C). Thus, the affinity and kinetic data that we have obtained for single and combinations variants are more definitive. As noted in our Discussion there is a wide variation in reported binding affinities and kinetics in previously published studies. We think the comprehensive data that we report here, the same robust method to measure binding properties of all these variants, adds significant value.

      Reviewer #3 (Public Review):

      [...] 1) The ACE2 receptor exists naturally as a dimeric form and the RBD is a component of the SARS-CoV-2 spike trimer. The assay format here was monomeric RBD binding against monomeric ACE2 throughout this study. While the measurements are indeed carefully executed and under more physiological conditions than many other reported studies, the authors should discuss potential avidity effects, the consequences of mutations on the accessibility of the RBD in VOC versus wildtype, and impact of other domains such as the NTD, in the context of their monomeric ACE2 measurements with isolated RBD here.

      We thank the reviewer for raising this issue. We have added a section to the Discussion addressing these important points.

      2) As shown in Figure S2, RBD WT, K417N, K417T, KN/EK, KT/EK, and S477N, the ~30kDa monomeric proteins were flanked by additional ~60kDa bands (which correspond to the smaller peaks to the left of the main peaks) some of which bleed through to the main fraction to different extents, whereas RBDs SA, UK1, UK2, BR, and E484K, do not seem to have as much or any of these extra species. Can the authors comment on whether these contaminants are RBD-dimers as observed before (Dai et al. 2020)? If yes, would such dimers affect the affinity and kinetics?

      We thank the reviewer for pointing out these larger ~60 kDa bands in some RBD preps. We think that it is unlikely that these are RBD dimers as these are reducing gels. The strictly monophasic kinetics of all RBD preps, also argues against this being an RBD dimer. We have confirmed by densitometry that the larger band comprises less that 5% of the protein in all the preparations. This will have only a minor effect on estimated of RBD concentration. We have added this information to the Figure S2 legend.

    1. Author Response (July 26, 2021):

      Reviewer #1 (Public Review):

      The authors have done a great job in carefully labeling the β-catenin with fluorescent protein SGFP2 and quantitatively measuring the β-catenin behavior during Wnt pathway activation with advanced biophysical methods. This is an excellent effort on quantitative biological studies. The knock-in constructs, the cell lines the authors made are great resources for the Wnt field. And the quantification like the β-catenin concentration, β-catenin diffusion coefficient are great knowledge for future studies. The finding that S45F mutation lead to higher fraction of the slow-moving complexes is interesting. Other areas could borrow the research ideas and methods used in this manuscript. My primary concern is the difficulty of interpreting some of the quantitative results in the biological context. The authors have concluded that β-catenin has two major populations: free population and slow-diffusing complexed population. The authors have concluded with FCS that the diffusion coefficient of free β-catenin to be 14.9 um2/s (line 259) and the complexed β-catenin to be 0.17 um2/s (line 327). Similar to the authors' argument in the manuscript, this difference means about a 100-fold change of the complex length scale. If the complex is linear, this means a 100-fold change in molecule size, but if the complex is spherical, this means a one-million-fold increase of the molecule size.

      We thank the reviewer for their positive, yet consistently critical assessment of our work. We share their view that the interpretation of these quantitative results in the biological context remains challenging.

      To clarify the specific point raised by the reviewer: The diffusion coefficient of the cytoplasmic CTNNB1 complex is indeed 14.9/0.17 = 87-fold slower than the free monomeric CTNNB1. And this would indeed be indicative of an 873 change in molecular size if we assume Einstein-Stokes relation. However, the Einstein-Stokes equation is only valid when specific conditions are met (including the assumption that we are dealing with perfectly spherical particles in a homogeneous environment). Therefore, we already noted the following in the material and methods section lines 1104-1112: "It must be noted that, especially for larger protein complexes, the linearity between the radius of the protein and the speed is not ensured, if the shape is not globular, and due to other factors such as molecular crowding in the cell and hindrance from the cytoskeletal network. We therefore did not estimate the exact size of the measured CTNNB1 complexes, but rather compared them to measurements from other FCS studies.”

      In our initial submission we included the following statement “Because a 3.5-fold change in speed would result in 3.53-change in size for a spherical particle (assuming Einstein-Stokes, see equation 7 in the material and methods section for details), this indicates that the size of the cytoplasmic CTNNB1 complex drastically changes when the WNT pathway is activated.” We chose to do so, because in our prior submission of this work (28-05-2020-RA-BX-eLife-59433), Reviewer 1 remarked that “CTNNB1 resides in slow moving complexes that persist upon Wnt but become slightly more mobile”, which we felt was an underappreciation of the significance of this change. We suspect and can understand that Reviewer 1 then applied this logic directly to the speed of the slower SGFP2-CTNNB1 fraction: “Similar to the authors' argument in the manuscript, this difference means about a 100-fold change of the complex length scale. If the complex is linear, this means a 100-fold change in molecule size, but if the complex is spherical, this means a one-million-fold increase of the molecule size.” However, it was never our intention to suggest that we can calculate absolute complex sizes from these diffusion speeds, due to the constraints of the Einstein-Stokes formula explained above For clarification, the 100-fold change is not necessarily in length scale – just in diffusion time which correlates to the radius of a spherical particle. Indeed, a destruction complex with 1003-fold larger mass or volume than a monomeric CTNNB1, is indeed unlikely if not impossible. On the other hand, the diffusion coefficients we observe for the cytoplasmic CTNNB1 complex are equally unlikely to represent a ‘free and monomeric’ version of the destruction complex, as the combined weight of the partners (APC, AXIN, CSNK1A1, GSK3 and SGFP2-CTNNB1, ~600kDa) would be expected to be only ~1.75x slower (assuming Einstein-Stokes). Overall, the most important take home message is not an absolute size estimate of the CTNNB1 complex, but the fact that a larger cytoplasmic CTNNB1 complex is still present after WNT stimulation, although it does undergo a substantial reduction in size (reflected by a 3.5-fold increase in speed upon WNT stimulation), and thus changes its identity. We have modified the statement regarding the relation between diffusion speed and size in the current manuscript, to avoid further confusion on this point. Line 348-352 Because changes in diffusion coefficient are typically indicative of larger changes in protein size (i.e. molecular weight see materials and methods section for details), this indicates that the size of the cytoplasmic CTNNB1 complex drastically changes when the WNT pathway is activated. To the materials and methods, we have added the following Line 1110-1112 However, it is likely that the 3.5-fold change in the second diffusion coefficient of SGFP2-CTNNB1 in response to WNT3A treatment is indicative of a larger than 3.5-change to complex size.

      Furthermore, in the next section, with the N&B method, the authors have suggested that “few, if any, of these complexes contain multiple SGFP2-CTNNB1 molecules” (line 366). When combining the two parts of information, it is hard to imagine a complex that contains one thousand to one million molecules only have one or a few β-catenin subunits. From the biology point of view, APC is the backbone of the destruction complex, which has several β-catenin binding sites by itself. Additionally, APC also contains several Axin1 binding sites where each Axin1 can also recruit one β-catenin. It is unlikely that one APC complex contains only one β-catenin, not mentioning the potential oligomerization of APC. The conclusion that most of the β-catenin containing complexes has only one β-catenin could either be real or due to the misinterpretation of experimental data.

      As elaborated above, it is unlikely that the complex really is 1003 larger than a single free SGFP2-CTNNB1 molecule. At the same time, the diffusion coefficient of the slow fraction of SGFP2-CTNNB1 is still indicative of a very large complex, similar to the 26S proteasome – as discussed in the main text (currently line 344-346: This is indicative of very large complexes containing SGFP2 CTNNB1 that move with diffusion kinetics comparable to those previously observed for the 26S proteasome (Pack et al., 2014).). We were, therefore, equally surprised by the findings from our N&B analysis, which is why we extensively discuss possible explanations in our manuscript. Future follow-up by ourselves and others will reveal in how far our interpretation of these measurements stands the test of time.

      Reviewer #3 (Public Review):

      Wnt signaling plays critical roles in cell fate determination in essentially every tissue in all animals, regulates tissue homeostasis in many adult tissues, and is inappropriately activated in many human cancers. It has been the focus of research for decades, and we have an outline of signal transduction. However, remarkably, key questions remain controversial. Central among these are questions about the nature of the negative regulatory destruction complex, its mechanism of action and how it is turned down by Wnt signaling. Here Saskia and colleagues take a novel and very exciting approach to these questions, combining innovative quantitative live-cell imaging and computational modelling.

      What I can say unequivocally is that there is data in this manuscript that will force a re-evaluation of our current models of Wnt signaling, and also serve as the foundation for future research. Particular notable are: 1) precise measurements of the concentrations of beta-catenin in the cytoplasm and nucleus before and after Wnt signaling and after inhibition of GSK3. 2) Definition of a high MW complex, likely the destruction complex, whose assembly state appears to be regulated by Wnt signaling, and 3) Intriguing evidence that at steady state this complex appears not to contain multiple copies of beta-catenin. These data are exceptionally interesting and timely, as controversy continues about the size/assembly state of the destruction complex.

      We are happy that reviewer 3 evaluates our work so favorably. We are looking forward to contributing to further re-evaluation of the current models of WNT signaling in the future, as well as witnessing further probing and validation of our data by others.

    1. Author Response:

      Reviewer #2 (Public Review):

      [...] The key analyses focus on a distinction between decision and confidence encoding in the EEG data. The main approach here was to identify trials where computational model and behavioural data diverged - cases where behaviour (either choice or confidence, or both) either matches or mismatches the model on individual trials. By applying decoding techniques the authors were able to identify neural correlates of these suboptimalities. One concern here is that if the behavioural data deviates from a noise-free ideal-observer model, it's not clear what neural correlates of these deviations mean. One interpretation could be that they indicate subjects are using a different model - in which case identifying neural correlates of deviations are less informative. Another interpretation is that they are deviating from the assumed model on a fraction of trials, but if this is the case, analysing these deviations will not be able to identify neural correlates of the latent variables of the (otherwise well-functioning) model. In other words, it is not clear whether these analyses are identifying latent states tracking noise in a confidence representation (confidence in confidence?), latent states underpinning (psychological) confidence, or something else.

      We thank the reviewer for this comment, which shows us where we missed some important details in the previous manuscript. To clarify, we do not compare computational model predictions to behaviour, we compare behaviour to the optimal observer who perfectly encodes the presented orientations and perfectly estimates the decision evidence so as to maximise the probability of making a correct perceptual decision. We define this better now in the first part of the Results (P4, L135).

      The computational model estimates the internal evidence the observer is using to make their decisions, which differs from the optimal evidence. We assume that when the observer makes a response that is different from that predicted by the optimal presented evidence, their internal evidence is more different from optimal than when they make a response that is in agreement with the optimal response. Given that observers’ responses are well predicted by the optimal evidence (further details in response to E3), we can say that the internal evidence the observer is using is some form of approximation of the optimal evidence. It may be that the observer is using an approximation that is different from the one specified by our computational model (although extensive analysis in previous research has suggested this model is a good approximation; Drugowitsch et al., 2016), in this case, responses that differ from the optimal response would still on average be due to evidence that is more different from optimal than the evidence that leads to responses that match the optimal response.

      Our neural analysis aims at identifying whether the neural representation differs from the optimal evidence on trials where the response also differs from the optimal response. The clusters of neural signals we isolate are those where deviations from optimal in the neural representation predict whether the observer will make an optimal response. In other words, the isolated clusters of neural signals follow the observers’ internal evidence L* more closely than the optimal presented evidence L. We make this clearer now at P10L, 302 and on P12, L352.

      This analysis was based on the trial-by-trial level assumption that the internal representation used to accumulate evidence (also known as the ‘decision variable’ in previous work) is further from optimal on trials where the observer does not give the optimal response. Because this evidence is accumulated over several samples, it will deviate more or less from optimal across samples (and trials). We therefore estimate the sample-wise ‘error’ (i.e., difference from the optimal evidence) associated with: 1. the neural representation, and 2. the computational model of behaviour, and we test whether there exists a significant correlation between these two neural and behavioural errors. We explain this section in further detail on P12, L382.

      We also see how the reader may want to understand this in terms of the actual confidence response, i.e. confidence magnitude. We therefore performed a further analysis, using the source localised signals, inspired by the reviewer’s comment. We show that the accumulated evidence reflected in the signals localised to the orbitofrontal cortex predict confidence magnitude in the lead up to and following the perceptual decision response (P13, L638).

      We note here that although the implementation of this analysis is novel, the reasoning behind it is not. Van Bergen et al. (2015), for example, use the variability in the decoded representation of stimulus orientation to index internal uncertainty, and relate this to behavioural biases in orientation estimation (we now reference this in the manuscript, P12, L357). The rationale is the same: the neural representation deviates from the optimal presented evidence in a way that predicts behavioural deviations, therefore, these processes index information important for behaviour.

      In summary, we are tracking the internal evidence on which observers base their confidence reports, assuming this is identifying neural signature associated with the computation of confidence as opposed to correlated with the eventual magnitude (either due to upstream processes such as the presented evidence, or downstream processes such as emotional responses to decision accuracy), and we now show that these signatures of the computation of confidence do indeed reflect the eventual confidence report.

    1. Author Response:

      Reviewer #1 (Public Review):

      ...The limitations of this study, although minor for the conclusion drawn by this study, are (1) CTD deletion generally confers modest cellular phenotypes compare to DBD deletion and is fully resistant to MMC and cisplatin.It remains unknown why CTD deletion elicits less impact despite its strong impairments in ligand-induced conformational changes...

      This is an intriguing aspect of our results. We have added discussion of this aspect (Page 22, line 397). We agree that the relationship between our various assays is not simple. This points to complex functions of BRCA2 that are only beginning to be revealed and understood. Because the assays are very different, involving cells with all interacting components available vs. individual isolated proteins, we cannot at this point directly relate the protein structural changes to precise biological functions. However, we do note that the ΔDBDΔCTD and ΔDBD cells lines behave similarly and are both sensitive to MMC and Cis Pt. Purified ΔCTD is deficient in structural response, while the similar variant in cells is not sensitive the DNA crosslinking agent. All deletion variant proteins are defective in response to ssDNA while again the ΔCTD in cells is not overly sensitive to DNA crosslinking agents. Thus, we observe structural transition defects in all c-terminal deletion mutants while only those variants missing the DBD are sensitive in cell assays probing the function of BRCA2 in DNA cross link repair. We and others (Le et al., 2020) observe complex interactions between different parts of BRCA2 with itself (inter= multimerization and intra= conformation molecularly), that can be modulated by binding partners, including DSS1. Although important and interesting, including DSS1 interaction is outside of the scope of our current study. We continue to investigate the structural response of BRCA2, these and other variants, to additional binding partners and hope that these studies will eventually contribute to a clearer connection between protein conformational changes and biological functions.

      ...and (2) the molecular behaviours of BRCA2 in mouse ES cells might not be directly translated to these in human somatic cells.

      It is of course possible that some aspects of BRCA2 behavior in human somatic cells and mouse ES cells differ. At least for diffusive behavior we have shown in our previous work (Reuter et. al., J. Cell Biol. 2014, in manuscript reference list) that BRCA2 behaves the same in HeLa cells as in mouse ES cells.

      Specific comments:

      (1) Thank you, does not need response.

      (2) …Surprisingly, they found that the deletion of DBD or CTD did not drastically affect foci formation, albeit slightly less efficient compared to full-length BRCA2. While the results and trends look promising, the number of samples analysed is somewhat limited (i.e., two or three technical replicates, rather than biological replicates) and the statistic tests have not been conducted.

      Statistical tests are in the source data files as indicated in the figure legend.

      For all cellular assays independent experiments have been performed at different days with cells at different passage numbers. Within all independent experiments we have included technical replicates (cell survivals: 2 or 3 wells; HR assays: 2 wells; microscopy experiments: at least 3 field of views per condition). To further support our observations, we have generated the single ΔDBD and ΔCTD cell lines and the cell line lacking both domains (ΔDBDΔCTD). Although in the original version of the manuscript we have included the results of statistical tests in the Source Data files, we have included additional information in the text and figure legend where appropriate).

      (3) Thank you, does not need response.

      (4) A silver stained gel of the proteins used has been added to supplementary figures (Figure 4 – supplement 4) and this issue is addressed in essential revision number 3.

      Reviewer #2 (Public Review):

      We apologize for not having included sufficient data replicates in our original submission. We have performed additional replicates of the cell survival experiments and foci counting experiments in figures 1D-F and 2B-C. The figures and legends have been revised to include the additional data. Statistical test results are included in the figure legends, main text and Source Data Files accompanied with Figures 1-3. The overall results are the same and the conclusions from them do not change. We agree that this strengthens our work and was a necessary improvement.

      Concerning identifying specific functions for the BRCA2 c-terminal domains, we agree this is a fascinating and important area to investigate. Our work addressed the role of these domains in protein behavior that we have previously described placing the suggested (though unspecified) assays out of the current scope. Indeed, the effect of additional BRCA2 interactors, including DSS1, is also of interest and part of our ongoing/future work. We trust the reviewer and others will be interested to follow this as it develops. As a general approach to effective scientific communication, we find the most value to the scientific community is achieved in timely reporting clearly understandable experiments and results so that others can evaluate and build on them. Concerning multimerization vs phase separation, this is also an interesting topic. Our current experimental work does not provide any data to distinguish these phenomena. We also believe that “phase separation” is a bit of a hype and is often misused or ill defined (see for example; “Evaluating phase separation in live cells: diagnosis, caveats, and functional consequences.” McSwiggen DT, Mir M, Darzacq X, Tjian R. Genes Dev. 2019 Dec 1). Because we do not provide any quantitative biophysical data on this topic we prefer not to contribute to the qualitative discussion. We trust this important distinction will be addressed by ongoing appropriate biophysical and theoretical work.

      Reviewer #3 (Public Review):

      1) Concerning comparison of cell sensitivity of our BRCA2 deletion variants and “completely non-functional BRCA2 allele”; This is indeed a good idea and would be interesting to pursue. However, we note that this would require making specific mutations from the human protein in mouse ES cell lines and thus require possibly substantial work determining if they mutations behave the same of differently. Although cell lines expressing (patient derived and other) BRCA2 truncations and deletion variants are described as “completely non-functional” this description does not entirely make sense to us. Cells lacking an essential protein (BRCA2) are, we assume by definition, dead or dying. That some tumor derived cell lines survive with apparently severe BRCA2 defects may attest to their other genetic alterations. A “clean” comparison in mouse ES cells does not exist. For our survivals in mouse ES cells we used a RAD54 deletion cell line as a well characterized comparison as HR defective in response to ionizing radiation. Though not perfect this at least provides a means of comparing sensitivity (Figure 1C) where the two BRCA2 deletion variants are even more sensitive.

      2) Concerning mechanistic importance (insight) from SPT analysis. The function of BRCA2 and other DNA repair proteins logically require them to become localized/temporary immobile at sites of damage where they need to exercise biochemical activities. This is seen as a high local concentration in “foci”. In order to accumulate in this way or simply become localized to do its work a protein has to change its diffusive behavior, either more of the protein moves to / through a place or more of it stay immobile for a longer time. This is what we can quantify by SPT. Here we show that, perhaps contrary to expectations, the in vitro defined DNA binding domain is not required for this immobilization or change in diffusive behavior. This lack of effect could be described as a negative result, however just as important to communicate and valid as if we had detected an effect. We discussed the mechanistic implication and motivation for SPT study of BRCA2 in a previous publication (Reuter et al, JCB, 2014 in the reference list of our current manuscript). There we also explain how the number of proteins that change mobility and the magnitude of their change in mobility is consistent with the expected amount of damage inflicted.

      General comments:

      Our statement that BRCA2 c-terminal domains have a role beyond (meaning after) delivering RAD51 is based on the observation that RAD51 still forms high local concentration at sites of damage (foci). We agree that the cell biology observations do not directly test DNA binding or RAD51 filament formation. Addressing these specific biochemical activities in vivo is challenging. On the other hand we have to admit that in vitro biochemistry (which we find essential to understanding) can show what is possible and not necessarily what is actually happening in cells. Our cell experiments are at one level aimed to define what we can quantitatively in the authentic molecular environment where all binding partners, specific and non-specific, are present. We hope that continued advances in observing molecular dynamics in cells and more complex yet defined in vitro conditions will converge in the future. We hope that our work here contributes to this progress. Concerning the observation that mouse ES cells survive “tolerate loss of CTD, DBD and both”; we agree this is an intriguing result especially given the highly conserved nature of this part of BRCA2. We do cover this topic specifically in the second and third paragraphs of the discussion section.

      Questions that should be addressed include the following: Are proliferation rates compromised compared to WT cells?

      We did not observe compromised growth rates compared to WT cells. We have included this observation in the results (page 7, line 110).

      Are they experiencing replication stress in the absence of any exogenous damage?

      The difference in number of spontaneous RAD51 foci we observe in untreated cells lacking the DBD could be an indication for increased replication-associated DNA damage. This interesting topic is ongoing work of a departmental collaborator and hence is here. We have however highlighted this observation in the discussion (page 20, line 368).

      Are ES cells special in relevant aspects?

      Mouse ES cells are highly proficient in homologous recombination and gene targeting, which makes them useful subjects for HR studies. Mouse ES cells have a relative high number of spontaneous BRCA2 and RAD51 foci, most likely caused by their rapid cell division and DNA replication. As mouse ES cells are non-transformed cells we use these cells in our experiments to avoid cancer cells which often include mutations influencing processes such as DNA repair.

    1. Author Response:

      Reviewer #3 (Public Review):

      [...] I have only minor concerns regarding sources of error, particularly with respect to interpretation of the small effects the authors observe in many of their FRET experiments.

      • Figure 2D shows rather small changes in ΔF/F-15 mV between fluorescent protein labels inserted at different positions in the ASIC sequence, particularly for the YFP constructs. As this metric is determined from the top and bottom asymptotes for the Boltzmann fits shown in Figure 2C, it would be useful to have some estimate as to the error associated with the fits at extreme values. Perhaps the authors could provide fits to their data (as in Figure 2C), including confidence intervals, or some similar estimate as to the size of the expected error compared to the effect size in Figure 2D.

      Thank you for this point. We did use Boltzman’s fits to get the asymptotes for each cell and calculate a ΔF/F. However, we could also use a ‘fit free’ approach of simply taking the difference between fluorescence values measured at -180 mV and that at +120 mV, divided by that at -15 mV to normalize for each cell. This approach completely avoids any error associated with fitting the data or imposing any model at all. Using this approach results in slightly different ΔF/F values but the pattern of statistical significance is identical. This new analysis is included in Figure 2 figure supplement 4. It has also been corrected for multiple comparisons.

      • Along those same lines, the authors use an interesting (and potentially generalizable) approach to reducing background from intracellular proteins in their experiments: co-transfecting their channels with empty plasmid DNA. What percentage of the remaining fluorescence signal is the result of intracellular background? How would that affect the data in Figure 2 and 3? Is the ΔF/Fnorm curve for YFP labeled positions in Figure 2-figure supplement 4 so flat because of contaminating background fluorescence?

      This is a great question. We originally hoped that the CFP and YFP quenching data from different positions could be used to triangulate both a distance from the membrane and a value for background fluorescence assuming that CFP and YFP would yield similar background fluorescences. An analogous approach was used in Zachariassen et al. Proc Natl Acad Sci, 2016 where an equal background was assumed between conformational states within a recording. In the end, the YFP quenching appeared to have a greater background than CFP. We speculate that this may be because the YFP variant we used matures faster than the CFP (mVenus, 17.6 min verses mTurquiose2, 33.5 min; FPbase.org) and hence the YFP matures faster than the ‘new’ channels get to the plasma membrane. However, at present we are uncertain how much of the background fluorescence signal to confidently attribute to this intracellular FP issue.

      • In Figure 3D, the FRET efficiency between CFP-cA1-cA1 and N YFP at a 1:15 ratio of the two plasmids is higher than the FRET efficiency between CFP and YFP in the same subunit, even though the authors conclude that fluorescent proteins on the same subunit show considerably more FRET than fluorescent proteins on neighboring subunits. Could this indicate that the N-termini of adjacent subunits are closer together than the N- and C-termini of a single subunit? If, on the other hand, this effect were entirely the result of crowding in the membrane why is FRET efficiency substantially lower when CFP-cA1-cA1 is co-expressed with C4 YFP? Wouldn't this construct produce a similar crowding effect?

      We strongly suspect the N termini of adjacent subunits are closer to each other than N and C of single subunit simply because the N FPs would all be at the same ‘height’ or same depth with respect to the plasma membrane. Thus the measured FRET in this case primarily reflects distances in the x-y plane. This contrasts with the N and C FPs on the same or different subunits where both x-y distances and axial distances come into play.

      • On page 23, the authors state that they detected no pH-dependent changes in FRET between their GFP tag on the N-terminus of ASIC1 and an RFP tag on the channel's C-terminus. However, Figure 4 shows a small, but significant change in fluorescence between pH 8 and pH 7.

      We have corrected for multiple comparisons within a figure. As a result, this effect is no longer statistically significant (adjusted p value is 0.063).

      • The interpretation of distances between various tagged position on ASIC and the plasma membrane in Figure 2 is based on using two different colored tags with two different distance dependences. However, the interpretation of the data from Figure 5 provided on page 25 is less clear. For example, the reduction in fluorescence from the N-terminal tag is interpreted as the tag moving closer to the plasma membrane. Without similar data from a YFP tag to verify, it seems equally likely that the reduction in fluorescence (at steady state) could result from a movement away from the plasma membrane.

      This is a very good point. We tried to perform DPA quenching of YFP-containing constructs at pH 6.0, but the acidification resulted in proton-quenching of the YFP fluorescence (Figure 4). We didn’t feel confident in measuring DPA quenching with the concomitant loss of YFP fluorescence due to acidification. Therefore, we relied on the pH 8.0 CFP and YFP data as a starting point (Figure 2). Given the C1 insertion gives the greatest extent of CFP quenching, it is reasonable to place it around the top of the curve. The N position could then be on the left or right side of the hump or peak in the CFP distance curve. The N quenching is comparable to the C2 insertion quenching (Figure 2D, left) yet the N FP is ~ 16 amino acids from the pore-forming membrane helices while the C2 insertions is ~ 40 amino acids away. For reference, the C1 is ~ 24 amino acids. Thus we are reasonably confident the N insertion is on the left side of the hump or peak. A reduction in ΔF/F would indicate movement closer to the plasma membrane. While technically possible that the N position could move further away from the membrane, this would have to be a >25 Å movement. Given there are only 16 amino acids between the CFP and the beginning of TM1 of the channel, we do not think such a dramatic movement outward could occur.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript is somewhat readable but the many acronyms for the cell types in model and biology make it difficult to follow. Is there a reason why the biological neuron names cannot be used in the model?

      We agree that the field would benefit from not having yet another set of acronyms for the different spinal neurons. We have changed the names of the neurons to their putative molecular identity.

      The presentation of data in figures can be more powerful. In many cases, the data in figures and the supplemental videos show apparently different results. This can be an artifact of how the videos were made and if yes, these can be improved. Tail tip coordinates can be plotted to show the behaviors in much better detail.

      Especially for beat and glide swimming, the points regarding burst firing, inhibition, etc. have not been robustly made.

      We have had to revise the beat-and-glide model. In the revised version, burst firing is no longer required for beat-and-glide swimming to occur. For inhibition, we hope that the presentation of the data has been improved. We now point out that despite reduced left-right coordination, the continued presence of some left-right alternation, especially in the rostral segments, will still cause the body to exhibit left-right tail beats. The kinematics will be altered (see Figure 6 video 2 and Figure 6 - figure supplement 2), but left-right tail beats will still be present. To the best of our knowledge, no studies have shown that loss of left-right coordination blocks the generation of left-right tail beats in swimming fish.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] One intriguing aspect is that ectosomes were not detected in supporting glia by transmission electron microscopy and by light microscopy analysis of PKD-2::GFP and CIL-7::GFP conducted by Blacque and Barr (see for instance 10.7554/eLife.50580). The authors discuss this point and rationalize the discrepancy by stating that the fluorescent protein that they are using is brighter at the low pH of endosomes and lysosomes that the FPs previously used. Considering that the FP is fused to the intracellular domain of the membrane protein, the FP will not be exposed to the pH of the endosomes in the target cell. The authors' explanation is not valid and the basis for the discrepant results remains unresolved.

      Reviewer#1 is correct: the C-terminal FP would not be exposed to acidic pH in endosomes. We tested whether increasing the pH of endo-lysosomal compartment would increase the fluorescence or the number of EVs within AMsh cytoplasm by exposing the animals to NH4Cl (Fazeli et al., 2016). We did not observe significant differences induced by NH4Cl. Therefore, quenching plays no role. We understand the concerns raised and removed all sentences suggesting quenching of FP may play a role. Still, EV produced from PCMC are captured by the supporting glia and - at least for GCY-22-wrmScarlet - this occurs through phagocytosis of basal ectosomes budding from ASER PCMC rather than by fusion of EV with AMsh plasma membrane. Therefore, the EVs within AMsh are likely traveling through endolysosomal pathway.

      As Reviewer#1, we were intrigued other labs did not report our observations of EV cargo transfer to glia. How can we explain PKD-2::GFP and CIL-7::GFP overexpressed in CEM were not observed in the associated glia? We observed export of TSP-6 to AMsh by Amphid neurons is severely reduced in TSP-6-wrmScarlet knock-in strain compared to TSP-6-wrmScarlet overexpression strains. Therefore, the absence of PKD-2::GFP and CIL-7::GFP export to the associated glia could be explained by differences in expression levels of the cargos, although -as noticed by Reviewer#1- PKD-2::GFP and CIL-7::GFP are also overexpressed in (Wang et al., 2014). Alternatively, each neuron/ cargo can have specific properties preventing or promoting export of cargo from PCMC to glia. For example, CEM neurons releasing PKD-2::GFP and CIL-7::GFP have a specific trafficking machinery (including expression of the kinesin KLP-6 and the tubulin TBA-6 (Akella and Barr, 2021) that can contribute to potentiate PKD-2/CIL-7 ectocytosis from cilia tip. Strong ectocytosis from CEM cilia tip might secondarily prevent PKD-2/CIL-7 export from PCMC to glia. If true, we would predict trafficking mutants preventing PKD-2::GFP and CIL-7::GFP entry in CEM cilia should promote their export to CEPsh. We could not test this. Finally, the morphology of CEM-CEPsh glia ensemble might matter. Particularly the extent to which CEM PCMC is embedded by glia might contribute to the export properties. We are missing this EM information to elaborate further.

      As correctly alluded to in the discussion, primary cilia regulate their protein composition by shedding ectosomes and overexpression of ciliary proteins may lead to increased ciliary ectocytosis. Therefore, it is also conceivable that the extracellularly shed material the authors observe is a non-physiological consequence of their experimental design rather than a manifestation of physiological ectocytosis. In all fairness to the authors, all published studies on PKD1 and PKD2 ectocytosis by the Barr lab have used overexpression systems. And the discussion clearly spells out the possibility that the observed transfer of ciliary material from ciliated neurons to glial cells may be caused by overexpression of fusion proteins. Nonetheless, the abstract and result sections do not mention the possibility that the observed results are caused by overexpression. It would be of great help to the community to clearly indicate from the introduction onward that the shedding of material by ciliated neurons may be a result of overexpression, in this study and in past publications.

      Thank you for mentioning this. As stated previously, we performed new experiments with the same cilia proteins expressed at endogenous levels to avoid any overexpression artefacts. Our new results prove ectocytosis from the cilia tip do take place in physiological condition for GCY-22-GFP and TSP6-wrmScarlet and that export to glia still occurs for TSP-6-wrmScarlet in physiological conditions. As suggested by Reviewer #1, we now state the overexpression concerns in the discussion.

      To firmly determine the physiological extend of ciliary signaling receptor transfer from ciliated neuron to glial cells, the authors are encouraged to consider using an endogenously tagged protein instead of an overexpression system. For the GCY-22 receptor, the knock-in animals have already been developed and published by Gert Jansen's group (doi: 10.1016/j.cub.2020.08.032). A comparison of the localization between the overexpression strains and the endogenous expression strains of GCY-22::FP will be valuable to the paper and to the general discussion of ectocytosis. The Jansen lab has generated mutants of GCY-22 that no longer localize to cilia; studying whether such mutants still end up in glial cells would help clarify the route taken by ciliary material that ends up in glial cells.

      As described above, osm-3 kinesin-II anterograde IFT motor and che-3 dynein retrograde motor mutants show increased PCMC accumulation and increased transfer to AMsh glia, suggesting that accumulation of cargoes in PCMC likely drives their export to AMsh. Similarly, overexpression of GCY-22-wrmScarlet in ASER induced its accumulation PCMC and its export to AMsh. We used a mutants for AP-1 μ1 clathrin adaptor unc-101(m1) mutants to prevent GCY-22-wrmScarlet sorting and trafficking to cilia. In unc-101(m1), GCY-22-wrmScarlet was not enriched in cilia and did not export to AMsh. Therefore, the sorting and trafficking machinery mediating ciliary cargoes to accumulate in cilia is required for GCY-22-wrmScarlet export to AMsh (Figure 4).

      The authors point out in the discussion that the DiI dye transfer experiment rules out issues related to overexpression. It is however unclear whether the route taken by DiI from the environment to the support cell is the same as the route taken by receptors overexpressed in ciliated neurons. Can the authors conduct co-localization studies with DiI and one of the overexpressed FP-tagged ciliary membrane protein?

      We tried the suggested experiment and observed ~ 40% of the DiO vesicles in AMsh also carrying TSP-6-wrmScarlet in all amphid neurons. However, we are sceptical about the approach taken in this experiment. The donor cells are not the same; TSP-6-wrmScarlet is exported from all amphid neurons (under the arrestin-4 promoter, driving the expression in most ciliated neurons); DiO is exported from a subset of these amphid neurons ASK, ADL, ASI, AWB, ASH and ASJ. Also, we expect endosomes to fuse along the endolysosomal pathway leading to an increased colocalization towards the cell body, independently of the original EV content.

      Reviewer #2 (Public Review):

      [...] 1. The overexpression of fluorescently tagged transmembrane proteins may be a concern, because it often leads to aberrant neurite morphology. For example, the ciliary base in Fig. 4A seems abnormally swollen. This could confound the authors' ability to faithfully measure EV dynamics in vivo.

      As stated by Reviewer#1, all previously published studies on ectocytosis in C. elegans used overexpression and we were aware of this limitation. We provide new results using endogenously tagged GCY-22 and TSP-6 EV cargos. We obtained much cleaner results regarding the effect of cilia trafficking mutants in these knock-in strains. We highlight overexpression concerns in the discussion. We agree with Reviewer#2: measuring PCMC deformation in presence of overexpressed GCY-22-wrmScarlet is prone to artefacts. Instead, we explored all PCMC shape using mKate expression in Figure 7.

      1. Other activities of glia that are important for shaping cilia may also be impaired by the use of a dominant negative dynamin to block endocytosis. By comparison, the use of a glial-specific dominant negative RAB-28 to block exocytosis also causes severe defects in cilia morphology (Singhvi et al. 2016). Thus, this experiment does not directly demonstrate a requirement for glial EV pruning in maintaining cilia shape.

      As stated by Reviewer#1, all previously published studies on ectocytosis in C. elegans used overexpression and we were aware of this limitation. We provide new results using endogenously tagged GCY-22 and TSP-6 EV cargos. We obtained much cleaner results regarding the effect of cilia trafficking mutants in these knock-in strains. We highlight overexpression concerns in the discussion. We agree with Reviewer#2: measuring PCMC deformation in presence of overexpressed GCY-22-wrmScarlet is prone to artefacts. Instead, we explored all PCMC shape using mKate expression in Figure 7.Ablation of AMsh, AMsh exocytosis defect in AMsh::RAB1(DN) or secretome defect in pros-1 were previously shown to cause severe truncation of AWC and AFD NREs as well as defects in the associated sensory functions probably because of changes in the microenvironment of these embedded cilia. In animals expressing DYN-1(K46A), we observed severe truncation of ~10% of AFD NRE but none for AWC cilia (Figure 7). We did not observe thermotaxis defect nor chemotaxis defect to IAA, suggesting AFD and AWC sensory responses are maintained. Therefore, our results contrast with the effects of AMsh exocytosis block.

      Nevertheless, we agree with Reviewer#2 that we cannot exclude DYN-1(K46A) could indirectly affect AMsh function, leading to cilia shape changes independently of the EV capture defects caused by DYN-1(K46A). We highlight this possibility in the discussion.

      1. The distinction between puncta brightness, size, and number is unclear. For example, in Fig. 7A, glial puncta in ttx-1 mutants seem to be approximately as numerous as in wild-type animals but much less bright. The authors interpret this as export being "strongly reduced" - but why does this affect brightness rather than number? In most figures, the results are either not quantified or are summarized as a ratio of overall glia/neuron fluorescence intensity. More precise quantification of puncta brightness, size, and number would improve the manuscript.

      We agree that glia/neuron fluorescence intensity was not appropriate. We improved ttx-1 analysis and we now provide puncta number and intensity (Figure 6B and Figure 6- Supplement 1A, 1B). Their number in AMsh is heavily reduced in ttx-1 as well as their fluorescence intensity. However, as we stated, export from AFD to AMsh is maintained in ttx-1 in absence of microvilli. More quantifications have been done for puncta number, intensity and size in other experiments. These are now presented in Figure 5-Supplement 1C, 1D and Figure 3 - Supplement 1B.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] My main technical concern lies in the choice of decomposition filter for SEP and alpha oscillations, and the conclusions the authors draw from that. Specifically, a CCA spatial filter is optimized here for the N20 component, which is then identically applied to isolate for alpha sources, with the logic being that this procedure extracts the alpha oscillation from the same sources (e.g., L359). I have no issues (or expertise) with using the CCA filter for the SEP, but if my understanding of the authors' intent is correct, then I don't agree with the logic that using the same filter isolate for alpha as well. The prestimulus alpha oscillation can have arbitrary source configurations that are different from the SEP sources, which may hypothetically have a different association with the behavioral responses when it's optimally isolated. In other words, just because one uses the same spatial filter, it does not imply that one is isolating alpha from the same source as the SEP, but rather simply projecting down to the same subspace - looking at a shadow on the same wall, if you will. To show that they are from the same sources, alpha should be isolated independently of the SEP (using CCA, ICA, or other methods), and compared against the SEP topology. If the topology is similar, then it would strengthen the authors' current claims, but ideally the same analyses (e.g., using the 1st and 5th quintile of alpha amplitude to partition the responses) is repeated using alpha derived from this procedure. Also, have the authors considered using individualized alpha filters given that alpha frequency vary across individuals? Why or why not?

      Indeed, applying the same spatial filter to EEG signals with different spatial arrangements of the sources can lead to the extraction of neuronal activity which does not originate from the very same sources. We had chosen our approach, as it is well known that the generators of the early SEP components and the generators of the prominent somatosensory alpha rhythm co-reside at similar sites in the primary somatosensory cortex (e.g., Haegens et al., 2015). Therefore, we considered our approach appropriate to specifically focus on neural activity from the somatosensory region both in the frequency band of the SEP as well as of the alpha rhythm. Yet, we agree with the reviewer that it should be acknowledged that we may have missed or mixed-up effects of alpha activity from other sources by using this procedure (which might have led to different conclusions otherwise). In order to account for this, we repeated our analyses with an SEP-independent reconstruction of the oscillatory effects in source space (“whole brain analysis”). For this, we first reconstructed the sources of alpha activity using eLORETA and head models based on participant-specific MRI scans, and estimated the respective effects independently for all sources across the cortex using both linear-mixed effects models (LME) as well as a binning approach for the Signal Detection Theory (SDT) parameters sensitivity d’ and criterion c (consistent with the previous analyses in our manuscript). In the LME analyses, both the effects of pre-stimulus alpha activity on N20 amplitudes as well as on perceived stimulus intensity were strongest in the right primary somatosensory cortex – in accordance with the sources of the originally extracted tangential CCA component of the SEP (see Supplementary Figure 1 for Peer Review). Also, using the binning approach to examine the relation or pre-stimulus alpha activity with SDT parameter criterion c, the effects were most pronounced around the right somatosensory regions (Supplementary Figure 2 for Peer Review), yet these effects did not survive statistical correction for multiple comparisons (FDR-correction with p<.01). However, when performing the same binning analysis for our region of interest (ROI), the hand area in BA 3b of the right somatosensory cortex, a significant effect or pre-stimulus alpha on criterion c was indeed confirmed, t(31)=-2.951, p=.006, CI95%=[-.173, -.032]. Furthermore, in line with our previous CCA results, for sensitivity d’, neither the whole brain analysis nor the ROI analysis showed effects of pre-stimulus alpha amplitude, t(31)=0.633, p=.531, CI95%=[-.083, .157]. Taken together, the findings we report in our original manuscript for pre-stimulus alpha activity obtained with the spatial CCA filter can thus be replicated with a SEP-uninformed source reconstruction, both using LMEs for a “whole-brain analysis” as well as SDT analyses in a ROI-based approach. We therefore conclude that the relationships between pre-stimulus alpha activity, N20 potential of the SEP, and perceived stimulus intensity can indeed be attributed to neural activity from the same (or at least very similar) sources in the primary somatosensory cortex.

      Addressing the question on filtering alpha activity in individualized frequency bands, we considered this option, too. However, the rather short length of our pre-stimulus window (-200 to -10 ms) constitutes a natural limit for the frequency resolution in the alpha range and slightly different filter ranges (adjusted with regards to the individual alpha peak frequency) are thus unlikely to lead to large differences in the estimation of pre-stimulus alpha amplitudes. Therefore, we refrained from using individualized frequency bands here and focused on the more generic approach using one common alpha band (8-13 Hz) for all participants, which should also facilitate direct comparisons with previous studies on pre-stimulus oscillatory effects.

      In the same vein, both alpha and N20 amplitude relate to perceptual judgement, and to each other. I believe this is nicely accounted for in the multivariate analysis using the SEM, but the analysis that partitions the behavioral responses using the 20% and 80% are done separately, which means that different behavioral trials are used to compute the effect of N20 and alpha on sensitivity and criterion. While this is not necessarily an issue given that there IS a multivariate analysis, I would like to know how many of those trials overlap between the two analyses.

      This is an interesting point indeed. We included both the binning analyses and the multivariate analyses in our manuscript as we believe they offer complimentary views on the data, and also allow a direct comparison to previous studies in the field (e.g., Iemi et al., 2017). In fact, the trial overlap between the extreme bins of the alpha and N20 data were rather small.

      Since the expected trial overlap is 20% when partitioning the data into quintiles randomly, the effect-driven increments and reductions in trial overlap in our data appear to be rather small. However, they showed the expected directions: Larger alpha amplitudes were associated with more negative N20 amplitudes (and vice versa). Presumably, these small differences in trial overlap reflect the rather small effect sizes we also observed in the multivariate analyses. We have added this information to our revised manuscript in the following way to give the reader a better picture of the underlying data for the binning analyses (page 9, lines 137 ff.): “(Please note that this procedure resulted in a different trial selection as compared to the SDT analysis of pre-stimulus alpha activity. Please refer to Fig. 2—figure supplement 2 for further details on the trial overlap.)”

      At multiple points, the authors comment that the covariation of N20 and alpha amplitude in the same direction is counterintuitive (e.g., L123-125), and it wasn't clear to me why that should be the case until much later on in the paper. My naive expectation (perhaps again being unfamiliar with the field) is that alpha amplitude SHOULD be positively correlated with SEP amplitude, due to the brain being in a general state of higher variability. It was explained later in the manuscript that lower alpha amplitude and higher SEP amplitude are associated with excitability, and hence should have the opposite directions. This could be explicitly stated earlier in the introduction, as well as the expected relationship between alpha amplitude and behavior.

      Thank you for pointing out this unclarity. We have now made this rationale more explicit already at an early point in the introduction (page 3, lines 26 ff.): “According to the baseline sensory excitability model (BSEM; Samaha et al., 2020), higher alpha activity preceding a stimulus indicates a generally lower excitability level of the neural system, resulting in smaller stimulus-evoked responses, which are in turn associated with a lower detection rate of near-threshold stimuli but no changes in the discriminability of sensory stimuli (since neural noise and signal are assumed to be affected likewise).”

      Furthermore, I have a concern with the interpretation here that's rooted in the same issue as the assumption that they are from the same sources: the authors' physiological interpretation makes sense if alpha and N20 originated from the same sources, but that is not necessarily the case. In fact, the population driving the alpha oscillation could hypothetically have a modulatory effect on the (separate) population that eventually encodes the sensory representation of the stimulus, in which case the explanation the authors provide would not be wrong per se, just not applicable. A comment on this would be appreciated in the revision.

      Our extensive additional analyses suggest that the sources of behaviorally relevant alpha and N20 activity were located at very similar cortical sites. Nevertheless, this is not a proof that exactly the same neuronal populations were involved (for example, alpha and N20 effects could originate from different cortical layers). Therefore, we have added this potential limitation to our revised manuscript in the following way (page 19, lines 379 ff.): “Furthermore, with the present data, we cannot unambiguously conclude that the observed relation between pre-stimulus alpha activity and initial SEP indeed involved the very same neuronal populations – which may represent a limitation of the hypothesized mechanism. However, all approaches to localize these effects pointed to very similar cortical regions as discussed in the following section.”

      In addition, given how closely related the investigation of these two quantities are in this specific study, I think it would be relevant to discuss the perspective that SEPs are potentially oscillation phase resets. Even though the SEP is extracted using an entirely different filter range, it could nevertheless be possible that when averaged over many trials, small alpha residues (or other low freq components) do have a contribution in the SEP. If the authors are motivated enough, a simulation study could be done to check this, but is not necessary from my point of view if there is an adequate discussion on this point.

      Indeed, the phase reset mechanism may be a possible alternative explanation for relations between oscillations and later parts of the ERP. However, the N20 potential reflects the very first excitation of the cortex in response to a somatosensory stimulus and should therefore represent a textbook example of an additive response (EPSPs are added to ongoing background activity). Moreover, the N20 response should be over long before a possible phase reset in lower frequencies (such as alpha frequencies) would start to play a role (Hanslmayr et al., 2007; Sauseng et al., 2007). Nevertheless, we ran additional control analyses (including a simulation study) in order to exclude that some odd combination of phase-locking and filter residues led to the present findings: Please see Essential Revision #4 for details and how we included these considerations in our revised manuscript.

      Reviewer #2 (Public Review):

      [...] The main weaknesses of the manuscript becomes most apparent with respect to the stated impact that "The widespread belief that a larger brain response corresponds to a stronger percept of a stimulus may need to be revisited.". I am not really sure if there are many cognitive neuroscientists, that would actually subscribe to such a simplistic relationship between evoked responses and perception and that temporal differentiation (early vs late responses) and the biasing influence of prestimulus activity patterns are becoming increasingly recognized. So rather than actually changing a dominant paradigm, this work is an (excellent) contribution to a paradigm shift that is already taking place.

      Thank you for this feedback. We agree that the paradigm shift away from simplistic assumptions about the relationship between variability of neural responses and perception is already taking place and that this is already being appreciated by many scientists in the field. Also, we agree that the present study contributes more evidence to this emerging notion rather than changing the whole field. However, we do think that particularly the observation of opposite amplitude modulations of initial somatosensory evoked responses associated with presented stimulus intensity on the one hand and pre-stimulus excitability state on the other, provides a novel perspective for our understanding of how fundamental features of sensory stimuli are processed at initial cortical levels. Following your suggestions to tone down claims about the controversiality as well as to avoid over-generalization, we have therefore adjusted the impact statement of this manuscript to: “Larger evoked responses during initial cortical processing may reflect states of lower excitability.”

      Furthermore, we have adjusted similar statements throughout the manuscript accordingly.

      Also it should be considered that with regards to the analysis approach using CCA, the claims are mainly restricted to BA3b: i.e. while I also think that this is a strength of the current study, one should refrain from overinterpreting the results in a very generalized manner. The authors do include some "thalamus" and "late" evoked response patterns as well, however that presentation of the results is somewhat changed now as compared to the N20 (e.g. using LMEs rather than comparison of extremes; not using SEMs). The readablity of results and especially the comparison of effects would profit from a more coherent approach.

      We agree that our findings indeed have the specific focus on the N20 component and thus on its generators in BA3b. We did not intend to suggest that the effects we observed for this initial cortical response can be readily generalized to other (later) ERP components, too. However, we do believe (and hypothesize) that similar mechanisms may be in place for corresponding initial cortical responses in other sensory modalities, too – yet it is clear that we cannot test this generalization with the current study. To avoid misunderstandings of these interpretations and their limitations, we have further specified these aspects in the Discussion.

      Regarding our analyses of the later SEP (i.e., N140 component) and thalamus-related activity (i.e., P15 component), we initially decided to use linear-mixed effects models as they are mathematically equivalent to the way the sub-equations of the structural equation model were constructed (Table 2 in the manuscript). Nevertheless, we have now additionally run binning analyses to make a direct comparison also with Signal Detection Theory (SDT) parameters possible: For the N140 component, there was a significant effect on criterion c, t(31)=-3.010, p=.005, but no effect on sensitivity d’, t(31)=0.246, p=.807. For the P15 component, no effects emerged either for criterion c or sensitivity d’, t(12)=1.201, p=.253, and t(12)=-0.201, p=.844, respectively. These findings correspond well to the previous LME analyses and may indeed further facilitate the comparison with the findings for the N20 potential and pre-stimulus alpha activity. Therefore, we have added these complimentary analyses to our manuscript in the following way:

      Results: “In addition, the SDT analysis based on binning of the P15 amplitudes into quintiles neither suggested a relation with criterion c nor with sensitivity d’, t(12)=1.201, p=.253, and t(12)=-0.201, p=.844, respectively.” (page 14, lines 241 ff.)

      “These findings were in line with a separate SDT analysis: N140 amplitudes were associated with an effect on criterion c, t(31)=-3.010, p=.005, but no effect on sensitivity d’ emerged, t(31)=0.246, p=.807.” (page 15, lines 263 ff.)

      Discussion: “Crucially, our data are at the same time consistent with previous studies on somatosensory processing at later stages, where larger EEG potentials are typically associated with a stronger percept of a given stimulus (e.g., Al et al., 2020; Schröder et al., 2021; Schubert et al., 2006), as both our SDT and LME analyses of the N140 component showed.” (page 19, lines 367 ff.)

      “Yet, neither our SDT analyses nor the LME models of the thalamus-related P15 component supported this notion.” (page 21, lines 414 ff.)

      Methods (page 32, lines 681 ff.): “The effects of the EEG measures pre-stimulus alpha amplitude, N20 peak amplitude, P15 mean amplitude, and N140 mean amplitude on the SDT measures sensitivity d’ and criterion c were examined using a binning approach: […]”

      I have some concerns whether the relationship between large alpha power and more negative N20s could be driven by more trivial factors rather than the model explanations the authors develop in the discussion. Concretely the question whether phase locking of large alpha power along with >30 Hz high pass filtering could produce a similar finding as shown e.g. in Figure 2c. This is an important issue, as prestimulus alpha influences the N20 amplitudes as well as the perceptual reports.

      Indeed, potential phase-locking of alpha oscillations to stimulus onset and filter-related effects are important issues that could potentially offer an alternative explanation for the observed relationship between amplitudes of pre-stimulus alpha activity and the N20 potential of the SEP. Although such pre-stimulus alpha locking is rather unlikely in a paradigm with jittered stimulus onsets (in our case uniformly distributed between -50 ms and +50 ms; corresponding to a whole alpha cycle), we have run the following control analyses to fully exclude this possibility:

      First, we analyzed whether pre-stimulus alpha phase values were distributed uniformly and whether these phase distributions differed between high and low alpha amplitudes as well as between high and low N20 amplitudes. The phase of pre-stimulus alpha activity was obtained from a Fast-Fourier transform in the pre-stimulus time window from -200 to -10 ms, applied to unfiltered, but otherwise identically pre-processed data as in the original manuscript (i.e., applying the spatial filter of the tangential CCA component). For the FFT, we used zero padding (extending the pre-stimulus data segments to 2048 data points each) in order to obtain an interpolated frequency resolution of around 3 Hz. The phase was extracted at the frequency 9.766 Hz (i.e., the closest available frequency to 10 Hz). As visible from Supplementary Figure 3 for Peer Review, pre-stimulus alpha phases were distributed uniformly across all five quintiles of both alpha and N20 amplitudes. This observation was confirmed by the Rayleigh test (testing for deviations from a uniform distribution; Berens, 2009): Neither in the concatenated phase data of all participants, z=1.130, p=.323, nor in single-participant analyses within every alpha amplitude or N20 amplitude bin, we found evidence for a non-uniform distribution of alpha phase, all p>.367 (after Bonferroni correction for multiple testing). Thus, there was no phase-locking of pre-stimulus alpha activity that could serve as a trivial alternative explanation of the relationship between pre-stimulus alpha amplitude and N20 amplitude.

      Second, in order to examine whether the combination of our temporal filters (30 to 200 Hz band-pass for the SEP, and 8 to 13 Hz band-pass for alpha activity) could have led to the present findings, we additionally re-ran our analysis pipeline with simulated data: We mixed exemplary SEP responses with constant amplitudes (unfiltered; derived from within-participant averages), with simulated alpha band activity with randomized amplitude fluctuations, and pink noise, reflecting neural background activity as is typical for the human EEG. The SEP onsets were chosen according to our original experimental paradigm with inter-stimulus intervals of 1513 ms and a jitter of ±50 ms. Next, we filtered these mixed signals between 30 and 200 Hz in order to extract the single-trial SEPs, and estimated the pre-stimulus alpha amplitudes between -200 and -10 ms in the same way as was done in the original manuscript (i.e., by filtering the mixed signal between 8 and 13 Hz). This procedure was repeated for 32 generated data streams, containing 1000 SEPs each (corresponding to our empirical dataset of 32 participants). The resulting average SEPs did neither show a visually detectable difference between the five alpha amplitude quintiles nor indicated a random-slope linear-mixed-effects model any relation between pre-stimulus alpha amplitude and N20 amplitude on a single-trial level, βfixed=-.0005, t(255.16)=-.094, p=.925. Therefore, our findings cannot be explained by filter artifacts or residual activity leaking from the alpha frequency band to the frequency band of the N20 potential.

      Third, we re-analyzed our empirical EEG data in time-frequency space to obtain a more detailed view of the effects of pre-stimulus alpha activity on N20 amplitudes. For this, we decomposed our pre-processed but unfiltered data with wavelet transformation (complex Morlet wavelets) and calculated linear-mixed effects models on the relation between signal amplitudes in the time-frequency domain and single-trial N20 amplitudes as obtained from our original analyses. As shown in Supplementary Figure 5 for Peer Review, the time-frequency representations of the effects on N20 amplitudes indeed indicated a specific role of the alpha band, with its effects (i.e., already 200 ms before stimulus and in the upper alpha frequency range) separated from the time- and frequency range of the N20 potential of the SEP (i.e., from ~20 ms after stimulus onwards and above ~20 Hz). In addition, we ran the same analysis for the behavioral effect (i.e., perceived stimulus intensity). Also here, pre-stimulus effects were predominantly visible in the alpha band. Of note, there were also strong effects in the beta band. These may be interesting to study further in future studies – in particular, whether they reflect independent physiological processes or rather harmonics of the alpha band. Furthermore, these time-frequency representations suggest that the studied pre-stimulus effects might have been even more pronounced if we had analyzed the data in pre-stimulus time windows from -300 to -10 ms. However, in order to avoid inflating effect sizes by post-hoc data digging (“p-hacking”), we prefer to keep the original, a priori chosen time window for the main analyses of the manuscript. Yet, these onsets of pre-stimulus effects at around -300 ms may be of interest for future work. Taken together, these time-frequency analyses further support the notion that the observed relation between pre-stimulus alpha activity and N20 amplitudes is not due to technical issues (such as filter leakage and phase-locking) but rather reflects genuine neurophysiological effects of alpha oscillations on SEPs.

      We have added the time-frequency analysis, as well as the SEP simulation analysis as figure supplements to Figure 2 in our revised manuscript (page 8) since we believe that these control analyses comprehensively show that the observed effects were (a) specific to the alpha band and (b) not due to any data processing-related artifacts.

      It is important to emphasize that the model develop is a post-hoc one, i.e. the authors do not develop already in the discussion various alternative scenario results based on different model predictions. Therefore there is no strong evidence in support of the specific one advanced in the discussion.

      Thank you for raising this issue. Indeed, we cannot prove with the current findings that our proposed physiological model of the relation between alpha oscillations and the SEP is the correct model (or that it is at least the best one out of a selection of possible alternative models). To do so, future studies would be needed that can actually directly measure and/or manipulate differences in membrane potentials and trans-membrane currents. Rather, we aimed with the present study to associate a physiological meaning with the concept of excitability changes in the human EEG – offering a hypothesis that may be worthwhile to be studied (and either confirmed or rejected) in future studies. We have tried to make this motivation more explicit in the Discussion section (page 20, lines 384 ff.): “Also, we would like to emphasize that the presented mechanism reflects a hypothesized model, which shall be further supported or falsified with more targeted studies, for example, directly quantifying membrane potentials and trans-membrane currents in relation to different excitability states in somatosensation.”

    1. Author Response

      However, the link between comparative genomic analysis and identification of specific drugs is not yet sufficiently established and doesn't convincingly demonstrate the usability of the evolutionary pipeline in identifying novel therapeutics.

      We thank the editor for this important comment. As our research is intensively focused on comparative genomics and phylogenetic profiling, we failed to thoroughly detail the concept and rationale of phylogenic profiling. Though it is certainly not the only approach for identifying gene-gene interactions, phylogenic profiling is the most critical part of our analysis and was used to establish the MECP2 co-evolved network. This network was the basis for filtering thousands of drugs to identify possible beneficial effects for RTT phenotypes. In this revision, we have edited the text and performed bioinformatic control experiments to demonstrate how comparative analysis was essential to identifying the specific MECP2-linked genes and compounds tested here. We edited the Introduction section to demonstrate the extensive track record of phylogenic profiling (and cladebased analysis) in accurately predicting gene function and expanding known networks.

      We reanalyze our resulting gene sets with respect to known interactions, showing that while some of our prediction described in the literature many genes we identify are novel.

      We also describe previously published benchmarking of our approach to determine sensitivity and specificity based on established interaction databases, and reference our newly published paper which includes a complete description of our pipeline.

      Reviewer #1 (Public Review):

      Major Comments/Concerns

      On line 101 - The use of only the longest transcript for each gene could miss important functional sections of the genome. This could create bias against genes with many isoforms and miss exons that do not happen to lie in the longest transcript. How different would the resulting profiles of conservations be if all coding regions or exons of every gene were used?

      We thank the reviewer for this comment and realized that our description of this method in the original manuscript was not well described. We do not use the longest isoform among all isoforms of the gene, but only the longest among the “canonical” isoforms determined by Uniprot (Bateman, 2019), in the rare cases where Uniprot specifies more than one. For all but 29 genes (of over 20,000), Uniprot has a single canonical isoform. For those 29 with multiple canonical isoforms, we choose the longest one. Thus, the “longest isoform” selection affects a tiny number of genes and has very little impact on our results. We have corrected our Methods section (lines 499-505) to describe our method more accurately, and reference our new paper which contains additional details (Tsaban et al., 2021). We apologize for our poor description in the original manuscript and thank the reviewer for bringing this to our attention.

      The reviewer also raises the valid suggestion of evaluating multiple isoforms. This has been investigated, and indeed phylogenetic profiling can benefit from a thorough isoform selection or harmonization scheme. However, such approaches cannot be easily applied to all human genes or cannot be easily scaled to accommodate new genomes (e.g. PALO, IsoSel) (Philippon, Souvane, Brochier-Armanet, & Perrière, 2017; VillanuevaCañas, Laurie, & Albà, 2013), so we have used our standard approach of the Uniprot canonical isoform here, which we have shown to perform well.

      On line 106 - Does this approach create good specificity to our gene of interest rather than just broad functional similarity? For example, with this approach, are there any major neuronal function genes that have NPP very different from MeCP2? Could authors provide a more objective evaluation to baseline/null?

      We and others have successfully applied phylogenetic profiling to identify functionally related genes in multiple systems and pathways. It does provide good specificity, but “specificity” here has a very particular meaning. When a pathway becomes non-functional in a specific lineage (often due to loss of a key gene in the pathway), then other genes involved in the pathway may ose the fitness advantage they provide to the organism if they are not involved in other important pathways/functions. PP captures these pathway-level loss events, providing a quantitative measurement of functional relatedness and prioritizing those genes minimally involved in the same pathway over pleiotropic genes required in multiple pathways. This property is especially attractive for drug development, where the goal to is target the most specific proteins possible. So it does indeed provide a high degree of specificity, under this particular model. We have significantly expanded our description of this method and its complementarity to other methods, on lines 81-98.

      It is clear that many genes with neuronal function have different evolution compared to MECP2. To give one example, we looked at a key pan-neuronal gene synaptobrevin (VAMP2). VAMP2 is conserved much more widely in evolution than MECP2 and is much more strongly linked to other synaptic proteins such as RAB3, than to MECP2. The phylogenetic profiles of these two genes are very different - when we look at the top 200 PP genes linked to each of these, 0 genes are in common among the two lists.

      We performed comparisons to several functional databases to provide comparisons of the 390 MECP2 linked PP genes to baseline. When we analyze these genes with GeneAnalytics (Fuchs et al., 2016), they are most enriched with genes expressed in the brain ( p-value < 0.00024), compared to all other tissues. 79 genes are linked to the cerebral cortex and 54 to the cerebellum. So a substantial fraction are expressed in expected tissue types for MECP2-linked genes. (As an aside, this analysis revealed an unexpected enrichment for testis, and there is a known evolutionary link for eutherian-specific genes to be expressed in the testis and brain (Dunwell, Paps, & Holland, 2017). This new analysis is now Figure 1–figure supplement 1C.

      To investigate another aspect of how the 390 MECP2 linked PP genes compare to baseline, we analyzed them in comparison to the STRING database. In STRING, 1,398 genes are linked to MECP2 in one of three evidence categories: coexpression, experimental, and textmining. 366 of the 390 PP linked genes are not linked to MECP2 in STRING via any of the three evidence categories, indicating the highly unique nature of PP interactions. So while results of PP do show expected functional properties (as evidenced by the GeneAnalytics enrichments), they are quite orthogonal to other methods for prediciting functional interactions.

      Minor Comments/Concerns:

      On line 132 - It seems fair to examine this set of genes first, but I am not sure this approach to filtering in particular moves us further towards finding a therapeutic for Rett. These genes could be all good potential targets, and your subset of focus are just the best ones for current validation.

      We agree with the reviewer comment however in this paper we try to focus on finding possible drugs using repositioning. The advantage of this approach is that it allows dramatic reduction of drug development time and costs. Other genes could ultimately be even better targets than the genes/proteins that are targeted by known compounds. We have now made available the full gene list as a supplemental table, which could be mined for other potential targets, especially as more genes become druggable over time and present additional opportunities for repurposing. We have also addressed this in the Discussion in lines 420-423.

      Figure 2C could be made with all 390 co-evolved genes to strengthen the argument that chr19p13.2 is an important region for MeCP2s role.

      We thank the reviewer for their suggestion. We have updated the figure to include this representation.

      Figure 3, 4, 5, 6 - Dynamite plots. While the stats tests are great for understanding the impact of different treatments, box plots or jittered dots would be even more clear.

      We agree, and have now produced jitter versions of all barplots in Figures 3-6. We have added an additional replicate for several of the experiments, which very slightly affects pvalues. These changes did not affect the significance on any result except for the weak effect of Pacritinib on NF-κB-dependent luciferase activation (fig 4E), which is now no longer supported by our data. We have amended the text accordingly on line 361.

      Reviewer #2 (Public Review):

      Strengths:

      Overall, the manuscript is very well written and easy to follow even for people outside the fields, and provides insights into an important biological process and identifying much needed therapeutic targets. The authors reproduced various RTT phenotypes in human neural cells with reduces MECP2 expression and demonstrated the ability of the three drugs to rescue the phenotypic profiles. In doing so, the authors were able to shed light on some of the potential mechanisms of action through which these drugs operate. Given that all three drugs have approved safety profiles, with further pre-clinical investigation, these drugs could serve as potential therapeutic agents for Rett Syndrome.

      We appreciate your recognition of the merits of this work.

      Weakness:

      The biggest weakness of the paper is the lack of a strong link between comparative phylogenetic profiling and the identification of potential therapeutic agents. The paper is currently framed as a 'comparative genomic pipeline' to identify novel drug targets, yet the authors didn't demonstrate the robustness of the pipeline using appropriate positive and negative controls. Basic network analyses weren't performed to demonstrate a wide usability of the methodology beyond RTT.

      We thank the reviewer for this comment, which points to the need for better justification for and validation of our method. We have tried to address this concern substantially in the revision, both by improving our discussion of previous work on the phylogenetic profiling method and by providing new bioinformatic validation experiments comparing our MECP2 protein list with other interaction databases, notably STRING. These are detailed in our responses to specific queries below. Importantly, we have now cited two new technical publications from our group describing the NPP method and benchmarking it against other comparative methods using control datasets as you suggest (Bloch et al., 2020; Tsaban et al., 2021). Overall we have extensive experience in developing and applying phylogenetic profiling, including these more methodological papers.

      While the authors do a good job of demonstrating the RTT phenotype-rescuing abilities of the three drugs, they don't exhaustively demonstrate how their comparative evolutionary pipeline was essential for identifying the three drugs. MECP2 forms a complex with HDACs and all three of the drugs selected here have known direct/indirect effects on HDAC activity. It is therefore plausible that the drugs are mediating their effects through HDACs, in which case the comparative genomic pipeline was not required to select these drugs.

      While phylogenetic profiling is highly complementary to other interaction databases (as ndemonstrated by our previously published benchmarking results including Bloch et al., 2020, as well as the STRING comparison to our PP links for MECP2 above), we could not claim that it is the only path to a given drug or even a given protein. We use phylogenetic profiling because it can be automated and systematically applied for prioritization of candidates, using a prioritization logic that is biologically motivated and orthogonal to other techniques.

      This question, along with others by another referee, indicate that we did a poor job of relaying this in the initial version. In the revision, we better describe the benefits of our approach, both in terms of identifying minimally related genes (lines 81-89), as well as providing an unbiased approach which does not depend on experimental datasets (lines 89-98). We also discuss our new methodological publications from our group which describe the NPP method and benchmarking sensitivity and specificity against other comparative methods using control datasets from the CORUM, REACTOME, and KEGG databases (Bloch et al., 2020; Tsaban et al., 2021) in lines 104-109.

      We thank the reviewer for pointing out HDACs, which play a major role in the MECP2 pathway and provide a good case study. Indeed, MECP2 is linked to a number of HDAC and HDAC-associated proteins in the STRING functional network. While associations have been reported in the literature, we searched for direct HDAC interactions for all class 1&2 HDACs in DGIDB, GeneCards and OpenTargets, and found no links to the 3 drugs we identified here. We also used String to provide a ranked prioritization of the top 390 genes linked to MECP2 using the String score (using 390 genes because that is the same number we used for phylogenetic profiling). While this gene list (S390) did indeed contain all the HDACs and associated proteins, we could establish no links to DMF or EPO using these 390 genes and the same databases and methods we used for the phylogenetic gene list. Pacritinib was identified through the same well studied gene that we identified through phylogenetic profiling (IRAK1).

      Thus, while there have been studies linking these drugs to effects on HDACs (which we now discuss in our Introduction on lines 72-80 and our Discussion on lines 462-479), they can not be easily established based on automated searching of the drug target databases. It is possible that digging into more or different databases would provide these links, but it would also produce more false positives. To our knowledge, while several HDACs have been shown to be impacted by EPO and DMF in neural cells, this has not been described as the primary mode of action for either compound, nor have they been shown to impact HDAC6 which is the best established HDAC in the context of Rett Syndrome. The fact that neither DMF nor EPO have been tested up to this point in a Rett model gives some indirect evidence that they have not been highly prioritized by the Rett research community. However, we completely agree with the referee and now clearly state in the text (lines 477-484) the need for future studies to elucidate the direct mode of action we observe for these drugs in MECP2 depleted neural cells, including mediation by HDACs.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Taken together, the authors provide new mechanistic insights into the interplay of Pak1, Orb6 and Sts5 when fission yeast is grown in medium containing different amounts of glucose.

      1) The claim that absence of Orb6 and Pak1-dependent phosphorylation of Sts5 concentrates Sts5 predominantly to P bodies and causes downregulation of translation of target mRNA would need to be reconsidered. While Figure 3B shows colocalization among some Sts5(2A)+ and Sum2+ puncta, some Sts5(2A)+ Sum2- and Sts5(2A)- Sum2+ puncta are also present. The effect on translation levels should be validated by investigating at least a few other Sts5 mRNA targets (both the levels of mRNA and their protein products should be tested). The model presented in Figure 3F seems to be premature at this stage.

      We extended our results beyond Ssp1 in response to this suggestion. In Figure 4 of the revised manuscript, we report that Cmk2 protein levels are also reduced in the sts5-2A mutant, similarly to Ssp1. The transcript levels of both ssp1 and cmk2 were previously shown to be upregulated in sts5∆ cells (Nunez et al., 2016), consistent with lower protein levels in the hyperactive sts5-2A mutant. In addition, we performed Nanostring experiments to measure mRNA levels of Sts5-regulated transcripts and observed reduced levels for ssp1, cmk2, psu1, and efc1 in the sts5-2A mutant. These new results are shown in Figures 4C-F as well as Figure 4 Supplement 1. We have also updated the model (now Fig 4G) to reflectreduced Cmk2 levels in sts5-2A mutant cells.

      2) The difference in the rates of Sts5 puncta dissolution in presence or absence of punctate Pak1 (Figure 7C) is modest. The authors may consider investigating the physiological consequence of the observed difference in dissolution rates, which remains unclear. Further, not all Sts5+ puncta contain Pak1 (Figure 4D). It would be good to know whether the authors have considered scoring the dissolution rates of Sts5+ Pak1+ vs. Sts5+ Pak1- puncta separately.

      We agree that comparing the composition and dissolution kinetics of individual granules could reveal interesting differences. However, our analysis relied on fixed cells due to technical difficulties including photobleaching during time-lapse experiments. We attempted to address this question using several different approaches including microfluidics and chambered coverslips, but we were unable to reliably track and measure single granules. We will continue to trouble-shoot these issues going forward, but for the revised manuscript we focused on extending our fixed cell approach as in the new Figure 8D.

      Reviewer #2 (Public Review):

      Specific comments:

      1. It should be noted that observations regarding Orb6 kinase phosphorylation of Serine 86, and the role of this residue in regulating Sts5 granule assembly were previously published (Chen et al., 2019). The role of Pak1 in Sts5 phosphorylation, and the specific residues phosphorylated by Pak1 kinase were also previously identified (Magliozzi et al., 2020). Therefore, the fact that Orb6 and Pak1 phosphorylate distinct residues of Sts5 or that Orb6 regulates Sts5 IDR is not a novel discovery in this paper (as mentioned repeatedly in lines 83-84 , 99-100, 158-161, 177-78, 186-87, etc.). However, the fact that Pak1 function had additive properties to Orb6 kinase in controlling the state of Sts5 granule assembly, in particular under conditions of glucose deprivation, is a novel, interesting expansion of knowledge.

      We have revised the text to focus on the additive nature of this regulatory mechanism as suggested by the reviewer. It is worth noting that Figure 1 of our current manuscript is dedicated to establishing Sts5-S261 as a direct target of Pak1, which was not established in our 2020 publication and therefore represents a novel finding. Based on this novel finding which required in vitro reconstitution and phosphorylation-site mutants (Figure 1), we can now conclude that Pak1 and Orb6 phosphorylate distinct residues. We have attempted to be explicit and clear that Orb6 phosphorylation of Sts5-S86 was discovered by Chen et al., 2019, while stating that our contribution to understanding this additive mechanism relates to Sts5-S261. We hope that our wording appropriately balances credit for these discoveries, e.g. “Recent studies have shown that NDR/LATS kinase Orb6 similarly phosphorylates the Sts5 IDR to regulate its localization to RNP granules (Figure 2A) (Chen et al., 2019; Nuñez et al., 2016). Importantly, Orb6 phosphorylates S86 in the Sts5 IDR (Chen et al., 2019), while we have shown that Pak1 directly phosphorylates S261.”

      1. Importantly, the authors refer to "stress granules" in several titles (line 185, line 225, line 244). These statements do not correspond to the data presented and should be corrected. Stress granules (SG granules) are separate membraneless organelles that contain specific factors, differentiating them from P-bodies. The marker Sum2, which is used in the paper, does not identify stress granules, but rather it colocalizes with Dcp1, a component of P-bodies. Therefore, data presented here supports the role of Pak1 in the control of P-bodies, not stress granules.

      Thanks for this clarification. Based on this comment, we performed a series of additional colocalization experiments using a stress granule marker Pabp-mRFP. During glucose starvation, Sts5, Pak1, and Sum2 all strongly colocalized with Pabp (new Fig 5 Supp 2). Based on this result, we state in the revised text: “We refer to Pak1 and Sts5 localization to stress granules during glucose deprivation due to colocalization with Pabp, but we note that this result formally supports association with the overlapping stress granule and P body structures.”

      1. The 2% glucose control cells used for experiments shown in Figure 4 (Figure 4, Supplement 1) appear very stressed. P-bodies (as visualized by Sum2) do not condense in healthy, exponentially growing cells. While this effect does not invalidate the results of the experiments shown in Figure 4, it would need to be corrected.

      These cells were cultured in rich media and displayed no defects in growth rate or other proxies for cell health. Previous studies have shown that P bodies are constitutive structures that can be observed in exponentially growing fission yeast cells (Nilsson and Sunnerhagen, RNA, 2011; Wang et al., Mol Cell Biol, 2013; Wang et al., RNA, 2017) as well as budding yeast cells (Nissan and Parker, Methods in Enzymology, 2008). Consistent with these previous studies, we routinely observe P bodies in unstressed cells. These P bodies then become significantly enhanced upon stress such as glucose starvation.

      1. Extending the quantification in Figure 7B, to include numbers of Sts5 granules (not only overall fluorescence intensity) would give a more precise assessment of the effects of Pak1 removal on Sts5 granule disassembly.

      Thanks for this helpful suggestion. We performed this experiment and added the quantification in the new Figure 8D.

    1. Author Response

      Reviewer # 1 stated that “…some of the conclusions are premature, especially concerning the mechanism of action of the Ton complex in the catalyzed transport of vitB12. While the data show clear differences between the apo and vitB12 bound states of BtuB, the conclusions on the actual transport mechanism of vitB12 into the periplasm are more speculative.”

      We agree that these conclusions regarding the mechanism of action of TonB are more speculative, and we eliminated this part of the discussion and re-written two paragraphs towards the end of the discussion.

      The main concern of this reviewer is the conclusions reached from the data obtained with the R14A and/or D316A mutants. There is a clear dramatic change of conformation for the SB3 loop for these mutants upon substrate binding, and as shown the natural environment of BtuB is important to detect these changes. However, the authors state that "breaking the ionic lock and eliminate the electrostatic interactions of R14, as we have done here, should mimic the TonB bound state" (lines 487-488). The data presented in the manuscript do not support this statement as they do not allow to monitor the structural state of the N-terminal TonB box.

      The reviewer is correct that we do not know the precise state of the Ton box in this in-vivo system. All we can say is that one outcome of TonB binding would be to eliminate the electrostatic interactions that we have eliminated or reduced by mutation. In the present experiment, BtuB is in excess (by maybe a factor of 20 to 50) over TonB; as a result, populating a TonB bound state of BtuB to levels sufficient for spectroscopic measurements may not be possible (without modifying the system). We have removed the statement and re-written the paragraph in the discussion.

      Later it is proposed that "the movement of SB3 may also drive the movement of substrate" (lines 466-467) and that "this structural change may be sufficient to move the substrate into the periplasm" (lines 501-502). This is highly speculative, as in the structural states observed with the broken ionic lock, it cannot be determined if vitB12 is still bound to BtuB, or released in the periplasm. As noted, the "conversion of the transporter to the apo state does not occur under the conditions of the experiment" (lines 398-399). It is possible that this structural state is locked with a vitB12 bound and unable to complete the transport cycle.

      The reviewer is correct to state that B12 may be in a locked state. However, the fact that the transporter does not appear to cycle back to its apo state may have more to do with the high ratio of BtuB to TonB and the slowness of the transport system. But the reviewers’ point is valid and until we know more about how these mutants alter transport and can perform experiments to track the position of the substrate, the statement is speculative. We have removed it from the discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] The manuscript is excellently written and discusses the simulation results clearly and succinctly. The resolution of the simulations is very impressive and yields unprecedented insight into the effect of merozoite shape on alignment dynamics, which has important implications for how effectively the parasite can survive and multiply. The conclusions reached by the authors are certainly justified by the simulation data. In particular, the authors are careful not to draw conclusions beyond the limits of their study, and acknowledge other factors which may influence the merozoite shape, such as internal structural constraints and the energy of invasion following successful alignment.

      We thank the reviewer for a thorough reading of our manuscript and the very positive judgement.

      Regarding weaknesses of the manuscript, some of the explanations of the trends observed in the simulation data could be expanded slightly, to help gain a deeper understanding of the competition between adhesion and RBC deformability underlying the alignment dynamics. These are described in more detail below.

      1. Line 114 and lines 120-129: The discussion here of the trends observed in Figure 1 (including why the LE shape has a larger energy compared to the OB shape despite having a smaller adhesion area) is somewhat vague and should be developed further. For example, currently there is only a video showing the egg-like shape and a second video comparing the LE shape to a spherical shape - it would be helpful to have a further video comparing the LE and OB shapes and the different RBC deformations they cause. Moreover, the explanation of the energy/mobility of each shape in terms of curvatures (e.g. the OB shape having "lower curvature at its flat side") could be made more precise. I would expect that the adhesion area depends on how close the principal curvatures of the merozoite surface are to being equal and opposite to the natural curvatures of the RBC, since this determines the bending energy associated with wrapping the merozoite and forming short bonds. This would explain why the spherical shape is most mobile (its principal curvatures are constant so there is no region where at least one is relatively small), and why alignment is most likely to occur in the dimple of the RBC where the membrane is naturally concave-outward. For a given adhesion area, the deformation energy should depend on the difference in principal curvatures in the contact region, with a larger difference causing more bending of the RBC membrane. This difference is larger for the LE shape, since one principal curvature remains large at each point on the surface, compared to the OB shape whose principal curvatures are both small on the 'flat side' where contact is most likely to occur.

      We have expanded the discussion of these results to make it clearer. Furthermore, a new video was generated to visually see differences between different shapes.

      1. Lines 175-176: Given that the ratio A_m/A_s (adhesion area to total surface area) plays a key role in the probability of alignment, the authors should be more quantitative at this point. How does the ratio A_m/A_s (as measured directly, or indirectly e.g. by the area under the probability distributions inside the alignment region in figures 3a,b) scale with the system parameters, such as the adhesion strength and the off-rate k_off? Can it be estimated from an energy balance between RBC bending/stretching and the average adhesion energy?

      A change in A_m as a function of adhesion strength can be estimated analytically for a sphere, as was done in Hillringhaus et al. Biophys. J. 117:1202, 2019. For small deformations, there is essentially a competition of bending and adhesion energies, while for strong adhesion, stretching-elasticity contribution becomes important. We have included this theoretical result into the manuscript and discuss its implications.

      1. Line 197-198 and Figure 4c: Why is the deformation energy associated with the OB shape much lower than all other shapes for values of k_off/k_on^{long} smaller than 2?

      For k_off/k_on^{long} < 2, the magnitude of local curvature has a pronounced effect. For the OB shape, a large adhesion area is formed over the area with very low curvature, and close to the rim where the curvature is large, the adhesion strength may not be strong enough to induce membrane wrapping and deformation. For other shapes, the adhesion strength is large enough to lead to partial wrapping of the parasite by the membrane over moderate curvatures. As a result, the integrated deformation energy is significantly lower for the OB shape than for the other shapes in this regime of adhesion strengths. We have added this clarification to the manuscript.

      1. Alignment requires that the distance between the merozoite apex and RBC membrane is very small, and the alignment criteria necessitate examining small changes in the apex angle \theta from \pi. Can the authors comment on how sensitive are the results to the numerical discretisation used?

      The discretization length does affect the tightness of the alignment criteria. In our simulations, the average discretization length of the RBC membrane is about l0=0.2 m. The half circumference length of a parasite (corresponding to angle ) is R, which is equal to about 12 l0 for R=0.75 m, such that our angle resolution with respect to the parasite size is 0.1. Therefore, we use 0.2 for the alignment criteria, which is large enough to avoid strong discretization effects. Simulations with a finer discretization are possible, but they become very expensive computationally.

      Reviewer #2 (Public Review):

      [...] A major strength of the results is that it investigates an unstudied problem in malarial pathogenesis. The results pertaining to adhesion strength may be informative for preventing the organism from invading red blood cells. A primary weakness is that there is too little detail provided in the methods for this reviewer to adequate assess the computational method. Secondly, the results are somewhat inconclusive. While the egg-shape performs better than certain other shapes, there is no clear final understanding why this shape is preferred over the spherical or short ellipsoidal shapes. However, this possibly provides some clues as to why a certain malarial species does actively adopt a spherical shape during red blood cell binding and invasion.

      We thank the reviewer for a positive judgment of our manuscript. We have significantly expanded the methods section, so it should contain now all necessary simulation details. We agree with the reviewer that the conclusions about shape advantages/disadvantages are equivocal to some extent, but this is exactly what our simulation data show. However, from our data it is clear that the two shapes (i.e. egg-like and sphere) stand out, and they also correspond to real examples of merozoite shapes. As the reviewer points out, we do discuss some clues for the importance of parasite shape in the alignment process.

      Overall, the authors achieved their aims by quantitatively assessing the affect of parasite shape and adhesion strength on cell alignment, which is a proxy for invasion. The discussion at the end of the manuscript provides an accurate evaluation of the results that puts them into the context of invasion. While to some extent the results presented here are inconclusive, I do think that this paper achieves an important goal for its field. This is an understudied area pertinent to a major disease. This manuscript has the potential to bring questions of the biophysics of malarial invasion out to the broader community, specifically introducing these questions to biophysicists as well as microbiologists. Furthermore, the results naturally lead to new questions. If the spherical and egg shapes do not confer a strong advantage, then these specific shapes must also play a role in other processes. The authors do suggest some possibilities in the Discussion. That their remain interesting questions is a great spur for future work.

      Thank you for emphasizing the importance of multidisciplinarity. We also hope that our work will ignite interest in different communities, as only a multidisciplinary effort can bring us much closer to understanding of parasite alignment and invasion, which clearly include a combination of different mechanical and biochemical processes.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Their studies were complemented by transcriptomics and metabolomics and these results support the general conclusions that pollen contains diverse carbon sources which could be used in complementary ways by the different species, which have diverse metabolic capabilities encoded in their genomes.

      Reply: We thank the reviewer for the positive assessment of our manuscript.

      One of the points that was not completely explored in the paper is what happens in the simplified diet both in vitro and in the Bee gut. They propose in the discussion that in the presence of few and simple carbon sources (sugars) there is competition for nutrients and competitive exclusion is driving loss of some species. But this is not fully addressed in the paper.

      Reply: All four species can colonize the gut individually and grow on their own in axenic cultures when providing the simple sugars or the pollen as the only carbon source. When cultured together, all four strains are stably maintained in the presence of pollen. However, three of the four strains steadily decrease in abundance in the simple sugars. These findings are, in our opinion, consistent with the consumer-resource model (more resources = more species that can coexist) and the competitive exclusion principle which predicts that if two or more strains compete for the same nutrients they will not be able to coexist. We have added a corresponding section on line 423-425.

      The system they use (with 4 closely related bacterial species) is a simplified system. Therefore, it is not clear if the same general findings will hold in more complex systems. But the results supporting that nutrient complexity (in diet) and metabolic diversity (from the microbial side) are key factors to enable co-existence and persistence of complex microbiota communities are strong and likely generalizable. Although, it is possible that with other communities and other hosts other factors will also come into play. Nonetheless, the current study is important because it sets a good example for how these questions can be addressed to study more complex systems.

      Reply: It is true that bacterial coexistence does not necessarily need to be dependent on the nutrient complexity and that in other communities the host, the structure of the environment, or cross-feeding activities may play a more important role. We have discussed this point in the revised manuscript starting on line 423 and on line 427.

      Overall, the study described here is complete, and rigorous, except for a few points that still need to be addressed and clarified. Namely, it would be interesting to understand what drives exclusion of some members of the community in the simplified diet.

      Reply: See our reply above.

      Importantly, the current study opens the door for new studies (including in vitro studies) on the identification of network interactions that are important for Microbe-Microbe interactions that enable co-existence in other systems. Additionally, this study also highlights the importance of identifying the relevant nutritional (and metabolic) conditions for addressing those questions given the importance of the metabolic context in shaping microbe-microbe interactions.

      Reply: Thank you. We agree.

      Reviewer #2 (Public Review):

      [...] Strengths: The use of community profiling, transcriptomics, and metabolomics adds depth, as does the comparison of defined culture conditions to the host environment. The main conclusions drawn by the authors is that the presence of pollen is necessary for gut species to coexist, and that the different species, although closely related, respond in distinct ways to nutrients in pollen and consume different profiles of nutrients from pollen.

      Reply: We thank the reviewer for the positive feedback and the many valuable comments which helped us to further strengthen our manuscript.

      Weaknesses: The main weakness I see with this work is the choice of in vitro comparison conditions. The strains are cultured either on pollen or sugar water, whereas in vivo bees are fed a diet of pollen and sugar water, or only sugar water. A direct comparison is possible between the strains grown on sugar water in vitro or in vivo, but I think that in several places, the authors may have to reconsider or modify their interpretations comparing in vitro culture on pollen/pollen extract with the in vivo growth of the community on pollen and sugar water. Because there is sugar in the bee diet, differences in assembly dynamics, transcription, or metabolite consumption between pollen-containing culture conditions and the bee gut might stem from the dietary intake of sugar, or from an aspect of the host environment.

      Reply: We agree with the reviewer that the nutrient conditions that were used in vitro and in vivo are not identical and may have impacted the relative abundance of some of the community members, the transcriptional profiles, or the metabolite changes. Nevertheless, we believe that our experimental design is valid to test the main hypothesis of our study, i.e. a complex, pollen-based diet facilitates coexistence, while simple sugars lead to the dominance of a single strain independent of the environment (culture tube versus host). An important point to consider here is that bees will pre-digest the consumed pollen, and partially absorb dietary nutrients such as amino acids, glucose, and fructose, before they reach the bacteria in the hindgut. Consequently, the in vivo and in vitro conditions will never be the same even if we would have used the identical nutrients in our treatments. Also, pollen by itself contains glucose, fructose, and sucrose. So, although we have not added glucose to the in vitro pollen condition, this simple sugar was present in the corresponding condition. We have added a corresponding section in the discussion on line 402-422. This said, while we cannot recapitulate the exact same nutritional conditions in vitro, we still think that our main conclusions hold which is that we can recapitulate the pollen-dependent coexistence found in vivo.

      Reviewer #3 (Public Review):

      [...] Overall, the paper is strong and the arguments and conclusions put forth are well supported by the data. I only have a few suggestions:

      Reply: We thank the reviewer for the positive evaluation of our manuscript.

      1) The study focuses on one strain each of the 4 Firm-5 species; however, there is diversity within each species. This is only briefly mentioned in the paper at the very end, and I think the authors should address this a bit more directly. In particular, they have previously generated a large amount of genomic data from some of these other strains, so it is likely possible to infer or speculate, based on this data, whether they expect different strains within each species to utilize similar nutrients. Also, I'm wondering if the authors can comment on how their findings could extend to the related bumble bee gut microbiome. Such a discussion would help enhance the applicability and importance of this study.

      Reply: We agree that the large amount of strain-level diversity within a given species is an important point to consider. However, we would like to not expand this point much further as it would require a relatively complex genomic analysis. Also, considering that many of the strain-specific transcriptional changes are in genes shared with the other species, I am not sure how much such an analysis would reveal. Anyways, we plan to compare the coexistence between strains from the same versus another lineages in a follow-up study.

      As for the bumble bees, we currently do not know how many strains or species of Lactobacillus Firm5 can coexist in bumble bees. Therefore, we feel that a discussion extending to bumble bees would be too speculative. However, we included a sentence in the discussion which states that since pollen facilitates coexistence, it follows that dietary differences are likely to influence the diversity of Lactobacillus Firm5 and give the example of the Asian honey bee, which seems to only harbor one species of this phylotype. See line 479-488.

      2) It is interesting that different species ended up dominating in the in vivo vs. in vitro simple sugar-based communities. What do the authors think may be behind this difference?

      Reply: This is indeed an interesting point. We have not used the same sugars in vivo (sucrose) and in vitro (glucose). Moreover, the nutritional and physicochemical conditions in the hindgut are likely different from those found in a culture tube. We have mentioned that these are potential reasons for the observed differences in the relative abundance of different community members between in vivo and in vitro conditions on line 402-422 of the manuscript.

      3) Since the observed coexistence of these gut microbes is largely due to nutritional niche partitioning, it would be helpful if the authors can comment on the natural variation of key pollen derived metabolites, and if/how we could expect ecological variation in the bee microbiome due to plant pollen availability based on biogeography and seasonality.

      Reply: We agree and have included a corresponding sentence in the discussion on line 479. See also our reply to point 1.

      4) The supplementary information is nicely documented and accessible, but I think it would be even more useful if genome-wide data for the RNA-seq results, not just for select genes, are made available. Furthermore, I suggest including descriptive titles and labels within the supplementary Excel files, as there are many separate sheets and it is not always clear what each one shows.

      Reply: This has been included in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Key issues:

      • The main claim of the versatility of Image3C comes from the idea that it can extract image features even without reagents such as antibodies. The authors seems to have omitted a large body of work in the field of label-free imaging. There are many optical or computational methods to obtain useful cellular features without any chemical labels.

      We agree with the reviewer that there are many label-free imaging tools already published. We listed several of them in the new table (Table 1) where we compare label-free phenotyping and cell clustering approaches. We took into consideration on which samples the tool was tested, the need of prior knowledge of the sample and/or species-specific reagents at any point of the process, and the hardware and software required. As far as we know, our tool is the only one that does not require a priori knowledge of the sample and/or species-specific reagents at any point of the pipeline or for training the neural network. With this work, we provide the community with a tool that is able to perform de novo clustering and that does not require custom-made pieces of equipment (lines 137-142 and 618-627).

      • One core issue is that the entire pipeline is strongly dependent on the use of the ImageStream Mark II system. In particular, the cell image feature extraction step is performed by the IDEAS software that is associated with the imaging system. This limits the general applicability of the Image3C tool.

      We agree that the ImageStream is necessary for acquiring the images. This piece of equipment is already present in many Institutes/Departments and we hope that in coming years researchers will increasingly have more access to thiis type of equipment. We selected the ImageStream for our study because of the high reproducibility of image aquisition that allows comparison between multiple experiments performed in different days. Image3C could also be applied to microscopy images, but, in this case, users will have to control for batch effects if images are acquired from different slides or in different days. This is now discussed in the text (lines 137-142).

      • Image3C actually contains many separate pieces of computer code. The R code is available as many separate R scripts, and dependent R packages had to be installed manually prior to running the scripts. The clustering step requires a different Java tool called Vortex, which was developed by another group. The clustering results are then analysed by an R script. For data exploration, we also need a different tool called FCS Expression Plus. The set up of the CNN classifier require another set of python scripts. To ensure the Image3C pipeline is indeed as robust as it claims, the authors could consider converting their codes into R and Python packages, with streamlined installation instructions, and example code for users to test.

      We thank the reviewer for this comment. To make our tool more accessible, we have: 1) thoroughly revised the readme document in the GitHub page making it clearer and more detailed; 2) added tutorial videos to the GitHub page to show how to use some of the key software mentioned in the manuscript; 3) saved the R code as a markdown page to make it more user friendly; 4) renamed the example files that can be used to test our pipeline to make them more self-explanatory; 5) saved the Python code in the more user friendly Jupyter Notebooks; and 6) transformed Figure 1-supplement figure 1 in an interactive map where it is possible to click on the different steps of the Image3C pipeline and be automatically directed to the corresponding section of our GitHub repository page (lines 662-667). We also included in the Material and Methods a more detailed explanation for the rationale behind our choices of softwares/algorithms (lines 618-627 and 650-654).

      • In the software pipeline, cell phenotype feature extraction and clustering are performed by other existing tools. The other main new contribution appears to be training of a CNN model to perform cell-type classification, in which the cell type labels were defined by clusters identified in an unsupervised analysis. From a technical point of view, there is little novelty as all the methods are quite standard.

      We agree that some of the individual components were already available. However, when applying existing tools to our datasets, we did not obtain satisfactory and reproducible results. We then developed original R codes for clustering events based on the features extracted from IDEAS, and we created a pipeline that allows users to cluster cells in a given sample de novo without the need of cell-type markers and a priori knowledge of the tissue composition. Our pipeline allows users to take full advantage of such tools to characterize new cell populations coming from less established research organisms.

      • The authors claim that the CNN classifier was trained in an 'unsupervised' way. I have a strong reservation about this claim. In machine learning, classification by definition is supervised. The fact that cells are labelled by the cluster label does not make it an unsupervised task with respect to the classification task. I think what the authors really means is that the cell types (class labels) do not have to be known prior to analysing the sample.

      We thank the reviewer for pointing this out and we apologize for the mistake. The reviewer is correct, and we did not use the word “unsupervised” in the appropriate way. In the text, we rephrased all the sentences in which the CNN was mistakenly defined as “unsupervised”.

    1. Author Response

      We are glad to know others in the field find our approach and results valuable and appreciate the positive feedback.

      Regarding protein stability, it was not our intention to determine the mechanism by which amino acid substitutions alter LasI/R activity. Indeed, sensitivity and selectivity could be altered for a number of reasons including changes to protein stability. This is especially true for LasR. For AHL receptors signal binding and protein stability are intricately linked. AHL receptors are usually unstable/insoluble in the absence of bound signal and changes in signal affinity frequently result in changes to receptor solubility. We have added a discussion of this to the text (lines 186-219, 252-254). We have also conducted experiments to measure LasR solubility in a subset of our variants (new Figure 4–figure supplement 2; main text lines 219-224).

      Reviewer #2 raises an interesting question about the consequences of LasI/R variants on P. aeruginosa sociality. We believe reporter assays are the most appropriate method by which to directly study LasR selectivity and sensitivity. Whether these changes to LasI/R sensitivity and selectivity result in altered sociality is a very intriguing, but separate, question and the subject of planned future research.

    1. Author Response

      Reviewer #1 (Public Review):

      We would like to thank the reviewer for commenting on the strength of the approach. We addressed the question regarding the effect of Ca2+ and Mg2+ on the association of MICUs with MCUcx in Essential Revisions points 1 and 2 (below).

      It remains an open question why MICU1-/- mitochondria have increased mitochondrial Ca2+ uptake in the low [Ca2+]i range. The answer to this question may not be straightforward since MICUs might have effects outside the MCUcx. Extensive future studies will be required to answer this question definitively.

      Reviewer #2 (Public Review):

      We would like to thank the reviewer for the encouraging comments. Potential biphasic effect of Ca2+ on MICUs was addressed in Essential Revisions point 1 (below).

      Reviewer #3 (Public Review):

      Thank you very much for commenting on the strengths of the manuscript. We agree that the data presented will foster further experiments and discussions regarding MICUs and will eventually reveal their true mechanism of action.

      Essential Revisions 1)

      The reviewers raise an important question, why there is an apparent discrepancy between the findings of two methodological approaches. Why at low levels of cytosolic [Ca2+], measurements of mitochondrial Ca2+ uptake do not detect a Ca2+ influx, while electrophysiological measurements find no evidence for conduction occlusion of MCUcx by MICUs. Our short answer is that measurements of mitochondrial Ca2+ uptake do not exclusively measure conductance by the MCUcx while the electrophysiological recording presented here do. We examined MCU currents, a direct measurement, and provide new evidence that could partially explain why the net uptake (uptake minus efflux) is low. In newly added experiments (Figure 7) we now show that the conduction pathway of MCUcx is not plugged by MICU1-- even at physiological levels of [Mg2+]. Instead, Mg2+ strongly blocks the selectivity filter of the pore. In addition, the unitary conductance of an open MCUcx channel at 100 nM [Ca2+] is extremely low. The MCUcx flux is thus limited not only because of the low concentration of the conducting ion but also because of Mg2+ block. This allows Ca2+ efflux machinery to effectively compete with MCUcx-mediated Ca2+ uptake at low cytosolic [Ca2+] to reduce net Ca2+ accumulation to nearly zero.

      “Why does knocking out MICU1 tends to increase the net mitochondrial Ca2+ uptake at low levels of [Ca2+]?” This is a question that our study does not directly examine. However, we can speculate -- based on recent other studies (see Gottschalk et al., 2019; Tomar et al., 2019; Tufi et al., 2019) -- that knocking out MICU1 affects multiple mitochondrial systems and not only the MCUcx. Here, we investigated the specific role that MICU1 play as part of the channel complex and do this by directly examining the MCUcx current. We see no evidence that supports the hypothesis that MICU1 acts as a plug of the pore. Instead, our work shows that at elevated levels of [Ca2+], Ca2+ binding to the EF hands of the MICU1 works to double the open probability of MCUcx. The binding of Ca2+ to the EF hands of MICUs, or any consequential effects of this binding occur at levels of [Ca2+] that are higher than 100 nM. Indeed, the available titration data demonstrates that the affinity of the MICUs for Ca2+ is not high enough for EF hands occupancy to occur at 100 nM. Even for MICU1, which has the highest affinity for Ca2+ of all MICUs, no significant binding occurs at 100 nM (resting cytosolic Ca2+), and complete saturation of binding would only occur around 3-6 μM (Kamer et al., 2017) (Figs. 1E, 1G, 2F, 2H, 2J and 3H). Thus, the quantitative evidence also suggests that Ca2+ occupancy of the EF hands at resting cytosolic Ca2+ is too low to explain any profound occlusion.

      We agree with the reviewers comment that Wang et al 2020b showed the possibility of the occlusion in the native dimeric form of the MCUcx. However, the occluded dimeric complex represented only ~10% of the total number of analyzed particles in the absence of Ca2+ . The low prevalence of the dimeric complexes can be purely due to the experimental limitations, but, regardless, a more thorough analysis of occlusion in this native form of MCUcx is needed. Also, such structural analysis should be performed in the presence of physiological concentrations of Mg2+, while so far Mg2+ was absent in all structural studies of the MCUcx holocomplex. We have rewritten the paragraph discussing MCUcx structures to reflect these changes.

      Although the occlusion model appears to be consistent with some of the mutations that disrupt electrostatic interactions in the plug structure, all such studies were performed using indirect assessment of MCU function. Such mutations can cause MICUs loss-of-function effects (that is not necessarily loss of occlusion) and could lead to activation of compensatory mechanisms that create an appearance of “the loss of the threshold”, similar to that observed in MICU1 knockout. To interpret the structures correctly, reliable direct functional data is much preferred, as it has always has been the case in the ion channel field.

      Essential Revisions 2)

      We have added a set of new electrophysiological experiments to address the effect of Mg2+ on Ca2+ currents. These experiments examine how Mg2+ affects the Ca2+ conduction through MCUcx. There are two main conclusions. First, Mg2+ interacts with the selectivity filter within the pore of MCUcx to occlude Ca2+ permeation. This effect is completely MICU-independent. Second, the Ca2+-dependent potentiating effect of MICUs on ICa does not depend on Mg2+.

      We would also like to reemphasize that in this study we did not center our investigation on the net mitochondrial Ca2+uptake, but focus specifically on MCUcx activity. The increase in net mitochondrial Ca2+ uptake in MICU1-/- vs WT was observed both in the presence (Csordas et al., 2013) or absence of Mg2+ (Mallilankaraman et al., 2012). Thus, the putative occlusion of the MCU pore by MICUs, if it exists, would be a Mg2+-independent phenomenon.

      With these new results, we will revise the results and the discussion sections. We will highlight the impact that physiological Mg2+ block has, in limiting MCUcx flux at low Ca2+. We will also emphasize the importance of reevaluating the structure of MCU holocomplex in the Mg2+ bound conformation. We thank the reviewers for prompting this addition.

      Per reviewers’ request, we will clearly indicate in the text that the use of EDTA removes not only Ca2+ but also Mg2+.

    1. Author Response:

      Reviewer #1:

      By sequencing a large number of SARS-CoV-2 samples in duplicate and to high depth, the authors provide a detailed picture of the mutational processes that shape within-host diversity and go on to generate diversity at the global level.

      1) Please add a description of the sequencing methods and how exactly the samples were replicated (two swaps? two RNA extractions? two RT-PCRs?). Have any limiting dilutions been done to quantify the relationship between RNA template input and CT values? Also, the read mapping/assembly pipeline needs to be described.

      Limiting dilutions were not performed however the association between Ct and discordance between replicates was explored. Samples with Ct>=24 were found to have considerable discordance between replicates, likely resulting from a low number of input RNA molecules. This is described in the first section of the results and illustrated in Figure 1 - figure supplement 3.

      We have now added additional sections to the methods to better describe the sequencing and mapping pipelines.

      Sequencing: A single swab was taken for each sample. Two libraries were then generated from two aliquots of each sample with separate reverse transcription (RT), PCR amplification and library preparation steps in order to evaluate the quality and reproducibility of within-host variant calls. The ARTIC protocol v3 was used for library preparation (a full description of the protocol used available at dx.doi.org/10.17504/protocols.io.be3wjgpe).

      Alignment and variant calling: Alignment was performed using the ARTIC Illumina nextflow pipeline available from https://github.com/connor-lab/ncov2019-artic-nf...

      2) I find the way variants are reported rather unintuitive. Within-host variation is best characterized as minor variants relative to consensus (or first sample consensus when there are multiple samples). Reporting "Major Variants" along with minor variants conflates mutations accumulated prior to infection with diversity that arose within the host. The relative contributions of these two categories to the graphs in Fig 1 would for example be very different if this study was repeated now. Furthermore, it is unclear whether variants at 90% are reversions at 10% or within-host mutations at 90%. I'd suggest calling variants relative to the sample or patient consensus rather than relative to the reference sequence (as is the norm in most within-host sequencing studies of RNA viruses).

      We are grateful for this comment and have tried to improve and clarify the reporting of variants to align with previous literature.

      Our original classification intended to classify non-reference sites as fixed changes (VAF>95%) or within-host variants (which we called “minor variants”). While we chose 95% as a cutoff (which may have been confusing), the results are analogous with a 99% cutoff, as variants in this set essentially have VAF~100%, and nearly all are expected to have occurred in a previous host. Thus, the previous classification intended to cleanly separate inter-host (fixed) mutations from within-host mutations, to compare their patterns of selection and their mutation spectra.

      Following the reviewer’s request, we have modified this classification to better align with other studies of RNA viruses by defining the majority allele at a site as the “consensus”. We note that the results remain largely similar, since the vast majority of within-host variants identified had a low VAFs (<<50%) with the majority/consensus allele most often corresponding to the reference (Wuhan) base.

      When considering recurrent mutations we now discuss the number of times variants are observed at each location within a sample. This avoids the issue of how variants are polarised.

      3) It is often unclear how numbers reported in the manuscript depend on various thresholds and parameters of the analysis pipeline. On page 2, for example, the median allele frequency will depend critically on the threshold used to call a variant, while the mean will depend on how variation is polarized. Why not report the mean of p(1-p) and show a cumulative histogram of iSNV frequencies on a log-log scale including. I think most of these analyses should be done without strict lower cut-offs or at least be done as a function of a cut-off. In contrast to analyses of cancer and bacteria, the mutation rates of the virus are on the same order of magnitude as errors introduced by RT-PCR and sequencing. Whether biological or technical variation dominates can be assessed straightforwardly, for example by plotting diversity at 1st, 2nd, and 3rd codon position as a function of the frequency threshold. See for example here:

      https://academic.oup.com/view-large/figure/134188362/vez007f3.tif [academic.oup.com]

      There are more sophisticated ways of doing this, but simpler is better in my mind.

      It would be good to explore how estimates of the mean number of mutations per genome (0.72) depend on the cut-offs used. A more robust estimate might be 2\sum_i p_i(1-p_i) (where p_i is the iSNV frequency at site i) as a measure of the expected number of differences between two randomly chosen genomes. Ideally, the results of viral RNA produced of a plasmid would be subtracted from this.

      The reviewer raises a number of important points that we have tried to address and clarify.

      We think that the quality of our variant calls is supported by several lines of evidence, including: (1) the use of the ShearwaterML calling algorithm, which uses a base-specific overdispersed error model and calls mutations only when read support is statistically above background noise in other genomes, (2) we use two independent replicates from the RT step, (3) we provide several biological signals that cannot be expected to arise from errors, including the fact that the mutation spectra of low VAF iSNVs called in our study recapitulate that of consensus mutations and the clear signal of negative selection acting on iSNVs. We note that this dN/dS analysis is closely related to the suggestion by the reviewer of comparing the frequency of mutations at positions 1/2/3 of a codon.

      To address this comment in the manuscript, we have amended the text to include these arguments and we provide two new supplementary figures: (1) a figure of the frequency of mutations at the three codon positions, as requested by the reviewer, and (2) the mutation spectra of low VAF iSNVs, demonstrating the quality of the mutation calls. Similar to the finding in Dyrak et al., (2019), and as expected from the dN/dS ratios, the distribution of variant sites is dominated by variants at the third position and not equally distributed as one might expect if errors were dominating the signal.

      We have amended the relevant section of the text to read:

      “To reliably detect within-host variants with the ARTIC protocol, we used ShearwaterML, an algorithm designed to detect variants at low allele frequencies. ShearwaterML uses a base-specific overdispersed error model and calls mutations only when read support is statistically above background noise in other genomes \cite{Gerstung2014-av,Martincorena2015-ef} (Methods). Two samples were excluded, as they had an unusually high number of low frequency variants unlikely to be of biological origin, leaving 1,179 samples for analysis, comprising 1,121 infected individuals of whom 49 had multiple samples. For all analyses we used only within-host variants that were statistically supported by both replicates (q-value<0.05 in at least one replicate and p-value<0.01 in the other, Methods). Within each sample, we classified variant calls as `consensus' if they were present in the majority of reads aligned to a position in the reference or as within-host variants otherwise. The allele frequency for each variant was taken as the frequency of the variant in the combined set of reads for both replicates.”

      ...

      “The use of replicates and a base-specific statistical error model for calling within-host diversity reduces the risk of erroneous calls at low allele frequencies. We noticed a slight increase in the number of within-host diversity calls for samples with high Ct values, which may be caused by a small number of errors or by the amplification of rare alleles and that could inflate within-host diversity estimates (Figure 1 - figure supplement 3) \cite{McCrone2016-se}. However, the overall quality of the within-host mutation calls is supported by a number of biological signals. As described in the following sections, this includes the fact that the mutational spectrum of within-host mutations closely resembles that of consensus mutations and inter-host differences and the observation of a clear signal of negative selection from within-host mutations, as demonstrated by dN/dS and by an enrichment of within-host mutations at third codon positions \cite{Dyrdak2019-xk} (Figure 1 - figure supplement 4).”

      Whilst we believe the remaining variant calls are reliable we acknowledge that how variants are polarised could impact some of the summary statistics reported. To help improve this we have amended Figure 1 to include a cumulative histogram of within-host variant frequencies on a log-log scale as suggested by the reviewer. We have also included estimates of the mean value of sqrt(p(1-p)) (indicating an estimate of the standard deviation of within-host variants assuming a Bernoulli distribution). We have also replaced the estimates of the mean number of mutations per genome with the expected number of differences between two randomly chosen genomes. The amended Figure 1C now displays a histogram of the expected number of differences between two genomes for each sample rather than the mean number of mutations.

      4) This paper provides an important baseline characterization of within-host diversity, while the patterns themselves are not extremely surprising. It is thus important that the data are provided in a form that facilitates reuse. It would be helpful to provide intermediate analysis results in addition to the raw reads in the SRA and the shearwater calls. I would like to see simple csv tables with the number of times A,C,G,U,- was observed at every position in the genomes for every sample. This would greatly facilitate the reuse of the data.

      We have now added raw count tables for each sample and each replicate to the GitHub repository. We have also archived this data using Zenodo to ensure it remains easily accessible.

      Reviewer #2:

      The paper by Tonkin-Hill and colleagues describes the analysis of intra-host variation across a large number of SARS-CoV-2 samples. The authors invested a lot of effort in replicate sequencing, allowing them to focus on more reliable data. They obtained several important insights regarding patterns of mutation and selection in this virus. Overall, this is an excellent paper that adds much novelty to our understanding of intra-host variation that develops during the time course of infection, its impact on transmission, and what we can or cannot learn on relationships between samples.

      We are grateful to the reviewer for their positive comments.

    1. Author Response:

      Reviewer #1:

      Authors reported here the results of two experiments. The first is about the effects of continuous theta burst transcranial magnetic stimulations on single cell responses of lateral parietal cortex of the monkey. This experiment is very challenging, requiring to obtain a stable and long-lasting signal from single cortical neurons and to stimulate constantly the same cell for an hour (they succeded also for longer periods). The paper represents a technical advance in the field and deserves attention and suggests a useful, through difficult, protocol to be replicated by other scientists.

      The second experiment tests the behavioral grasp-related effects of two TMS theta burst protocols. Authors demonstrate a long-lasting increase in the grasping time after TMS.

      A negative aspect is that the two experiments are not carried out on the same animals, and the results of the second experiment seem somehow not completely logically connected to the results of the first. Particularly important for the scientific community is the first experiment, that shows that the neural excitability is significantly reduced within the first hour after rTMS. This experiment demonstrated also a variability of effects (hyperexcitation, hypoexcitation, variable delays of recovery), that can be seen as a potential disadvantage in such TMS protocols, and an index of the different effects that may be obtained in the same experiment across subjects.

      Overall, the paper represents a step forward in the neuromodulation experiments in nonhuman primates.

      Reviewer #2:

      Romero and colleagues designed an experiment to describe the neural and behavioral effects of continuous theta burst stimulation with the explicit aim of solving the problem of inter-subject variability of cTBS effects on human behavior. They describe two independent experiments in which cTBS was applied to the inferior parietal lobule of two monkeys per each experiment. In the first experiment the authors measure the activity of single units in response to light-on and to single-pulse TMS (spTMS). In the second experiment the authors describe the effect of cTBS on reaching time in a reach-grasp task. The results indicate a great variability on single neurons that follow different patterns of response to cTBS. In the second experiments the results show a systematic increase in reach-grasp time following cTBS. The authors provide a reasonable description of neuronal activity following their cTBS protocol but do not respond to the main issue of explaining inter-subject variability of human cTBS. The data has the merit of providing neural bases of the "delayed" effects of human cTBS.

      We thank Reviewer #2 for his/her comments and for asking us to provide a possible explanation for the inter-subject variability of human cTBS. This has now been added in the Discussion (page 20). ‘The reproducibility of our results was most likely related to the very controlled conditions in which we applied cTBS in monkeys. Most importantly, the TMS coil was rigidly anchored to the head implant of the animal, so that we kept both the position and the orientation of the coil similar across sessions. However, another possibility is that monkeys become highly overtrained in the grasping task, which may partially explain the similar behavioral effects of cTBS we reported in Merken et al. (2021). It is therefore plausible to assume that the larger variability inherent to human behavior is one reason underlying the variability of cTBS effects in humans, since stimulation is applied over a brain area in subjects at different levels of learning stages and behavioral performance, ultimately impacting on the susceptibility of that brain area to cTBS and increasing inter-individual variability of the technique.’

    1. Author Response:

      Reviewer #1:

      In this paper, Wammes et al. used fMRI to investigate changes in representational similarity of temporally paired images in hippocampal subfields. The stimuli were designed to parametrically vary in their visual similarity so that individual pairs covered the entire range of visual overlap, which was behaviourally validated by a separate sample of participants. The authors compared the neural patterns evoked by these pairs of stimuli before and after participants completed a statistical learning task. The findings showed that pre- to post-learning, representations in the dentate gyrus reconfigured to fit a cubic model, consistent with the non-monotonic plasticity hypothesis (NMPH).

      This is an interesting, novel approach with a clever stimulus manipulation which addresses a gap in the current literature. The study is well-motivated by theory, the analyses are appropriate and clearly described, the implemented controls are carefully designed, and the manuscript is well-written. However, it is unclear whether the same principles necessarily generalize beyond visual similarity, and whether these neural patterns meaningfully relate to behaviour.

      1) The analytic approach is well-designed and the results clearly address the hypotheses. However, it seems like the conclusions might be dependent on this learning paradigm, which should be discussed in a bit more detail and made clearer. The present statistical learning approach is somewhat implicit in its nature and relies on the participants gradually recognizing the temporal links between stimuli. In contrast, in most prior studies cited in the present manuscript, participants were explicitly instructed to make associations between stimuli that either occurred on the screen simultaneously, or relatively far apart in time (i.e., not successively). This top-down influence likely plays an important role. Even beyond experimental paradigms - we often make connections between similar experiences that occurred far apart in time, and cannot always rely on temporal contingencies. The step between previous work and statistical learning needs to be made clearer and more explicit.

      Although our current approach involves a more implicit statistical learning task, the hypothesized non-monotonic plasticity is a general mechanism that has been and can be applied across tasks. We used temporal contingency to create a situation where representations were concurrently active. However, prior work has used other manipulations, such as linking to a shared associate. We have modified and expanded both the Introduction and Conclusion to emphasize this broader context and highlight directions for future work.

      See Introduction (p. 4, lines 60-74): “The NMPH has been put forward as a learning mechanism that applies broadly across tasks in which memories compete, whether they have been linked based on incidental co-occurrence in time or through more intentional associative learning (Ritvo et al., 2019). The NMPH can explain findings of differentiation in diverse paradigms (e.g., linking to a shared associate: Chanales et al., 2017; Favila et al., 2016; Schlichting et al., 2015; Molitor et al., 2020; retrieval practice: Hulbert & Norman, 2015; statistical learning: Kim, Norman, & Turk-Browne, 2017) by positing that these paradigms induced moderate coactivation of competing memories. Likewise, relying on the same parameter of coactivation, the NMPH can explain seemingly contradictory findings showing that shared associates (Collin et al., 2015; Milivojevic et al., 2015; Schlichting et al., 2015; Molitor et al., 2020) and co-occurring items (Schapiro et al., 2012; Schapiro, Turk-Browne, Norman, & Botvinick, 2016) can lead to integration by positing that — in these cases — the paradigms induced strong coactivation. Importantly, although the NMPH is compatible with findings of both differentiation and integration across several paradigms with diverse task demands, the explanations above are post hoc and do not provide a principled test of the NMPH’s core claim that there is a continuous, U-shaped function relating the level of coactivation to representational change.

      See Introduction (p. 5, lines 83-86): “No existing study has demonstrated the full U- shaped pattern for representational change; that is what we set out to do here, using a visual statistical learning paradigm — specifically, we brought about coactivation using temporal co-occurrence between paired items, and we manipulated the degree of coactivation by varying the visual similarity of the items in a pair.”

      See Conclusion (p. 18, lines 370-374): “From a theoretical perspective, these results provide the strongest evidence to date for the NMPH account of hippocampal plasticity. We expect that a similar U-shaped function relating coactivation and representational change will manifest in paradigms with different task demands and stimuli, but additional work is needed to provide empirical support for this claim about generality.”

      2) Related to the point above - the timecourse over which such statistical learning occurs should be discussed. If I understood correctly, all of the learning occurred in the 6 scanned blocks between the two templating runs. Does the NMPH predict that the hippocampal patterns should immediately reconfigure depending on visual input, or only reconfigure once the participants encode the links between paired stimuli? If the pattern consistent with the NMPH is immediately evident, this would suggest that the present findings, while very convincing, might not be governed by the same mechanisms as integration/differentiation in memory. It seems unlikely that participants would immediately attempt to link these complex visual stimuli, especially as the cover task was orthogonal. To this end, it would be helpful to see any kind of analysis evaluating representations across the 6 statistical learning runs.

      The reviewer correctly describes that learning took place over the six blocks between templating runs. We agree that observing the emergence of representational change across those runs would be ideal. Unfortunately, however, our design is not compatible with this analysis. Because the pairs were learned from deterministic transition probabilities, the onsets of the paired stimuli were correlated in time. When these correlated events are convolved with the slow hemodynamic response, the responses to the paired stimuli cannot be reliably distinguished. Also, the response to the second stimulus in a pair would be affected by visual similarity to its preceding stimulus as a result of adaptation/repetition suppression, confounding comparisons across conditions. These problems are precisely why we employed a pre/post design in which to-be/previously paired stimuli are presented independently in a random order. This allows for the assessment of representational similarity unconfounded with correlated onsets or adaptation.

      Although we cannot provide a sense of the learning trajectory, we now highlight this design decision, acknowledge the limitation, and highlight this as an opportunity for future work with other more time-resolved modalities or with (random) representational assessments interdigitated with the learning blocks.

      See Discussion (p. 17, lines 358-366): “Finally, although analyzing representational overlap in templating runs before and after statistical learning afforded us the ability to quantify pre-to-post changes, our design precluded analysis of the emergence of representational change over time. That is, we could not establish whether integration or differentiation occurred early or late in statistical learning. This is because, during statistical learning runs, the onsets of paired images were almost perfectly correlated, meaning that it was not possible to distinguish the representation of one image from its pairmate. Future work could monitor the time course of representational change, either by interleaving additional templating runs throughout statistical learning (although this could interfere with the statistical learning process), or by exploiting methods with higher temporal resolution where the responses to stimuli presented close in time can more readily be disentangled.”

      3) In the Introduction and Discussion, the authors focus on learning and discuss the integration/differentiation of memories. To establish a link between the reported hippocampal representations and behaviour, it would be helpful to show evidence of a link between neural differentiation and measures of statistical learning such as priming.

      As the reviewer alluded to earlier, our behavioral task is orthogonal to the manipulation of temporal co-occurrence. Accordingly, we do not have any behavioral data on which we could conduct such an analysis. We fully acknowledge the value of this suggestion and now describe this as a limitation and area for future research.

      See Discussion (p. 17, lines 350-357): “Prior work in this area has demonstrated brain- behavior relationships (Favila et al., 2016; Molitor et al., 2020), so it is clear that changes in representational overlap (i.e., either integration or differentiation) can bear on later behavioral performance. However, in the current work, our behavioral task was intentionally orthogonal to the dimensions of interest (i.e., unrelated to temporal co- occurrence and visual similarity), limiting our ability to draw conclusions about potential downstream effects on behavior. We believe that this presents a compelling target for follow-up research. Establishing a behavioral signature of both integration and differentiation in the context of nonmonotonic plasticity will not only clarify the brain-behavior relationship, but also allow for investigations in this domain without requiring brain data.”

      4) From the authors' predictions (and Fig 1), it might follow that participants who show steeper slopes in early visual regions (i.e., higher correspondence to stimulus similarity) pre-learning might also show a stronger cubic trend in the hippocampus. It would be useful to show within-participant analyses to link visual processing regions to hippocampal representations.

      What a fantastic suggestion! To test this prediction, we extracted the linear coefficients in the visual similarity analysis from cortical ROIs (V1, V2, LO, IT, FG, PHC, PRC, and EC) and the cubic model fit in the representational change analysis from the key hippocampal ROI (DG). Linearity during the initial templating run in PRC was associated with stronger non-monotonicity in DG. The full reporting of these analyses is now included in the figure supplements and referenced in the main text.

      See Results, subsection Representational Change (p. 12, lines 228-229): “Interestingly, in an exploratory analysis, we found that the degree of model fit in DG was predicted by the extent to which visual representations in PRC tracked model similarity (see Figure 4—figure supplement 2).”

      Reviewer #2:

      The authors apply neural network modeling and representational analysis of fMRI data to testing the ability of the theoretical framework under the "non-monotonic-plasticity hypothesis" to explain how hippocampal subdivisions represent similarity and distinctiveness between events. They suggest that the dentate gyrus subfield, in particular, was sensitive to the degree of overlap between experiences, and changes how it favored distinctiveness or similarity in its representation of associated stimuli in a non-monotonic manner.

      Overall, the work builds logically on prior evidence from this group focused on how cortical representations influence memory, and leverages a compelling theoretical framework to reconcile some conflict in the literature on how hippocampal representations respond to overlap.

      The primary confusion and concern with the current manuscript was on the theoretical side. It was not wholly clear from the literature review why DG was the predicted locus of the non-monotonic representational relationship observed, and how the findings fit with extant data from rodent work.

      Thank you for providing an opportunity to better motivate our work. We have updated the paragraph justifying our focus on the hippocampus and on DG in particular.

      See Introduction (p. 8, lines 122-147): “We and others have previously hypothesized that nonmonotonic plasticity applies widely throughout the brain (Ritvo et al., 2019), including sensory regions (e.g., Bear, 2003). In this study, we focused on the hippocampus because of its well-established role in supporting learning effects over relatively short timescales (e.g., Favila et al., 2016; Kim et al., 2017; Schapiro et al., 2012). Importantly, we hypothesized that, even if nonmonotonic plasticity occurs throughout the entire hippocampus, it might be easier to trace out the full predicted U-shape in some hippocampal subfields than in others. As discussed above, our hypothesis is that representational change is determined by the level of coactivation — detecting the U-shape requires sweeping across the full range of coactivation values, and it is particularly important to sample from the low-to-moderate range of coactivation values associated with the differentiation ‘dip’ in the U-shaped curve (i.e., the leftmost side of the inset in Fig. 1). Prior work has shown that there is extensive variation in overall activity (sparsity) levels across hippocampal subfields, with CA2/3 and DG showing much sparser codes than CA1 (Barnes, McNaughton, Mizumori, Leonard, & Lin, 1990; Duncan & Schlichting, 2018). We hypothesized that regions with sparser levels of overall activity (DG, CA2/3) would show lower overall levels of coactivation and thus do a better job of sampling this differentiation dip, leading to a more robust estimate of the U-shape, compared to regions like CA1 that are less sparse and thus should show higher levels of coactivation (Ritvo et al., 2019). Consistent with this idea, human fMRI studies have found that CA1 is relatively biased toward integration and CA2/3/DG are relatively biased toward differentiation (Dimsdale-Zucker et al., 2018; Kim et al., 2017; Molitor et al., 2020). Zooming in on the regions that have shown differentiation in human fMRI (CA2/3/DG), we hypothesized that the U-shape would be most visible in DG, for two reasons: First, DG shows sparser activity than CA3 (Barnes et al., 1990; GoodSmith et al., 2017; West, Slomianka, & Gundersen, 1991) and thus will do a better job of sampling the left side of the coactivation curve. Second, CA3 is known to show strong attractor dynamics (‘pattern completion’; McNaughton & Morris, 1987; Rolls & Treves, 1998; Guzowski, Knierim, & Moser, 2004) that might make it difficult to observe moderate levels of coactivation. For example, rodent studies have demonstrated that, rather than coactivating representations of different locations, CA3 patterns tend to sharply flip between one pattern and the other (e.g., Leutgeb, Leutgeb, Moser, & Moser, 2007; Vazdarjanova & Guzowski, 2004).”

      Additionally, the theoretical model (nicely illustrated in the manuscript) is considered in a somewhat biological-network-agnostic level. Some assumption for how context changes over time, how prior representations are maintained over time, etc., are important for non-monotonic relationships between representations and memory to manifest in the model, but the manuscript does not provide much discussion of their plausibility. This was particularly notable in terms of the emphasis given in the fMRI data to different hippocampal subfields, but not much discussion given on whether/why the model framework is static across subfields (in terms of how context and item information are represented and connected).

      We appreciate this nudge to discuss these additional subfield-specific factors; we have added a paragraph to the Discussion that addresses these issues.

      See Discussion (p. 16, lines 318-336): “Although we focused above on differences in sparsity when motivating our predictions about subfield-specific learning effects, there are numerous other factors besides sparsity that could affect coactivation and (through this) modulate learning. For example, the degree of coactivation during statistical learning will be affected by the amount of residual activity of the A item during the B item’s presentation in the statistical learning phase. In Figure 1, this residual activity is driven by sustained firing in cortex, but this could also be driven by sustained firing in hippocampus; subfields might differ in the degree to which activation of stimulus information is sustained over time (see, e.g., the literature on hippocampal time cells: Eichenbaum, 2014; Howard & Eichenbaum, 2013), and activation could be influenced by differences in the strength of attractor dynamics within subfields (e.g., Neunuebel & Knierim, 2014; Leutgeb et al., 2007). Also, in Figure 1, the learning responsible for differentiation was shown as happening between ‘perceptual conjunction’ neurons and ‘context’ neurons in the hippocampus. Subfields may vary in how strongly these item and context features are represented, in the stability/drift of the context representations (DuBrow, Rouhani, Niv, & Norman, 2017), and in the interconnectivity between item and context features (Witter, Wouterlood, Naber, & Van Haeften, 2000). It is also likely that some of the relevant plasticity between item and context features happens across, in addition to within, subfields (Hasselmo & Eichenbaum, 2005). For these reasons, exploring the predictions of the NMPH in the context of biologically detailed computational models of the hippocampus (e.g., Schapiro, Turk-Browne, Botvinick, & Norman, 2017; Frank, Montemurro, & Montaldi, 2020; Hasselmo & Wyble, 1997) will help to sharpen predictions about what kinds of learning should occur in different parts of the hippocampus.

      As such, this review was very positive, and found the methods to be sound and the conclusions to be solid. There was some room for improvement in how the theoretical foundation was presented for the hippocampal subregion fMRI predictions and for the conceptualization of the neural network memory model.

      We agree with the reviewer that more justification of our specific hippocampal predictions was required and we are grateful for their suggestions.

    1. Author Response:

      Reviewer #1:

      This study focuses on how the vmPFC supports delay discounting. The authors tested patients with vmPFC lesions (N=12) and healthy controls (N=41) on a delay discounting (DD) task with two additional conditions: (1) reward magnitude and (2) cues that should evoke episodic future thinking (EFT).

      The authors replicate their previous finding that patients with vmPFC lesions show steeper DD, and report two novel findings: (1) DD in patients is insensitive to reward magnitude, suggesting that vmPFC is critical for reward magnitude to modulate DD; (2) vmPFC patients show normal effects of EFT cues on DD, such that all subjects discounted less in the presence of cues that promote episodic future thinking. These findings have important implications for how vmPFC contributes to delay discounting, as they suggest that vmPFC is not necessary for prospective thinking to affect the evaluation of future rewards.

      1) A potential issue with the EFT finding is that it rests on accepting the null hypothesis of no group differences. However, there are reasons to assume this is not a trivial null result due to a lack of statistical power. Specifically, there is a significant effect of EFT within the vmPFC patient group and there is a significant group difference for the effect of reward magnitude. Assuming comparable power to detect effects of EFT and reward magnitude, it seems unlikely that the non-significant EFT effect is simply a lack of power. In any case, this caveat has to be considered when interpreting the effect.

      We have added a discussion of this caveat on p. 10, which reads: “Before discussing this finding further, we note that it rests on accepting the null hypothesis of no group differences in the EFT effect on DD between vmPFC patients and controls. It is unlikely, however, that this null finding simply reflects a lack of statistical power, for example due to a small sample size. First, the null effect on group differences indeed reflects a significant within-participant effect, with greater regard for future amounts in the EFT compared to the Standard condition in vmPFC patients. Second, together with the preservation of the EFT effect, we found a significant reduction of the magnitude effect in the same vmPFC patient sample. Bayesian analyses confirmed greater evidence in favour of the null compared to the alternative hypothesis regarding group differences in the EFT effect on DD.”

      2) It is somewhat surprising that the authors had such a strong prediction about the absence of group differences for the EFT effect. Based on previous work (Bertossi et al., 2016a, b), one could expect a smaller EFT effect in the VMPFC group. The authors appear to put much weight on the results by Ghosh et al. 2014, which suggest that vmPFC is critical for schema reinstatement. The rationale for this strong prediction is not very clear from the introduction.

      We have now reframed our hypotheses as suggested by the reviewers and the editors. In the Introduction, we now make only the hypothesis of a reduced EFT effect on DD in vmPFC patients, which is based on previous evidence of an EFT impairment in vmPFC patients. We present the hypothesis that vmPFC is critical for schema instantiation only in the Discussion, as an explanation of the null finding on group differences on the EFT effect.

      Thus, p. 5 now reads: “Concerning prospection, previous studies have observed an EFT effect on DD, such that people discount future rewards less steeply if cued to imagine personal future events during intertemporal choice (Peters and Büchel, 2010; Benoit et al., 2011). Considering that vmPFC is implicated in prospection (Schacter et al., 2012) and that vmPFC patients are impaired in EFT (Bertossi et al., 2016a,b; Bertossi et al., 2017), vmPFC patients' DD should remain steep even when EFT cues are provided, because patients may nevertheless fail to construct the vivid future events that might be needed to counteract DD. Thus, we predict a reduced EFT effect on DD in vmPFC patients compared to healthy controls.”

      Reviewer #2:

      Ciaramelli et al. address a timely and theoretically important issue with respect to the functional role of the vmPFC in decision-making more generally, and temporal discounting in particular. Strong points of the paper include 1) a theoretically important research question and 2) much-needed lesion data on two important behavioral effects in temporal discounting: the magnitude effect, and a modulation of discounting via episodic future thinking. Weaker points of the paper include 1) lack of clarity for a number of methodological issues (group comparisons & control group for the AI data, inconsistency analysis) and 2) many remaining open questions with respect to how vmPFC patients might have utilized the EFT cues, and whether different processes were at work compared to controls.

      We thank the reviewer for this positive evaluation of the paper and address the reviewer’s comments below.

      Major points:

      1) The authors note that their interpretation of the preserved EFT effects in the vmPFC patients in terms of e.g. semantic processing remains speculative, but is supported by the finding of intact external details production following vmPFC damage in earlier studies. But was this also the case in the present data set? This remains unclear, because for the AI data, only z-scores relative to some earlier control group (Kwan et al. 2015) are reported (Table 1 and Supplement p. 30). Was this control group matched to the patients? And since the referenced Kwan et al. (2015) paper reports only on six patients (presumably the patients from the Canada site?) - what about the patients from the Italian site, which control group were their AI data compared to?

      The Crovitz data of the Canadian patients are unpublished (the Kwan et al., 2015 paper is not about vmPFC patients, but about 6 MTL patients). We compared them to a sample of 18 age-matched healthy controls, a subset of those included in Kwan et al. (2015). The 4 Italian patients were part of the vmPFC sample tested on EFT (and episodic memory) in Bertossi et al. (2016). We compared their performance with that of the 11 healthy controls from the same study who were age-matched to the patients.

      This is clarified on p. 17, which reads: “The results of the Italian patients (a subset of those included in Bertossi et al. 2016b) were contrasted with those of the 11 healthy controls from the same study (all males; Bertossi et al., 2016b) who were age-matched to the patients (vmPFC patients: M = 47.75, SD = 5.25; healthy controls: M = 41.63, SD = 11.89, t13 = -0.97, p = 0.34). The results of the Canadian patients (unpublished) were contrasted with those of 18 healthy controls (10 males; a subset of those included in Kwan et al., 2015) age-matched to the patients (vmPFC patients: M = 61.00, SD = 9.83; healthy controls: M = 67.94, SD = 13.57, t22 = 1.15, p = 0.26).”

      2) Directly related to my previous point: The methods section states that external details were in the normal range in the vmPFC group (mean z-score for EFT = -.73) but from Table 1 we can see that 8/10 patients in fact exhibit a negative z-score. This suggests that a direct group comparison of the external details scores would very likely reveal a significant group difference. Generally, it would help to report to actual control data here, not just the z-scores, and report the respective group comparisons.

      We now report the Crovitz data in Table 2 and have run two ANOVAs on internal and external details separately in vmPFC patients and controls tested in Italy and in Canada. As the two ANOVAs show, we confirm that both patient groups produced fewer internal (episodic) details but a similar number of external details during EFT (as well as episodic remembering) than healthy controls. Therefore, the previously reported EFT problems for internal (but not external) details in vmPFC patients also apply to the patients tested here.

      P. 17 now reads: “As for the Italian sample, an ANOVA on the details produced with Group (vmPFC patients, healthy controls), Time (Past, Future), and Detail (internal, external) as factors showed a significant effect of Time (F1,13 = 14.66, p = 0.002, partial η2 = 0.53), such that all participants produced more details for past than future events (18.19 vs. 15.37). There were also significant effects of Group (F1,13 = 6.16, p = 0.02, partial η2 = 0.32) and Detail (F1,13 = 9.14, p = 0.009, partial η2 = 0.41), qualified by a Group x Detail interaction (F1,13 = 8.99, p = 0.01, partial η2 = 0.40). Post hoc Fisher tests showed that vmPFC patients produced fewer internal details (11.45 vs. 25.51; p = 0.004) but a similar number of external details than controls (11.39 vs. 11.96; p = 0.89). No other effect was significant (p > 0.31 in all cases). The same ANOVA on the Canadian sample revealed an effect of Group (F1,22 = 17.76, p = 0.0003, partial η2 =20.44), qualified by a significant Group x Detail interaction (F1,22 = 4.72, p = 0.04, partial η = 0.18), again indicating that vmPFC patients produced fewer internal details (10.63 vs. 31.78; p = 0.0003) but a similar number of external details than controls (16.79 vs. 25.65; p = 0.09). No other effect was significant (p > 0.32 in all cases).”

      3) The description of the inconsistency analysis was somewhat unclear. The authors use the procedure suggested by Johnson & Bickel (2008), which makes sense, given the overall analytical approach that focuses on the analysis of indifference points. However, this procedure is based on a comparison of adjacent indifference points. In contrast, the authors are referring to the number of inconsistent choices - this is either a typo, or a different procedure. I think the former, because the reported absolute numbers (e.g. means around 1) and the single subject plots in the supplement appear to reflect the number of inconsistent ID points rather than choices. If this is the case, I disagree with the statement that the "mean number of inconsistent choices was very low" (p. 10) - as this probably reflects the mean number of inconsistent indifference points and not choices, about 1 out of 6 ID points was inconsistent in the vmPFC group, which is a lot.

      We apologize for lack of clarity. Yes, we are referring to indifference points (as in our previous study; Sellitto et al., 2010), not single choices. Inconsistent preferences are defined as “data points in which the subjective value of a future outcome (amount = R) at a given delay (R2) was greater than that at the preceding delay (R1) by more than 10% of the amount of the future outcome (i.e., R2 > R1 + R/10, as in Sellitto et al., 2010).” To avoid confusion, we have now corrected the expression ‘inconsistent choice’ to ‘inconsistent preference’ throughout the paper, and have eliminated the claim about the low number of inconsistent choices in vmPFC patients.

      4) The EFT cues are suggested to help vmPFC patients to "circumvent their initiation problems" (p. 12) but I am not sure I follow this logic. First, the AI procedure typically entails external cues as well, and here vmPFC patients showed impairments (Table 1, but see my point 1 above). Second, some of the cited papers (e.g. Verfaellie et al., 2019) also used specific event cues, and still observed reduced internal details production in vmPFC patients.

      The AI (Crovitz) procedure uses external cues but typically these are words that are not particularly meaningful to the participants (indeed, they are the same for all participants). e.g., Imagine attending a Fourth of July cookout a few years from now; Verfaellie et al., 2019) but, again, these cues are the same for all participants. We used personalized cues, which were events that participants (1) had selected themselves, and (2) had already planned or found them plausible in their future, and therefore presumably were the most self-relevant and familiar to the participants, including patients. We think that these events may have been effective in activating self- and event- relevant schemata. We clarify this point on p. 11, which reads: “We propose, therefore, that subject-specific event cues, which were self-relevant and familiar to the participants because they had been selected by participants themselves, and were already planned or were plausible in their future, acted as external triggers of self- and situation-relevant schemata, helping to circumvent vmPFC patients’ EFT initiation problems. Their intact MTLs allowed them to construct episodic future events, which were then integrated into intertemporal choice, reducing DD.” As we note on p. 14, indeed, vmPFC patients are capable of imagining detailed experiences if they are guided to choose for themselves a specific moment from an extended future event to narrate in detail (Kurczek et al., 2015). Of course, we agree with the Reviewer’s point below that this interpretation is speculative at this point.

      5) One shortcoming with the paper is that no data are available that could inform how vmPFC patients might have utilized the EFT cues, and whether the processes at work might have differed from those in controls. Many points mentioned in the discussion (self-referential processing, semantic processing, activation of schemata, self-initiation vs. external cueing etc.) thus necessarily remain conjecture.

      We agree with the Reviewer, and we admit in several parts of the Discussion that this interpretation is speculative at this point. However, the interpretation that we offer seems the most plausible to us at this time, considering what we know about the role of the vmPFC (vs. the MTL) in event construction and the absence of the EFT effect on DD in MTL patients. We also propose an alternative interpretation, but the pattern of findings on the EFT effect on DD makes it less likely to us. On p. 12, we state, “An alternative interpretation of the DD modulation is that EFT cues simply shifted attention towards the future, or conferred a positive valence to it, as we encouraged positively valenced EFT. If so, however, one should consistently observe an EFT-induced benefit on DD also in patients with MTL lesions, but this is not the case (Kwan et al., 2015; Palombo et al., 2015).”

      Reviewer #3:

      In this manuscript, Ciaramelli et al. examined the decision-making behavior of 12 patients with vmPFC damage in a delay discounting task. The authors carried out two manipulations in this task: 1. They presented participants with small and large offers for both the immediate and delayed reward (magnitude manipulation), 2. They prefaced decisions with a cue prompting participants to vividly imagine an event in their future that was expected to occur at the same delay as the proposed larger offer (episodic future thinking (EFT) manipulation). Compared to age and education matched healthy controls, patients with vmPFC damage showed steeper discounting of delayed rewards, particularly when the amounts offered were large (reduced effect of magnitude). However, like controls, vmPFC damaged patients displayed shallower discounting of delayed rewards following the EFT manipulation.

      The manuscript is clear and concise in its presentation of the results, while still providing a detailed description of the behavior of these patients. This paper is also a good example of how pooling participants from multiple institutions can increase statistical power in a study of patients with focal brain damage targeting a fairly specific cognitive question. The positive results of the study mostly replicate previous findings. While the null result for the EFT manipulation is novel, the finding is hard to interpret. The authors state that they predicted that the EFT manipulation would not change discounting behavior in vmPFC damaged patients a priori despite the deficits of these patients in EFT in previous papers, which are also replicated here. However, I do not know why the authors would design their task in such a way to test for a null result. It is also not clear if this null result is observed for the reason proposed by the authors (that the EFT cues externally activate this process), or if this result is null for some other reason that is not accounted for here. As the authors do not provide a direct test for their hypothesized rationale for predicting this null result, the findings are hard to interpret.

      We agree with the reviewer’s and editor’s point that this paradigm does not allow testing whether subject-specific, personally relevant cues, such as those we used, are indeed effective in externally initiating EFT in vmPFC patients. Therefore, we concur that, for the sake of clarity, this is best presented only as speculative discussion of the preserved EFT effect on DD in vmPFC patients. In the Introduction, therefore, we now formulate only the hypothesis based on previous evidence of impaired EFT in vmPFC patients (e.g., Bertossi et al., 2016a,b, Verfaellie et al., 2019), which would lead to the prediction of a reduced EFT effect in vmPFC patients. We present the hypothesis that vmPFC is critical for schema instantiation only in the Discussion, as an explanation of the null finding on group differences on the EFT effect.

      P. 5 now reads: “Concerning prospection, previous studies have observed an EFT effect on DD, such that people discount future rewards less steeply if cued to imagine personal future events during intertemporal choice (Peters and Büchel, 2010; Benoit et al., 2011). Considering that vmPFC is implicated in prospection (Schacter et al., 2012) and that vmPFC patients are impaired in EFT (Bertossi et al., 2016a,b; Bertossi et al., 2017), vmPFC patients' DD should remain steep even when EFT cues are provided, because patients may nevertheless fail to construct the vivid future events that might be needed to counteract DD. Thus, we predict a reduced EFT effect on DD in vmPFC patients compared to healthy controls.”

      Overall, this manuscript makes a relatively modest contribution to our knowledge about the function of vmPFC during inter-temporal choice. It bolsters previous claims about how vmPFC damage impacts delay discounting and EFT, while not revealing new information about how vmPFC specifically contributes to the processes involved in these behaviors and why damage to this region impacts intertemporal choice in this way.

      We concur with the reviewer that our findings confirm previous evidence that vmPFC is necessary for balanced DD and for EFT. However, we think that our finding of a complete abolishment of the magnitude effect together with a complete preservation of the EFT effect on DD in vmPFC patients configures a remarkable theoretical advancement on the role of vmPFC in intertemporal choice. Indeed, it shows that during intertemporal choice vmPFC is more prominently implicated in reward valuation than in prospection. This finding is important for current theories of intertemporal choice, and is surprising considering previous demonstrations of impaired EFT in vmPFC patients (a finding that was replicated in the current study), and therefore has important implications also for theories relating to the role of vmPFC in EFT. Finally, we note that the paper focuses on one important facet of impulsivity following damage to the vmPFC in humans: steep DD. Our findings, therefore, may inform the clinical management of impulsivity in patients with vmPFC damage or dysfunction, delineating the contextual manipulations that are or are not expected to push the reach of patients' choice into the future.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] My main suggestion would be to expand discussions on the results of twins. To me that is the most interesting part of the present study, which contributed further from previous findings such as Silva et al., 2018.

      We have expanded the discussion of the twins results in the text in both the Results section and the Discussion section [P13❡2]. (We note that while we, too, find the twins results compelling, there are other aspects of the results that are novel, including the tight link to behavioral results.)

      Also, as the authors noted too, "behavioral pattern may vary with task". It would be helpful if the relationship between the present cortical magnification finding and behavioral results could be discussed with further details.

      We have elaborated on this point in the Results section [P7❡2].

      Reviewer #2 (Public Review):

      1) Representation of 45 degrees. The results demonstrate that 45{degree sign} angles are relatively under-represented on the cortical surface, but without a concomitant decline in perceptual performance (Figure 3). While a prior study demonstrated radial asymmetries in cortical magnification (Silva 2018), that prior study did not have the power and resolution to show the clear reduction in surface area around 45{degree sign} that is shown here. This result is one of the more novel findings of the current study, but is not discussed. I looked at the Barbot (2021) paper, and I gather that acuity was tested with a grating that was oriented at 45{degree sign}. Could this property of the stimulus interact with the radial orientation bias that has been shown in perception and cortical response (e.g., Sasaki 2006).

      The cortical surface areas corresponding to the 45° angles reported in the original submission were indeed puzzling and suggestive of some bias in the data or method. We have performed a substantial re-analysis of the results using a now-published dataset of manually-drawn V1, V2, and V3 boundaries (Benson et al., 2021; DOI:10.1101/2020.12.30.424856); our method section was also substantially updated to accommodate this dataset and some new calculations [P16❡3–P18]. Specifically, there are two main changes in the method. First, we now use hand-drawn boundaries of the lower, upper, and horizontal meridians, and of several iso-eccentricity contours, to identify sectors of the V1 map, rather relying on the boundaries found by an automated atlas fit. Second, we now employ a new and more robust method to carve up these sectors into fine-grained regions. The manually labeled boundaries have high inter-rater reliability and their inclusion eliminates any biases that could have derived from the automated boundary-finding method we had previously employed. This re-analysis leaves our previous findings intact and largely unchanged, and it eliminates the apparent mystery of the surface area of the 45° angle, which is no longer under-represented on cortex relative to behavior (Fig. 3).

      Regarding the Barbot et al. study, the 45 deg orientation was chosen so that it would not contaminate the psychophysical measures at the vertical and the horizontal meridians, as discriminability would be better along the vertical meridian for orientations off vertical and along the horizontal meridian for orientations off horizontal; it is possible that the performance at the intercardinal locations is better than if they had used 0° or 90° orientation.

      2) While the correlation in the MZ twins is impressive, I am not sure that it is an independent source of information. One would not want to conclude, for example, that there is a genetic influence specifically for radial asymmetry of the visual cortex. Instead, there may be genetic influences upon the general shape, folding, and functional organization of the cortex as a whole, of which the visual cortex is just one part. It would be informative, for example, if the correlation in MZ twins for visual cortex radial asymmetry is GREATER than the correlation that is observed for any other cortical property (Chen 2013). It would also be informative to examine perceptual data from these twin pairs, but I understand why this is not available.

      Our intention in analyzing the asymmetry correlations between twins was not to suggest that there is a genetic influence limited only to asymmetry, and we have reworded the discussion to further clarify this point [P13❡2–P14]. For context, we have added reports of correlations between other cortical properties [Fig. 4; P8❡1].

      3) I know that these authors have thought carefully about how cortical curvature might influence their measurements. There is the obvious confound that the horizontal meridian is represented in the depth of a sulcus, while the vertical meridian is represented close to the gyral crowns. I would appreciate some consideration in the methods or discussion of why cortical folding can't account for the current results.

      In the original submission we reported only the surface areas of the midgray surface (i.e., the halfway point between pial and white surfaces) as a way to minimize bias of the cortical curvature that might arise on the pial surface (where the gyral crown is expected to have a larger surface area than the sulcal valley) or the white surface (where the opposite is expected). We have now included Figure S1 as a supplement to Figure 2 a re-analysis of the data using both the white and pial surface areas as well as the midgray surface area. Whereas surface areas and their ratios vary numerically depending on which surface is used for analysis, the main trends hold for all 3 analyses (white, midgray, pial): there is more surface area for the horizontal than vertical and for the lower vertical than upper vertical. Unsurprisingly, this re-analysis substantially affects the HVA, which depends on the gyral and sulcal surface areas, but only slightly the VMA, which depends only on gyral surface areas. We have also added text in the Results to address this topic [P7❡1] .

      4) The Silva 2018 paper included a more "fine scale" analysis of cortical magnification as a function of polar angle (Figure 4B). The error bars in this prior report are an order of magnitude larger than in the current measurements, but I would appreciate an evaluation of the degree to which the current measures agree with this prior work.

      We have expanded our discussion of this paper in the text [P12❡1] and have included a supplemental comparison of their data (as digitized from their Fig. 4) with our data (Fig. S4, P34).

      5) The cortical surface representation is described as an "amplification" of asymmetries that are present in the retina. Looking at Figure 5, however, it doesn't appear to me that a multiplicative scaling of the cone or midget RF functions would fit the cortical data. The cortical asymmetries are certainly larger, but they are of a different form with eccentricity. This might be worth acknowledging, and perhaps considering that perceptual measures as a function of eccentricity and polar angle could deepen the correspondence with the cortical data.

      In this instance we used the word “amplification” loosely to mean that the asymmetries in cortex were consistently higher than the asymmetries in the retina, not in the mathematical sense of a multiplicative scale factor. We have now clarified this in the text, and we have expanded the discussion of this point [P10❡1].

      Reviewer #3 (Public Review):

      [...] The conclusions of this paper are mostly well supported by data, but some aspects of data analysis and statistics need to be clarified and extended.

      1) The statistical model on repeated measurements: in the present work, there are lots of repeated measurements recorded (e.g., Figure 1, across angular distance and meridian). It is a need of clear and comprehensive description on the statistical methods to be reported in the method part.

      The data referenced in Figure 1, and, in fact, all psychophysical data we analyzed, are from previous publications in which these details were reported, including analysis using linear mixed models. We have now duplicated the relevant details from these publications in the Methods along with relevant reports of the inter-rater reliability of the V1-V2 boundaries on which the surface area calculations were based. This subsection of the Methods is now titled Statistical Analysis and Measurement Reliability [P21–22].

      2) Measurement reliability: this is a fundamental concept of individual differences, which the present work is based on to assess the link between brain, behavior and genetics. The reliability levels of these measurements should be reported due to the importance of understanding the correlational outcomes. For example, In Figure 3, a surprisingly high correlation was reported (r = 0.96). How we interpret this correlation in terms of the psychometric theory of individual differences. Again, how this correlation was derived from such a setting on the repeated measurements.

      We have added a section on Statistical Analysis and Measurement Reliability in the Methods section to address the topic of reliability [P21–22]. Additionally, we note that the correlation from Figure 3 is a correlation of mean values across subjects using different subject groups for the x and y axes and thus should not be interpreted as a finding about individual differences. We have clarified this fact in the text [P7❡1].

      3) ICC: should be non-negative. In Figure 4, the negative ICCs appeared for DZ twins for some polar angle widths. Please clarify the reason.

      We have clarified our use of an unbiased estimate of the ICC in the Methods and have provided the formulae for our calculations [Eqs. 1–2; P20].

      4) Credit HCP data use: Please visit https://www.humanconnectome.org/study/hcp-young-adult/document/hcp-citations

      We thank the reviewer for catching this oversight and have included the relevant text in the Acknowledgements [P23].

      5) A systems-neuroscience perspective: These is an interesting way of discussing the present findings of the human vision system by looking them at the level of the global brain system (e.g., connectomics), for example, how these vision-related heritable features are related to or implicated for their connectome-level findings (https://pubmed.ncbi.nlm.nih.gov/26891986)?

      We have expanded the Discussion and have included text regarding the previous findings of connectome-level heritability in the visual cortex [P13❡2–P14].

    1. Author Response:

      Reviewer #1:

      *A summary of what the authors were trying to achieve.

      The study takes advantage of the interesting plant genus Leucadendron to compare gene expression between male vs. female in species with more or less sexual dimorphism. This question was addressed in a somewhat comparable manner in only one previous paper by Harrison et al. 2015 across six bird species. The overarching question is the role of natural selection in sexual dimorphism.

      *An account of the major strengths and weaknesses of the methods and results.

      -Beside the genus-wide comparison of whole transcriptomes across related species, which makes in itself a strong dataset, the major strength of the analysis is the phylogenetic framework that allows the authors to track the evolution of sex bias through several tens of million years of evolutionary history. Despite ancestral dioecy in the genus, very few genes show consistent sex bias across several species, with sex-bias being mostly species-specific. Two striking negative results will be of special interest to the community : 1) species with more pronounced sexual dimorphism at the morphological level do not tend to exhibit more pronounced sex-biased gene expression 2) the few genes that do show sex-biased expression were apparently recruited among those with the highest expression variance to begin with, strongly suggesting that sexual selection has not been the main force driving their expression divergence.

      -In my view, the main limitation of the work is the use of leaf rather than reproductive tissues, making the comparison to other studies less straightforward to interpret. It is especially important that the expectations for somatic vs gonadic tissues be made a lot clearer in the text.

      We have added a full paragraph to the Introduction that lays out the expected differences between reproductive and non-reproductive tissues (traits) in the intensity of sex-specific (or sexual) selection, sexual dimorphism and sex-biased gene expression. We have also taken care to state the reproductive or non-reproductive tissues in cited references.

      Also, the fact that a single leaf phenotype is measured (specific leaf area) seems arbitrary : one could imagine sexual dimorphism on many other characteristics, yet they are not considered here. The text on p.324 mentions "striking convergence in aspects of morphological dimorphism across the genus", but there is no way for the reader to appreciate the extent of this convergence. Finally, it would be useful to at least make some mention of the sex-determination system in these species, since the expectations would differ if some of the sex-biased genes were linked to sex chromosomes.

      Indeed, Leucadendron can be sexually dimorphic for many phenotypes ranging from plant architecture to phenology. We cite more specialised studies on this topic throughout the manuscript, including works on physiology, ecology, and trait evolution (convergence). The focus on only two traits (specific leaf area and leaf area) is justified by two arguments: 1) we precluded any ambiguity in correspondence by measuring both morphology and gene expression in the same organ / material, and 2) the focus of this study was on the question whether sex-biased gene expression evolved adaptively, rather than discovering new macroscopic sexually dimorphic traits.

      *An appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The analysis is mostly sound, but I am a bit concerned by the arbitrary threshold used to define SBGE. The text on p.305 says that "This result is extremely robust to the choice of threshold", but 1) the results are not reported so it is impossible for the readers to evaluate the basis of this assertion and 2) it is not clear whether robustness of the other results has been evaluated at all. This aspect clearly deserves more attention.

      We added two new Appendices, and a directory with data uploaded to the Dryad repository to support these assertions. Appendix 1 presents key results under very permissive (no minimum fold-change, uncorrected p-value <= 0.05) and very stringent thresholds (minimum 3-fold change, FDR <= 0.001) for the definition of sex-bias. Appendix 2 shows how the choice of thresholds for the delta-x analysis (the assertion in original submission line 305) affects the conclusion that expression shifts to sex-bias are depleted in signatures of adaptation of expression levels. The patterns and conclusions of our study are generally robust to the choice of threshold to define sex-biased expression.

      *A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      This work will be of interest to the community, as rapid rates of expression evolution would generally be interpreted as the consequence of sex bias, whereas the phylogenetic analysis presented here instead supports the idea that the expression of genes that end up being sex biased were instead intrinsically less constrained to begin with.

      Reviewer #2:

      Scharmann et al. present a study of sex-biased gene expression as a function of sexual dimorphism in leaf tissue in the genus Leucadendron. Comparative studies of sex-biased expression across clades are still relatively rare, and this analysis tests some core findings of a recent paper (Harrison et al. 2015). Overall, I like the analysis and think it could be a valuable addition to the literature on sex-biased genes. This is particularly true given the difficulty of cross-species expression comparisons and the paucity of them in plants.

      However, there are some critical differences between the Harrison paper and the one here, and I think it would be helpful if the authors present them early in the text. Specifically, Harrison et al. (2015) was primarily focused on gonad tissue, which in animals is the site of the vast majority of sex-biased genes. In contrast, the authors here focus on vegetative (leaf) tissue, which is analogous to animal somatic tissue. None of the patterns that Harrison et al. (2015) observed and report from the gonad were evidence in the somatic tissue they assessed. Also, by looking at gonadal tissue, Harrison et al. (2015) focused on the tissue that produces gametes, which are thought to be subject to some of the strongest sexual selection pressures. The fairest comparison would be flower tissue in plants, so I am unsure how much of the Harrison results would be expected to hold up in leaf samples. This doesn't mean the authors should do the analyses they present, just that they should be a little more upfront about what they might reasonably expect to find.

      We have added a full paragraph to the Introduction that lays out the expected differences between reproductive and non-reproductive tissues (traits) in the intensity of sex-specific (or sexual) selection, sexual dimorphism and sex-biased gene expression. We have also taken care to state the reproductive or non-reproductive tissues in cited references.

      There is also a conflation at times in the paper between sexual dimorphism, which the authors can quantify in their leaf samples, and sexual selection. I explain this in more detail below, but to summarize here, I think the expectations for the relationship between sex-biased gene expression and sexual selection versus sexual dimorphism are somewhat distinct.

      We added a new paragraph in the Introduction to clarify the key differences between Harrison et al.s' (2015) reasoning and ours, and the expectations how leaf sexual dimorphism could be related to sexual selection and sex-biased gene expression. We argue that sexual selection is but one of several components of sex-specific selection promoting sexual dimorphism in vegetative organs of plants. Please see also our reply to the more detailed comment below.

      Finally, I am a little concerned that the low numbers of sex-biased genes, expected from leaf tissue, offer limited power for some of the tests the authors want to do. Harrison et al. (2015) had hundreds of sex-biased genes from the gonad, and this power made it possible to detect subtle patterns. The authors have a few dozen sex-biased genes, and this makes it difficult to know whether their negative results are the result of low statistical power. That they find clear associations between pre-sex-biased genes and rates of evolution is quite impressive given this low power.

      Indeed, we found fewer sex-biased gene per species than Harrison et al. (2015), but over all species together we discovered 650 sex-biased genes. For comparisons of the properties and evolution of sex-biased (or pre-sex-biased) versus unbiased genes, this sample size of about 4% of the total 16,194 genes is acceptable. The test for a correlation of sex-biased expression and morphological dimorphism should not be affected by low numbers (or proportions) of sex-biased genes; rather, these numbers or proportions themselves constitute the test. Our RNA-seq and differential expression testing (6 males versus 6 (or 5) females) was certainly powerful enough to discover thousands of sex-biased genes in each species, but these were not found. Furthermore, we have added a new Appendix 1, in which we explore results for a three times larger sample of SBGs (1,973). Although the larger sample of SBGs is obtained by unconventionally lax thresholds to define SBGs, the patterns and conclusions drawn are fully consistent with those from the smaller set of 650 SBGs. No changes made to the main text.

    1. Author Response:

      Reviewer #3:

      A. Summary of what the authors were trying to achieve

      The authors seek to understand how whole-animal behavior is represented in the nervous system. They approach this problem utilizing high-speed volumetric calcium imaging in freely moving nematodes (C. elegans). In recording from a majority of neurons in the head, this approach is state-of-the art in C. elegans and, arguably, far beyond what is likely to be achieved in most other organisms in the foreseeable future. Imaging data are analyzed by training a linear decoder to predict the instantaneous locomotion velocity and body curvature from instantaneous neuronal activity at single neuron resolution.

      B. Major strengths and weaknesses of the methods and results

      The paper has numerous strengths:

      1) State-of-the art simultaneous imaging of brain-wide neuronal activity and unrestrained behavior.

      2) The overall approach has been published in two papers by this group and one from another group, but this is the first paper that actually takes the next logical step: connecting the recordings back to behavior. This is a major strength.

      3) Comparison of neuronal dynamics during locomotion and immobilization in the same worm.

      4) Rigorous data collection and modeling.

      The paper in its current form has a number of weaknesses:

      1) Several of the main findings of the paper seem rather obvious. (i) "We report that a neural population more accurately decodes locomotion than any single neuron (Abstract)". Similarly, "We conclude that neural population codes are important for understanding neural dynamics of behavior in moving animals." (ii) "Our measurements suggest that neural dynamics from immobilized animals may not entirely reflect the neural dynamics of locomotion." Consider rephrasing, as this sentence is almost a tautology: "…neural dynamics in the absence of locomotion may not entirely reflect the dynamics in the presence of locomotion (line 379)." Can these conclusions be rephrased, or put in a more significant context?

      Thank you for this feedback. We have completely rewritten the relevant portion of the discussion to better place our findings in context and better convey the implications.

      "That C. elegans neural dynamics exhibit different correlation structure during movement than during immobilization has implications for neural representations of locomotion. For example, it is now common to use dimensionality reduction techniques like PCA to search for low-dimensional trajectories or manifolds that relate to behavior or decision making in animals undergoing move- ment (Churchland et al., 2012; Harvey et al., 2012; Shenoy et al., 2013) or in immobilized animals undergoing fictive locomotion (Briggman et al., 2005; Kato et al., 2015). PCA critically depends on the correlation structure to define its principal components. In C. elegans, the low-dimensional neural trajectories observed in immobilized animals undergoing fictive locomotion, and the un- derlying correlation structure that defines those trajectories, are being used to draw conclusions about neural dynamics of actual locomotion. Our measurements suggest that to obtain a more complete picture of C. elegans neural dynamics related to locomotion, it will be helpful to probe neural state space trajectories recorded during actual locomotion: both because the neural dy- namics themselves may differ during immobilization, but also because the correlation structure observed in the network, and consequently the relevant principal components, change upon im- mobilization. These changes may be due to proprioception (Wen et al., 2012), or due to different internal states associated with fictive versus actual locomotion."

      And we have rewritten portions of the introduction, for example:

      "There has not yet been a systematic exploration of the types and distribution of locomotor related signals present in the neural population during movement and their tunings. So for example, it is not known whether all forward related neurons exhibit duplicate neural signals or whether a variety of distinct signals are combined. Interestingly, results from recordings in immobile animals suggest that population neural state space trajectories in a low dimensional space may encode global motor commands (Kato et al., 2015) , but this has yet to be explored in moving animals. Despite growing interest in the role of population dynamics in the worm, their dimensionality, and their relation to behavior (Costa et al., 2019; Linderman et al., 2019; Brennan and Proekt, 2019; Fieseler et al., 2020) it is not known how locomotory related information contained at the population level compares to that contained at the level of single neurons. And importantly, current findings of population dynamics related to locomotion in C. elegans are from immobilized animals. While there are clear benefits in studying fictive locomotion (Ahrens et al., 2012; Briggman et al., 2005; Kato et al., 2015), it is not known for C. elegans how neural population dynamics during immobile fictive locomotion compare to population dynamics during actual movement."

      2) The rationale for the decoding exercises seems underdeveloped. Figs. 3-6 are motivated by the question of whether "activity of the neural population might be more informative of the worm's locomotion than an individual neuron." It just seems obvious this will be the case. There might be a missed opportunity, here. Perhaps a stronger motivation would be to ask whether locomotion related signals can be found in the subset of neurons found in the head. The alternative hypothesis would be that head neurons alone are not sufficient, the implication being that the ventral cord and/or tail ganglia must be included.

      We have added rationale for decoding in the results section:

      “...because an effective strategy adopted by the decoder may also be available to the brain, understanding how the decoder works also illustrates plausible strategies that the brain could employ to represent locomotion.”

      And added motivation in the introduction:

      “...Despite growing interest in the role of population dynamics in the worm, their dimensionality, and their relation to behavior (Costa et al., 2019; Linderman et al., 2019; Brennan and Proekt, 2019; Fieseler et al., 2020) it is not known how locomotory related information contained at the population level compares to that contained at the level of single neurons. ”

      The ideas about head vs ventral cord and tail are interesting, but since we are limited in what we can say about signals beyond the head we hesitated to pursue that path.

      3) The logic of how decoding exercises are interpreted also seems underdeveloped: (i) Why isn't the finding of locomotion-related signals in the head a forgone conclusion? After all, the worm's head is literally "carving the furrow" that the rest of the body follows, leading to body curvatures that ought to be correlated with with neuronal activity in the head. Furthermore, a substantial fraction of head neurons are nose and neck muscle motor neurons. These contribute to overall thrust, which in the worm's fluidic regime is proportional to velocity. Thus, as stronger head motor neuron activation would generate more thrust, there a correlation with velocity is expected. (ii) What does it mean to say, "The distribution of weights assigned by the decoder provides information about how behavior is represented in the brain (p. 8)"? Who or what is reading this representation? Is the representation detected by the decoder necessarily in the same or similar language used by the worm's brain? If not, how are the decoder findings significant for understanding locomotion in the worm? (iii) It seems likely that the decoder picks up signals of neurons that causally regulate locomotion, but also signals that follow from it (e.g., efference copy, proprioception, re-entrant signals, etc.). Assuming this is true, again: how are the decoder findings significant for understanding locomotion in the worm? (iv) In what ways, if at all, is the decoder a model for worm locomotion? If it's not a model, how does it improve our understanding of locomotion, or our future ability to construct and informative model?

      Response to items 3 and 4 are combined below.

      4) The Discussion seems to miss key points: (i) What are the main limitations of the approach (paucity of identified neurons, inability of Ca imaging to report inhibition, etc)? (ii) Why are the limitations non-fatal? (iii) What are the broader impacts of the main conclusions? For example, what is this significance of the finding of locomotion representations in the C. elegans nervous system or, indeed, in any nervous system? How do the results illuminate neural mechanisms of behavior?

      We thank the reviewer for posing these thoughtful questions. We have rewritten the discussion to better explore some of the implications of our finding that a linear model works to decode locomotion and we explicitly highlight limitations including those related to:

      • Neural identities: “ Future studies using newly developed methods for identifying neurons (Yemeni et al., 2020) are needed to reveal the identities of those neurons weighted by the decoder for decoding velocity, curvature, or both.”

      • Linear vs nonlinear models: “...This does not preclude the brain from using other methods for representing behavior.”

      • Distinguishing motor commands from signals that monitor: “... the measurements here do not distinguish between neural signals that drive locomotion, such as motor commands; and neural signals that monitor locomotion generated elsewhere, such as proprioceptive feedback”

    1. Author Response:

      Reviewer #1:

      This manuscript shows cell to cell variability in the relative levels of Sox2 and Brachyury (Bra) expression by individual cells within the region of the epiblast containing axial progenitors (the progenitor zone, PZ). Accordingly, some cells express high Bra and low Sox2 levels, others high Sox2 and low Bra and a third group expressing equivalent levels of both transcription factors. They then show that by experimentally promoting high Sox2 expression cells enter neural tube (NT) fates, whereas high Bra brings cells in the progenitor zone to enter the presomitic mesoderm (PSM). The authors then complement these experiments with evaluation of cell movements within the PZ, NT and PSM to show that cells in the NT are much less motile than those in the PZ and PSM. These data led the authors to propose a fundamental role for Sox2/Bra heterogeneity to maintain a pool of resident progenitors and that it is the high cell motility promoted by high Bra levels what pushes cells to join the PSM, whereas high Sox2 levels inhibit cell movement forcing cells to take NT fates. To validate their hypothesis, the authors generated a mathematical model to show that those expression and motility characteristics can indeed lead to axial extension generating NT and PSM derivatives in the proper positions, while keeping a PZ at the posterior end.

      Some specific comments on the manuscript are specified below.

      1) Although the description of cells within the PZ containing different Sox2 and Bra expression ratios is more explicit and quantitative in the present manuscript, this has already been previously reported by different methods including immunofluorescence (e.g., Wymeersch et al, 2016). Similarly, that breaking the Sox2/Bra balance towards high Sox2 or Bra is an essential step to bring the progenitors towards NT or PSM fates has also been previously shown in different ways. These observations are, therefore, not totally new. The novel contribution of this paper is the authors' interpretation that "heterogeneity among a population of progenitor cells is fundamental to maintain a pool of resident progenitors". In this work, however, this conclusion is only supported by their mathematical simulation, as the experiments described in this manuscript are not aimed at homogenizing Sox2/Bra expression levels in the progenitor cells (meaning keeping the double positive feature) but, instead, forcing the progenitors to express Sox2 or Bra alone, which permits evaluation of differentiation routes rather than how to maintain the resident progenitor pool. Interestingly, their alternative mathematical model in which the relative Sox2/Bra levels follow an anterior-posterior gradient (which is actually a feature observed in the embryo) was also successful in producing an extending embryo. This model was not favored by the authors (but see my comment below). According to this model, the progenitor zone could be maintained by a cell pool containing equivalent Sox2/Bra levels; when this balance is broken cells eventually enter NT or PSM routes. Therefore, while expression heterogeneity can be observed in the PZ, I am not sure that the work shown in this manuscript is conclusive enough to claim an essential role of such heterogeneity to maintain the progenitor pool.

      We acknowledge that regional heterogeneity of Sox2 and Bra has been described in the PZ and we made sure that we cite the bibliography including Wymeersch et al, 2016 and Kawachi,2020. Although these papers described different levels of Sox2 and Bra in the PZ, they did not clearly reported and quantified the fact that direct neighboring cells have very different levels of Sox2 and Bra, therefore we believe that our description of a “random-like” pattern of heterogeneity constitutes a real novelty. In the same lines, we are aware of the several studies independently showing that gain or loss of-function of Sox2 or Bra can act on the progenitor decision to join either the NT or the PSM (these references are cited l.70, l.72). However, we believe that our study is the first to test systematically both overexpression and downregulation of Sox2 and Bra on progenitor distribution in the same biological system and to link Sox2/Bra functions to cellular motility.

      Testing the requirements of spatial cell-to-cell heterogeneity to maintain a pool of progenitors is experimentally challenging and even if we were able to homogenize Sox2 and Bra expression, we would have to do it in all progenitors, which is not, so far, technically possible using bird embryo as a model system. We are well aware of these limitations and have toned down claims on the essential role of heterogeneity to maintain progenitor pool. In particular, we have changed the abstract (we removed the last sentence stating that heterogeneity is fundamental to maintain a pool of resident progenitors), as well as the end of the introduction (we removed “while progenitors expressing intermediate/equivalent levels of the two proteins tend to remain resident”). We have pondered our model in the discussion in saying by cell with comparable levels of Sox2 and Bra “could” remain resident (L.370)

      To better apprehend the role of cell-to-cell spatial heterogeneity, we have developed a new mathematical model (Figure 5) which integrates both gradient and random heterogeneity in Sox2/Bra values within the PZ and thus fits better to our biological results. In the new version of the manuscript, we compared this model with a model in which the PZ is fully gradient-like and second one in which it is completely random. These comparisons allow us to describe better what properties random and patterned heterogeneities could bring to the system (Figure 6).

      2) The other main novelty of this manuscript is the idea that differences in cell motility derived from their Sox2 or Bra contents are a major force driving the generation of NT and PSM from the progenitors in the PZ. While there are clear differences between cell motility in the NT and the other two regions, the differences between what is observed in the PSM and PZ is not that high (actually, from the data presented it is not clear that such differences actually exist). However, independently of motility differences, there is no experimental evidence demonstrating that the essential driver of the cell fate choices is motility itself. Differences in cell motility could be just one of the results of more fundamental (and causal) changes in cell characteristics triggered by Sox2 or Bra activity. Indeed, NT and PSM cells are different in many different ways, including adhesion properties, which are normally a major determinant of tissue morphogenesis. Cell motility could, therefore, be one of the factors but it is not clear that it plays the essential role proposed by the authors. (see also next comment).

      Cell motility distributions in the PZ are slightly different from that of the PSM since slower cells were found in the PZ. We agree with the reviewer that this difference might be difficult to see because the average motilities between the two tissues are very similar (Figure 3 and Figure 3-figure Supplement 1). To reveal this difference more clearly we have used a reporter gene for Sox2 and analyze progenitor motility by time lapse imaging. We have specifically tracked GFP positive cells (reporter gene for Sox2) in the PZ and compared them to cells which are not expressing GFP. The result is that Sox2 high progenitors are globally slower than other progenitors clearly revealing heterogeneity in cell movements within the PZ and its relation to Sox2 expression (L.225-232, Figure 3-figure Supplement 1B, video 2).

      We agree that there is no experimental evidence that motility itself is the driver of the cell fate choices. To test if the effect on cell motility is taking place downstream of differentiation events, we have analyzed the expression of markers for mesodermal and neural fate (Msgn1 and Pax6) 7hrs after overexpression of Sox2 and Bra. While Sox2 or Bra overexpression triggers changes on cell motility in this short time window, we did not observe any changes in Msgn1 and Pax6 expression (L.267-274, Figure 4-figure Supplement 2) arguing that the effect on motility is an early consequence of the Bra and Sox2 misexpression. Nevertheless, we are aware that this is not a strict demonstration that the effect on fate are coming from the differential motility only. We have therefore toned down our arguments and changed the title of the manuscript (“....guides destiny by controlling their motility “ has been replaced by “...guides motility and destiny”) .

      The effects on cell motility we observe could be a consequence of Sox2 and Bra effect on adhesion as suggested by the reviewer, this is an interesting possibility that we cannot and don’t want to rule out. The effect on cell adhesion is taken into account in our model and we discuss this hypothesis in the new version of the manuscript (L. 456-459). Identifying the mechanisms underlying the effects of Sox2 and Bra on cell motility is an extremely interesting project we want to pursue but we consider that this aspect goes beyond the scope of the current manuscript.

      3) The authors developed a mathematical model to confirm their hypothesis that Sox2/Bra expression diversity combined with different motility of cells with high, low or intermediate relative levels of Sox2 and Bra expression are the key to guarantee proper axial elongation from the PZ. I am, however, not sure that the model, the way it was designed, actually proves their point. In particular, because it introduces an additional variable that might actually be the essential parameter for the success of the mathematical model: physical boundaries between NT and PSM cells, meaning that cells with high Sox2 or high Bra are unable to mix. As I commented above, this variable reflects a key biological property of the two tissues involved, one epithelial and the other mesenchymal in nature, which might be more relevant that the motility of the cells themselves (e.g. by different cell adhesion properties). How would a model that does not include such physical barriers work? Conversely, how would a model work in which only physical barriers are applied, using similar starting conditions: a prefigured central neural tube (Sox2 high), flanked at both sides by PSM (Brachyury high) and with the PZ (variable Sox2/Bra levels) just posterior to the neural tube?

      We agree that adhesion and non-mixing properties are essential to our models. Because it was not clear in the previous version, we have explained them in more details in the new version of the manuscript (l.295-300 and Appendix 1). To assess their roles, we have made two new simulations one without the regulation of non-mixing /adhesion properties and one without motility control by Sox2/Bra. Both simulations show strong defects in morphogenesis arguing that motility on its own is a key component of the system and that the non-mixing and adhesion properties are also important but not sufficient to drive morphogenesis (Figure 5F). Having the same non-mixing/adhesion and motility properties downstream of Sox2 and Bra in all our models allows us to isolate the phenomena we wish to study: the role of the distribution of cell -to cell heterogeneity in the PZ (Figure 6).

      4) The authors generate two mathematical models, differing in whether they start with a random distribution of Sox2 and Bra expression throughout the PZ or with prefigured opposing Sox2 and Bra expression gradients, somehow resembling the image observed in the embryo. The two models generated structures resembling the elongating embryo, although with small differences in the extension process and the extension rate. After analyzing the behavior of those models, they concluded that the random model fits better with the expectations from the in vivo characteristics in the embryo. I am however not sure that I agree with the authors' interpretation. First, because the gradient model includes a natural characteristic observed in the embryo, which the random model does not. Second, because one of the deciding characteristics, namely the slower extension rate observed in the gradient model, does not necessarily make it worse than the random model, as it is not possible to properly determine which extension rate actually resembles more accurately axial extension in the embryo. Third, because the observation that in the gradient model the PZ undergoes fewer transient deformations and self-corrective behaviour is in my view an argument to favor, instead of to disfavor the gradient model, both because the final result is at least as good as the one obtained with the random model and it is actually not clear that in the embryo the PZ undergoes such clearly visible deformations and self-corrections during axial extension. In addition, the gradient model generates a "pure" PZ (just yellow cells) in the posterior end of the structure, while in the random model the PZ contains some islands of NT cells, which is not what is observed in the embryo. According to the last features, the gradient model seems better than the random model.

      To answer the reviewer’s concern about similarity to the embryo, we have developed a new model that is clearly closer to the biological system because it integrates both the gradient and the random ratio distributions (new Figure 5). Interestingly, by comparing it to the two extreme models (random and gradient), we found that this more “natural” model combines the stability and fluidity brought by the gradient model and the random model, respectively. As pointed out by the reviewer, we found that graded distribution brings more stability to the system with a “purest” PZ. At the opposite, random distribution allows more tissue fluidity and cell rearrangements as well as tissue shape conservation (Figure 6). We want to thank the reviewer for his or her input; we think that the new model and the comparison with the two extreme cases allowed us to reveal more clearly properties that are specific to the two types of spatial distributions and therefore to point out what general morphogenetic properties could emerge from random- like heterogeneity in the embryo.

      Reviewer #2:

      In this manuscript, Romanos et al show firstly that there is extensive cell-to-cell heterogeneity in the relative levels of Sox2 and Bra in the region containing progenitors for neural and paraxial mesoderm, gradually resolving towards high Bra/low Sox2 in the mesoderm or high Sox2/low Bra in emerging neurectoderm. They then show that overexpression of Sox2/morpholino-based inhibition of Bra or vice versa lead cells to favour neurectoderm or mesoderm respectively. Next they show that cells expressing high Bra are more motile than those expressing Sox2, and show using mathematical modelling that these behaviours can explain many aspects of the eventual segregation of Sox2-high neurectoderm and Bra-high mesoderm.

      This interesting and well-presented work leads to the elegant and novel hypothesis that random cell motility induced by Bra and inhibited by Sox2 are sufficient to explain the segregation of NMps towards mesoderm and neurectoderm respectively. The work will be of broad interest to developmental and mathematical biologists interested in the cell biological basis of self-organising cell behaviours. Nevertheless there are some concerns to address in order to solidify the claims in the manuscript.

      1) The section where Sox2 and Bra levels are manipulated (line 152 onwards) is somewhat under-analysed. Results are presented as supporting a model where the two proteins mutually repress each other and lead to segregation of neural (high Sox2) and mesodermal (high Bra) cells. However the data presented does not unequivocally support the claims in the manuscript and would require further clarification.

      In the new version of our manuscript, we give more details on the analysis of Sox2 Bra levels manipulations. In particular, we provide data showing the tissue localization of manipulated cells on transverse sections (L. 192, Figure 2-figure supplement 3). We have also studied the effects of Sox2 and Bra ovexpression on cell fate maturation in the PZ and provide some evidence that progenitors do not yet express differentiation markers as they acquire specific motile properties in response to Sox2 or Bra overexpression (L. 267-273, Figure 4-figure supplement 1). According to our results and to the literature, we revised the text by removing mentions to Sox2 and Bra mutual repression (L 171, L 386, L389).

      2) The mathematical model may be an oversimplification of the role of these two genes in organising a balanced production of neurectoderm and mesoderm.

      In the new version of our manuscript, we have made significant efforts to better explain how non- mixing properties are taken into consideration in our models and thus, hopefully, to avoid an impression of oversimplification. We would like to point out that simulations performed to evaluate the impact of non-mixing properties on the elongation process, indicate that adhesion and non- mixing properties alone cannot account for the morphogenetic events we modelled (new Figure 5F), thus reinforcing the view that regulation of cell motility is a key element in the system. Furthermore, we have designed a new mathematical model, which is closer to the biological system because it integrates both graded and random distribution of Sox2/ Bra values (as observed in vivo) (new Figure 5). As explained above in response to reviewer 1, comparison of this model with our previous models, based on either graded or random distribution of the Sox2/ Bra values, points out the importance of random like cell-to-cell heterogeneity in this morphogenetic process.

      Reviewer #3:

      The manuscript by Romanos and colleagues examines how Sox2 and Brachyury control the behavior and cell fate of neuro-mesodermal progenitors (NMPs) in avian embryos. Using immunohistochemistry, the authors showed that the cells residing in the progenitor zone (PZ) display high variability in Sox2/Bra expression. Manipulation on the levels of the two transcription factors affected NMPs' choice to stay or exit the PZ and their future tissue contributions. This motivated the authors to employ an agent-based computational model and additional functional experiments to explore the importance of Sox2/Bra for cellular motility. The results led the authors to propose that (i) heterogeneity in Sox2/Bra ratio is important for the spatial organization of the PZ and its derivatives and that (ii) Sox2/Bra determine the fate of progenitor cells by controlling cellular movements.

      This is a technically sound report that combines single-cell analysis, in vivo functional experiments, and mathematical modeling to explore the link between cell motility and cell identity. While the model proposed by the authors is intriguing, I found that the study should provide evidence placing Sox2/Bra as primary regulators of cell motility in the context of the PZ. Given the extensively-studied role of these transcription factors in NMPs, it is challenging to decouple cellular behavior from cellular identity during tissue formation. The study would benefit from further demonstration that cell fate commitment is regulated by - and not a regulator of - cell migration of NMPs.

      We have now tested the effect of Sox2 and Bra overexpression on cell identity. We show that, 7 hrs after electroporation (a time at which we observe an effect on cell movement), no modification of the expression of neural (Pax6) and mesodermal (Msgn1) maturating markers. These data thus indicate that the effect on cell motility happens without a major acceleration of the maturation program (Figure 4 figure supplement 2). However, as mentioned in response to Reviewer 1, these experiments are correlative and do not demonstrate that the effect of Sox2 and Bra on neural and mesodermal differentiation programs are going only thought cell motility, therefore we have accordingly toned down our arguments in the new version of our manuscript.

      Strengths and Weaknesses:

      • The idea that heterogeneity in cellular behaviors within a progenitor field may act as a driver of morphogenesis is interesting and nicely supported by the agent-based model.

      We want to thank the reviewer for this comment. We believe that in the new version of the manuscript we go even further by developing a new model (Figure 5) which is closer to reality and by testing the influence of random versus gradient Sox2/Bra distribution on morphogenesis (Figure 6)

      • One of the premises of the model (Fig 4) is that Sox2/Bra ratio determines how much cells move, but this is not clear from the in vivo experiments and seems speculative. A clear demonstration of correlation between Sox2/Bra ratio and cellular motility is necessary for proper support of the model.

      The role of the Sox2 to Bra ratio on PZ cell motility is demonstrated in Figure 4. In the new version of the manuscript, these results are presented before the modelling section, we hope that it would help clarifying any doubt the reader can have on the fact that we do demonstrate clearly a role of Sox2 and Bra in controlling PZ cell motility in vivo.

      • The authors found that manipulation in the levels of the TFs results in changes in NMP motility, but it is not clear if this the cause or a consequence of commitment to a neural or mesodermal fate. Could Bra-High cell moving more because they have been specified to a mesodermal fate? Conversely, Sox2-High cells might migrate less since they get incorporated into the neural tube. Establishing the timing of cell fate commitment is necessary to resolve this issue

      We agree with the reviewer that it is an interesting issue; we have checked for expression of specification markers 7hrs after electroporation of Sox2 and Bra expression vectors, a time point at which electroporated cells did not yet leaved the PZ but have already changed their motility. In these conditions, overexpression of Sox2 and Bra had no discernable effect on expression of the neural marker Pax6 and on the PSM marker Msgn1, respectively (Figure 4 figure supplement 2).

      • The study's impact and novelty depend on the demonstration that the primary function of Sox2/Bra in NMPs is to drive cell movement. This is not sufficiently explored in the study, and there are no proposed mechanisms for how Sox2/Bra modulate cellular behavior.

      We do have shown that Sox2 and Bra act on progenitor motility in vivo (Figure 4). As a mechanism, we propose that Sox2 and Bra could act directly on motility or indirectly by regulating differential adhesion. Cell adhesion control by Sox2/Bra is part of our modeling assumptions and is therefore a hypothesis that will be the subject of future investigations in the lab. This hypothesis is part of the discussion in the new version of the manuscript (L.457).

    1. Author Response:

      Reviewer #1:

      Straub et al., present the first structures of membrane proteins from the XKR family of lipid scramblases. While structures of lipid scramblases from the TMEM16 family have been solved previously, it is the XKR family of proteins that have been identified as the scramblases involved in the dissipation of phosphatidyl-serine asymmetry in the plasma membrane to signal apoptosis. As such, the molecular details of these proteins has been highly sought after. Through the development of a synthetic nanobody that binds to XKR9 from Rattus norvegicus, the authors solved the full-length structure of this small 43 kDa protein by cryo-EM, with a resolution of 3.66 Å. This structure reveals a novel topology, adding to the growing repertoire of membrane protein folds. In addition, they were able to determine the structure of the caspase-3 treated protein at a resolution of 4.3 Å, which cleaves a C-terminal peptide that has been proposed to be involved with scramblase activation. In addition, both structures possess densities that are suggestive of lipids, with densities embedded within the protein core, thus mapping out a putative lipid site or pathway. There has been very little structural information about XKR proteins so far, thus, this work is impactful to the field and pushes forward our ability to investigate a new class of lipid scramblases.

      A limitation of this study is that the structures do not clearly inform on the mechanism quite yet. Unfortunately, transport function was not observable in the reconstituted liposomes, and so the connection between structure and function are limited. Certainly, it can be challenging to reconstitute function from purified proteins, but given that the previous studies of this protein are based on cellular activity, such as the rescue of PS scrambling of XKR8 knockouts by XKR9 and mutant constructs (Suzuki et al., JBC 2014), it is still not clear whether this protein provides the basic unit for transport or whether other components are required. With this, it is unclear whether the caspase cleaved protein informs on a mechanistically active structure. Thus, the paper needs to be clarified by focusing on the novel structure, potential lipid pathways and the difference in the caspase treated vs. full-length structures without speculating on the molecular mechanism.

      We appreciate the comments of the reviewer and agree that we can at this point not demonstrate the function of XKR9 as scrambling unit beyond doubt. We thus toned down all hypotheses concerning molecular mechanisms.

      Reviewer #2:

      In this report by Straub and colleagues, they describe cryo-EM structures of a rat ortholog of XKR9 in full-length and caspase-9 activated states. The structure is a technical achievement due to the small size of XKR9 and provides a first view into this family of proteins, of which three members, XKR4, XKR8, and XKR9, participate in lipid scrambling. The structures are determined in complex with a synthetic monobody, resulting in an interpretable density map. To begin to understand the role of caspase cleavage in activation, a structure is determined following caspase activation. Notably, no changes could be detected in the cleaved form and it thus remains unclear how caspase activates XKR9 or how activated XKR9 mediates lipid scrambling. Overall, these results will be of broad interest and will likely serve as a foundation for future studies into this interesting family of proteins.

      We appreciate the comments of the reviewer but would like to correct that the binder is a synthetic nanobody and not a monobody.

      Reviewer #3:

      This is a characteristically high quality report from the Dutzler lab of the atomic structure (via cryo EM, using synthetic single chain antibody for size enhancement) of a class of membrane proteins that was identified as being important for the exposure of the signaling lipid phosphatidylserine at the surface of apoptotic cells. These Xk-related proteins were proposed as caspase-activated lipid scramblases. The Dutzler paper reveals the structure of the Xkr9 homolog but the data do not allow conclusions about scramblase activity. No activity was detected and the protein in its two very similar conformations before and after caspase treatment offers no obvious clue as to its function. Nevertheless this is an important first step in this nascent field.

      We thank the reviewer for these supportive comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Strengths:

      1. The loss of ciliary GPR161 has a more robust phenotype in specific tissues (i.e., the limbs and face). As a result, the limb data (in Figure 6) and craniofacial data (in Figure 7) are well presented and clear. In these figures, the authors directly compare and highlight differences between primarily two genotypes (wt and Gpr161mut1/mut1 embryos) and quantify the changes (digit number and distance between nasal pits). Overall, these two figures support the existing GPR161 model, showcasing that a loss of ciliary GPR161 results in a tissue-specific loss of GLI3R (Figure 6D) and consequently the development of additional digits (Figure 6E) and craniofacial defects (Figure 7D and 7E).

      Thank you.

      Weaknesses:

      1. There is no data in the paper showing that Gli3 repressor function is affected preferentially compared to Gli Activator function. In Figure 4C, Gli3 FL/R ratios are not different between wt/wt and mut/mut embryos. The data can be explained by the fact that the mutant Gpr161 is a partial loss of function allele and the resultant weaker phenotypes (compared to the full KO) show some tissue specificity. Linking this allele to a specific biochemical mechanism is not justified by the data.

      We have now revised the title of the paper and the discussion emphasizing on these limitations. We have also added a new section in discussion on the limitations of our methods and other optogenetic/chemogenetic methods for generating cAMP in cilia. These limitations arise from the cilioplasm not being strictly restricted from the cytoplasm. Therefore, the second messengers cAMP and Ca2+ are freely diffusible between ciliary and extraciliary compartments (Delling et al., 2016; Truong et al., 2021). A paper published in Cell during revision of this study used optogenetic tools to show that ciliary, but not cytoplasmic, production of cAMP functions through PKA localized in cilia (Truong et al., 2021) to repress sonic hedgehog-mediated somite patterning in zebrafish (Wolff et al., 2003). We have also compared and discussed these results with our study. Our study highlights that the effects of ciliary loss of Gpr161 pools are tissue specific and dependent on the requirements of the tissues on GliR vs GliA in the morpho-phenotypic spectrum. Overall, our results using Gpr161mut1 allele are complementary to the optogenetic study by showing that lack of ciliary Gpr161 pools result in Hh hyperactivation phenotypes arising mainly from lack of GliR, in the limb buds, mid-face and intermediate neural tube.

      1. The authors use an endpoint assay based on overexpression in 293T cells to claim that cAMP production is unaffected by the Gpr161mut allele. However, weak effects (very likely given the weak phenotypes) may not be evident this assay. We also do not know if the mutant allele is defective in some other biochemical function or in localization to other places in the cell. One way to address this is to measure ciliary and extraciliary cAMP in their knock-in cells. In Gpr161mut1/mut1 cells, is ciliary cAMP reduced to levels comparable to Gpr161ko/ko cells? Is extraciliary cAMP unchanged compared to WT cells? Or, is cAMP able to diffuse into the cilia from GPR161mut1 localized to vesicles at the ciliary base (Figure 1B)? Many of the conclusions made in the paper equate a loss of ciliary GPR161 to a loss of ciliary cAMP, but this loss of ciliary cAMP is not definitively shown in the paper.

      As physiological ligands for Gpr161 are currently not known, we are unable to test extraciliary vs ciliary contribution of Gpr161 in cAMP production in a physiological context. Therefore, we resort to overexpression assays for constitutive cAMP production by Gpr161 and Gpr161mut1. Using these assays, we do not find a difference in constitutive activity among these variants.

      As the cilioplasm is not strictly compartmentalized from the cytoplasm, the second messengers cAMP and Ca2+ are freely diffusible between ciliary and extraciliary compartments (Delling et al., 2016; Truong et al., 2021). Thus, in any approach for generating subcellular pools of cAMP, be it genetic, optogenetic or chemogenetic (Guo et al., 2019; Hansen et al., 2020; Truong et al., 2021), extraciliary cAMP could diffuse into ciliary compartments. A recent paper using optogenetic and chemogenetic tools for cAMP production inside cilia or in cytoplasm show that there is free access of cytoplasmic cAMP to intraciliary compartments but is unable to reach critical thresholds in activating PKA (Truong et al., 2021). Thus, we would assume that the extraciliary cAMP produced by extra copies of Gpr161mut1 could diffuse to cilia but is likely to be less effective in activating downstream effectors. In addition, the PKA regulatory subunit-AKAP complexes are fundamentally important in organizing and sustaining PKA catalytic subunit activation to organize localized substrate phosphorylation in restrictive nanodomains (Bock et al., 2020; Zhang et al., 2020). The dual functions of Gpr161 in Gs coupling and as an atypical AKAP (Bachmann et al., 2016) is likely to further restrict cAMP signaling in ciliary or extraciliary microdomains.

      1. Compared to Figures 6 and 7, the data presented in Figures 3 and 5 are very confusing and difficult to interpret. On the one hand, this is understandable, the Gpr161mut/mut phenotypes are complex, and some tissues (like the developing spinal cord) are more resistant to change due to a loss of GliR. On the other hand, the data collected from the numerous genotypes analyzed could be easier to interpret by (i) providing a penetrance of the phenotypes and (ii) quantifying the phenotypes.

      Thank you for all the suggestions. We have now carried out these quantifications or tabulations, which have considerably improved the presentation of the datasets (Table 2 and Figure 5-figure supplement 1). Some of these experiments required additional experimental animals (Table 1), and we have updated the text accordingly.

      Below are a few examples of data that could be improved with quantifications:

      — In Figure 3, the authors are trying to convey that the Gpr161mut allele is partially functional and produces a milder phenotype than the Gpr161ko allele. However, the Gpr161ko/ko, Gpr161mut/ko, and Gpr161mut/mut phenotypes showcased in the figure all look quite severe, and it is difficult to appreciate the differences in the defects fully. An accompanying table summarizing the phenotypes and their penetrance in the affected genotypes would help to convey this point.

      We have added an accompanying Table 2 summarizing the phenotypes and penetrance for the respective genotypes, when present. Please note that rostral malformations such as exencephaly are similar between Gpr161 ko/ko and Gpr161 ko/mut1, whereas Gpr161 mut1/mut1 embryos have mid face widening. In the same line, Gpr161 ko/ko has no forelimbs, whereas Gpr161 ko/mut1 has smaller fore limb buds, whereas Gpr161 mut1/mut1 embryos have polydactyly.

      — In Table 1, the authors note that the Gpr161mut1/mut1 mouse is embryonic lethal by e14.5, but the analysis in Table 1 appears to be incomplete. In the table titled "breeding between Gpr161 mut1/+ parents," the authors indicate that they only assessed one litter of e14.5 and e15.5 embryos. Oddly, the authors note that additional litters were collected, but the embryos were not genotyped because the embryos exhibited no phenotypes. The absence of phenotypes could be due to an absence of viable Gpr161mut1/mut1 embryos; however, the embryos need to be genotyped and a chi-square analysis conducted to verify this. Death can be a measure of phenotype severity, but I think it is important to surmise why the embryos are dying. It is unclear whether the embryos are dying due to the heart defects mentioned in the discussion. If the embryos are dying due to the heart defect, then it would be important to know whether the heart defects are more severe in the Gpr161ko/ko embryos.

      Our apologies for the oversight. We have now analyzed additional timed pregnancies at E14.5, E14.75 and E15.5. We find that the embryonic lethality is seen fully by E14.75. Heart defects in Gpr161 ko/ko embryos are not apparent as they are E10.5 lethal. We do see apparent heart defect phenotypes in Gpr161 ko/mut1 vs Gpr161 mut1/mut1. These defects include pericardial effusion, outflow tract defects, A-V cushion abnormalities and smaller ventricles. These phenotypic descriptions are beyond the scope of the current paper. However, we have mentioned about pericardial effusion in the text and Table 2.

      — In Figure 5, quantifying the progenitor domains would greatly assist in discerning differences between the various genotypes. For example, a quantification would help readers assess differences in NKX6.1 across the various genotypes.

      We have now quantified the differences in Nkx6.1 across genotypes. The data is presented in Figure 5-figure supplement 1.

      On an unrelated note, the PAX7 staining of the Gpr161mut1/ko spinal cord looks very strange because the line adjacent to the image does not accurately represent the dorsal-ventral patterning of PAX7 seen in the image. This image would need to be replaced.

      Our apologies for the oversight. We have now revised this image.

      Reviewer #2 (Public Review):

      The premise of the entire study is predicated on GPR161mut1 failing to target to cilia and being WT in every other aspect. The Gs coupling of GPR161mut1 is examined. The ciliary localization ofGPR161mut1 is carefully assessed by conducting staining not just in WT cells but also in INPP5Ecells where GPR161 ciliary levels are known to be elevated. Another prediction is that GPR161mut1is found in an intermediate biosynthetic compartment. Some insights into the compartment whereGPR161mut1 is found would help interpret the phenotype of the GPR161mut1 animals. It would be important to know whether the GPR161mut1 mimics a pre-cilia targeted GPR161 (say at the plasma membrane) or whether it mimics a post-ciliary exit state (say recycling endosomes). In the past few years, work from the von Zastrow lab and others has shown that GPCRs keep activating their downstream partners after endocytosis from the plasma membrane. If GPR161mut1 were to mimic the post-ciliary exit state of GPR161, it may assume some of the signaling functions of ciliaryGPR161.

      Thank you for all the suggestions. We have now examined and extensively discussed the plausible source of extraciliary Gpr161 in mediating Hh repression. We already showed that Gpr161 localizes to the periciliary recycling endosomal compartment where it localizes in addition to cilia (Mukhopadhyay et al., 2013) and could activate ACs and PKA in proximity to the centrosome. We now show that Gpr161mut1 also localizes to similar compartments (Figure 1-figure supplement 3). We propose that this compartment could promote Gpr161 activity outside cilia in the in vivo settings in GliR formation (please see model in Figure 8D).

      We also compare our results with a recently published paper showing that ciliary, but not cytopasmic, production of cAMP functions through PKA localized in cilia to repress sonic hedgehog-mediated somite patterning in zebrafish (Truong et al., 2021). While this paper is an elegant demonstration of ciliary pools of cAMP in repressing Hh activity despite having no strict compartmentalization exclusively in cilia, it does not capture the roles of ciliary and extraciliary pools of Gpr161-mediated cAMP signaling in different tissues that we show are dependent on the requirements of the tissues on GliR vs GliA in the morpho-phenotypic spectrum.

      A second point that the authors may wish to address is whether GPR161mut1 may fail to enrich in cilia because it is hyperactive and undergoes constitutive exit from cilia. The hypothesis here is thatGPR161mut1 couples to beta arrestin better than WT GPR161. Blocking GPR161mut1 exit via depletion of beta arrestin or BBSome is a simple way to test this hypothesis.

      As advised by the reviewer, we have tested for Gpr161/Gpr161mut1 levels in cilia upon arrestin1/2 or BBSome loss. These experiments show that Gpr161mut1 is not present in cilia in arrestin1/2 (Arrb1/2) double ko MEFs (Figure 1-figure supplement 1) or upon RNAi of BBS4 (Figure 5-figure supplement 2). We previously also showed that knockdown of the 5’phosphpatase INPP5E that causes accumulation of Gpr161 in cilia does not show any accumulation of Gpr161mut1 in cilia. Based on all these experiments, we surmise that Gpr161mut1 does not transit through cilia.

      Finally, it would be good to learn about the levels of expression of GPR161mut1 compared to WTGPR161 using immunoblotting. If GPR161mut1 were to be expressed at much higher levels than WTGPR161, it may compensate for its lack of ciliary localization by elevated total cellular activity.

      We were unable to determine protein stability of the mutant receptor in the Gpr161mut1 embryos due to technical constraints in immunoblotting for endogenous levels. However, we note Gpr161mut1 in vesicles surrounding the base of cilia (Figure 1B) and constitutive cAMP signaling activity (Figure 1G, Figure supplements 1-3) in stable cell lines, suggesting that protein levels and activity of the mutant were comparable with wild type Gpr161. As suggested by the reviewer, we also tested LAP-tagged Gpr161mut1protein levels by tandem affinity purification and immunoblotting, with respect to LAP-tagged Gpr161wt in MEFs stably overexpressing these variants. We noted similar immunoblotting pattern from receptor glycosylation in both variants (Figure 2-figure supplement 2).

    1. Author Response:

      Reviewer #3 (Public Review):

      This manuscript was built on their recent observation that Schwann cell (SC)-specific loss of the mitochondrial protein Prohibitin-1 results in a rapid, progressive demyelinating peripheral neuropathy in mice associated with mitochondrial dysfunction. Although several mechanisms have been well-studied in SCs, the potential novelty here is establishing those pathways as downstream effectors of mitochondrial dysfunction in SCs. The authors provide a comprehensive evaluation of these pathways following the loss of SC Prophibitin-1 and identify JUN and mTORC1 as potential mediators of myelin disruption. This manuscript includes a substantial amount of data. However, some data are not directly related to the primary mechanistic conclusions. In addition, the manuscript relies heavily on descriptive, rather than mechanistic, data regarding the roles for JUN and mTORC1. Specific issues to be addressed are listed below:

      Thank you for the detailed comments and careful analysis of our manuscript!

      1) Figure 1: The authors suggest that increased JUN expression and mTORC1 activation are associated with the demyelinating in Phb1-SCKO mice with "peaking around P40 - P60" (Line 82). However, it appears the most profound effects on number of myelinated and demyelinated axons were observed at P90. Interestingly, immunoblots for JUN and mTORC1 targets suggest that increases in these signaling pathways are much greater at P20 and P40 when compared to P90. This may suggest that JUN and mTORC1 are important for early demyelination, but other mediators play a more prominent role in chronic changes. It would be nice to have data from a time point between P40 and P90 to further understand the time course of JUN and mTORC1 changes. If not, the authors should discuss these possibilities in further detail.

      Thank you for your suggestion. Following your comment, we included the P60 time point, which is now reported on Figure 1–figure supplement 2. We found that, at P60, the magnitude of the relative change (Phb1-SCKO vs Controls) is roughly the same as at P40. In our previous publication (Della-Flora Nunes et al., 2021), we conducted a careful analysis of the time course of demyelination and we found that the number of myelinated and demyelinated axons in nerves of Phb1-SCKO is roughly the same at P60 and P90 (Fig. 1e of the referred manuscript). Therefore, we believe that demyelination peaks between P40 and P60, and that mTORC1 and c-Jun activation is in line with the demyelination phenotype of Phb1-SCKO.

      2) Figure 4: Since these are teased nerve fibers, not adjacent sections, please describe the detailed methods for immunofluorescence detection of DAPI and protein targets (JUN, P0, MBP, p-S6).

      Thank you for your comment! We have now included an expanded description of the immunofluorescence method used in our analysis.

      3) For Figure 2 and Figure 3, the authors state "mTORC1 and JUN may be activated in different stages of the SC response to mitochondrial damage, with mTORC1 preceding JUN temporally" (Lines 293 - 294). However, the data presented here are somewhat confusing for this conclusion. The immunoblots provided in Figure 1 suggest a similar time course for both JUN and mTORC1 activation after Phb1 loss in SCs at both P20 and P40. However, in Figure 3, teased nerve from Phb1-SCKO at P40 shows reduced JUN but not p-56 expression. The authors may consider repeating the PhAM-DAPI-JUN/p-S6 studies at the P20 and P90 time points to clarify this issue.

      Thank you for your suggestion. We repeated the co-staining experiments at P90 and the new data is reported on Figure 3 – figure supplement 1. These data go in line with our previous results at P40, suggesting an association of mitochondrial damage with c-Jun expression, but not with p-S6. Unfortunately, mitochondrial loss is not visible at P20, so a similar analysis cannot be carried out at this time point. However, due to this comment and suggestions from Reviewer #2, we have removed inferences to which of these pathways is activated first since our experiments do not allows us to reach a firm conclusion. Our hypothesis would be that mTORC1 is activated earlier in the presence of subtler mitochondrial dysfunction, while high c-Jun expression happens later, following mitochondrial loss and preceding demyelination.

      4) Figure 7: Significant recovery of the demyelinating phenotype and nerve conduction velocity were noted after blockade of the mTORC1 pathway using rapamycin in Phb1-SCKO mice. However, this did not result in recovery of CMAP or overall functional improvement using the Rotarod assay. Given that demyelination was reversed, does this suggest that an important trophic function of SC mitochondria for associated axons is disrupted in Phb1-SCKO mice? Alternatively, could rapamycin delivery at P20 be too late to rescue degenerating axons, leading to incomplete functional recovery? The authors should discuss these possibilities in further detail.

      Thank you very much for this comment! In Phb1-SCKO mice, clear axonal degeneration seems to happen fast; so, it is challenging to detect these events. According to our data on Figure 7, Phb1-SCKO mice treated with rapamycin from P20 to P40 showed a trend towards amelioration of axonal degeneration in tibial nerves at P40 as assessed in semithin sections. To explore this data in more detail, but also to substantiate the data on rescue of demyelination by rapamycin, we now performed an analysis of the same tissue in electron microscopy. Our new results reported in Figure 7 – figure supplement 2 suggest that, although rapamycin is efficient at reducing the demyelination in Phb1-SCKO, it did not significantly alter the axonal degeneration as quantified from the electron micrographs. Therefore, we believe that the main beneficial effect of rapamycin on nerve conduction velocity is mediated by its capacity to prevent the demyelination on Phb1-SCKO mice. However, we cannot entirely rule out the possibility that maintenance of myelin sheaths by rapamycin can also have a small indirect effect on axon survival (which could be what we picked up in our previous quantification from semithin images). We adjusted the results and discussion sections to convey that the effect of rapamycin on axonal integrity is small or even non-present in our paradigm. We also believe that earlier inhibition of mTORC1 (before P20) could be beneficial to Phb1-SCKO mice. However, mTORC1 is known to be essential for SC proliferation during development, and, therefore, an earlier treatment could also affect myelin formation. This is one of the main reasons guiding our choice for the starting point of rapamycin application.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The study provides convincing evidence that gap junctions are important for glutamatergic synaptogenesis and dendritic arbor development, but how this happens is less clear. As I was reading the paper, I thought the data support the synaptotrophic hypothesis, which states that synapse formation facilitates dendritic arbor development, and that they added an essential early step that gap junctions are required for development of glutamatergic synapses. This would suggest that the sequence of local events is: gap junction formation -> synaptogenesis->branch extension, repeat. The authors suggest an alternate sequence: gap junction formation -> branch extension-> synaptogenesis, repeat. I don't think the current data allow us to choose one or the other of these options and the implied causality, but the study clearly demonstrates a role for gap junctions in the process, which was the goal of the study. Given this ambiguity, the discussion should be modified to accommodate both interpretations, or to explain why one interpretation is favored over the other.

      We agree that the sequence can be either gap junction formation→ synaptogenesis→ branch extension, repeat or gap junction formation → branch extension→ synaptogenesis, repeat. We do not have direct evidence to favour one hypothesis versus the other. However, if synaptogenesis precedes branch extension, and the stunted arbor in gjd2b mutants is a result of reduced AMPAR synapses, one might expect to see increased branch retractions as well. As demonstrated by (Haas et al., 2006), loss of AMPAR synapses leads to more dynamic dendritic branches, which are added and retracted at faster rates compared to wild type. This is not what we see. Infact, in gjd2b mutants, branch elongations were reduced and branch retractions were not affected at all. In addition, when we expressed Gjd2b in wild type PNs and imaged in 5-minute windows, branches containing Gjd2b punctum elongated more; the retractions were not affected (Figure 7C). These observations lead us to favor the Gjd2b→ branch elongation→ synaptogenesis view, but we acknowledge that without a direct assay, the two scenarios cannot be disambiguated. We have added the above to the discussion section (line 440).

      The implied mechanistic link between camk2 transcript expression and pharmacological inhibition of CaMKII enzymatic activity on dendritic arbor growth is not convincing to me. It is clear that the transcript observation is unexpected and suggests that somehow interfering with gjd2b affects camk2 transcript expression. Perhaps other synaptic proteins are affected as well. This point would be worth commenting on. But transcript level does not necessarily correlate with protein level or function, particularly for a calcium activated kinase, which is itself tightly regulated in terms of protein expression and function by multiple mechanisms. The main issue concerns causality. The authors state that the gjd2b regulates glutamatergic synaptogenesis by reducing CaMKII levels. The authors do not provide evidence for this statement of cause and effect.

      In the revised manuscript, we have removed claims of causality between observed camk2 expression levels and glutamatergic synaptogenesis. (lines 42, 327, 349, 475). We have inserted the following sentence into the discussion (Line 462):

      “Further experiments are required to verify whether this increase in expression level of CaMKII isoforms translates to increased enzymatic activity.”

      Reviewer #2 (Public Review):

      [...] Strengths:

      1) The sheer amount of work that has gone into this paper is impressive. Each technique is tedious, time consuming and labor intensive so it is quite impressive that the authors have a substantial number of Ns for their experiments. All the experiments and analysis have been performed carefully and the data is of high quality. 2) The authors have done a thorough job of generating and characterizing Gjd2b mutant fish which will be useful for the entire zebrafish neuroscience community.

      We thank the reviewer for these very nice comments about our work.

      *Weaknesses:

      1) Overall, while the experiments and data are clearly presented, several experimental results have significant technical limitations, which in turn open them up to alternative explanations which cannot be easily ruled out based on current data.*

      We agree that there are technical limitations and have discussed these in the “Supplementary File 4” document.

      2) Knocking out gap-junctions will affect spontaneous activity in early development which is propagated via gap junctions. Given that spontaneous activity is likely dampened in Gjd2b knockout fish, a substantial concern is that effects that the authors attribute to the absence of gap junction mediated activity could equally likely be a consequence of homeostatic changes in synaptic input. One possible way to alleviate this issue is to perform transplant experiments from mutant fish to wild type fish which ensures that the rest of the circuit is unaffected.

      The rescue experiment we performed is akin to transplant experiments, in the sense that we expressed Gjd2b in single Purkinje neurons in the mutant background. This way, the rest of the circuit is unaffected and remains Gjd2b null while only the neuron expressing Gjd2b is rescued.

      3) Rescuing Gjd2b is an interesting experiment, but its unclear how functional electrical synapses form by expressing the functional protein in only one neuron.

      We speculate that this can be due to heterotypic gap junctions formed between Gjd2b and other connexins expressed on the coupled neuron. Such heterotypic gap junctions have been documented in the Mauthner and CoLo neurons in larval zebrafish (Miller et al., 2017). (Line 300).

      4) The authors suggest that dendritic growth is reduced because of the absence of gap junctions, but its unclear whether the reduced dendritic growth is simply a consequence of fewer excitatory synapses, and thus a downstream consequence of the absence of gap junctions rather than specific information being transmitted through gap junctions.

      This point was raised by Reviewer #1 as well. Please see response above.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] I have two overarching concerns regarding the Results as reported:

      1) In their preregistration the authors make specific hypotheses regarding TMS effects on the scene-only condition and stated that their plan was to include a 3-level factor of stimulus type (object-related, context-related, scene-only) in their ANOVAs. However the scene-only condition has not been included in the statistics. A justification for this alteration should be provided or the statistics should be run as originally planned.

      The pre-registered statistics for the scene-alone condition in the OPA experiment are now included in the manuscript (p.9-10). The relevant figure (Figure 3) has also been updated such that the scene-alone condition results are now in the main text rather than the Supplement. Results confirmed our predictions, showing a reduction of scene-alone performance when OPA was stimulated 160-200 ms after stimulus onset. Note that the scene-alone condition was only included in the pre-registration of the OPA experiment, which is why we had not reported the corresponding statistics previously. (This condition was not relevant for the LOC and EVC experiments.)

      2) All participants were screened and only included in the study if TMS stimulation of the relevant area produced a reduction in object recognition. More detail on the specific procedures used should be provided. The authors should clarify which SOAs were used as part of the screening and how many participants were excluded based on this screening. The use of this screening procedure should be flagged in the main text so that the reader can interpret the results accordingly.

      We now introduce the screening procedure in the main text and point the reader to a recent publication that documents the methods and results of this experiment (Wischnewski & Peelen, J Neurosci 2021). The screening experiment followed the design of Dilks et al. (J Neurosci 2013), stimulating OPA and LOC using 5 TMS pulses at a rate of 10Hz (i.e., no SOAs were used). No participants were excluded – all participants were assigned to one of the three conditions (OPA, LOC, EVC). This is now more clearly explained in the manuscript.

      3) Based on the fact that TMS to LOC and EVA disrupts performance >150ms after stimulus onset the authors conclude that this reflects the role of feedback from scene-selective areas. Can the authors really exclude alternative possibilities? Would the same results not be expected if areas like LOC and EVA exhibit recurrent activity perhaps reflecting continued processing of a representation of the stimulus held in iconic memory? Similarly, the authors conclude that the longer latency of the TMS effects on LOC in the context-based vs object-based condition reflects the role of feedback. But the object stimulus is degraded in the context-based condition so could it not be that LOC remains active over longer periods of time to support a more difficult discrimination?

      We have added a paragraph to the Discussion section in which we discuss the alternative interpretation of local recurrence (p.13-14).

    1. Author Response:

      Reviewer #1:

      The manuscript “A computationally designed fluorescent biosensor for D-serine" by Vongsouthi et al. reports the engineering of a fluorescent biosensor for D-serine using the D-alanine-specific solute-binding protein from Salmonella enterica (DalS) as a template. The authors engineer a DalS construct that has the enhanced cyan fluorescent protein (ECFP) and the Venus fluorescent protein (Venus) as terminal fusions, which serve as donor and acceptor fluorophores in resonance energy transfer (FRET) experiments. The reporters should monitor a conformational change induced by solute binding through a change of the FRET signal. The authors combine homology-guided rational protein engineering, in-silico ligand docking and computationally guided, stabilizing mutagenesis to transform DalS into a D-serine-specific biosensor applying iterative mutagenesis experiments. Functionality and solute affinity of modified DalS is probed using FRET assays. Vongsouthi et al. assess the applicability of the finally generated D-serine selective biosensor (D-SerFS) in-situ and in-vivo using fluorescence microscopy.

      Ionotropic glutamate receptors are ligand-gated ion channels that are importantly involved in brain development, learning, memory and disease. D-serine is a co-agonist of ionotropic glutamate receptors of the NMDA subtype. The modulation of NMDA signalling in the central nervous system through D-serine is hardly understood. Optical biosensors that can detect D-serine are lacking and the development of such sensors, as proposed in the present study, is an important target in biomedical research.

      The manuscript is well written and the data are clearly presented and discussed. The authors appear to have succeeded in the development of D-serine-selective fluorescent biosensor. But some questions arose concerning experimental design. Moreover, not all conclusions are fully supported by the data presented. I have the following comments.

      1) In the homology-guided design two residues in the binding site were mutated to the ones of the D-serine specific homologue NR1 (i.e. F117L and A147S), which lead to a significant increase of affinity to D-serine, as desired. The third residue, however, was mutated to glutamine (Y148Q) instead of the homologous valine (V), which resulted in a substantial loss of affinity to D-serine (Table 1). This "bad" mutation was carried through in consecutive optimization steps. Did the authors also try the homologous Y148V mutation? On page 5 the authors argue that Q instead of V would increase the size of the side chain pocket. But the opposite is true: the side chain of Q is more bulky than the one of V, which may explain the dramatic loss of affinity to D-serine. Mutation Y148V may be beneficial.

      Yes, we have previously tested the mutation of position 148 to valine (V). We have now included this data in the paper as Supplementary Information Figure 1 (below). The fluorescence titration showed that the 148V variant displayed poor D-serine specificity compared to Q148 at the same position (the sequence background of the variant was F117L/A147S/D216E/A76D. Thus, Q was superior to V at this position and V was not taken forward for further engineering. In the text, we meant that Q would increase the size of the side chain pocket relative to the wild-type amino acid, Y. We can see that this is unclear and have updated this sentence.

      Supplementary Figure 1. Dose-response curves for F117L/A147S/Y148V/D216E/A76D (LSVED) with glycine, D-alanine and D-serine. Values are the (475 nm/530 nm) fluorescence ratio as a percentage of the same ratio for the apo sensor. No significant change is detected in response to glycine. The KD for D-alanine and D-serine are estimated to be > 4000 mM based on fitting curves with the following equation:

      2) Stabilities of constructs were estimated from melting temperatures (Tm) measured using thermal denaturation probed using the FRET signal of ECFP/Venus fusions. I am not sure if this methodology is appropriate to determine thermal stabilities of DalS and mutants thereof. Thermal unfolding of the fluorescence labels ECFP and Venus and their intrinsic, supposedly strongly temperature-dependent fluorescence emission intensities will interfere. A deconvolution of signals will be difficult. It would be helpful to see raw data from these measurements. All stabilities are reported in terms of deltaTm. What is the absolute Tm of the reference protein DalS? How does the thermal stability of DalS compare to thermal stabilities of ECFP and Venus? A more reliable probe for thermal stability would be the far-UV circular dichroism (CD) spectroscopic signal of DalS without fusions. DalS is a largely helical domain and will show a strong CD signal.

      We agree that raw data for the thermal denaturation experiments should be shown and have included this in the supporting information of an updated manuscript (Supplementary Data Figure 7). The data plots ECFP/Venus fluorescence ratio against temperature. When the temperature is increased from 20 to 90 °C, we observe two transitions in the ECFP/Venus fluorescence ratio. The fluorescent proteins are more thermostable than the DalS binding protein, and that temperature transition does not vary (~90 °C); thus, the first transition corresponds to the unfolding of the binding protein and the second transition to the unfolding or loss of fluorescence from the fluorescent proteins. This is an appropriate method for characterising the thermostability of the binding protein in the sensor for two main reasons. Firstly, the calculated melting temperature from the first sigmoidal transition changes upon mutation to the binding protein in a predictable way (e.g. mutations to the binding site/protein core are destabilising), while the second transition occurs consistently at ~ 90 °C. This supports that the first transition corresponds to the unfolding of the binding protein. Secondly, characterising the stability of the binding protein in the context of the full sensor is more relevant to the end-application. Excising the binding domain and testing that in isolation would results in data that are not directly relevant to the sensor. The absolute thermostabilities for all variants can be found in Table 1 of the manuscript.

      Supplementary Figure 7. The (475 nm/530 nm) fluorescence ratio as a function of increasing temperature (20 – 90 °C) for key variants in the engineering trajectory of D-serFS. Values are normalised as a percentage of the same ratio for the sensor at 20 °C and are represented as mean ± s.e.m. (n = 3). The first sigmoidal transition in the data changes upon mutation to the binding protein while the second transition begins at ~ 90 °C for all variants. The second transition is not observed in full as the upper temperature limit for the experiment is 90 °C.

      3) The final construct D-SerFS has a dynamic range of only 7%, which is a low value. It seems that the FRET signal change caused by ligand binding to the construct is weak. Is it sufficient to reliably measure D-serine levels in-situ and in-vivo?

      First, we have modified the sensor, which now has a dynamic range of 14.7% (Figure 5, below). The magnitude of the change is reasonable for this sensor class; they function with relative low dynamic range because they are ratiometric sensors, i.e. they are accurate even with low dynamic range because of their ratiometric property. For example, the Gly-sensor GlyFS published in 2018 (Nature Chem. Biol.) has one of the highest dynamic ranges in this sensor class of only ~28%. The Glu sensor described by Okumuto et al., (2005) (PNAS, 102, 8740) has a dynamic range of ~9%. So, the FRET change is not a low value for ratiometric sensors of this class (which have been used very effectively for over a decade). Most importantly, the data from experiments with biological tissue and in vivo (Fig. 6) demonstrate a detectable (and statistically significant) response to changes in D-serine concentration in tissue.

      Figure 5. Characterization of full-length D-serFS. (A) Schematic showing the ECFP (blue), D-serFS binding protein (D-serFS BP; grey) and Venus (yellow) domains in D-serFS. The C-terminal residues of the Venus fluorescent protein sequence are labelled, showing the truncated (top) and full-length (bottom) C-terminal sequences. The underlined amino acids in truncated D-serFS represent residues introduced from the backbone vector sequence during cloning. Represents the STOP codon. (B) Sigmoidal dose response curves for truncated and full-length D-serFS with D-serine (n = 3). Values are the (475 nm/530 nm) fluorescence ratio as a percentage of the same ratio for the apo sensor. (C) Binding affinities (M) determined by fluorescence titration of truncated and full-length D-serFS, for glycine, D-alanine and D-serine (n = 3).*

      In Figure 5H in-vivo signal changes show large errors and the signal of the positive sample is hardly above error compared to the signal of the control.

      We have removed the in vivo data. Regardless, the comment is incorrect. Statistical analysis confirms that there is no significant change in the control (P = 0.08411), whereas the change for the sample with D-serine was significant to P = 0.00998.

      “H) ECFP/Venus ratio recorded in vivo in control recordings (left panel, baseline recording first, control recording after 10 minutes; paired two-sided Student’s t-test vs. baseline, t(6) = -2.07,P = 0.08411; n = 6 independent experiments) and during D-serine application (right panel, baseline recording first, second recording after D-serine injection, 1 mM; paired two-sided Student’s t-test vs. baseline, t(3) = -5.85,P = 0.00998; n = 4 independent experiments). Values are mean +- s.e.m. throughout. **P < 0.01.”

      Figure 5G is unclear. What does the fluorescence image show?

      We have removed the in-vivo data from the manuscript. However, Figure 6 in the original manuscript shows a schematic of how the sensor is applied to the brain for in-vivo experiments (biotin injection, followed by sensor injection and then imaging). The fluorescence image shows the detected Venus fluorescence following pressure loading of the sensor into the brain.

      Work presented in this manuscript that assesses functionality and applicability of the developed sensor in-situ and in-vivo is limited compared to the work showing its design. For example, control experiments showing FRET signal changes of the wild-type ECFP-DalS-Venus construct in comparison to the designed D-SerFS would be helpful to assess the outcome.

      Indeed, the in situ and in vivo work was never the focus of the study, which is already a large paper. To avoid confusion, the in vivo work is now omitted and the in situ work is present to show proof, in principle, that the sensor can be used to image D-serine. We reiterate – this is a protein engineering paper, not a neuroscience paper.

      4) The FRET spectra shown in Supplementary Figure 2, which exemplify the measurement of fluorescence ratios of ECFP/Venus, are confusing. I cannot see a significant change of FRET upon application of ligand. The ratios of the peak fluorescence intensities of ECFP and Venus (scanned from the data shown in Supplementary Figure 2) are the same for apo states and the ligand-saturated states. Instead what happens is that fluorescence emission intensities of both the donor and the acceptor bands are reduced upon application of ligand.

      We thank the reviewer for bringing this to our attention. The spectra were not normalised to account for the effect of dilution when saturating with ligand, giving rise to an observed decrease in emission intensity from both ECFP and Venus. We can also see how the figure is hard to interpret when both variants are displayed on the same axes, so we have separated them in an updated figure shown below and normalised the data as a percentage of the maximum emission intensity from ECFP at 475 nm. This has been changed in the supporting information of an updated manuscript. Hopefully it is now clear that there is a ratiometric change upon addition of ligand.

      Figure 3. Emission spectra (450 – 550 nm) of (A) LSQED and (B) LSQED-T197Y (LSQEDY) upon excitation of ECFP (lexc = 433 nm), normalised to the maximum emission intensity from ECFP (475 nm). For all sensor variants, the FRET efficiency decreases in response to saturation with D-serine (A, B; orange), leading to decreased emission from Venus (530 nm) relative to ECFP (475 nm). When comparing the apo states of LSQED and LSQEDY (A, B; dark green), it can be seen that the T197Y mutation results in a decreased Venus emission (lower FRET efficiency). This suggests a shift in the apo population of the sensor towards the spectral properties of the saturated, closed state and explains the decreased dynamic range of LSQEDY compared to LSQED. Values are mean ± s.e.m (n = 3).

      Reviewer #2:

      The authors describe the development and use of a D-Serine sensor based on a periplasmic ligand binding protein (DalS) from Salmonella enterica in conjunction with a FRET readout between enhanced cyan fluorescent protein and Venus fluorescent protein. They rationally identify point mutations in the binding pocket that make the binding protein somewhat more selective for D-serine over glycine and D-alanine. Ligand docking into the binding site, as well as algorithms for increasing the stability, identified further mutants with higher thermostability and higher affinity for D-serine. The combined computational efforts lead to a sensor for D-serine with higher affinity for D-serine (Kd = ~ 7 µM), but also showed affinity for the native D-alanine (Kd = ~ 13 uM) and glycine (Kd = ~40 uM). Molecular simulations were then used to explain how remote mutations identified in the thermostability screen could lead to the observed alteration of ligand affinity. Finally, the D-SerFS was tested in 2P-imaging in hippocampal slices and in anesthetized mice using biotin-straptavidin to anchor exogenously applied purified protein sensor to the brain tissue and pipetting on saturating concentrations of D-serine ligand.

      Although presented as the development of a sensor for biology, this work primarily focuses on the application of existing protein engineering techniques to alter the ligand affinity and specificity of a ligand-binding protein domain. The authors are somewhat successful in improving specificity for the desired ligand, but much context is lacking. For any such engineering effort, the end goals should be laid out as explicitly as possible. What sorts of biological signals do they desire to measure? On what length scale? On what time scale? What is known about the concentrations of the analyte and potential competing factors in the tissue? Since the authors do not demonstrate the imaging of any physiological signals with their sensor and do not discuss in detail the nature of the signals they aim to see, the reader is unable to evaluate what effect (if any) all of their protein engineering work had on their progress toward the goal of imaging D-serine signals in tissue.

      As a paper describing a combination of protein engineering approaches to alter the ligand affinity and specificity of one protein, it is a relatively complete work. In its current form trying to present a new fluorescent biosensor for imaging biology it is strongly lacking. I would suggest the authors rework the story to exclusively focus on the protein engineering or continue to work on the sensor/imaging/etc until they are able to use it to image some biology.

      Additional Major Points:

      1) There is no discussion of why the authors chose to use non-specific chemical labeling of the tissue with NHS-biotin to anchor their sensor vs. genetic techniques to get cell-type specific expression and localization. There is no high-resolution imaging demonstrating that the sensor is localized where they intended.

      We use non-specific chemical labelling for proof-of-concept experiments that show the sensor can respond to changes in D-serine concentration in the extracellular environment of brain tissue. Cell-type specific expression of the sensor is possible based on our previous development of a similar sensor for glycine (Zhang et al., 2018; doi: https://doi.org/10.1038/s41589-018-0108-2) where the sensor was expressed by HEK293 cells and neurons, and targeted to the membrane. However, this is beyond the scope of this manuscript. Figure 5G of the original manuscript shows that the sensor (identified by Venus fluorescence) is localized to the area where D-serFS is pressure-loaded into the brain.

      2) Why does the fluorescence of both the CFP and they YFP decrease upon addition of ligand (see e.g. Supplementary Figure 2)? Were these samples at the same concentration? Is this really a FRET sensor or more of an intensiometric sensor? Is this also true with 2P excitation? How does the Venus fluorescence change when Venus is excited directly? Perhaps fluorescence lifetime measurements could help inform what is happening.

      Please see response to major comments from reviewer #1 and Figure 3. We hope this clarifies that the sensor is ratiometric. The sensor behaves similarly under two-photon excitation (2PE) as shown in Figure 5A.

      3) How reproducible are the spectral differences between LSQED and LSQED-T197Y? Only one trace for each is shown in Supplementary Figure 2 and the differences are very small, but the authors use these data to draw conclusions about the protein open-closed equilibrium.

      We have updated this to show data points representing the mean ± s.e.m (n = 3).

      4) The first three mutations described are arrived upon by aligning DalS (which is more specific for D-Ala) with the NMDA receptor (which binds D-Ser). The authors then mutate two of the ligand pocket positions of DalS to the same amino acid found in NMDAR, but mutate the third position to glutamine instead of valine. I really can't understand why they don't even test Y148V if their goal is a sensor that hopefully detects D-Ser similar to the native NMDAR. I'm sure most readers will have the same confusion.

      Please see response to major comments from reviewer #1. Additionally, while the NR1 binding domain of the NMDAR was used a structural guide for rational design of the DalS binding site, the high affinity of the NMDAR for both D-serine and glycine was not desirable in a D-serine-specific sensor.

    1. Author Response:

      Reviewer #1:

      In multicellular eukaryotes, reproduction usually proceeds through a single-cell stage via propagule cells (germ cells) of some kind, like the zygotes resulting from gamete fusion in animals and flowering plants. In such organisms, inheritance of nuclear genomes from one generation to the next is a relatively straightforward problem when compared to that of inheriting non-nuclear genomes (e.g. mitochondrial or chloroplast genomes), which often exist at very high copy numbers that are not always the same throughout the life cycle of the reproductive cell lineage that gives rise to the gametes. This complex problem is nevertheless important in evolution because allelic changes in these non-nuclear genomes can impact the phenotypes, and therefore potentially the fitness, of the cells, tissues, and organisms that house them.

      In animals, the observations that (a) the gamete precursor cells (primordial germ cells = PGCs) in embryos, or postembryonic gamete precursors (oogonia that have not yet become mature oocytes) typically have far fewer copies of mitochondria than the oocyte that will give rise to the zygote and (b) mitochondrial genome allelic variance is typically higher in embryonic PGCs than in post-embryonic germ cells, have led to the acceptance that some kind of regulated mitochondrial culling occurs at some point between initial PGC specification and the end of gametogenesis. What is less clear is exactly when along this germ cell life cycle trajectory this culling takes place, what the specific evolutionary, cellular or molecular mechanisms are that regulate it, and which mechanism(s) best explain the observed pattern of inherited mitochondrial genomes in populations.

      This manuscript addresses these problems with the approach of developing a computational evolutionary model to see how well different assumptions about when and how mitochondrial culling takes place, are able to predict the observed distribution of mitochondrial mutations in some human populations for which data are available. The authors test the fit of three hypotheses to these data: (1) imposing a bottleneck at the PGC stage by limiting the number (and variance) of mitochondria at PGC stages; (2) selection against oogonia that have "bad" mitochondria; (3) preferential accumulation of "good" mitochondria, pooled from multiple oogonia, into those oogonia that will go on to complete oogenesis. They find that the third model fits the data better than the first two. They then compare these hypotheses in a multi-generational model. They report that the third model fit the data better over a wider range of selective pressures, than the first two, although all three models have some explanatory power within the range of mutation rates explored.

      This problem is an important one and the modeling approach could add important complementary perspective to existing empirical data, or suggest new avenues of experimentation for the future. The authors have tried to extract much biological data from the empirical data to inform their parameter and boundary choices for the model, and explained quite clearly their choices, which is an excellent approach. However, a weakness of the study is that the parameters that inform the model, and many of the assumptions that underlie the logic they use to interpret their results, are drawn from a wide range of different biological systems, but the model aims to test the fit of specific hypotheses to human data only. There are many differences in every aspect of germ line segregation, PGC development, oogenesis, and mitochondrial behaviour across animals, and which aspects of these things have strong evidence for universal conservation remains unclear. Nevertheless, in this MS the authors make broad claims about universality of conclusions in some cases, and in others appear to be restricting their conclusions to explaining human data only. A second area for improvement is that some well-documented observations on mitochondrial and germ line biology that are relevant to interpreting their observations, are not considered or claimed to be absent or irrelevant (e.g. paternal mitochondrial inheritance, germ lineage separation in flowering plants), and the existing empirical literature providing evidence for these things in at least some systems is not discussed at all, not even to explain why the authors deem this evidence unimportant for their model or for the conclusions they draw from it.

      We thank the referee for their fair summary and comment on our submission. Whilst we believe that our modelling approach should apply to many systems, it is perhaps better to limit our main claims to the systems where there is a high density of information – in particular mammalian systems, humans and mice. We have rewritten the MS in this light, and restrict our comment on non-mammalian systems to the Discussion. The details of particular systems are well worth further investigation to test the generality of our conclusions.

      Reviewer #2:

      Colnaghi, Pomiankowski and Lane develop models to investigate the effects of population genetic forces on mtDNA variation within germline cells to address unanswered questions about the selective pressures on mitochondrial genomes. The models are based on updated information about germline development in mammals, including humans. Realistic parameters of mutation, selection and sampling drift are applied to the demography of cells from stem cell through mature oocytes. Three selective processes are considered: at the level of the individual (zygote), the cell, and the mitochondria. The results indicate that selection among mitochondria is the most likely process to match empirical, clinical data for mitochondrial mutation loads. This is based on modeling the mixing of mitochondria following cytoplasmic transfer of cellular contents among individual oogonia in germline cysts into the emergent primary oocyte. The proportion of mutant mtDNAs, or the strength of selection on mutant vs. wild type mtDNAs, proved to have the most impact on model outcomes and correspondence to clinical data.

      The paper is clearly written and addresses controversies that have emerged in earlier studies. Notably, the results suggest that the bottleneck effects on the mtDNAs population during germline development has less of an effect that previously thought on the selective landscape that may permit mtDNA to persist despite the consequence of Muller's ratchet decay. A pleasant aspect of the paper is its clear presentation of quantitative approaches used in both the computational and evolutionary models presented. The paper presents an advance of interest to a general readership.

      We thank the referee for this summary.

    1. Author Response:

      Reviewer #1:

      The manuscript "Two different cell-cycle processes determine the timing of cell division in Escherichia coli" by Colin et al. presents an experimental approach to investigate the role of two governing cell-cycle processes, namely, DNA replication-segregation and cell division cycle, in size regulation. Authors tackle the problem by first decoupling these two cell-cycle process via sub-lethal dosages of A22, and then analyze the role of each process in the timing of cell division. Modern imaging and analysis techniques are used in this work to monitor cell division with single-cell resolution and chromosome replication with sub-cellular resolution. The large pool of data allows the authors to perform correlation analysis of cell-size and the cell cycle parameters, which led to the conclusion that the two processes have a "balanced contributions in non-perturbed cells."

      The question studied in this manuscript is important and timely. The investigation of the two concurrent processes chosen by the authors is perhaps the right direction which may eventually lead to a complete understanding of the E. coli cell-cycle and size regulation. The high-resolution imaging and analysis accomplished in this work is also commendable. There is, however, a major concern about this manuscript, which is the entire conclusion is based on the cell-cycle and size perturbations by A22. The caveat of the A22 perturbations is that an aberrant cell shape could affect both of the cellular processes simultaneously. Even though the C-period and initiation size are largely unchanged, a possible, but unknown, cross-talk between the two processes may be affected by A22. Therefore, additional evidence is necessary to show whether the two processes independently determine cell division.

      We agree that A22 treatment could possibly affect DNA replication or organization, e.g., indirectly through an effect of cell width on DNA organization. It would thus indeed be desirable to confirm our findings based on alternative perturbations. At the same time, our experiments clearly demonstrate that cell sizes at replication initiation and division are decreasingly correlated with increasing A22 concentration, which suggests that a process different from DNA replication is responsible for the timing of division.

      Additionally, DNA replication could depend on cell division, which could possibly complicate the relationship between replication and division. We have now addressed the possibility of an influence on division on replication initiation in the Discussion, where we write ‘The concurrent-cycles framework assumes that replication initiation is independent of cell division or cell size at birth, [...]. However, we note that this is not the only possibility, and DNA replication may not be entirely independent of cell division. A complementary hypothesis \citep{Kleckner2018} posits a possible (additional or complementary) connection of initiation to the preceding division event. To test this hypothesis one could perturb specific division processes by titrating components involved in Z-ring assembly (e.g., titrating FtsZ \citep{Zheng2016}).’

      Reviewer #2:

      This is an interesting paper which makes important contributions to an interesting and highly controversial topic: how does an E.coli cell decide when to divide.

      As the authors describe in clear and careful detail, two main camps have argued (often dogmatically) for "single process" models in which division is either a direct, downstream consequence of replication initiation (which is the regulated step) or of effects that act directly on division (irrespective of replication and, more generally, the chromosome cycle). The authors of this paper have, instead, proposed that both types of effects are important, in different proportions according to the circumstances. They refer to this idea as a "concurrent cycles" hypothesis. In previous work they have presented arguments and data which they interpret as being incompatible with any single process model and consistent with their alternative hypothesis.

      This work now investigates the consequences of treatment with A22, a drug which inhibits MreB, with the result that it increases cell width and, concomitantly, increases the length of time between completion of a given round of DNA replication and the immediately ensuing cell division (an interval known as the "D period"). The idea to analyze this situation was motivated by the authors previous hypothesis: by the concurrent cycles idea, increasing the length of the D-period should prolong the replication-independent inter-division process such that it becomes rate limiting in determining the timing of division (relative to the replication-dependent process).

      The data presented confirm the authors' expectation. They first show that progressively increasing the amount of A22 does not (dramatically) alter either: (i) the basic "adder" behavior in which a fixed amount of cell length is added irrespective of the length of the cell at birth or (ii) the finding that a fixed amount of cell length is added per replication origin during the period from one round of replication initiation to the next, which is consistent with (and generally considered to be supportive of) a role for a replication-dependent process.

      However, they also discover an interesting additional effect by examining the amount of cell length added (per origin) during the entire period comprising replication plus the immediately ensuing division ("C+D"). In the unperturbed case, cells that are longer at the time of initiation of replication also add more length during the ensuing (C+D) period. In contrast, in the presence of increasing amounts of A22, this effect is progressively reversed such that, finally, at high drug levels, cells which are longer (per origin) at the time of initiation of replication add much less length during the ensuing (C+D) period. Since the length of the C period is essentially constant in all conditions, the relevant effect is the variation in the length of the D period. And since the observed effect becomes more and more prominent with increasing A22 concentration, variation in the D period dominates more and more as the length of that period gets longer and longer. The authors interpret this effect to mean that, with increasing D-period length, division timing is decreasingly dependent on replication initiation. They go on to infer that "with increasing average D period, a process different from DNA replication is likely increasingly responsible for division control". This is a sensible, relatively formal restatement of the finding. This statement allows for diverse specific interpretations. The authors focus on one possible interpretation: they show that their previously proposed concurrent cycles hypothesis can quantitatively explain these data. In essence, given a replication-independent and a replication-dependent process, the observed findings are explained by an increased contribution of the replication-independent process. This scenario also does a better job of explaining the presented data, as well as other findings, than other recent "single process" models, for reasons that are discussed in straightforward detail in the Discussion. The authors also do an excellent job of laying out the assumptions upon which their model (and other existing models) are based, thus laying open the possibility for future studies to consider other possible scenarios.

      This work is important for four reasons. First, provides interesting new data which must be accommodated by any synthetic explanation for cell division control. Second, it makes it abundantly clear that the validity of any proposed single process model remains to be further substantiated. Third, it suggests an interesting alternative model which can accommodate a diversity of data, including that presented in the current work, and which has the potentially attractive feature of combining the two existing single-process models. Fourth, and perhaps most importantly, the authors discussion of the available data in this field clear, thoughtful and thought-provoking and leaves open the possibility of some as-yet unimagined mechanism. Overall, this work provides an important counterpoint to other published work and is a very valuable contribution to thinking and discussion in this field.

      [It can also be noted specifically that this work provides an important counterpoint to the model proposed in a previous eLIFE paper on this topic by Witz et al., 2019 (eLife 2019;8:e48063 doi: 10.7554/eLife.48063).]

      We thank the reviewer for her careful assessment and appreciation of our work.

      Reviewer #3:

      Colin, Micali et al. investigated slow-growing E. coli cells' division and replication over cell cycles at single cell level with the perturbed cellular dimension. They found that the time between replication termination and division increased by perturbing cell width as recently reported, and that chromosome replication became decreasingly limiting for cell division. These results well supported the 'concurrent-processes model' previously proposed by some of the authors.

      1) Cell length can be used to represent the cell size (adder) only if the cell width keeps constant. In the current form of the manuscript, it is unknown whether or not the cell width varies significantly at single-cell level with A22 treatment (e.g., 1µg/ml A22). In this case, cell volume might not be nicely correlated with cell length. The interpretation of Figure 3 therefore would be devalued.

      We now demonstrate in the new Figure 2–S2 that the coefficient of variations of cell width does not increase with A22 concentration (neither in snapshots from cells grown in liquid culture nore in the mother machine):

      *Figure: Variation of width at the single-cell level. Coefficient of variation of cell width as a function of mean cell with. Squares and triangles represent measurements done on cells grown in mother machine or in liquid culture respectively. Blue color represents wild-type cells. Grey color represents cells treated with different amounts of A22.*

      We also reference this figure in the main text, writing: ‘Increasing A22 concentration leads to increasing steady-state cell width both in batch culture and in the mother machine (Figure \ref{fig2}B), without affecting cell-to-cell width fluctuations (Figure \ref{CV_width}),} and without affecting doubling time (Figure \ref{fig2}C) or single-cell growth rate (Figure \ref{SI_Fig1}).’

      2) The negative value of 𝜁C+D in Figure 3F (treated group) indicates that the division length is negatively correlated with the cell length at replication initiation. It is not obvious that this can rule out the possible contribution of DNA replication/segregation in offsetting the length difference at initiation and thus contribute to cell division. Since Figure 3F is the key observation to validate the model, more explanations are required to help readers understand how a negative 𝜁C+D can lead to a conclusion that a process different from DNA replication is likely responsible for division control with A22-treatment.

      The negative value of zetaCD actually corresponds to a lack of correlation between division size and size at initiation, typically predicted by the models where replication is never limiting for cell division (Micali et al 2018, Si et al. 2019). We have commented more explicitly on this point in the text, writing: ‘Note that the negative value of $zeta{\rm CD}$ corresponds to a lack of correlation between division size and size at initiation (Figure \ref{fig3}G), typically predicted by the models where replication is never limiting for cell division~\cite{Micali2018,Micali2018b,Si2019}.’

      3) As an important input for the model, the QC+D' is assumed to be equal to QC+D in unperturbed conditions and remains constant regardless of the A22 concentration (Line 548-554). This assumption is reasonable if the minimum time interval for segregation (D') is irrelevant to the change of cell width. But how D' and QC+D' changes with cell width are unknown. Earlier molecular studies revealed that the polymerization of MreB affects the activity of topoisomerase IV, an enzyme mediates the dimerization of sister chromosomes, which implies that changing cell width may affect D'. Given the importance of QC+D' to the model, it is vital for the authors to make this assumption clear in maintext and explain why such assumption is reasonable.

      QCD’ (related to average growth in the CD’ period) is a parameter that we cannot measure, or bypass in the model. We have made this assumption more explicit in the text. While this question deserves further investigation in future studies, we know that D’ cannot increase too strongly with width, because otherwise it would leave replication/segregation limiting for division under A22 perturbations, contrary to our observation. This is the main reason to assume D’ constant in the model. A posteriori we can say that the loss of correlation between size at division and size at initiation observed under A22 treatment is in line with the hypothesis that D’ does not increase too much in order for the segregation process to interfere with cell division. We now write: ‘Note that neither the minimum completion time C+D' nor the coupling parameter $zeta{CD’}$ can be measured experimentally, or bypassed in the model. In principle these parameters could change under A22 perturbations, since MreB affects the activity of topoisomerase IV \citep{madabhushi2009actin,kruse2003dysfunctional}, an enzyme that mediates the dimerization of sister chromosomes. However, constancy of $\zeta{CD'}$ is supported by the constancy of the C period, and the minimum D' period cannot increase too strongly with width in the model, because otherwise it would render replication/segregation limiting for division under A22 perturbations, contrary to our experimental observation. Hence, for simplicity, we assumed $\zeta{CD'}$ and the D' period to stay constant.’

    1. Author Response

      We are grateful for the thorough and thoughtful comments provided by the reviewers, and we appreciate their support for the design and implications of this study. We have addressed the major points raised by the reviewers as follows.

      Major Concerns:

      1) Limitations of extrapolation to human health and disease.

      From Reviewer 2: Though I found the work largely beyond critique technically, I would have appreciated additional discussion of the limitations of the use of a captive non-human primate to model human dietary response.

      From Reviewer 3: However, my major concern is the suitability of these results to explain human relevance and how far they can address the actual evolutionary significance. I think they should tone down a little. For example, is there really any strong reason to assume that macaques will mimic dietary responses in humans? I appreciate the fundamental importance of macaque-specific responses, but I am unclear how captive primates can model human effects─ how do authors factor their (obvious?) fundamental differences between different immune response profiles activated against similar cues and standing microbiome, warranting divergent interactions with the said dietary manipulations. I think these are caveats that need to be carefully discussed to avoid building over expectations among readers.

      From Reviewer 3: Could there be more discussion on the relevance of differentially expressed macaque genes in humans?

      We appreciate the concern regarding possible overinterpretation of results. There is an extensive body of literature demonstrating the utility of the cynomolgus macaque model to explore influences of diet on numerous phenotypes including atherosclerosis and cardiovascular disease, bone metabolism, breast and uterine biology, and other phenotypes (Adams et al., 1997; Clarkson et al., 2004, 2013; Cline et al., 2001; Cline & Wood, 2006; Haberthur et al., 2010; Lees et al., 1998; Mikkola et al., 2004; Mikkola & Clarkson, 2006; Naftolin et al., 2004; Nagpal, Shively, et al., 2018; Nagpal, Wang, et al., 2018; Register, 2009; Register et al., 2003; Shively & Clarkson, 2009; Sophonsritsuk et al., 2013; Walker et al., 2008; Wood et al., 2007). The cynomolgus model was remarkably accurate in predicting effects of hormone therapies on both cardiovascular disease and breast cancer later demonstrated in the very large Women’s Health Initiative (Adams et al., 1997; Clarkson et al., 2013; Naftolin et al., 2004; Shively & Clarkson, 2009; Wood et al., 2007). Cynomolgus macaque responses to other therapies (tamoxifen, selective estrogen receptor modulators, blood pressure medications, etc.) also have shown great similarities to those in humans (Cline et al., 2001). We have added additional text to the Abstract (lines 51-52), Introduction (lines 136-141), and Discussion (lines 531-542) to situate the current work in the extensive literature that uses cynomolgus macaques as a model to understand human health. We have also included discussion regarding the limitations of extrapolating these results to humans in lines 543-545 of the Discussion

      We also tested the overlap of differential gene expression induced by the Western diet with genes implicated in human complex traits (Zhang et al., 2020). Genes implicated in numerous traits associated with cardiometabolic health were enriched in Western genes, while no traits were enriched in Mediterranean genes. We describe these findings in lines 206-215 of the Results section and in Figure 1—figure supplement 1, which depicts traits relevant to human health and disease identified by previous groups where gene expression profiles overlapped with the “Western genes” in the current study. Lines 668-672 of the Materials and Methods detail the statistical approach used.

      2) Limitations of this experimental design to test the evolutionary mismatch hypothesis.

      From Reviewer 2: My worry is that macaques are so ill-adapted to the Western human diet that the behavioral and inflammation differences seen are explained by this macaque-Western diet mismatch, which dwarfs the human-Western diet mismatch that likely nonetheless exists. This concern can be partially mitigated by careful discussion of this study limitation.

      From Reviewer 2: One critique of dietary interventions that attempt to correct the evolutionary mismatch (which would be useful to address when discussing human-macaque differences) is that human evolution continuing to the present day has been marked by putative selection regime changes associated with multiple major dietary shifts, including meat eating and those arising from cooking and domestication of plants and animals. Such selection may have differentiated humans from macaques in key ways that influence macaque suitability as a dietary model.

      From Reviewer 2: My recommendations for strengthening the work are minor, besides those outlined above to include caveats concerning the differences between macaques and humans that will hopefully prevent lay readers from over-interpreting the results. Specifically, species-level differences which warrant mention include gross differences in "natural" diet between the species, as well as known recent selection on diet-related genes in humans (reviewed in, e.g., Luca et al. 2010; doi:10.1146/annurev-nutr-080508-141048) and gut microbiome differences between the species (e.g., Chen et al. 2018; doi:10.1038/s41598-018-33950-6).

      From Reviewer 2: A simple analysis that begins to address this point analytically would be to compare what results exist for humans (e.g., Camargo et al, 2012; doi:10.1017/S0007114511005812) to those of your study.

      From Reviewer 2: Additionally, one could check whether the DE genes you identify are known to be selected in humans.

      We appreciate the suggestion to strengthen our discussion of the macaque model of human health. As with early hunter-gatherer humans, macaques are omnivorous in the wild, eating a variety of plants and animals. In addition, the cynomolgus macaque often co-exists with human populations, and in that respect may have co-evolved in many ways. Furthermore, cynomolgus macaques have been used in studies of dietary influences on chronic prevalent human disease for 50 years (Malinow et al., 1972), and nearly 700 papers in a Pubmed literature search support the idea that cynomolgus responses to diet are remarkably similar to those of humans in all systems studied. Some of these studies are identified above. With respect to the microbiome, previous work by others has demonstrated that the gut microbiome of omnivorous nonhuman primates is similar to that of humans living a modern lifestyle (Ley et al., 2008), and we previously reported similarities in patterns of microbiome responses to Mediterranean vs. Western diets between humans and NHPs in the present study (Nagpal, Shively, et al., 2018). We have added discussion of the above and note limitations of extrapolation to humans due to species-level differences in natural diets and the role that selection may plan in responses of humans to Western or Mediterranean dietary patterns (lines 543-545). Similarities between humans in DE genes are noted in responses above. In addition, we already had noted that our studies complement and extend the findings of Camargo (line 399), and we added more detail that we found similar effects of diet on expression of IL6 and NF-kB pathway members (line 397).

      3) Lack of control group maintained on a standard chow diet.

      From Reviewer 2: In future studies, it would be useful to have samples from proper control monkeys fed a standard primate diet.

      From Reviewer 3: Also, this is slightly unfortunate because there is no full control treatment where macaques are maintained in their regular diet (i.e., standard monkey chow) and then compared with groups switched to the Mediterranean vs western diet to estimate the relative deviations from their expected physiological processes and behavioural traits.

      We appreciate the concern regarding the lack of a standard monkey chow diet control group. All monkeys ate chow during the baseline phase and were thoroughly phenotyped, exhibiting minimal differences in monocyte gene expression profiles between groups subsequently assigned to the two diets, which involved stratified randomization based on key baseline characteristics while consuming the same diet. Importantly, monkey chow is unlike any historic or current human or nonhuman primate diet as is apparent in Table 1. It is quite low in fat, and rich in soy protein and isoflavones, which are known to alter physiology and immune system function. Therefore, parallel assessments of health measures in monkeys consuming chow long term do not provide data relevant to diet effects on human health. We have added discussion of the strengths of the study (lines 136-141, 531-542), which was designed in order to be able to draw causal inference about the diet manipulation, and we acknowledge limitations to assess directionality of changes (i.e. which experimental diet is driving a particular observed difference) in lines 545-553.

    1. Author Response:

      Joint Public Review:

      The Mismatch Repair (MMR) pathway removes mismatched bases from newly synthesized DNA strands. Strand discrimination is driven by single strand breaks in the daughter strands. MMR can also recognize some adducts formed by methylating chemotherapeutics, such as temozolomide (TMZ), the standard treatment for glioblastoma. TMZ, and the mimic N-methyl-N-nitrosourea (MNU), methylate guanine at N7 (7mG) and adenine at N3 (3mA). These account for 80-90% of total adducts and are repaired by the Base Excision Repair (BER) pathway. However, they also form 8-9% O6-methylguanine (O6-mG), which is cytotoxic and mutagenic and not repaired by BER. O6-mG can pair with T during replication giving rise to O6-mG:T lesions. This mismatch is recognized by MMR but provokes a "futile cycle" of repair in which the T, since it is in the daughter strand, is removed, after which repair synthesis restores the O6-mG:T. It has been proposed that, in the subsequent S phase, replication across gaps generated during the futile cycles results in toxic double strand breaks (DSBs). The key feature of this model is the requirement for two cycles of replication, the first to generate the provocative O6-mG: T mismatch, the second to produce the breaks. Versions of this scenario have been the primary concern of the field for many years.

      The submission from Fuchs and colleagues presents an additional and non-conventional model for MNU/TMZ toxicity. Their experimental approach departs from the requirement for replication and emphasizes the initial O6-mG:C lesion rather than O6-mG:T. They follow repair synthesis in plasmids treated with either MNU or methyl methane sulfonate (MMS) which produces high levels of 7mG and 3mA, but low levels of O6-mG:C. The plasmids were incubated in Xenopus egg extracts that support repair but not replication. They found that MMR proteins bound the MNU treated plasmid but not the MMS treated plasmid and that there was greater repair synthesis in the plasmid treated with MNU than with MMS. They also observed that the BER pathway was important for repair synthesis of the MNU treated plasmid. Experiments with a plasmid carrying a single defined O6-mG:C with or without MMS treatment supported this conclusion. Based on these and other observations they argue that BER of the 7mG and 3mA adducts introduced nicks that were exploited by MMR to drive gap formation and repair synthesis at sites of O6-mG:C. DSBs were formed in the plasmids undergoing both BER (against the N methyl adducts) and MMR against O6-mG:C. Their results support a model in which BER nicking at sites of N methyl adducts provides an enhanced opportunity for MMR of the O6-mG:C lesions. Extended exonuclease digestion by MMR reaches sites undergoing BER on the other strand thus generating DSBs.

      Although there is an extensive literature on replication-dependent production and processing of the MNU/TMZ O6-mG:T lesion, this report is novel in the attention to replication-independent repair of the primary mismatch product. Chemotherapy has typically been premised on targeting replicating cells. However, the majority of cells in a glioblastoma tumor are not proliferating, and insight into attacking non dividing cells might be very useful in treating this almost always fatal tumor. The author's data support their model, although some of the implications of their conclusions could be more fully developed. Additional data on two aspects would strengthen the paper.

      The first reflects the considerable interest in manipulations of DNA repair pathways that would enhance the toxicity towards tumors of DNA reactive chemotherapy drugs. The authors propose that the introduction of nicks during the early steps of BER are responsible for the enhanced efficacy of MMR in generating the DSBs. However, the later steps of BER act to reverse the nicks. The extract system would appear to lend itself to the identification of the later steps in the BER pathway which, if inhibited, would increase DSB formation by MMR mediated gap formation on one strand past nicks on the other.

      The second would extend the approach beyond the extracts. The authors have effectively exploited this system to identify key proteins responding to model substrates and address certain mechanistic questions with those substrates. However, the extracts cannot recapitulate all the features of repair/toxicity of MNU/TMZ adducts in the chromatin environment of the human genome. Although the authors allude to future cell-based assays, the paper would benefit by an initial test of the new model in a live cell system.

      The authors should also consider an apparent discrepancy with earlier work. Figure 1 describes the recovery of MMR proteins bound to the plasmid treated with MNU. This treatment would yield O6-mG:C in addition to the guanine and adenine N-alkylation products. Several years ago the Hsieh lab found that purified MutS alpha failed to bind O6-mG:C but recognized O6-mG:T (Mol Cell 22, 501, 2006). However, in this submission the authors report binding of MutS alpha to the plasmid with O6-mG:C. Current models suggest that mismatch binding by MutS alpha initiates the repair process (see Ortega, Cell Res. 31, 542, 2021). In the light of the report from the Hsieh lab the authors' results would seem to imply that something in the extract in addition to MutS alpha is required for that binding. The recognition of O6-mG:C is central to their model, and it would be useful for them to discuss how they reconcile their results with those of the Hsieh lab. In addition, there is a discrepancy with an earlier publication (Olivera Harris et al. 2015 DNA Repair about the effectiveness of the MGMT inhibitor Patrin-2 in Xenopus extracts that should be reconciled.

      Indeed, in the Hsieh paper, purified MutSa, is shown not to bind O6mG:C pairs. Our experiments involve extracts containing many proteins and there is probably synergy between MutSa and MutLa to achieve full MMR (as supported by the 2021 Ortega paper). Moreover, activation of MMR by a single O6mG:C lesion has been reported previously by the Modrich group as referenced in our paper (p7) (Duckett et al., 1999).

  3. Jun 2021
    1. Author Response:

      Reviewer #1 (Public Review):

      The physical principles underlying oligomerization of GPCRs are not well understood. Here, authors focused on oligomerization of A2AR. They found that oligomerization of A2AR is mediated by the intrinsically disordered, extramembraneous C-terminal tail. Using experiment and MD simulation, they mapped the regions that are responsible for oligomerization and dissected the driving forces in oligomerization.

      This is a nice piece of work that applies fundamental physical principles to the understanding of an important biological problem. It is a significant finding that oligomerization of A2AR is mediated by multiple weak interactions that are "tunable" by environmental factors. It is also interesting that solute-induced, solvent-mediated "depletion interactions" can be a key driving force in membrane protein-protein interactions.

      Although this work is potentially a significant contribution to the fields of GPCRs and molecular biophysics of membrane proteins in general, there are several concerns that would need to be implemented to strengthen the conclusions.

      1) How reasonably would the results obtained in the micellar environment be translated into the phenomenon in the cell membranes?

      1a) Here authors measured oligomerization of A2AR in detergent micelles, not in the bilayer or cellular context. Although the cell membranes would provide another layer of complexity, the hydrophobic properties and electrostatics of the negatively charged membrane surface may cooperate or compete with the interactions mediated by the C-terminal tail, especially if the oligomerization is mediated by multiple weak interactions.

      The translatability of properties of membrane proteins in detergent micelles to the cellular context is a valid concern. However, this shortcoming applies to all biophysical studies of membrane proteins in non-native environments. Even for membrane proteins reconstituted in liposomes, the question arises whether the artificial lipid composition that differs from that in the human plasma membrane would alter protein properties, especially as surface charges and cholesterol content can impact membrane protein dynamics, association, and stability. In that sense, this question cannot be answered satisfyingly, especially for GPCRs that are notoriously difficult to isolate. However, we can offer some perspectives. The propensity for membrane proteins to associate and oligomerize, if anything, is greater in lipid bilayers compared to that in detergent micelles, while detergent micelles can effectively solubilize membrane protein monomers (Popot and Engelman, Biochem 1990, 29 (17), 4031–4037). Hence, the findings that A2AR readily oligomerizes in detergent micelles and that the degree of oligomerization can be systematically tuned by the C-terminal length of A2AR in the same micellar system suggest that inter-A2AR interactions are modulating receptor oligomerization; we speculate that A2AR oligomers will be present or be enhanced in the lipid bilayer environment. In fact, in the cellular context, it has been shown that A2AR assembles into homodimers at the cell surface in transfected HEK293 cells (Canals et al, J Neurochem 2004, 88, 726–734) and into higher- order oligomers at the plasma membrane in Cath.A differentiated neuronal cells (Vidi et al, FEBS Lett 2008, 582, 3985–3990). Furthermore, C-terminally truncated A2AR has been demonstrated to show no protein aggregation or clustering on the cell surface, a process otherwise observed in the WT form (Burgueno et al, J Biol Chem 2003, 278 (39), 37545–37552). These results provide the research community with a valid starting point to discover factors that control oligomerization of A2AR in the cellular context.

      1b) Related to the point above (1a), I wonder if MD simulation could provide an insight into the role of the lipid bilayer in the inter- or intra-molecular interactions involving the tail. Although the neutral POPC bilayer was employed in the simulation, the tail-membrane interaction may affect oligomerization since the tail is intrinsically disordered and possess a significant portion of nonpolar residues (Fig. S4).

      The reviewer brings up a valid point about the ability for MD simulations to provide insights into the role of membrane-protein interactions. In response to the reviewer, we performed additional analysis focusing on the interactions of the C-terminus with the lipid bilayer. Overall, as the C-terminus is extended, there is a decrease in its interaction with the cytoplasmic leaflet of the membrane (left figure below). More specifically, we find that the C-terminal segment associated with helix 8 (residues 291 to 314) interacts tightly with the membrane, while the rest of the C-terminus (an intrinsically disordered segment) more weakly interacts with the membrane, regardless of truncation (right figure below). As the C-terminus is extended, the inherent conformational flexibility leads to a decrease in the interactions between the protein and the bilayer. We also observe that shorter stretches of the disordered segment do have the ability to interact more closely with the membrane. While these portions include charged residues that can participate in formation of the dimer interface, no general trends are observed. We therefore cannot draw any conclusions regarding the role of C-terminal-membrane interactions on the dimerization of A2AR. What we do know is that the MD simulations presented here should be considered a model study that reveals that the charged and disordered C-terminus of A2AR can account for oligomerization via multiple and weak inter-protomer contacts.

      MD simulations showing (Left) average distance of all C-terminal residues and (right) average per-residue distance from the cytoplasmic membrane of the lipid bilayer.

      2) Ensuring that the oligomer distributions are thermodynamic products.

      Since authors interpret the SEC results on the basis of thermodynamic concepts (driving forces, depletion interactions, etc.), it would be important to verify that the distribution of different oligomeric states is the outcome of the thermodynamic control. There is a possibility that the distribution is the outcome of the "kinetic trapping" during detergent solubilization.

      This is an important question. As we have shown in the manuscript, the A2AR dimer level was found to be reduced in the presence of TCEP (Figure 2B), suggesting that disulfide linkages have a role in facilitating A2AR oligomerization. However, disulfide cross-linking reaction cannot be the sole driving force of A2AR oligomerization because (1) a significant population of A2AR dimer remained resistant to TCEP (Figure 2B), (2) A2AR oligomer levels decreased progressively with the shortening of the C-terminus (Figure 3), and (3) A2AR oligomerization is driven by depletion interactions enhanced with increasing ionic strength (Figure 5).

      To answer whether A2AR oligomer is a thermodynamic or kinetic product, we tested the stability and reversibility of the A2AR monomer and dimer/oligomer population. We used SEC to separate these populations of both the A2AR-WT and A2AR-Q372ΔC variants, then performed a second round of SEC to observe their repopulation, if any. The results are summarized in the figure below, which we will include in the revised manuscript as Figure 5-figure supplement 1.

      We find that the SEC-separated monomers repopulate measurably into dimer/oligomer, with the total oligomer level after redistribution comparable with that of the initial samples for both A2AR WT (initial: 2.87; redistributed: 1.60) and A2AR-Q372ΔC (initial: 1.49; redistributed: 1.40) (Figure 5-figure supplement 1A). This observation indicates that A2AR oligomer is a thermodynamic product with a lower free energy compared with that of the monomer. This is consistent with the results we have shown in the manuscript that the oligomer levels of A2AR-WT are consistent (1.34–2.87; Table S1) and that A2AR oligomerization can be modulated with ionic strengths via depletion interactions (Figure 5).

      Figure S5. The dimer/oligomerization of A2AR is a thermodynamic process where the dimer and HMW oligomer once formed are kinetically trapped. (A) SEC chromatograms of the consecutive rounds of SEC performed on A2AR-WT and Q372ΔC. The first rounds of SEC are to separate the dimer/oligomer population and the monomer population, while the second rounds of SEC are performed on these SEC-separated populations to assess their stability and reversibility. The total oligomer level is expressed relative to the monomeric population in arbitrary units. (B) Energy diagram depicting A2AR oligomerization progress. The monomer needs to overcome an activation barrier (EA), driven by depletion interactions, to form the dimer/oligomer. Once formed, the dimer/oligomer populations are kinetically trapped by disulfide linkages.

      Interestingly, the SEC-separated dimer/oligomer populations do not repopulate to form monomers (Figure 5-figure supplement 1). This observation is, again, consistent with a published study of ours on A2AR dimers (Schonenbach et al, FEBS Lett 2016, 590, 3295–3306). This observation furthermore indicates that once the oligomers are formed, some are kinetically trapped and thus cannot redistribute into monomers.

      We believe that disulfide linkages are likely candidates that kinetically stabilize A2AR oligomers, as demonstrated by their redistribution into monomers only in the presence of a reducing agent (Figure 2B). Taken together, we suggest that A2AR oligomerization is a thermodynamic process (Figure 5-figure supplement 1B), with the monomer overcoming the activation energy (EA) by depletion interactions to repopulate into dimer/oligomer with a slightly lower free energy (given that we see a distribution between the two). Once formed, the redistributed dimer/oligomer populations can be kinetically stabilized by disulfide linkages.

      3) The claim that the C-terminal tail is engaged in "cooperative" interactions is too qualitative (p. 11 line 274, p.12 line 279 and p.18 line 426).

      This claim seems derived from Fig. 3b and Figs. 4b-c. However, the gradual decrease in the dimer level and the number of interactions may indicate that different parts in the C-terminal tail contribute to dimerization additively rather than cooperatively. The large decrease in the number of interactions may stem from the large decrease in the length (395 to 354). Probably, a more quantitative measure would be the number of interactions (H-bonds/salt bridges) normalized to the tail length upon successive truncation. Even in that case, the polar/charged residues would not be uniformly distributed along the primary sequence, making the quantitative argument of cooperativity challenging.

      The request to clarify our basis to refer to a cooperative interaction is well taken. Figure 4B and 4C show that the truncation of one part of the C-terminus (segment 335–394) leads to a reduction in contacts of a different part (segment 291–334) of A2AR. Therefore, we conclude that the binding interactions that occur in segment 291–334 are altered by the interactions exerted by the segment 335–394. This characteristic is consistent with allosteric interactions. We believe that characterizing these interactions as “cooperative” is possible but is not fully justified in this work. We also agree with the comment that quantifying the role and segments involved in contacts would be challenging. The manuscript has been amended to use the term “allosteric” in place of “cooperative”.

      4) On the compactness and conformation of the C-terminal tail:

      Although the C-terminal tail is known as "intrinsically disordered", the results seem to indicate that its conformation is rather compact (or collapsed) with a number of intra- and intermolecular polar interactions (Fig. 4) and buried nonpolar residues (Fig. 6), which are subject to depletion interactions (Fig. 5). This raises a question if the tail indeed "intrinsically disordered" as is known. Recent folding studies on IDPs (Riback et al. Science 2017, 358, 238-; Best, Curr Opin Struct Biol 2020, 60, 27-) suggest that IDPs are partially expanded or expanded rather than collapsed.

      We agree that our results seem to suggest that the conformation of the C-terminus could be partially compact. However, by stating that the C-terminus on average is an intrinsically disordered region (IDR), we do not exclude the possibility of partially structured regions, or greater compactness than that of an excluded volume polymer. IDR or IDP should refer to all proteins or protein regions that do not adopt a unique structure. By that standard, we know that the C-terminus of A2AR falls into that category according to our experiments and MD simulation, as well as the literature. In isolation, the majority the C-terminus is indeed an IDR, as has been demonstrated not only by simulations but also by experimental data. In fact, the C-terminus exhibits partial alpha-helical structure, and transiently populates beta-sheet conformations, depending on its state and buffer conditions (Piirainen et al, Biophys J 2015, 108 (4), 903–917). The literature studies suggest that A2AR’s C-terminus may adopt a greater level of compactness when interactions are formed between the C-terminus and the rest of the A2AR oligomer.

      Reviewer #2 (Public Review):

      The authors expressed A2A receptor as wild type and modified with truncations/mutations at the C-terminus. The receptor was solubilized in detergent solution, purified via a C-terminal deca-His tag and the fraction of ligand binding-competent receptor separated by an affinity column. Receptor oligomerization was studied by size exclusion chromatography on the purified receptor solubilized in a DDM/CHAPS/CHS detergent solution. It was observed that truncation greatly reduces the tendency of A2A to form dimers and oligomers. Mechanistic insights into interactions that facilitate oligomerization were obtained by molecular simulations and the study of aggregation behavior of peptide sequences representing the C-terminus of A2A. It is concluded that a multitude of interactions including disulfide linkages, hydrogen bonds electrostatic- and depletion interactions contribute to aggregation of the receptor.

      The general conclusions appear to be correct and the paper is well written. This is a study of protein association in detergent solution. It is conceivable that observations are relevant for A2A receptors in cell membranes as well. However, extrapolation of mechanisms observed on receptor in detergent micelles to receptor in membranes should proceed with caution. In particular, the spatial arrangement of oligomerized receptor molecules in micelles may differ from arrangement in lipid bilayers. The lipid matrix may have a profound influence on oligomerization.

      The ultimate question to answer is how oligomerization alters receptor function. This will have to be addressed in a future study.

      We could not agree more. We address the concern regarding the translatability of properties of membrane proteins in detergent micelles to the cellular context in our response to Reviewer 1. In short, we believe the general propensity for A2AR to form dimers/oligomers and the role of the C-terminus will hold in the cellular context. However, even if it does not, given that biophysical structure-function studies of GPCRs are conducted in detergent micelles and other artificial environments, it is critical to understand the role of the C-terminus in the oligomerization of reconstituted A2AR in detergent micelles. How oligomerization alters receptor function is a question that is always on our mind and should be the the focus of future studies. Indeed, it has been demonstrated that truncation of the A2AR C-terminus significantly reduces receptor association with Gαs and cAMP production in cellular assays (Koretz et al, Biophys J 2021, https://doi.org/10.1016/j.bpj.2021.02.032). The results presented in this manuscript, which have demonstrated the impact of C-terminal truncation on A2AR oligomerization, will offer critical understanding for such study of the functional consequences of A2AR oligomerization.

      Reviewer #3 (Public Review):

      The work of Nguyen et al. demonstrates the relevant role of the C-terminus of A2AR for its homo-oligomerization. A previous work (Schonenbach et al. 2016) found that a point mutation of C394 in the C-terminus (C394S) reduces homo-oligomerization. Following this direction, more mutants were generated, the C-terminus was also truncated at different levels, and, using size-exclusion chromatography (SEC), the oligomerization levels of A2AR variants were assessed. Overall, these experiments support the role of the C-terminus in the oligomerization process. MD studies were performed and the non-covalent interactions were monitored. To 'identify the types of non-covalent interaction(s)', A2AR variants were also analysed modulating the ionic strength from 0.15 to 0.95 M. The C-terminus peptides were investigated to assess their interaction in absence of the TM domain.

      The SEC results on the A2AR variants strongly support the main conclusion of the paper, but some passages and methodologies are less convincing. The different results obtained for dimerization and oligomerization are low discussed. The MD simulations are performed on models that are not accurately described - structural information currently available may compromise the quality of the model and the validity of the results (i.e., applying MD simulations to low-resolution models may not be appropriate for the goal of this analysis, moreover the formation of disulfide bonds cannot be simulated but this can affect the conformation and consequently the interactions to be monitored). Although the C-terminus is suggested as 'a driving factor for the oligomerization', the TM domain is indeed involved in the process and if and how it will be affected by modulating the solvent ionic strength should be discussed.

      We thank the reviewer for the overall positive assessment and critical input. We will respond to the comments as followed.

      The qualitative trend for dimerization is consistent with that for oligomerization, as demonstrated in Figs. 2A, 3B, and 5. For example, a reduction in both dimerization and oligomerization was observed upon C394X mutations (Figure 2A), as well as upon systematic truncations (Figure 3B), while very similar trends were seen for the change in the dimer and oligomer levels of all four constructs upon variation of ionic strength (Figure 5).

      We agree that the experimental observation and MD simulation only incompletely describe the state of the A2AR dimer/oligomer. For example, we discover the impact of ERR:AAA mutations of the C-terminus (Figure 3C) on oligomer formation, but do not know whether this segment interacts with the TM domain or C-terminus of the neighboring A2AR. MD simulations suggest that the inter-protomer interface certainly involves inter-C-termini contact. We also mention that the A2AR oligomeric interfaces could be asymmetric, suggesting that the C-terminus can interact with other parts of the receptor, including the TM domain. However, we do not have evidence that the TM domain directly interact with each other to stabilize A2AR oligomers, and thus cannot discuss the effect of the solvent ionic strength on how the TM domain contributes to A2AR oligomerization. We minimize such discussion in our manuscript because we have incomplete insights. What we can say is that multiple and weak inter-protomer interactions that contribute to the dimer and oligomer interface formation prominently involve the C-terminus. Ultimately, the structure of the A2AR dimer/oligomer needs to be solved to answer the reviewer’s question fully.

      With respect to the validity of our model, we restricted ourselves to using the best-available X-ray crystal structure for A2AR. Since this structure (PDB 5G53) does not include the entire C-terminus, we resorted to using homology modeling software (i.e., MODELLER) to predict the structures of the C-terminus. In our model, the first segment of the C-terminus consisting of residues 291 to 314 were modeled as a helical segment parallel to the cytoplasmic membrane surface while the rest of the C-terminus was modeled as intrinsically disordered. MODELLER is much more accurate in structural predictions for segments less than 20 residues. This limitation necessitated that we run an equilibrium MD simulation for 2 µs to obtain a well-equilibrated structure that possesses a more viable starting conformation. We have included this detailed description of our model in lines 641–650. To validate our models of all potential variants of A2AR, we calculated the RMSD and RMSF for each truncated variant. Our results clearly show that the transmembrane helical bundle is very stable, as expected, and that the C-terminus is more flexible (see figure below). This flexibility is somewhat consistent for lengths up to 359 residues, with a more noticeable increase in flexibility for the 394-residue variant of A2AR.

      Root mean square fluctuation (RMSF) from sample trajectories of truncated variants modeled from the crystal structure of the adenosine A2AR bound to an engineered G protein (PDB ID 5G53), and the root mean square deviation (RMSD) of the C-terminus of each variant starting from residue 291.

    1. Author Response:

      A very sincere thanks to the Editors and Reviewers for their insightful and helpful feedback which undoubtedly strengthened the manuscript. We appreciate the opportunity to respond to these critiques and recommended revisions.

      Evaluation Summary:

      The authors ask why the APOE4 allele has persisted, often at high frequencies, in human populations despite its associations to heart disease and Alzheimer's disease. They consider the hypothesis that APOE4 may be advantageous in a high pathogen and high physical environment settings (as opposed to a low pathogen industrial lifestyle) through an in-depth characterization of the Tsimane in Bolivia. The study is of broad interest with an insightful dataset; the conclusions are somewhat limited by the nature and current description and treatment of the data.

      From Editor and Reviewer comments, we realized that the paper required some clarification regarding our position on APOE allele frequencies in the Tsimane population. As discussed more depth in the reviewer comments, it was not our intent to make a purely adaptationist argument for E4, or to suggest that there are no costs to E4 in other aspects of life, or at different ages, but rather to suggest that the relative costs and benefits of E4 may be environmentally-dependent, such that having an E4 allele is more neutral, and/or may provide some benefits (and limited costs), in pathogenically-diverse and energy-limited environments.

      Reviewer #1 (Public Review):

      Garcia, AR et al. seek to test out the hypothesis that APOE4 is environmentally mediated and may be protective in a high-pathogen environment. The authors test the presence of at least a single APOE4 allele copy with baseline innate immune function in a Tsimane population in Bolivia by measuring various biomarkers. They showed that being an APOE4 allele carrier is associated with higher circulating levels of lipids combined with lower levels of CRP and eosinophils. This finding among the APOE4+ individuals of the Tsimane population demonstrates further support for the hypothesis that higher loads of lipids are protective in higher loads of infection. This work highlights not only connections to immune response but how we can interpret heart disease/Alzheimer's in an evolutionary context dependent on the environment. Furthermore, a strength is that this work was carried out ethically where work with human subjects was not only approved in US-based institutions, but also by the governing body of the Tsimane. Overall, this is a clear study using fieldwork methods to demonstrate connections difficult to replicate in a controlled laboratory setting.

      1) One of the underlying assumptions for the persistence of APOE4 alleles across human populations is because it is or was previously under selection and in the right environment, the APOE4 allele is advantageous. Presumably, in the Tsimane, where the APOE4 allele may be advantageous due to a higher pathogen load and high activity, then wouldn't we expect the allele frequencies to be higher? This section discussing evolution should be a little more fleshed out. Is there any evidence for genetic selection (positive/ balancing) at that locus or is it based on allele frequencies? Given that you do calculate allele frequencies, how do the allele frequencies in Tsimane populations compare to other populations that live in the same geographic region or environment? Would we expect these allele frequencies to be higher than in a post-industrial environment? Do they support selection?

      We thank you for this input, and also appreciate the opportunity to clarify our position; we do not assume that APOE4 was under positive selection in this population, and do not intend to make a purely 'adaptationist' argument for the persistence of APOE4 in this population. Nor do we make any attempts to assess signals of selection. Our goals for this paper are (1) questioning the assumptions that APOE4 is a universally deleterious allele, rather than its effects on phenotype are environmentally moderated, and (2) assessing the relationships between APOE, lipids, and innate inflammation in a population living in a relatively unique (and underrepresented) environmental context. We contend that the benefits of APOE4 may be most appreciable in energetically-constrained and pathogenically-diverse environments, and that APOE4 may also not have the same harmful effects on health under such conditions. It was not our intent to suggest that there are no costs to APOE4 in other aspects of life, or at different ages, but rather to suggest that the relative costs and benefits of APOE4 may be environmentally-dependent, such that having an APOE4 allele is more neutral, and/or may provide some benefits (and limited costs), in pathogenically-diverse and energy-limited environments.

      The distribution of APOE allelic variants in populations around the world is likely due to a mosaic of factors, and potentially include differences related to environmentally-dependent costs and benefits of functionally-distinct variants. Certainly some degree of genetic drift or founder effects (Gayà-Vidal et al., 2012; Singh et al., 2006), antagonistic pleiotropy (Smith et al., 2019; Van Exel et al., 2017) and other forces (e.g. genetic relatedness) play an important role in determining population and global frequencies and should be considered jointly to make inferences about the frequency of E4 in the Tsimane population. While pathogenicity and energetic limitations is one context where the typical costs of APOE4 may not be expressed, the high frequency of APOE4 allele in populations that are appreciably different in terms of latitude, ecology, and population history (e.g. northern European and central African populations), confirm that more is involved in understanding population differences in APOE4 frequency (Abondio et al., 2019). Nonetheless, the context we describe and show evidence for in our paper contrasts with the contemporary obesogenic environment with low pathogen diversity more typical of the Global North, where benefits like lipid buffering are no longer needed- and may in fact incur costs. Wording throughout the paper has been modified to clarify this stance.

      2) Throughout the paper I was wondering if other models were also considered and tested (APOE3/APOE3, APOE3/APOE4, APOE4/APOE4), but I didn't see the reasoning for why the alleles were binned until the methods section. This information should come earlier in the paper, given the way it is structured. If the 3 genotypes were tested, it should be stated in the paper, even if there was no association or there was insufficient sample size and should be discussed in the discussion.

      Unfortunately, there are very few individuals who are homozygous for APOE4 (E3/3: n=998; E3/4: n=245; E4/4: n=23), making it impossible to conduct statistical tests with sufficient statistical power to make inferences. While we understand the interest in reporting findings from models that use 'unbinned' data, we are cautious against doing so, as statistical tests provide no clear or reliable inferences (whether non- or highly-significant), due to the sample size disparities. This is even more problematic in models that include interaction effects (the majority), which further split and lead to disparate sample sizes between groups. We have moved up the explanation for binning APOE genotypes to the beginning of the Results section, and noted the genotypic breakdown.

      Reviewer #2 (Public Review):

      This work investigates the impact of the APOE4 gene variant on inflammation and lipid profiles among the Tsimane subsistence population of Bolivia, a group facing energy constraints and heavy infectious disease burden. APOE4 is associated with greater inflammation, lipids, and downstream cardiovascular disease and Alzheimer's disease in energy-abundant post-industrial populations. Increasingly, human and other model research suggests that the impact of APOE4 on inflammation and lipids may vary under differing conditions of energy availability and infection. It is important to understand this variation to understand how APOE4 impacts disease risk across populations but also to understand why, from an evolutionary perspective, APOE4 frequency is up to 40% in some populations.

      Strengths:

      *The evolutionary medicine approach used in this study allows for a powerful analysis to probe both proximate ("how") and ultimate ("why") questions relating to variation in APOE4 frequency and associated disease risk.

      *The sample size is relatively large and is, it appears, the first to combine this set of measures in a subsistence population experiencing a wide range of energy availability. This allows for the testing of variable interactions and moderating effects using mixed models that can accommodate data clustering and missing data.

      *The paper is organized nicely. The findings, as currently described, have important implications for understanding evolved mechanisms of pathogen defense and the rapidly increasing burden of cardiovascular disease in many low-and middle-income countries.

      Weaknesses:

      *The observational design and correlative nature of the analysis limit causal inference. This is exacerbated by near-single measures of some key variables and the use of proxies of energy availability (e.g., BMI) and pathogen exposure (e.g., community) that lack specificity.

      We concur that the data available limit causal inference. We discuss this point in detail in the Limitations section (excerpt below), and offer ideas that we believe would be useful and necessary to extend this research, by designing experimental lab-based models to test some of the main findings.

      Regarding the measures used, while we agree that BMI is not a perfect proxy for energy availability, given that our main goal in the paper -- with regards to energy availability -- was to investigate APOE and lipids at the extreme tails of BMI (overweight vs. lean), we do feel that BMI can adequately capture broad differences in energetic availability between these two groups. For example, a previous paper showed that BMI and body fat were closely associated among adults in this population (Gurven et al. 2012: r=.0.75 in women; r=0.57 in men; Fig. S4). Also, though we use BMI as a continuous measure for models, we plot the upper and lower tertiles from these models to distinguish these overweight vs. lean groups.

      Regarding justification for using community as a proxy for pathogen exposure, we have added the following sentence to Methods: "Because Tsimane villages vary in sanitation infrastructure, including access to soap and other hygienic products, and potentially prevalence by pathogen type (e.g. some living very close to the river versus farther out in the forest), individuals were clustered by community to account for variation in such community-level factors." We would also like to note that we include season and white blood cell count as additional covariates to adjust for individual-level differences in current pathogen exposure.

      Excerpt from Limitations: "Because these findings may be important for furthering evolutionary (i.e. why the APOE4 allele is maintained) and clinical (i.e. the role of APOE in disease pathogenesis) understanding, they require replication, and warrant experimental testing. The central thesis presented here – that persistent exposure to pathogens and obesogenic diets moderate the relationship between blood lipids and inflammation – is amenable to experimental manipulation under lab conditions. Specifically, a mammalian model system could be split into two treatments: those raised under sterile conditions versus regimented exposure to non-lethal pathogens. These treatments may then be crossed with dietary or physical activity conditions that produce differential levels of adiposity. Our hypothesis predicts that both decreased adiposity and increased life course pathogen exposure will reduce or even eliminate positive associations between blood lipids and chronic inflammation. Importantly, inflammatory biomarkers can be measured at more frequent intervals in lab conditions to assess long-term differences in the function of both pro- and anti-inflammatory pathways between experimental treatments."

      *There may be reporting errors in the key marker of inflammation (CRP) and, potentially, the sample sizes. This adds concern for the analysis.

      We had mistakenly reported CRP in mg/dL. We truly appreciate this Reviewer for catching the unit reporting error: CRP units have been updated and now correctly report in mg/L.

      We realized the lack of clarity regarding the sample size for each of the specific models, given that sample sizes differed across models, dependent upon the number of measurements/observations available per biomarker. For clarity, we have added sample sizes for each model (total observations and unique sample IDs) to tables in the main text of the document. To this end, raw data points have also been added to all figures. Full models with covariates are still included in the Supplement.

      *While the argument of the paper is based on "baseline" measures of inflammation and lipids, it is unclear given the nature of the data and analysis if representative measures are actually being used. If not, the interpretation of the data could change considerably.

      We see the problem of stating that the reported levels for biomarkers are specifically "baseline", particularly given the observational nature of the data. A main focus of the paper is in applying an evolutionary lens to understanding relationships between APOE variants, lipids, and immune functions-- including the widely-observed phenomenon that in healthy, non-obese, individuals, APOE4 is consistently associated with lower innate inflammation. We aim to apply an evolutionary theoretical framework to understand this relationship, however, with the existing data, we cannot make strong inferences or rule out the alternate explanations (lower baseline vs faster clearance, etc.) posed. We have deleted all uses of the term "baseline" to describe observed levels of immune function. We have also added sentences to the discussion section to illuminate some possible explanations for genotype differences in levels of innate immune function. Further, with regards to one of our main results (that APOE4 carriers have significantly lower CRP): to address the possibility that we are capturing a high number of acute inflammatory events, which may affect findings, we reran models constraining them to include only observations of CRP < 10mg/L. The median level for CRP for this subset is 2.5mg/L. Constraining the models does not alter the results, however, we report the results for both, and include constrained models in the Supplement.

      From Discussion (additions underlined): "Our finding that innate immune biomarkers are lower among APOE4 carriers is in line with prior reports (Lumsden et al., 2020; Martiskainen et al., 2018; Trumble et al., 2017; Vasunilashorn et al., 2011), however the causes are uncertain. One proximate explanation involves the mevalonate pathway, which plays a key role in multiple cellular processes, including modulating sterol and cholesterol biosynthesis and innate immune function (Buhaescu and Izzedine, 2007). Regarding the main finding for CRP, it is possible that APOE4 carriers experience a lower innate immune sensing (Dose et al., 2018) or have faster clearance following the resolution of an acute spike. While there is currently no direct evidence for the latter, some studies have found that higher circulating lipids were associated with more rapid clearance of active infections (Andersen, 2018; Pérez-Guzmán et al., 2005). The current study design did not allow analysis of these pathways."

      *The paper does not have the sample size to address the impact of having 1 vs. 2 copies of APOE4 and could better discuss population-level variation in APOE4 frequencies and why Tsimane frequency (12%) is, in fact, much lower than in many other populations (e.g., in Central Africa).

      As noted above, we agree that it would informative to be able to tease apart phenotypic effects from having one or two E4 alleles versus none, and recognize that we are unfortunately not in the position to parse out such differences in this paper due to the relatively small sample size of homozygous E4/E4 individuals (E3/3: n=998; E3/4: n=245; E4/4: n=23). We have moved up the explanation for binning APOE genotypes to the beginning of the Results section, and noted the genotypic breakdown to explain why we needed to bin APOE4 carriers vs non-carriers for the statistical analyses.

      We completely agree that making inferences about APOE4 frequency across populations is interesting and would be a useful extension, but it is unfortunately beyond the scope of this manuscript. Certainly some degree of genetic drift, founder effects, population bottlenecks, antagonistic pleiotropy and other forces should be considered jointly to make inferences about the frequency of E4 in the Tsimane population, and in comparison to other populations worldwide. Unfortunately, that set of analyses is beyond the scope of this paper, which provides data on associations between APOE genotype and lipid and immune phenotypes. Given that E4 is the ancestral allele, it is possible that a combination of lower costs (or even benefits) in pathogenically-diverse environments, and maintenance due to drift/lack of bottlenecks, may in part explain the high frequencies in some parts of Africa. While pathogen risk and energetic limitations is one context where the typical cardiovascular costs of E4 may not be expressed, the high frequency of E4 allele in northern European populations confirms that more is involved in understanding population differences in E4 frequency. Nonetheless, the context we describe and show evidence for in our paper contrasts with the contemporary obesogenic environment with low pathogen diversity more typical of the Global North, where benefits like lipid buffering are no longer needed- and may in fact incur costs. Wording throughout has been modified to clarify this stance.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper examines muscle activity at single muscle level during Drosophila ecdysis (adult hatching) behavior. The premise is that quantifying behavior or motor neuron activity is insufficient to understand how the CNS generates behavior - it is also critical to quantify muscle activity. They show that abdominal body wall muscles generate stereotyped patterns of activity during four developmental stages; (phase 0, stochastic activity; phase 1-3, each with different patterns of activity. Co-active groups of muscles form "syllables" which are used in different combinations to generate the stereotyped activity seen in phases 1-3. This analysis was facilitated by use of a convoluted neural network. Interestingly, they found examples where muscle contraction did not match muscle activity (GCaMP elevation), showing the importance of measuring both attributes.

      In addition to mapping the stereotyped muscle activity at single muscle resolution in the generation of ecdysis behavior, they find that phase 1 and 3 are quite variable, and speculate that other constraints on the CNS output (e.g. during larval locomotion) may prevent a sharpening up of muscle patterns. They show that the hormone ETH is required for initiating phase 1, and the neuromodulators bursicon and CCAP are required for initiating phase 2. Failure to initiate either phase is lethal. Lastly, they show that in addition to initiating phase 1 or 2, the hormone/neuromodulators result in more coherent muscle activity.

      Overall this study sets the stage for a detailed analysis of motor neuron function in driving muscle activity patterns, and then further into the CNS to understand the role of premotor neurons. Ecdysis behavior has the potential to be a powerful system for understanding how the CNS generates behavior at the single muscle /single motor neuron level, as well as for understanding how neuromodulators act to regulate muscle/motor neuron activity.

      The figures are almost all too small to see the salient information, and the color scheme is often difficult to resolve. Please enlarge the key aspects of the figures; and try to use more distinctive colors where critical comparisons need to be made. Some examples: left/right colored lines in 1G; panel 3D; lines in 3E; all data in 5G (this is the worst for tiny data); 6C,D,J; all of 7.

      Thank you for your thoughtful review and your suggestions on how to improve the manuscript. Some figure panels (e.g. 5G) have been completely replaced. The others mentioned have been divided into multiple figures or panels, which allowed us to enlarge the material in each. Fig. 7 was deleted from the revised manuscript because it was generally found unhelpful. We also felt that the other revisions rendered this figure unnecessary. The revised manuscript now has 11 main figures and 9 figure supplements with more generous layouts for individual panels so that details are more easily resolved. In addition, we attempted to improve the color scheme to facilitate clarity, using the color palette recommended for the color-blind. Other specific changes are referenced in our responses to individual concerns below.

      Reviewer #2 (Public Review):

      The manuscript by Diao et al. is an important extension of their eLife paper of 2017. Their development of new tools that allow them to follow Ca2+ transients in single muscle fibers over the whole animal through the behavioral sequence and also to independently monitor the Ca2+ transients in the endplates of the motor neurons that innervate these muscles. Their goal is to break down the movements that control the ecdysis sequence into elemental "syllables" and then to defined the role of these syllables in constructing progressively complex behavioral programs and as targets of neuropeptide modulation.

      A crucial behavior that occurs during P1 in higher flies is the movement of the gas bubble but this event is largely ignored in the paper. Prior to pupal ecdysis, gas is expelled into the posterior puparial space and then actively translocated, via muscular contractions of the body wall, to the anterior end of the puparium during the latter portion of P1 (shown nicely in the author's 2017 Video). A detailed study by C.G. Chadfield & J.C.Sparrow (1985. Dev. Genetics 5: 103) of pupal ecdysis in Drosophila emphasized the importance of this translocation for head eversion. When they simply removed the operculum at the start of bubble movement, then the gas bubble could not push the animal backwards in the puparial case and head eversion could not occur. However, they saw normal pupation and head eversion if the removed operculum was immediately replaced and sealed down with petroleum jelly.

      During translocation, the bubble moves in a fragmented fashion between the pupal cuticle and the puparium. Ignoring this movement leads to statements like on line 378 "Because pupal ecdysis is independent of environmental factors and executed in the absence of competing physiological needs, it is likely that its variability is intrinsic to the ecdysis network." For the pupating animal, its "environment" is the inside of the puparial case and the moving bubble is an unpredictable variable in this environment. The trajectory and route of bubble movement is not fixed, and it is likely that variation in sensory feed-back from the gas movement explains the motor variability and reduced stereotypy during P1. The role for proprioception during this phase is likely to inform the CNS of the progression of the bubble fragments. The author's finding that the blockage of proprioceptors suppresses the behavior progression could mean that this sensory information is needed to signal that an anterior space has been produced, and without this signal, the behavior does not progress to its next phase. This should be addressed in the text if not experimentally.

      We very much appreciate the reviewer’s point that the environment within the puparium may affect the pupa’s motor performance. We have now amended our comment on environmental influences to include this point (ll. 479-481 [515-517]), and we elaborate in the Discussion on conditions within the puparium that may influence movement and sensory processing (ll. 457-477 [493-513]). Following the reviewer’s advice, we note that the gas bubble and its dispersion during P1 must be considered a possible determinant of pupal movement. In addition, we mention other possible determinants that we did not previously discuss, namely substrate and surface tension interactions between the body wall, puparium, and residual molting fluid. In line with the Reviewer’s point that understanding the environment of the puparium is critical, we stress the need to account for all external forces acting on the pupal body to achieve a complete understanding of the pupal motor output. In the Discussion, we also now mention the Reviewers’ interesting hypothesis that creation of the anterior space at the end of P1 may provide sensory information necessary for progression of the behavioral sequence (ll. 534-535 [601-602])

      Another aspect of the background that is missing is considering earlier studies on the ontogeny of behaviors leading up to ecdysis/hatching. Notable are studies of the progressive construction of the flight motor program during metamorphosis in moths (Kammer & Rheuben 1976 J. Exp. Biol. 65:65.) and a similar feature of assembly of motor programs prior to hatching in Drosophila (Crisp et al., 2008 Development 135:3707). In the moth studies, complex motor programs were gradually assembled during ontogeny with motor neurons firing but without muscle contraction (as the authors see in prepupae during P0 - Fig 2C). A lack of excitation-contraction coupling in the moth prevents muscle movement through most of development. This suppression of contraction is essential because prior to production of adult cuticle, muscle contraction would rip the developing animal apart. The same requirement to suppress muscle contraction would be seen in fly prepupa until sufficient pupal cuticle has been secreted to prevent rupture from actual muscle contractions! This should be addressed in the text.

      We thank the reviewer for his comments and for the references on motor program assembly. We agree that this is topic deserved more attention than it was originally given. We have now amended our discussion of P0 to contextualize our observations, pointing to the previous literature on both suppressed muscle activity and latent motor programs observed in other developing animals (ll. 487-500 [523-536]).

      Besides not being explicit about how the syllables combine to build the eight basic movements, it is not clear how these basic movements then combine to support the major behaviors of each phase. This is seen in P1, where we see that swing and brace movements can co-occur (e.g., Fig 3D) but is a swing on one side always associated with a brace on the other? What are their phase relationships? Does their temporal association remain stable as the bouts progress? Another example is in Phase 3. There appear to be 5 basic behaviors associated with bouts in Phase 3. The example in Fig 1H shows double peak bouts in phase 3, and the bulk Ca data show a preponderance of double peaks. The different shapes suggest that there are different movements during the two peaks. Their discussion of P3 movements (around line 273), though, does not address this feature of the double peaks. The example in Fig 7A suggests that some movements, like the PostSwing occur at half the frequency of other movements such as the PostCon and AntComp. Is this the basis of the double peaks and how is that reflected in the movements that are finally produced? This should be addressed in the text.

      We regret the confusion on these points. As described there, we have made numerous changes to the manuscript to clarify how elements of behavior at one level (e.g. movements) derive from lower-level elements (e.g. syllables) and are used to build higher-level elements (e.g. phases). We describe the phase relationships at all levels for P1 and P2 and summarize the more variable constituents of P3 movements in the text (Figs. Fig. 7D, E and ll. 247-275 [274-302]). The specific questions raised by the reviewer are also now answered in the text. In brief, early P2 bouts (roughly those prior to head eversion) differ from later bouts in containing only a Swing. Later bouts contain in addition to the Swing a Brace performed concomitantly on the contralateral side of the body (l. 182-183 [197-199]). The movements contributing to the peak-double peak motif common to P3 are now more carefully described at ll. 351-360 [383-393])

      One approach that I did not find useful was dividing the analysis into compartments - anterior versus posterior and dorsal-lateral-ventral. This may provide a way of generating some statistical analysis, but it did not illuminate anything about the behavior. The line between anterior and posterior segments seems to be arbitrary. Of course, it is important to know if there is directionality of movement [waves going anteriorly versus posteriorly], but beyond that, I am not sure what it adds. [Indeed, it made Fig 7 very confusing!] Also, I could not see a rationale for considering separate dorsal-lateral-ventral compartments. This should be addressed in the text.

      We thank the reviewer for this question, which we now address in a revised section of the Discussion on the topic of neuromodulation and compartmentalization (ll. 539-588 [606-655]). To briefly expand upon our explanation there, we think that compartmental activity allows a useful coarse-grained description of the sequential body wall contractions that give rise to movement as indicated by the SequenceMatcher similarity scores (Fig. 6E in the revised manuscript). Second, and more important, we think that how activity flows across compartments provides clues about both the central organization and the neuromodulatory control of ecdysis behavior. Both ETHRB and CCAP neuron suppression exert selective effects on A-P compartments. ETHRB neuron suppression blocks the Lift, a movement of the posterior compartment, while suppressing CCAP neurons prematurely terminates the first (and only) swing-like movement by blocking its progression into the anterior compartment. Additionally, the distribution of CCAP-R appears to reflect mechanisms for selectively regulating distinct D-V compartments. Myotopic maps of larval motor neuron dendrites show that MNs innervating dorsal and ventral muscles are spatially segregated from those innervating lateral muscles and have distinct inputs. This suggests distinct regulation of activity in D-V and L compartments and likely distinct functions. Importantly, CCAP-R is expressed only in motor neurons of the D and V compartments, but in the L compartment it is expressed in muscles. As we suggest, this may allow the different regulatory mechanisms of compartmental regulation to synergize during P2. Finally, our subdivision of the A-P axis at the boundary between segments 5 and 6 has both anatomical and functional importance. At the pupal stage, selective muscle loss imposes differences in muscle composition of segments anterior and posterior to this boundary. Most importantly, anterior segments contain M12, which is a major contributor to behavior only after P1 and is targeted by neuromodulatory Type III terminals containing CCAP and Bursicon. In addition, the A-P boundary also conforms to the functionally and neuroanatomically defined “hinge” region of Tastekin et al. (2018, eLife,), which regulates the switch from forward to backward movement in the larva. Because the compartmental subdivisions we define conform with neuroanatomical differences and appear to underlie functional differences, our working hypothesis is that they will be important landmarks for mapping behaviorally relevant CNS activity as we begin to image it in the next phase of our work.

    1. Author Response:

      Reviewer #2 (Public Review):

      Summary:

      Frey et al develop an automated decoding method, based on convolutional neural networks, for wideband neural activity recordings. This allows the entire neural signal (across all frequency bands) to be used as decoding inputs, as opposed to spike sorting or using specific LFP frequency bands. They show improved decoding accuracy relative to standard Bayesian decoder, and then demonstrate how their method can find the frequency bands that are important for decoding a given variable. This can help researchers to determine what aspects of the neural signal relate to given variables.

      Impact:

      I think this is a tool that has the potential to be widely useful for neuroscientists as part of their data analysis pipelines. The authors have publicly available code on github and Colab notebooks that make it easy to get started using their method.

      Relation to other methods:

      This paper takes the following 3 methods used in machine learning and signal processing, and combines them in a very useful way. 1) Frequency-based representations based on spectrograms or wavelet decompositions (e.g. Golshan et al, Journal of Neuroscience Methods, 2020; Vilamala et al, 2017 IEEE international workshop on on machine learning for signal processing). This is used for preprocessing the neural data; 2) Convolutional neural networks (many examples in Livezey and Glaser, Briefings in Bioinformatics, 2020). This is used to predict the decoding output; 3) Permutation feature importance, aka a shuffle analysis (https://scikit-learn.org/stable/modules/permutation_importance.htmlhttps://compstat-lmu.github.io/iml_methods_limitations/pfi.html). This is used to determine which input features are important. I think the authors could slightly improve their discussion/referencing of the connection to the related literature.

      Overall, I think this paper is a very useful contribution, but I do have a few concerns, as described below.

      We thank the reviewer for the encouraging feedback and the helpful summary of the approaches we used. We are happy to read that they consider the framework to be a very useful contribution to the field of neuroscience. The reviewer raises several important questions regarding the influence measure/feature importance, the data format of the SVM and how the model can be used on EEG/ECoG datasets. Moreover, they suggest clarifying the general overview of the approach and to connect it more to the related literature. These are very helpful and thoughtful comments and we are grateful to be given the opportunity to address them.

      Concerns:

      1) The interpretability of the method is not validated in simulations. To trust that this method uncovers the true frequency bands that matter for decoding a variable, I feel it's important to show the method discovers the truth when it is actually known (unlike in neural data). As a simple suggestion, you could take an actual wavelet decomposition, and create a simple linear mapping from a couple of the frequency bands to an imaginary variable; then, see whether your method determines these frequencies are the important ones. Even if the model does not recover the ground truth frequency bands perfectly (e.g. if it says correlated frequency bands matter, which is often a limitation of permutation feature importance), this would be very valuable for readers to be aware of.

      2) It's unclear how much data is needed to accurately recover the frequency bands that matter for decoding, which may be an important consideration for someone wanting to use your method. This could be tested in simulations as described above, and by subsampling from your CA1 recordings to see how the relative influence plots change.

      We thank the reviewer for this really interesting suggestion to validate our model using simulations. Accordingly, we have now trained our model on simulated behaviours, which we created via linear mapping to frequency bands. As shown in Figure 3 - Supplement 2B, the frequency bands modulated by the simulated behaviour can be clearly distinguished from the unmodulated frequency bands. To make the synthetic data more plausible we chose different multipliers (betas) for each frequency component which explains the difference between the peak at 58Hz (beta = 2) and the peak at 3750Hz (beta = 1).

      To generate a more detailed understanding of how the detected influence of a variable changes based on the amount of data available, we conducted an additional analysis. Using the real data, we subsampled the training data from 1 to 35 minutes and fully retrained the model using cross-validation. We then used the original feature importance implementation to calculate influence scores across each cross-validation split. To quantify the similarity between the original influence measure and the downsampled influence we calculated the Pearson correlation between the downsampled influence and the one obtained when using the full training set. As can be seen in Figure 3 - Supplement 2A our model achieves an accurate representation of the true influence with as little as 5 minutes of training data (mean Pearson's r = 0.89 ± 0.06)

      Page 8-9: To further assess the robustness of the influence measure we conducted two additional analyses. First, we tested how results depended on the amount of training data - (1 - 35 minutes, see Methods). We found that our model achieves an accurate representation of the true influence with as little as 5 minutes of training data (mean Pearson's r = 0.89 ± 0.06, Figure 3 - Supplement 2A). Secondly, we assessed influence accuracy on a simulated behaviour in which we varied the ground truth frequency information (see Methods). The model trained on the simulated behaviour is able to accurately represent the ground truth information (modulated frequencies 58 Hz & 3750 Hz, Figure 3 - Supplement 2B)

      Page 20: To evaluate if the influence measure accurately captures the true information content, we used simulated behaviours in which ground truth information was known. We used the preprocessed wavelet transformed data from one animal and created a simulated behaviour ysb using uniform random noise. Two frequency bands were then modulated by the simulated behaviour using fnew = fold β ysb. We used β=2 for 58Hz and β=1 for 3750Hz. We then retrained the model using five-fold cross validation and evaluated the influence measure as previously described. We report the proportion of frequency bands that fall into the correct frequencies (i.e. the frequencies we chose to be modulated, 58 Hz & 3750 Hz).

      New supplementary Figure:

      Figure 3 - Supplement 2: Decoding influence for downsampled models and simulations. (A) To measure the robustness of the influence measure we downsampled the training data and retrained the model using cross-validation. We plot the Pearson correlation between the original influence distribution using the full training set and the influence distribution obtained from the downsampled data. Each dot shows one cross-validation split. Inset shows influence plots for two runs, one for 35 minutes of training data, the other in which model training consisted of only 5 minutes of training data. (B) We quantified our influence measure using simulated behaviours. We used the wavelet preprocessed data from one CA1 recording and simulated two behavioural variables which were modulated by two frequencies (58Hz & 3750Hz) using different multipliers (betas 2 & 1). We then trained the model using cross-validation and calculated the influence scores via feature shuffling.

      3)

      a) It is not clear why your method leads to an increase in decoding accuracy (Fig. 1)? Is this simply because of the preprocessing you are using (using the Wavelet coefficients as inputs), or because of your convolutional neural network. Having a control where you provide the wavelet coefficients as inputs into a feedforward neural network would be useful, and a more meaningful comparison than the SVM. Side note - please provide more information on the SVM you are using for comparison (what is the kernel function, are you using regularization?).

      We thank the reviewer for this suggestion and are sorry for the lack of documentation regarding the support vector machine model. The support vector machine was indeed trained on the wavelet transformed data and not on the spike sorted data as we wanted a comparison model which also uses the raw data. The high error of the support vector machine on wavelet transformed data might stem from two problems: (1) The input by design loses all spatial relevant information as the 3-D representation (frequencies x channels x time) needs to be flattened into a 1-D vector in order to train an SVM on it and (2) the SVM therefore needs to deal with a huge number of features. For example, even though the wavelets are downsampled to 30Hz, one sample still consists of (64 timesteps 128 channels 26 frequencies) 212992 features, which leads the SVM to be very slow to train and to an overfit on the training set.

      This exact problem would also be present in a feedforward neural network that uses the wavelet coefficients as input. Any hidden layer connected to the input, using a reasonable amount of hidden units will result in a multi-million parameter model (e.g. 512 units will result in 109051904 parameters for just the first layer). These models are notoriously hard to train and won’t fit many consumer-grade GPUs, which is why for most spatial signals including images or higher-dimensional signals, convolutional layers are the preferred and often only option to train these models.

      We have now included more detailed information about the SVM (including kernel function and regularization parameters) in the methods section of the manuscript.

      Page 19:To generate a further baseline measure of performance when decoding using wavelet transformed coefficients, we trained support vector machines to decode position from wavelet transformed CA1 recordings. We used either a linear kernel or a non-linear radial-basis-function (RBF) kernel to train the model, using a regularization factor of C=100. For the non-linear RBF kernel we set gamma to the default 1 / (num_features var(X)) as implemented in the sklearn framework. The SVM model was trained on the same wavelet coefficients as the convolutional neural network*

      b) Relatedly, because the reason for the increase in decoding accuracy is not clear, I don't think you can make the claim that "The high accuracy and efficiency of the model suggest that our model utilizes additional information contained in the LFP as well as from sub-threshold spikes and those that were not successfully clustered." (line 122). Based on the shown evidence, it seems to me that all of the benefits vs. the Bayesian decoder could just be due to the nonlinearities of the convolutional neural network.

      Thanks for raising this interesting point regarding the linear vs. non-linear information contained in the neural data. Indeed, when training the model with a linear activation function for the convolutions and fully connected layers, model performance drops significantly. To quantify this we ran the model with three different configurations regarding its activation functions. We (1) used nonlinear activation functions only in the convolutional layers (2) or the fully connected layers or (3) only used linear activation functions throughout the whole model. As expected the model with only linear activation functions performed the worst (linear activation functions 61.61cm ± 33.85cm, non-linear convolutional layers 22.99cm ± 18.67cm, non-linear fully connected layers 47.03cm ± 29.61cm, all layers non-linear 18.89cm ± 4.66cm). For comparison the Bayesian decoder achieves a decoding accuracy of 23.25cm ± 2.79cm on this data.

      Thus it appears that the reviewer is correct - the advantage of the CNN model comes in part from the non-linearity of the convolutional layers. The corollary of this is that there are likely non-linear elements in the neural data that the CNN but not Bayes decoder can access. However, the CNN does also receive wider-band inputs and thus has the potential to utilize information beyond just detected spikes.

      In response to the reviewers point and to the new analysis regarding the LFP models raised by reviewer 1, we have now reworded this sentence in the manuscript.

      Page 4: The high accuracy and efficiency of the model for these harder samples suggest that the CNN utilizes additional information from sub-threshold spikes and those that were not successfully clustered, as well as nonlinear information which is not available to the Bayesian decoder.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript presents new data and a model that extend our understanding of color vision. The data are measurements of activity in human primary visual cortex in response to modulations of activity in the L- and M-cone photoreceptors. The model describes the data with impressive parsimony. This elegant simplification of a complex data set reveals a useful organizing principle of color processing in the visual cortex, and it is an important step towards construction of a model that predicts activity in the visual cortex to more complex visual patterns.

      Strengths of the study include the innovative stimulus generation technique (which avoided technical artifacts that would have otherwise complicated data interpretation), the rigor of experimental design, the clear and even-handed data presentation, and the success of the QCM.

      The study could be improved by a more thorough vetting of the QCM and additional discussion on the biological substrate of the activation patterns.

      We thank the reviewer for the thoughtful summary of our work, for highlighting the strengths of our methodology and analysis, and for noting that our study will make a worthy contribution to understanding the organizing principles of visual cortex.

      Reviewer #2 (Public Review):

      The goal of this work is to advance knowledge of the neural bases of color perception. Color vision has been a model system for understanding how what we see arises from the coordinated action of neurons; detailed behavioral measurements revealed color vision's dependence upon three types of photoreceptors (trichromacy) and three second stage retinal circuits that compute sums and differences of the cone signals (color opponency). The processing of color at later, cortical stages has remained poorly understood however, and studies of human cortex have been hampered by methodologies that abandoned the detailed approach. Typical past work simply compared neural responses in two conditions, the presentation of colorful (formally, chromatic) vs grayscale (luminance) images. The present work returns to the older tradition that proved so successful.

      The project's specific goals were to measure functional MRI responses in human cortex to a large range of colors, and equally importantly, capture the pattern responses with a quantitative model that can be used to predict response to many additional colors with just a few parameters. The reported work achieved these goals, establishing both a comprehensive data set and a modeling framework that together will provide a strong basis for future investigations. I would not hesitate to query the data further or to use the QCM model the paper provides to characterize other data sets.

      The strengths of the work include its methodological rigor, which gives high confidence that the goals were achieved. Specifically:

      1) The visual presentation equipment was uniquely sophisticated, allowing it to correct for possible confounds due to differences in photoreceptor responses across the retina.

      2) The testing of the model was quite rigorous, aided by distinct replications of the experiment planned prior to data collection.

      3) The fMRI methods were also state of the art.

      The work was well-situated within the literature, comparing its findings to past results. The limitations and assumptions of the present work were also clearly stated, and conclusions were not overstated.

      Weaknesses of the current draft are relatively minor, however, I believe:

      1) The data could be presented in a way to make them more comparable to prior fMRI work, e.g. by using percent change units in more places, comparing the R^2 of model fits reported here to those reported in other papers, and explaining and exploring how the spatially uniform stimuli, used here but not in other fMRI studies, limited responses in visual areas beyond V1.

      2) Comparison between the two models, the GLM and QCM is not quite complete.

      3) The present results are not discussed in context with past results using EEG, and Brouwer and Heeger's model of fMRI responses to color.

      4) Implications of the basic pattern of response for the cortical neurons producing the data are discussed less than they could be.

      We thank the reviewer for this clear summary of the paper, calling to attention our detailed approach to studying cortical color processing, and enthusiasm regarding the impact of our data and computational modeling.

      Reviewer #3 (Public Review):

      The authors describe a method for fitting a simple, separable function of contrast and cone excitation to a set of fMRI data generated from large, unstructured chromatic flicker stimuli that drive the L- and M- cone photoreceptors across a range of amplitudes and ratios. The function is of the form of a scaled ellipse – hereafter referred to as a 'Quadratic Color Model' (QCM). The QCM fits 6 parameters (ellipse orientation, ellipse elongation, and 4 parameters from a non-linear, saturating (Naka-Rushton) contrast response curve. The QCM fits the dataset well and the authors compare it (favorably) to a 40-parameter GLM that fits each separate combination of chromatic direction and contrast separately.

      The authors note three things that 'did not have to be true' (and which are therefore interesting):

      1) The data are well-fit by a separable ellipse+contrast transducer - consistent with the idea that the underlying neuronal computations that process these stimuli combine relatively independent L-M and L+M contrast.

      2) The short axis of the QCM tends to align with the L-M cone contrast directing (indicating that this direction is one of maximum sensitivity and the L+M direction (long axis) is least sensitive. This finding is qualitatively consistent with psychophysical measurements of chromatic sensitivity.

      3) Fit parameters do not change much across the cortical surface – and in particular they are relatively constant with respect to eccentricity.

      This is a technically solid paper – the data processing pipeline is meticulous, stimuli are tightly-calibrated (the ability to apply cone-isolating stimuli to fovea and periphery simultaneously is an impressive application of the 56-primary stimulus generator) and the authors have been careful to measure their stimuli before and after each experimental session. I have a few technical questions but I am completely satisfied that the authors are measuring what they think they are measuring.

      The analysis, similarly, is exemplary in many ways. Robust fitting procedures are used and model performance and generalizablility are evaluated with a leave-run-out and leave-session-out cross validation procedures. Bootstrapped confidence intervals are generated for all fits and analysis code is available online.

      The paper is also useful: it summarises a lot of (similar) previous findings in the fMRI color literature going back to the late 90s and points out that they can, in general, be represented with far fewer parameters than conditions. My main concerns are:

      1) Underlying mechanisms: The QCM is a convenient parameterization of low spatial-frequency, high temporal-frequency L-M responses. It will be a useful tool for future color vision researchers but I do not feel that I am learning very much that is new about human color vision. The choice to fit an ellipse to these data must have been motivated at least in part by inspection. It works in this case (possibly because of the particular combination of spatial and temporal frequencies that are probed) but it is not clear that this is a generic parametric model of human color responses in V1. Even very early fMRI data from stimuli with non-zero spatial frequency (for example, Engel, Zhang and Wandell '97) show response envelopes that are ellipse-like but which might well also have additional 'orthogonal' lobes or other oddities at some temporal frequencies.

      2) Model comparison: The 40-parameter GLM model provides a 'best possible' linear fit and gives a sense of the noisiness of the data but it feels a little like a strawman. It is possible to reduce the dimensionality of the fit significantly with the QCM but was it ever really plausible that the visual system would generate separate, independent responses for each combination of color direction and contrast? I suspect that given the fact that the response data are not saturating, it would be possible to replace the Naka-Rushton part of the model with a simple power function, reducing the parameter space even further. It would be more interesting to use the data to compare actual models of color processing in retina/V1 and, potentially, beyond V1.

      3) Link to perception. As the authors note, there is a rich history of psychophysics in this domain. The stimuli they choose are also, I think, well suited to modelling in the sense that they are likely to drive a very limited class of chromatic cells in V1 (those with almost no spatial frequency tuning). It is a shame therefore that no corresponding psychophysical data are presented to link physiology to perception. The issue is particularly acute because the stimulus differs from those typically used in more recent psychophysical experiments: it flickers relatively quickly and it has no spatial structure. It may, however, be more similar to the types of stimuli used prior to the advent of color CRTs : Maxwellian view systems that presented a single spot of light.

      We thank the reviewer for their detailed comments on our paper and for highlighting our careful methodological approach and modeling of the data. We address the specific points.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper focuses on the role of historical evolutionary patterns that lead to genetic adaptation in cytokine production and immune mediated diseases including infectious, inflammatory, and autoimmune diseases. The overall goal of this research was to track the evolutionary trajectories of cytokine production capacity over time in a number of patients with different exposure to infectious organisms, infectious disease, autoimmune and inflammatory diseases using the 500 Functional Genomics cohort of the Human Functional Genomics Project. The identified cohort is made up of 534 individuals of Western European ancestry. Much of this focus is on the impact and limitations of certain datasets that they have chosen to use such as the "average genotyped dosage" to be substituted for missing variants and data interpretation.

      We fully agree with the reviewer, we replace missing variants in a sample with its average dosage in the entire dataset. This makes it so missing variants in a sample do not bias the trends over time we observe. If we were to correct it using only samples from within their own era we would be inflating differences between the different era's. Whereas only using shared variants would increase the noise for older samples due to higher error rates associated with DNA degradation.

      Moreover, some data pairings in the data set are not complete or had varying time points .

      The stimulation periods were chosen based on extensive studies that showed that the timepoints used were best suited for assessing monocyte-derived and lymphocyte-derived cytokines per stimulus. Not all the stimuli induce the production of all cytokines, so the selection of the cytokine-stimulus pairs was performed for those pairs in which a cytokine production could be measured (PMID: 1385767; PMID: 19380112; PMID: 27814509; PMID: 27814508; PMID: 27814507). The differences in the cytokine availability and time points are adjusted to the optimal time of production per stimuli. Monocyte-derived cytokines (IL-1b, IL-6 and TNFa) are early response cytokines, produced by innate immune cells shortly after stimulation. IFNg, IL-17 and IL-22 are lymphocyte-derived cytokines, produced by adaptive immune cells, in this case T helper cells. These cells need to differentiate for several days before they start to produce these cytokines, this is the reason why the time point of the measurements of these cytokines is 7 days. In the case of IFNg, it can also be produced by NK cells, so it was measured after 48h after stimulation in whole blood samples. We have included these considerations in the new version of the text (lines 82 to 87).

      Similarly, a split was done to look at before and after the Neolithic era and the linear regression correspond to those two eras. However, the authors do not comment or show the data to demonstrate why they choose that specific breakpoint as opposed to looking at every historical era transition, i.e., from early upper paleolithic to late upper paleolithic to Mesolithic to Neolithic to post-Neolithic to modern.

      We thank the reviewer for this remark and acknowledge that we do not address the rationale behind our choice to look at this split specifically sufficiently. We hypothesized that the start of the Neolithic with its increase in population density and contact with animals would also be a turning point for many immune responses and immune related traits. We added various analyses to better highlight this and also show differences between different adjacent time periods.

      -The original figures showed only models using two separate linear regression lines and the different thresholds for missing genotype rates showed consistent results. In the new figures we depict LOESS regression models to better show the difference in mean PRS at every point in time and we additionally show boxplots with the different major age periods pooling the paleolithic and mesolithic samples together as pre-neolithic samples in order to account for the lower sample number in the earlier historical periods. To highlight this we have added a new section in lines 123 to 129 and new versions of the figures 1, 2, 3 and 4.

      -In the new figure 2 we add LOESS regression models for which we do not bias our analysis into defining a break at a certain time period. We furthermore show boxplots with pairwise comparisons (student’s T-test) for broader time periods highlighting the changes in PRS that would correspond with major changes in human lifestyle such as the shift from a hunter-gatherer to a neolithic lifestyle or the rapid urbanization of human society.

      -In the new Figure 3 we confirm that the various traits showing a clear change in PRS start at the advent of the Neolithic or post-Neolithic era using both the LOESS regression and pairwise comparisons (student T-test).

      -Similarly the heatmap in our original figure 4 has also been revised to only show the large sample set.

      Lastly, the authors should highlight additional limitations of this current study in terms of the generalizability to other populations or to clearly state that this is limited to the European population at the specified latitude and longitudes used.

      We thank the reviewer for his feedback and agree we should put more emphasis on this. In our study we focus on summary statistics obtained from European populations and only employ European aDNA samples, so our results should not be extrapolated to other populations from other geographical areas. We have included this in the Discussion of the new version of the manuscript (lines 289 to 292). However, our findings are mostly in agreement with previous studies in other populations, which adds robustness to the results of our study.

      Reviewer #2 (Public Review):

      In "Evolution of cytokine production capacity in ancient and modern European populations", Dominguez-Andrés et al. collect a large amount of trait association data from various studies on immune-mediated disorders and cytokine production, and use this data to create polygenic scores in ancient genomes. They then use the scores to attempt to test whether the Neolithic transition was characterized by strong changes in the adaptive response to pathogens. The impact of pathogens in human prehistory and the evolutionary response to them is an intriguing line of inquiry that is now beginning to be approachable with the rapidly increasing availability of ancient genomes.

      While the study shows a commendable collection of association data, great expertise in immune biology and an interesting study question, the manuscript suffers from severe statistical issues, which makes me doubt the validity and robustness of their conclusions. I list my concerns below, in rough order of how important I believe they are to the claims of the paper:

      —In addition to the magnitude of an effect away from the null, P-values are a function of the amount of data one has to fit a model or test a hypothesis. In this case, the authors have vastly more data after the Neolithic Revolution than before, and so have much higher power to reject the null hypothesis of "no relationship to time" after the revolution than before. One can see this in the plots the authors provided, which show vastly more data after the Neolithic, and consequently a greater ability to fit a significant linear model (in any direction) afterwards as well.

      We thank the reviewer for raising this very important point. In order to account for this difference in sample size for the different historical periods we pooled all samples prior to the neolithic era together to test for differences in mean PRS between neighbouring historical periods. This way we lose some strength in terms of the carbon-dated age of each sample but we gain the ability to compare more different pairings than just pre- and post-neolithic samples. We added various analyses to better highlight this and also show differences between different adjacent time periods:

      -The original figures showed only models using two separate linear regression lines and the different thresholds for missing genotype rates showed consistent results. In the new figures we depict LOESS regression models to better show the difference in mean PRS at every point in time and we additionally show boxplots with the different major age periods pooling the paleolithic and mesolithic samples together as pre-neolithic samples in order to account for the lower sample number in the earlier historical periods. To highlight this we have added a new section in lines 123 to 129 and new versions of the Figures 1, 2, 3 and 4.

      -In the new figure 2 we add LOESS regression models for which we do not bias our analysis into defining a break at a certain time period. We furthermore show boxplots with pairwise comparisons (student’s T-test) for broader time periods highlighting the changes in PRS that would correspond with major changes in human lifestyle such as the shift from a hunter-gatherer to a neolithic lifestyle or the rapid urbanization of human society.

      -In the new figure 3 we confirm that the various traits showing a clear change in PRS start at the advent of the Neolithic or post-Neolithic era using both the LOESS regression and pairwise comparisons (student T-test).

      -Similarly the heatmap in our original figure 4 has also been revised to only show the large sample set.

      —The authors argue that Figure S2 makes their results robust to sample size differences, but showing a consistency in direction before and after downsampling in the post-neolithic samples is not enough, because:

      1) you still lack power to detect changes in direction before the Neolithic.

      2) even for the post-Neolithic, the relationship may be in the same direction but no longer significant after downsampling. How much the significance of the linear model fit is affected by the downsampling is not shown.

      We thank the reviewer for pointing this out. The low sample count dating back to before the Neolithic era makes it indeed hard to accurately detect changes in PRS significantly correlated with time. Instead, we now aim to pool these samples together and compare the distribution of their PRS with those of Neolithic samples to better be able to detect significant differences in PRS between these historical time periods.

      In order to show the significance of each linear model as well we now show the -Log10 of the P value multiplied by the sign of the correlation coefficient. This way we can better highlight the consistency in direction as well as significance and show that downsampling affects the order of significance. Please see the new Figure 4-figure supplement 1. We have also discussed this more in depth on lines 267-272 of the new version of the text.

      —The authors chose to test "relationship between PRS with time" before and after the Neolithic as a way to demonstrate that "the advent of the Neolithic was a turning point for immune-mediated traits in Europeans". A more appropriate way to test this would be creating a model that incorporates both sets of scores together, accounts for both sample size and genetic drift in the change of polygenic scores, and shows a significant shift occurs particularly in the Neolithic, rather in any other time period, instead of choosing the Neolithic as an "a priori" partition of the data. My guess is that one could have partitioned the data into pre- and post-Mesolithic and gotten similar results, largely due to imbalances in data availability.

      We agree with the reviewer that the exact pairing of the groups might influence the conclusions, showing the importance of remaining unbiased in our a priori partitioning of the data like the reviewer accurately pointed out. We aim to account for sample imbalances by pooling the paleolithic and mesolithic samples together and instead of just testing pre- versus post- Neolithic samples we perform a pairwise comparison between neighbouring historical periods using a T test thereby taking into account the sample size of each group.

      —The authors only talk about partitions before and after the Neolithic, but plots are colored by multiple other periods. Why is the pre- and post-Neolithic the only transition that is mentioned?

      Our initial hypothesis was that the pre-versus post-Neolithic shift was a turning point for immune responses. However, based on the suggestions of the reviewers, we have decided to perform the analysis in a more unbiased way, so we show the comparison of different individual era's. The new analyses and the new Figures provided address these issues.

      —Extrapolating polygenic scores to the distant past is especially problematic given recent findings about the poor portability of scores across populations (Martin et al. 2017, 2019) and the sensitivity of tests of polygenic adaptation to the choice of GWAS reference used to derive effect size estimates (Berg et al. 2019, Sohail et al. 2019). In addition to being more heavily under-represented, paleolithic hunter-gatherers are the most differentiated populations in the time series relative to the GWAS reference data, and so presumably they are also the genomes for which PGS estimates built using such a reference would have higher error (see, e.g. Rosenberg et al. 2019). Some analyses showing how believable these scores are is warranted (perhaps by comparing to phenotypes in distant present-day populations with equivalent amounts of differentiation to the GWAS panel).

      A similar study regarding standing height in ancient populations (PMID: 31594846) validated this approach when comparing polygenic scores based on modern populations with skeletal remains from ancient individuals. We do acknowledge the absolute results of the polygenic scores are less accurate for aDNA samples compared to a modern European cohort. The effect size estimates gained using a modern cohort are less accurate for aDNA samples than unrelated modern samples, and this is certainly an unavoidable limitation of the study.This is the reason why we focus on the direction of change of the trends and not on the absolute polygenic scores since such subtle differences do not affect the conclusions of our study.

      —In multiple parts of the paper, the authors mention "adaptation" as equivalent to the patterns they claim to have found, but alternative hypotheses like genetic drift are not tested (see e.g. Guo et al. 2018 for a review of methods that could be used for this).

      We thank the reviewer for this feedback. Based on this, we have added an Fst based test for selection to determine whether the changes we see in PRS over time are due to selection or due to genetic drift. This test shows that changes between the pre-Neolithic to Neolithic are not significantly different from drift whereas after the onset of the Neolithic we do see significant amount of selection. We have explained this further in the manuscript on lines 130-135 and included the new Table S2.

      New Table S2 : Tests for selection as opposed to genetic drift were performed between populations from adjacent time periods. A two tailed test was used to determine whether mean trait Fst between pre-Neolithic - Neolithic, Neolithic - post-Neolithic, and post-Neolithic - Modern samples was significantly different compared to 10000 random LD and MAF matched mean Fst’s calculated using a same amount of SNP’s.

      —250 kb window is too short a physical distance for ensuring associated loci that are included in the score are not in LD, and much shorter than standard approaches for building polygenic scores in a population genomic context (e.g. see Berg et al. 2019, Berisa et al. 2016). Is this a robust correction for LD?

      We thank the reviewer for this remark, we tested multiple thresholds for window sizes, increasing the window size from 250 kb to 500 kb and 1000 kb (please see below new Figure 1-figure supplement 2) Although the level of significance changes for a few traits, the direction of the change remains stable across the three thresholds, demonstrating the robustness of our results. We have chosen this approach because the aDNA samples present a too high error rate and contain a relatively high amount of missing data to accurately determine LD, and determining LD using a modern reference cohort would bias our analysis by assuming the aDNA samples have a similar LD structure as modern samples.

      New Figure 1-figure supplement 2: PRS correlation pre- and post-Neolithic revolution using polygenic scores calculated at varying window sizes.

      We have edited the manuscript accordingly to show the consistency between these varying window sizes on lines 111-113.

      —If one substitutes dosage with the average genotyped dosage for a variant from the entire dataset, then one is biasing towards the partitions of the dataset that are over-represented, in this case, post-Neolithic samples.

      We fully agree with the reviewer, however the substitution of missing dosages with average dosages prevents the introduction of the bias in our models caused by varying amounts of missing SNPs in the older samples. Although our average scores on an absolute level are largely influenced by the more abundant post-Neolithic samples, this reduces the odds of wrongfully observing significant trends caused by the sparsity of the data. While the absolute scores might be biased towards a certain value, the differences and thus the direction of the change in PRS is affected by the non-missing variants in each sample.

      —It seems from Figure 2, that some scores are indeed very sensitive to the choice of P-value cutoff (e.g., Malaria, Tuberculosis) and to the amount of missing data (e.g. HIV). This should be highlighted in the main text.

      The reviewer is right, and this is largely due to the fewer number of SNPs that are included in the model at stricter p-value cutoffs, which is in part a limitation of the available GWAS summary statistics. Using fewer SNPs in our PRS calculations reduces the variability between different samples which weakens our ability to accurately model changes in these specific complex traits and detect statistical significance. We have highlighted this in the main text on lines 193-196.

      —Some of the score distributions look a bit strange, like the Tuberculosis ones in Figure 2, which appear concentrated into particular values. Could this be because some of the scores are made with very few component SNPs?

      We thank the reviewer for pointing this out and this is indeed correct. At stricter thresholds fewer significant QTLs will be included in the polygenic score model. We chose to still show these plots to point out those results might more easily differ if more variants could be included. At more lenient thresholds more variants can be included increasing the power of the model but the score might be less informative for the trait that way.

    1. Author Response:

      Reviewer #1 (Public Review):

      The primary strength of this paper is the attempt to characterize the neurons injected by Toxoplasma and the electrophysiological changes that ensue. Three major problems are however noted.

      1) Figure 1 attempts to identify regions of the brain more profoundly impacted by Toxoplasma and does so by normalizing the numbers of injected neurons to the size of the region. But since the reporter system used requires the parasite injected protein to interact with a neuron's nucleus, The authors claims can only be valid after normalizing not to size but to density of nuclei in a region. This is especially important in the cortex where different layers have distinct architectures.

      We appreciate the Reviewer’s pointing out that neuron density may play a role in where we find TINs. For example, the limited number of neuron nuclei in/near white matter tracts is almost certainly what accounts for the “under enrichment” of TINs in white matter tracts (in a sense it is a nice built-in control that our system is working). We have noted this explanation in the results.

      We also agree that neuron density could explain why TINs somas are enriched in deeper cortical layers and not in more superficial cortical layers. We will incorporate such considerations into future studies/analyses but feel that such re-analysis is beyond the scope of this paper for several reasons. First, prior T. gondii studies that have quantified cyst locations have used region size for normalization (Berenreiterova et al 2011; Boillat et al 2020). Thus, by using size ourselves, we could be consistent with these papers and draw connections to them. Second, for most of the major brain regions, relative size correlates with the number of TINs found in those regions, suggesting that size may adequately explain why a certain percentage of TINs is found in those regions (even without accounting for nuclei density). From a statistical standpoint, for both type II and type III infection, size does not correlate with the number of TINs for only two regions beyond the white matter tracts: the cortex and cerebellum. Given that the cerebellum is a relatively large brain region and has a high density of neurons in the granular layer— the lack of TINs here suggests that some unknown factor (such as difference in vascular permeability (Daniels et al 2017)) rather than area size or neuron density accounts for the cerebellum’s relative resistance to infection. Third, unlike brain region size where the Allen Institute Mouse Brain Atlas is accepted as a standard, there is no accepted standard for neuron density, in part because counts vary widely (reviewed in Keller et al 2018, Tables 1-4). Given the lack of standard counts, the most appropriate way to normalize would be to do our own counts of host nuclei in each region and ideally in both uninfected and infected mice. Counting host nuclei in infected tissue becomes complicated because of the infiltrating immune cells and dividing glia, an issue that cannot be solved by counting only neurons because of a loss of antigenicity for multiple neuron markers in inflamed brain tissue (Cekanaviciute et al 2014; David et al 2016). Pursuing such counts in uninfected tissue alone leads to a technical issue for us: when trying to count cells with high numbers using our program (Mendez, Potter et al 2018), our processing system cannot not handle the workload (i.e., the program crashes). Thus, at this time, we feel that normalizing by size is an appropriate first step and will use the recommended normalization in subsequent work which will pursue defining the mechanisms that lead to enrichment of TINs in the cortex and a lack of TINs in the cerebellum.

      2) The authors claim that inhibitory neurons are significantly less injected than excitatory ones. But how do they know that the inhibitory ones just don't die more quickly.

      We have added this possibility to our discussion.

      3) All of the electrophysiological changes that are reported to happen in the injected neurons can be most easily explained by the fact that they are unhealthy due to the injection. This does not mean that the data are insignificant since increased neuronal damage/death in injected neurons is a critical finding.

      We agree with that the electrophysiology findings of TINs may be due to neuronal injury but, at this time, we cannot prove this assumption. We have amended our discussion to more clearly state that we do not know if neuron healthy (or disease) is driving the TINs physiology or if the TINs physiology ultimately results in neuron death.

      Reviewer #2 (Public Review):

      The location and longevity of Toxoplasma infection in neurons accompanied by continuous immune infiltration to the brain provides many specific questions about the long term implications of this common infection as well as a broadly applicable model for neurotropic infections. Here Mendez and colleagues continue the use of a reporter system to reveal neurons that have been injected with parasite proteins (TINs) to determine the anatomical and cellular localization of parasite-neuron interactions and the electrophysiological properties of these neurons. This is a technically impressive piece of work first using the Allen Brain Atlas to map the location of TINs that may be a new gold standard for this type of work. Secondly, for the first time there is a record of functional data from parasite manipulated neurons suggesting that these cells ultimately die due to infection. The full consequences of this data do not seem to be fully addressed and certain limitations of the data are worthy of discussion.

      1) Although acknowledged on the first page of the introduction, there is a distinction between TINs and infected neurons. This important distinction is not continued later in the paper. Indeed, one conclusion is a change in neurotropic dogma regarding the long-lived nature of neurons being an attractive location for infections. The combination of methods used in this study allows major conclusions to be made regarding Toxoplasma injected neurons and as a result is an exciting body of work however it cannot distinguish what is happening in infected/cyst containing neurons.

      The Reviewer is absolutely correct. With the current Cre system, in vivo, we cannot distinguish between aborted invasion and invasion followed by intracellular killing of the parasite. In addition, as many cysts are in distal neuronal processes (Cabral, Tuladhar et al 2016), we often cannot determine if a TIN is infected unless we do a full neuron reconstruction, which requires thick sections rather than the 40 micron sections we used for the immunohistochemical studies. For this reason, we do not make distinctions about infection status or how an uninfected but injected TIN arose. We clarified this issue in the text (i.e. TINs refers to both infected and uninfected, injected neurons) as well as included a new figure that more clearly explains this concept. In addition, we have noted that our prior work suggests that >90% of these TINs are not infected, which is consistent with our findings in thick sections where we have done whole neuron reconstructions (unpublished data.)

      2) The electrophysiology study is well controlled by data from bystander neurons in the same tissue. These bystander neurons show a significant (p<0.001) increase in resting membrane potential. This striking significance seems underplayed for the rest of the study somewhat overshadowed by the extreme read outs on TINs. It would be interesting to hear what this mild depolarization functionally means for these bystander neurons. The data may suggest there is greater variation of membrane potential between neurons from uninfected mice and bystander neurons. The significance is lost later in the paper and the conclusions are summarized with bystander neurons being 'akin' to neurons from uninfected mice which seems not accurate.

      As noted by the Reviewer, we have focused on the TINs as opposed to the bystander neurons. In part we have primarily focused on the TINs because our lab is interested in how neuron- Toxoplasma interactions govern parasites persistence. We also did not focus on the bystanders because, compared to MSNs in uninfected mice, the difference in resting membrane potential was the only statistically significant difference we found be in the bystander MSNs (which is also why we refer to them as akin to MSNs in uninfected mice; we have changed this word to “similar”). As you correctly point out, it is an interesting difference (and, as far as we know, the first time such studies have been done). We have amended our discussion to highlight that while the bystander physiology is relatively normal compared to TINs physiology, it is still abnormal. In addition, we have also included a new figure (Supp Fig 6) which shows an expected consequence of the depolarized nature of bystanders; they require fewer steps of input current to trigger the first action potential. In the discussion, we have noted that such changes in other neurons (e.g., cortical neurons) might lead to an increase in seizures— which is seen in patients with symptomatic toxoplasmosis (ie. congenital and recrudescent infections)— or even the various behavioral changes observed in rodents. We have expanded our discussion to include such possibilities but have stressed that these findings are speculations that require further studies to confirm or negate.

      3) Two pieces of data support the concept that neurons that have been infected with parasite proteins die - firstly that readings from TINs are highly depolarized and secondly there is a loss of neurons by 8 weeks post infection. This is a significant piece of new information that is not stated in the abstract. Some hesitation in making this conclusion may be from the difficulty in obtaining electrophysiology data from these cells. Another way to support this conclusion would be helpful for this shift in our thinking of the effects of Toxoplasma infection on the brain.

      We have updated the abstract to reflect that TINs die. We are actively working on other ways to support this conclusion but believe these data (determining how TINs die- so we can block TINs death, etc) are beyond the scope of this paper.

      4) The impressive data investigating the anatomical location and the cellular specificity of TINs is further strengthened by the use of two types of parasites a Type II and Type III. The properties of these different 'strains' that may lead to alterations in neurons is not fully explored and conclusions about similarities or differences are unmade.

      We agree with the Reviewer that one of the strengths of this paper is using genetically distinct strains types as it allows us to determine what findings are likely universal vs. strain-specific. We have not focused on the differences or similarities because we do not have any data to support what factors might influence these differences and similarities. As such, we prefer to save such commentary for another paper in which we can more clearly identify these factors (or even use multiple strains from several haplotypes so that we can more definitively state whether these findings are strain type-specific or simply different between these two strains).

    1. Author Response:

      Evaluation Summary:

      This manuscript is of interest to scientists within the fields of actin cytoskeleton, cellular neurobiology and neurodevelopment. It explores how actin regulators are coordinated to trigger the formation of branches in neuronal dendritic arbor. Experiments are very well performed. Conclusions of the manuscript are convincingly supported by the results, although strict dependence of Cobl and Cobl-like in dendritic branch formation should perhaps be confirmed with additional experiments or tuned down. Results concerning the spatiotemporal relationship between the molecular players involved are more preliminary and few findings already published by the same group in previous articles should be expunged from this manuscript.

      We thank the reviewers for the positive assessment of the quality and impact of our work.

      The functional dependence of Cobl on Cobl-like and vice versa of Cobl-like on Cobl is explained in our comment to the general points raised by the reviewers and also is more clearly described and discussed in the revised manuscript.

      We acknowledge that analysis of the spatiotemporal relationship of molecular players involved in dendritic branch induction only is in its infancy, as at the current stage of research not even all important players of this process are known and this type of analysis is technically challenging to do in a quantitative manner in neurons.

      The revised manuscript does not only contain representative 3D-live imaging data but now also clearly demonstrates by quantitative evaluations of peak signal intensities that all four components studied (Cobl, Cobl-like, syndapin I and CaM) indeed show accumulations at branch induction sites prior to branch initiation (revised Figure 5C). These data are quite well in line with the relative accumulation data collected for two of the components at the 30 s time point prior to protrusion initiation for Cobl (Hou et al., 2015 PLoS Biol.) and for Cobl-like (Izadi et al., 2018 J. Cell Biol.).

      Furthermore the revised manuscript now contains a preliminary assessment of the average peak times of all four components studied here prior to dendritic branch induction (revised Figure 5D). The data highlights that they indeed do not only show spatial but also temporal overlap at branch initiation sites, as it can be expected from our finding that Cobl-like and Cobl can be interconnected by Cobl-like’s novel interaction partner syndapin I in a CaM-regulated mechanism converging on one particular of the three syndapin I binding motifs we identified in Cobl-like.

      Finally, the criticized side-by-side, software-based, detailed evaluation of Cobl and Cobl-like loss-of-function phenotypes during early dendritic arborization, has been moved to the Supplemental Material (Figure 1-Figure Supplement 1) in the revised manuscript, as one half of the data set of course indeed merely is a reproduction of the Cobl-like phenotype identified by the same method before (Izadi et al., 2018).

      However, the reviewers will acknowledge and readers will immediately understand that, without this comparison revealing the high degree of phenotypical copy, we would not have followed up and discovered the coordinated action of the two components powering actin filament formation during dendritic branch initiation we report here.

      Reviewer #1 (Public Review):

      This work investigates at the molecular and cellular levels the functional dependence of two actin filament nucleation factors, Cobl and Cobl-like proteins, in the formation of protrusive dendritic structures. Depletion of Cobl or Cobl-like lead to roughly similar phenotypes; overexpression of Cobl or Cobl-like induces excessive dendrite formation when the other protein is expressed at normal levels, but not when this other protein is depleted. Altogether, these observations lead the authors to conclude that these proteins work strictly interdependently. The authors then investigate how Cobl and Cobl-like are recruited, and identify syndapin as an essential component to bring Cobl and Cobl-like together at the membrane. This interaction is beautifully documented through a large number of pulldown experiments in vitro, and critical domains for these interactions are identified. These interactions are also confirmed in physiological conditions through ectopic localization experiments of those components to mitochondria. Syndapin I is identified as clusters at dendritic initiation sites by electron microscopy and all three components colocalize at the same nascent dendritic branch sites. In the last part of the manuscript, the authors further document the interaction between Cobl-like and syndapin, and find that calcium-dependent calmodulin binding to Cobl-like increases syndapin I's association through the first of the three KRAP's domains.

      Comments to be addressed in a revised manuscript:

      1) Some results appear inconsistent between different Figures. For example, in Figure 1D, Cobl RNAi shifts numbers of dendritic branch points from 10 to 6, while in Figure 2E, Cobl RNAi leaves numbers of dendritic branch points pretty much unchanged (around 7 or 8). Could the authors make sure that all data are consistent between Figures or explain apparent inconsistencies?

      We thank the reviewer for his/her careful evaluation of our data. The discrepancy noticed, however, is only an apparent inconsistency, as the experimental set-ups and purposes were different.

      It is correct that the both the absolute numbers of dendritic branches, terminal points and dendritic length and also the relative effects of Cobl and Cobl-like RNAi differ in particular between former Figure 1 and 2. Roughly, one can say that the RNAi effects in former Figure 1,9 and 10 are about twice as strong as in the former Figure 2. There are two simple reasons for this, which we unfortunately failed to communicate properly in our original manuscript

      i) time frame of phenotype development and suppression, respectively and

      ii) expression of only one versus two plasmids in the different types of experiments

      Concerning i): The former Figure 2 and actually also the former Figure 4 (suppression by syndapin I RNAi) both are suppressions of gain-of-function phenotypes, whereas the former Figures 1, 9 and 10 are loss-of-function experiments. Because the gain-of-function effects represent strong and fast inductions of dendritic arborization it suffices to do a short transfection (max. 34 h) and then evaluate. Loss-of-function effects in conditions using the RNAi tools alone are not very strong at such short times when compared to control (all three analyzed parameters are -15-25% for the stronger Cobl-like RNAi and 0 to -10% for the weaker Cobl RNAi at this short time; former Figure 1, please see Figure 1-Figure Supplement 2 of the revised manuscript).

      The loss-of-function experiments (former Figure 1, 9, 10) are different. The times need to be longer, as the phenotype is the normal growth of the dendritic arbor in controls vs. the putative suppression of this developmental process upon RNAi. Thus, transfections in these experiments usually need to be substantially longer (37-46 h) to show loss-of-function phenotypes compared to control - which then also may be more obvious (Cobl-like RNAi, -30 to -40 %; Cobl RNAi, -33% (*), -20% () and -10% (n.s.) (former Figure 1D-F – now Figure 1-Figure Supplement 2).

      Concerning ii): Suppression of gain-of-function experiments require the coexpression of two plasmids (one for the induction of the gain-of-function phenotype and the second for the RNAi (including reporter expression)), whereas we are able to drive loss-of-function/rescue experiments from only one plasmid driving both RNAi and the expression of a reporter or rescue mutant. Such transfections with two plasmids usually leads to gain-of-function but also suppression effects that are weaker than the effects of either overexpressing or knocking down proteins alone.

      The revised manuscript now provides information on the different time frames of transfection (see improved and expanded Figure legends and Material and Method section) and also briefly touches on the coexpression issue leading to different numbers in the different types of experiments.

      2) I find experiments of Figure 1 and 2 insufficient to conclude that Cobl and Cobl-like factors depend strictly on each other. One could imagine many scenarios where effects of Cobl or Cobl-like are highly concentration dependent, and lead to detectable effects in cells below or under certain thresholds (especially for multi-domain binding proteins such as Cobl and Cobl-like, which are likely to undergo complex phase transition behaviors when clustering at the membrane). Therefore I would recommend the authors to be very careful with wording and conclusions of their experiments, and stick to what can strictly be concluded.

      We share the reviewer’s concerns that suppression experiments are sometimes difficult to interpret. We hope the reviewer will be content with the revised version of our manuscript.

      We share the reviewer’s concerns that suppression experiments are sometimes difficult to interpret, if the experiments are not designed in a careful manner and/or show a complex outcome. This is not the case in our experiments, however (see details below).

      Of strong concern would be the following outcome: A presence of significant RNAi effect(s) alone compared to control and the results of the suppression attempt and the RNAi run for comparison are not equal but the effects of conducting RNAi alone are stronger. In this case of experimental outcome, one should rather abstain from any interpretation and try to adapt the experimental design to reach a clear conclusion. The reason is that, in this particular case, two processes (one positive, the other one negative) could simply operate in parallel, may not necessarily have anything to do with each other directly and may potentially be affected by unspecifiable dose effects as well – thus the experiment is not informative.

      In our experiments, the situation is different and the revised manuscript now contains an elucidation of the considerations required for a correct interpretation for the two vice versa suppression experiments we conducted and reported in the former Figure 2 (Figure 1C-P in the revised manuscript).

      In general, the reviewers will acknowledge that when component A is able to elicit a certain cell biological effect and this does not happen when component B is not present, then component A’s functions depend on B. This is a very classical experimental design and conclusion. The same can also be done with inhibitors - then A’s functions depend on B’s activity. However, it is absolutely crucial that the individual effects of the manipulations as well as the baseline control values are considered in the interpretation, too. If the suppression of the overexpression effect is larger than any putative RNAi effects compared to control or there is no such RNAi effect, the experiment and interpretation actually is very straight forward.

      In our study, this is the case for Cobl RNAi in the suppression of Cobl-like functions (Figure 1C-I in the revised manuscript): We observed complete suppression of Cobl-like’s effects with Cobl-like RNAi. Yet, the effects of GFP+Cobl RNAi expression are not distinguishable from control and the result thus is straight forward to interpret. We actually designed the experiment in a way that the individual RNAi conditions remained neglectable to reach this straight forward interpretation scenario.

      The same applies to the suppression of Cobl-like effects by syndapin I RNAi (Figure 3 in the revised manuscript). Under the conditions shown, syndapin I RNAi would not cause any phenotypes, yet, it completely suppressed the strong Cobl-like-mediated effects on all four parameters of dendritic arborization determined (former Figure 4; now Figure 3 in the revised manuscript).

      For the suppression of the Cobl gain-of-function phenotypes by Cobl-like RNAi (Figure 1J-P in the revised manuscript) the situation is a bit less obvious and we understand the concern of the reviewer that this may need a more detailed look. Here, in all three parameters shown, GFP+Cobl-like RNAi causes a relatively mild but significant phenotype when compared to GFP+Scrambled control. However, the reviewer will acknowledge that the RNAi effects deviating negatively from the GFP+Scrambled control are much smaller than the suppression of the Cobl-mediated effects on dendritic arborization, which are twice as high (branch points; total dendritic length) and three times as high (terminal branches), respectively. Thus, also here, we clearly observe a suppression of specifically Cobl functions and can exclude additive actions in opposite directions. Importantly, this conclusion is formally further underscored by the fact that in all three phenotypical analyses GFP-Cobl+Cobl-like RNAi and GFP+Cobl-like RNAi are not statistically different from one another but equal (Figure 1J-P in the revised manuscript). This makes the interpretation of the results of also this suppression experiment straight forward again.

      Other mentions such as (line 328) "their functions were cooperative", should also be avoided without any further explanations; Mentions such as (line 101) "Functional redundancy seemed unlikely, because both individual loss-of-function phenotypes were severe." should be explained so that readers can assess whether functional redundancy is indeed unlikely or not (for example by referencing a paper describing mild versus severe phenotypes).

      As already written in the Essential Revision list above, we apologize for the too much shortened argumentation in the original manuscript. This paragraph has been changed in the revised manuscript and now explains better why parallel action of Cobl and Cobl-like appeared unlikely and why we thus addressed the alternative hypothesis.

      3) One missing experiment in this story is whether this important effect of Ca2+/CaM signaling promoting syndapin I's association with the first of the three "KRAP" motifs is key to account for Cobl-like's clustering at the plasma membrane. Could the authors measure the effect of calcium for Cobl-like (KRAP1 deleted) clustering at the plasma membrane (as compared to wild-type Cobl-like)?

      We thank the reviewer for his/her suggestion of experiments suitable to significantly strengthen the manuscript.

      In order to address a putative impact of the first, Ca2+/CaM-regulated KRAP motif on the membrane recruitment of Cobl-like, we knocked-down endogenous Cobl-like and then quantified the membrane-association of reexpressed, RNAi-insensitive Cobl-like lacking KRAP1 at the plasma membrane of neurons in comparison to wild-type Cobl-like. Although KRAP1 is only one out of three identified syndapin I binding sites, we observed that deletion of merely this one site had a profound, statistically significant (p<0.0001; **) impact on Cobl-like’s membrane localization in developing hippocampal neurons. This data obtained in our revision work is reported as Figure 9G-I in the revised manuscript.

      In brief, this type of experimentation was done as part of our revision efforts during the last weeks. It demonstrated a remarkable strong impact of deletion of KRAP1 on Cobl-like’s membrane localization in developing hippocampal neurons and is now reported in the newly added revised Figure 9G-I.

      4) I regret sometimes the lack of quantification for some experiments. For example, protein colocalization in cells should be quantified (for example by calculating Pearson's correlation coefficients of red and green signals at mitochondrial sites) because colocalization (or absence of) is not always obvious for non-expert eyes.

      It may have been overlooked that calculating Pearson's correlation coefficients is not useful in our case, as we are not addressing a correlation of the occurrence of individual signals of one type with another type but are addressing coaccumulations of components under a given condition versus a more diffuse localization under other condition.

      The original manuscript highlighted such coaccumulations by false-color heat map representations and marking sites of interest in two of our main figures.

      In order to also comply with the reviewer’s request concerning the other figures (the in vivo protein complex reconstitutions at mitochondrial membrane surfaces), we added high-magnification insets to all of these figures in the main manuscript and in the Supplementary information visualizing in a more easily accessible manner than in the small full-size images whether the respective mitochondrial patterns are occurring or only a diffuse localization pattern prevails. We furthermore conducted line scans to quantitative visualize coincidences of elevated or diminished signal intensities. We hope that the reviewer is content with these additional figure panels added to many of our revised figures.

      5) Figure 6 is beautiful, but I am wondering if these data could be exploited better. Is it possible to record data at shorter time intervals? It seems that Cobl-like appears before syndapin. Is that correct and if so, how is this coherent with a recruitement of Cobl-like through syndapin?

      We acknowledge that analysis of the spatiotemporal relationship of molecular players involved in dendritic branch induction only is in its infancy, as at the current stage of research not even all important players of this process are known and this type of analysis is technically challenging to do in a quantitative manner in neurons. The revised manuscript does now clearly demonstrate by quantitative evaluations of peak signal intensities that all four components studied (Cobl, Cobl-like, syndapin I and CaM) indeed show accumulation at branch induction sites prior to branch initiation. These data are quite well in line with the relative accumulation data collected for two of the components at the 30 s time point prior to protrusion initiation for Cobl (Hou et al., 2015 PLoS Biol.) and for Cobl-like (Izadi et al., 2018 J. Cell Biol.). Furthermore the revised manuscript now contains a preliminary assessment of the average peak times of all for components highlighting that they indeed do not only show spatial but also temporal overlap at branch initiation sites, as it can be expected from our finding that Cobl-like and Cobl can be interconnected by Cobl-like’s novel interaction partner syndapin I in a CaM-regulated mechanism converging on one particular of the three syndapin I binding motifs we identified in Cobl-like. The Cobl-like and the syndapin I data hereby showed significant variances and a surprisingly early appearance of both components together. The data obtained thus far do not suggest that Cobl-like is recruited before syndapin but in average showed the same peak time (please see revised Figure 5C,D (former Figure 6). Thus, while we honestly do not claim that we have detailed enough data on the different aspects of the spatiotemporal behaviors of all players in dendritic branch initiation and this will definitively require further studies focusing on these aspects specifically, there at least is no discrepancy with any of the molecular mechanisms involving Cobl-like and syndapin I, which we demonstrate in this manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Izadi et al., "Functional interdependence of the actin nucleator Cobl and Cobl-like in dendritic arbor development" deals with the fundamental question of how actin regulators are orchestrated to control the formation of membranes protrusions during cells morphogenesis. In particular, the authors explored how actin nucleators are coordinated to trigger the formation of branches in neuronal dendritic arbor.

      In that context, Cobl have a crucial role in dendritic arbor formation in neuronal cells. Cobl contains a repeat of three WH2 domains interacting with actin and enabling nucleation of new actin filaments (F-actin). The initial idea was that tandem repeat of WH2 domains could be sufficient to trigger F-actin nucleation. However, other studies have shown that the WH2 repeat of Cobl has no nucleation activity of its own. Importantly, Cobl activity was shown to work in coordination with other actin regulators including the F-actin-binding protein Abp1 (Haag, J Neuro 2012) and the BAR domain protein syndapin (Schwintzer, EMBO J 2011).

      The manuscript of Izadi et al. builds on previous articles from the same group, in particular a study demonstrating that Cobl-like, an evolutionary ancestor of Cobl, is also crucial for dendritic branching (Izadi et al., 2018 JCB). This previous article showed that like Cobl (Haag, J Neuro 2012), Cobl-like protein works in coordination with the F-actin-binding protein Abp1 and Ca2+/CaM to promote dendritic branching through regulation of F-actin nucleation or/and assembly. In the current manuscript the authors showed that the two actin nucleators Cobl and Cobl-like proteins are interdependent to trigger dendritic branching.

      The authors used functional assays by quantifying the formation of dendritic branches in primary hippocampal neurons. Using fluorescence microscopy and siRNA-based knockdowns, the authors showed that Cobl and Cobl-like are functionally interdependent during dendritic branch formation in dissociated hippocampal neurons. They showed that siRNA decreasing Cobl or Cobl-like expression reduced the number of dendritic branch points to the same extent. Fluorescence time-lapses indicated that Cobl and Cobl-like proteins co-localized at abortive and effective branching points. Furthermore, they showed that the increase in branching induced by Cobl-like overexpression is reversed by using a siRNA that decreases Cobl expression, they also performed the reciprocal experiments. Using a variety of biochemistry assays (co-immunoprecipitation, in vitro reconstitutions with purified components…) the authors demonstrated that Cobl and Cobl-like do not interact directly, but that Cobl-like associates with syndapins, as previously shown for Cobl (Schwintzer et al., 2011; Hou et al., 2015). Thus, syndapin is the molecular and functional link between Cobl and Cobl-like proteins. The authors performed a very thorough characterisation of the biochemical interactions between the Cobl-like protein and syndapins. Syndapins and Cobl-like interactions were direct and based on SH3 domain/Prolin rich motif interactions respectively on syndapins and Cobl-like. The Prolin rich motifs were located in 3 KRAP domains at the Nter of Cobl-like proteins. The authors also showed that the interaction of the Nter proximal KRAP domain with syndapin is Ca2+/CaM dependent, and that this Ca2+/CaM dependent interaction is crucial for the function of the Cobl-like protein in the regulation of dendritic arbor formation. The authors confirmed most of their biochemical results by visualizing the formation of protein complexes on the surface of mitochondria in intact COS-7 cells. They also used time-lapse fluorescent microscopy to demonstrate that Syndapin and Cobl-like are co-localized at sites of dendritic branch induction. Importantly, the authors used Immunogold labeling of freeze-fractured plasma membranes combined with electron microscopy. Using this strategy, they showed that membrane-bound syndapin nanoclusters are preferentially located at the base of protrusive membrane topologies in developing neurons. Throughout the manuscript, the authors confronted their biochemistry experiments with functional assays quantifying the formation of dendritic branches.

      The overall conclusion of the manuscript is that a molecular complex involving Cobl, Cobl-like and syndapin and regulated by Ca2+/CaM, promotes the formation of actin networks leading to dendritic protrusions to initiate dendritic branches. Importantly, this manuscript demonstrated that multiple actin nucleators can be coordinated in neurons to trigger the formation of subcellular structures.

      The conclusions of the manuscript are, in most cases, convincingly supported by the results. In particular, the authors have performed a very comprehensive characterization of the biochemical interactions between Cobl, Cobl-like and syndapin, which are well supported by the functional results. However, the results found concerning the spatiotemporal relationship between Cobl, Cobl-like and syndapin during dendritic branch formation are more preliminary and do not take into account the roles of Ca2+/CaM. In addition, some of the findings presented in this manuscript have already been published by the same group, which diminishes the inherent originality of this manuscript. Apart from the main points raised above, the manuscript is experimentally solid and contains interesting results that are likely to stimulate further experiments in the fields of actin cytoskeleton but also in the fields of cellular neurobiology and neurodevelopment.

      We thank the reviewer for the positive assessment of the quality and impact of our work.

      As far as the first point of the reviewer is concerned, the spatiotemporal relationship between Cobl, Cobl-like and syndapin I.

      We acknowledge that analysis of the spatiotemporal relationship of molecular players involved in dendritic branch induction only is in its infancy, as at the current stage of research not even all important players of this process are known and this type of analysis is technically challenging to do in a quantitative manner in neurons.

      The revised manuscript does now clearly demonstrate by quantitative evaluations of peak signal intensities that all four components studied (Cobl, Cobl-like, syndapin I and CaM) indeed show accumulation at branch induction sites prior to branch initiation. These data are quite well in line with the relative accumulation data collected for two of the components at the 30 s time point prior to protrusion initiation for Cobl (Hou et al., 2015 PLoS Biol.) and for Cobl-like (Izadi et al., 2018 J. Cell Biol.).

      Furthermore the revised manuscript now contains a preliminary assessment of the average peak times of all for components highlighting that they indeed do not only show spatial but also temporal overlap at branch initiation sites, as it can be expected from our finding that Cobl-like and Cobl can be interconnected by Cobl-like’s novel interaction partner syndapin I in a CaM-regulated mechanism converging on one particular of the three syndapin I binding motifs we identified in Cobl-like. The Cobl-like and the syndapin I data hereby showed significant variances and a surprisingly early appearance of both components together. The data obtained thus far suggest that Cobl-like and syndapin I are in average recruited at the same peak time, whereas Cobl may perhaps peak a bit later (n.s.) and CaM overlaps with both (please see revised Figure 5C,D).

      However, even with these additional efforts we made during our revision work, one has to honestly admit that it is too early to claim that we have detailed enough data on the different aspects of the spatiotemporal behaviors of all players in dendritic branch initiation (which currently we may not all even have identified, yet). Although technically challenging to do at high enough resolution, with large enough time frames to capture the only relatively rare events of dendritic branch induction, at sufficient frame rates to not miss key events and with high enough numbers of transfected primary neurons of suitable developmental stages to reach sound quantitative data, this will require further comprehensive studies focusing on these aspects specifically.

      As far as the second point of the reviewer is concerned, the criticism that some of the findings presented in this manuscript have already been published.

      As all other points presented are novel, this probably refers to the side-by-side, software-based, detailed evaluation of Cobl and Cobl-like loss-of-function phenotypes during early dendritic arborization originally presented in Figure 1. This data has been moved to the Supplemental Material (Figure 1-Figure Supplement 1) in the revised manuscript, as one half of the data set of course indeed merely is a reproduction of the Cobl-like phenotype identified by the same method before (Izadi et al., 2018).

      However, the reviewers will acknowledge and readers will immediately understand that, without this comparison revealing the high degree of phenotypical copy, we would not have followed up and discovered the coordinated action of the two components powering actin filament formation during dendritic branch initiation we report here.

    1. Author Response:

      Evaluation Summary:

      This paper compares the properties of UV cone output synapses in different regions of the zebrafish retina using a combination of electron microscopy, quantitative imaging and computational modeling. They relate these differences to ultrastructural differences in synaptic ribbons and evaluate them using a previously-developed biophysical model for the operation of the synapse. The finding of regional differences in ribbon behavior is novel and suggests an under-appreciated degree of control of release by ribbon structure and behavior. The presentation of some of the results, particularly the model, could be strengthened.

      We thank the reviewers for their valuable inputs. In response, we have substantially extended and restructured the description of preprocessing steps and modelling to aid clarity. Moreover, we include new analysis of “old” GCaMP6f data to show the similarity of calcium dynamics across retinal regions. Additionally, we worked on the description of the simulation-based inference method and provided more intuitive explanations. Finally, we updated the discussion of the model results. We hope to have addressed the helpful critique of the reviewers and strengthened our conclusions and the whole manuscript.

      Reviewer #1 (Public Review):

      Preprocessing of glutamate traces. The bulk of the analysis in the paper uses "scaled and denoised" traces. It is important to verify that this process did not either introduce or obscure any differences across regions. This should include some validation of the assumptions that go into the scaling process (such as whether a sufficiently low calcium level is achieved to use that as a standard). An example of a how this concern could impact the conclusions is that the AZ glutamate traces look less rectified than the others, perhaps due to an elevated baseline, as suggested in the text. But the conclusion about the elevated baseline relies on the scaling process creating a proper alignment such that it is accurate to superimpose the traces as in Figure 3a.

      Thank you for giving us the opportunity to clarify this point. AZ UV-cones indeed have an elevated baseline, as explicitly shown in our previous publication (Yoshimatsu et al. 2020 Neuron). The scaling process recapitulates this baseline shift, as expected. In this previous work we also show how the lower rectification of AZ cones is directly linked to this baseline shift, and it includes experiments specifically designed to find the “true” minimum calcium levels achievable in UV-cones in different parts of the eye, as suggested by the reviewer.

      However, we fully agree that the scaling/denoising process could be described more clearly, and we expanded the explanation in the method section and added a figure (Fig. S3) to visualize all steps explicitly.

      Model fitting. Some key aspects of the model fitting were difficult to evaluate and follow. For example, is the loss function the same as the discrepancy defined in the methods (I assumed that is the case - if not the loss function needs to be defined)? The definition of the discrepancy could be clearer (e.g. be careful about using x here and as the offset of the calcium trace). Related, the results would benefit from a more intuitive description of the fitting, rather than just a reference to the methods (which is a bit dense to go through for that intuitive-level explanation of the model development).

      We added an overview of the simulation-based inference method to the main section of the manuscript. Additionally, we updated the definition of the loss function and tried to give more intuitive explanations. We hope that these changes will help the reader to better understand the computational methods used.

      Some statements seem too strong given the state of current knowledge. E.g. lines 79-80 I think goes too far about the functional role of the ribbon. Similarly lines 97-98 are quite explicit about the connection to prey capture. Lines 276-279 are a particularly important example; I would argue that the statement there requires showing uniqueness of the model.

      We agree that the mentioned statements were perhaps quite strong and we have toned them down in the revised manuscript.

      Could fixation of the retina for EM change the distribution of vesicles in different compartments? I realize this may not be answerable, but a caution about that possibility might be warranted.

      We are not aware of such an effect in previous works. As the reviewer notes it may not be answerable. However, in a way we have an “internal control” for such a possibility, since the different eye regions were treated equally for fixation, yet vesicle distributions differ across eye regions. It seems unlikely that the fixation would have disproportionately distorted vesicle distributions in one eye region without also affecting the others. This is now noted when first discussing the EM approach.

      Line 159: it is not clear how similar the calcium signals are. Specifically, could differences in calcium signal get amplified when passed through simple nonlinearity (e.g. due to the calcium dependence of transmitter release) to account for the differences in glutamate output? Maybe rewording here to leave open that possibility unless you have reason to reject it.

      We agree that this statement was perhaps too strong at this point of the manuscript. We softened it and included a detailed analysis of additional calcium data later to investigate the regional differences of the calcium signal (Fig. 3k-n)

      Can you quantify the fits in Figure 4f,g? For example, can you give a probability of a particular experimental trace or summary parameters for that experimental trace given the parameter probability distributions from the same area and from a different area?

      A quantification of the fits is shown in Fig. S4b,c (previously S3b,c). As we perform “likelihood-free inference”, we cannot give probabilities for the model traces, but we show two different loss functions for the model fits as well as for the linear model: the relevant loss, on which the models are optimized (which is based on the summary statistics) and for comparison the MSE to the experimental traces. We apologize if this was not clearly mentioned in the manuscript. We added it more prominently in the revised version.

      Reviewer #2 (Public Review):

      This study images synaptic calcium and glutamate release from larval zebrafish UV-sensitive cones in vivo. They also study the ultrastructure of ribbon synapses from UV cones in different regions of the retina. They find differences in ribbon dimension and light-evoked glutamate release from cones in different regions of the retina. Cones from dorsal retina show a more pronounced transient component of glutamate release than those from nasal retina. Those in the acute zone in the center of the retina showed intermediate kinetics. Ultrastructural reconstructions of UV-sensitive cones from those regions showed fewer and small ribbons in dorsal cones vs. those in the nasal region or acute zone zone. Light-evoked changes in the kinetics of synaptic calcium were not significantly different suggesting that differences in release kinetics may be related to differences in ribbon behavior in cones from different regions. To relate these different measurements to one another, the authors modified an existing model of cone release to incorporate a simulation-based Bayesian inference approach for estimating best-fit parameters. The model suggested that the differences in glutamate release kinetics could be explained by differences in the rates of transfer between vesicle pools on and off the ribbon. By fixing different parameters, the authors then used the model to explore the parameter space and general properties of ribbon tuning. They also provide a link to the model for others to use.

      The main new experimental finding is that glutamate release properties differ among cones in different regions. The finding that kinetics of glutamate release and ribbon ultrastructure vary systematically in different regions of the retina is interesting. They relate these data using a model of ribbon release. While the model is not novel in its general design, the incorporation of Bayesian inference is new. The most interesting finding from the model is that the kinetic differences in release between cones are not due to calcium kinetics but arise primarily from differences in transitions between vesicle pools. Nevertheless, using the model, the authors show that calcium levels and kinetics matter, since if they hold other parameters fixed, calcium levels and kinetics are the most important factors in shaping response detectability and response kinetics. This is consistent with a lot of earlier work that calcium kinetics are important for shaping response kinetics at ribbon synapses.

      1) The measured changes in glutamate and calcium are small and noisy and there is considerable overlap in the data from cones in different regions. While the example waveforms show considerable differences, the scatter in the data is less persuasive. If I understand correctly, the imaging data comes from 30 AZ, 16 dorsal, and 9 nasal UV cones. With such noisy data, 9 cones seems like particularly small sample. With imaging data, it should be possible to record from dozens or hundreds of cells and a larger sample would strengthen the conclusions.

      We agree that the sample size is quite small, however the dual color experiments are technically extremely challenging. This is part-related to the laser wavelength compromise that needs to be reached for concurrent excitation of red and green fluorescent probes, and the fact that red probes generally give comparatively poor SNR. Notably, to our knowledge concurrent 2P imaging of presynaptic calcium and consequent glutamate release in an in vivo scenario is quite novel, and still very much on the edge of experimental possibilities.

      The green glutamate recordings based on iGluSnFR which are particularly central to our work do have a reasonably high SNR, rather the “problem” is more obviously linked to the calcium recordings. For a better understanding of the calcium handling, we therefore now reanalysed an “old” dataset from Yoshimatsu et al., 2020, Neuron (see Fig. 3k-n) that was recorded with SyGCaMP6f, which provides much higher SNR (and is a little faster albeit also more nonlinear). Notably, the SyGCaMP6f calcium dynamics were also analysed in some detail in Yoshimatsu et al., 2020, Neuron, and we built on these conclusions.

      We hope that the analysis of the additional calcium dataset which is now included in the manuscript adds to more persuasive conclusions.

      2) Calcium and iGluSnfr measurements are both single wavelength measurements and thus sensitive to differences in expression of the indicator. In Fig. 3, the authors show that dorsal cones exhibit larger calcium responses than nasal cones (3c) and that AZ cones show larger glutamate responses than nasal cones (3d). Please address the potential impact of differences in expression on these measurements.

      Thank you for this comment. In Yoshimatsu et. al, 2020, Neuron we compared “live 2p” and “fixed confocal” data of the same sample to show that biosensor expression in UV-cones was uniform across regions, and that the different brightness levels were rather a result of variations in calcium levels. We extrapolated this knowledge to the used biosensors in the new experiments. We now note this explicitly in the revised manuscript.

      3) Please describe controls performed to assess the potential for spectral overlap between the red and green channels. Is there any bleed-through of one dye into the other channel?

      The expression profile of the two indicators is very different, the red fluorescence signal appears in cones, the green in HCs. We illustrated this separation in an additional figure (Fig. S2a,b) which shows that there was no obvious spectral mixing of the two fluorescence channels. We clarified this now in the revised manuscript.

      4) I am not a modeler and while I understand the general approach used for the model, I am not competent to critique specific details of the implementation, particularly the Bayesian inference. However, the fact that the linear statistical model seems to perform just as well as the more ornate model is comforting since it says that the Bayesian inference approach didn't lead the model into an unrealistic parameter space. However, while to my eye the linear model appears to perform just as well as the fancier model, the text says otherwise (Figure 4, lines 270-273). Please clarify.

      Indeed, the linear model captures the general shape of the glutamate response. However, it fails to recover adaptational processes, more precisely the transient components and adaptation over several steps. The model performances are quantified in Fig. S4 (previously S3), and especially with respect to the relevant loss, which is measuring the relevant features, the biophysical model outperforms the linear model. We expanded the discussion on these points in the manuscript and made a more prominent reference to the quantification figure.

      5) Adding a diagram to show where the different regions (dorsal, nasal, acute zone) are located in the eye would be helpful. Is there a difference in the number or size of UV cones from different regions of the retina in larval zebrafish?

      A diagram has been added to Figure 1 as requested. Regarding UV-cone numbers, indeed they do vary across the eye to specifically peak in the acute zone, and to a lesser extent also nasally. This relationship was explored in some detail in

      Zimmermann et al. 2018 Curr Biol, and also touched upon in Yoshimatsu 2020 Neuron. This known density difference is now noted in the introduction.

      6) Are differences in ribbon morphology, glutamate responses or calcium changes retained in adult zebrafish retina? While it may not be feasible to perform similar experiments in adult, some discussion of possible differences and similarities with adult retina would be helpful for putting the results in a more general context.

      The reviewer raises an interesting point. Adult zebrafish display a much broader array of visual behaviours than larvae, and moreover have a rather different diet (meaning that the UV-dependence of prey capture - see Yoshimatsu et al., 2020 Neuron - may be different). Unfortunately, the visual ecology of adult zebrafish remains poorly explored so at this point we can only speculate. Notably, unlike larvae, adults also feature a crystalline mosaic of all cones, meaning that at least numerical anisotropies in cones as they occur in larvae (Zimmermann et al. 2018) are not expected. However, this does not preclude the possibility that UV-cones have different properties across the retina, perhaps it would be the most straightforward way to regionally tune outer retinal outputs in adults. Accordingly, we fully agree that this topic would be exciting to explore, however it would go beyond what could be achieved within a reasonable revision cycle.

      We now added a summarising note of the above into the discussion section.

      Reviewer #3 (Public Review):

      The strengths of the manuscript: It contains a thorough characterization of the anatomical and physiological differences of UV cone ribbons at different locations using the state-of-art techniques including Serial-blockface scanning EM reconstruction and dual-color, simultaneous calcium and glutamate imaging. The Bayesian simulation-based inference model captured the key features of the calcium responses and glutamate release dynamics and provided distributions for each biophysical parameters, which gave insights of their interactions and their impacts on ribbon function. The online tool for ribbon synapse modeling is quite useful. Overall, it is a great effort to understand the function of ribbon synapse with a suitable system that allows multi-facet data collection and a new modeling approach.

      The weaknesses of the manuscript: 1) Overall the writing/formatting of the manuscript can be much improved - there are many imprecise, hard to understand descriptions in the manuscript; figure legends/descriptions are often inadequate for easy understanding; inconsistencies between description in the main text and methods; and above all, the descriptions of model itself and the results from the model are not communicated in a way that facilitates the understanding of process and implications. In contrast, the previous papers from the same group employing similar modeling approaches are much better explained. 2) Based on the intuitions from the modeling, there has not been a strong connection established between the anatomical data and the functional data to which the model is built to fit. More clearly identifying the consistencies and discrepancies between the data and the model will help the readers to understand the pros and cons of the model and the limitations of the generalizations from the model.

      Specific questions and recommendations for the authors:

      1) It will be helpful to have a retina diagram indicating the locations of three different regions.

      The requested diagram has been added to Figure 1.

      2) Fig 1d,e,f (and other figure panels in general) there is no need to mark n.s. On the other hand, in the Statistical Analysis section, GAMs models are mentioned only for Fig 1g, but not other results - needs a clarification.

      We find the “n.s.” labels useful, in part because in some panels none of the differences were significant and the label makes this quite explicit. Accordingly, we have opted to retain them. GAMs were indeed only used for Figure 1g - this is motivated by the difference in data structure of this panel compared to others (i.e. a comparison between continuous rather than discrete distributions). We now clarified this in the methods and added a short paragraph on the used testing procedure.

      3) Fig 1h is quite confusing, with a mixture of 3D and 2D plot, schematic drawing and statistical marks. What comparisons are these marks for? The legend is not specific and the Suppl Fig S1 doesn't clarify much.

      The asterisks are meant to indicate a statistically significant difference in the indicated property (e.g. ribbon size/number) relative to the acute zone. We apologise for not making this clear in the previous version, it is now directly noted in the panel. Regarding the 2D/3D representation, we agree that it may be a little confusing, but we cannot think of a “better” way of summarising all properties analysed by EM in a single panel, so we opted to keep it. We did however expand on the related explanation in the legend to further clarify what is shown.

      4) It will be good to discuss the properties of the calcium sensor. Deconvolution of the calcium signal (lines 617-619) notwithstanding, presumably, the sensor has neither the temporal nor spatial resolution to catch the nano-domain calcium peak near the vesicles in RRP, which is critical for the release of RRP.

      This point seems to link to the ongoing debate on to what extent release from ribbons is driven by micro- and/or nano-domain calcium signalling. It is our understanding that this debate remains unresolved in a truly general sense. Rather, it seems to be non- mutually exclusive (i.e. both micro and nano-domain signals working together), and moreover quite specific to each ribbon synapse in question. In larval zebrafish cones, the pedicle has a rather small cytoplasmic volume, there is only one invagination from postsynaptic processes, and all ribbons inside the cone are opposed to this single invagination. Accordingly, on a possible “sliding scale” of micro- vs nano-domain dominance, we think it is likely that in larval zebrafish cones microdomains will have a notable impact on release. While we are not aware of any data directly looking at this question in zebrafish larval UV-cones, there is good data available from systems that are perhaps quite similar, such as mammalian rods (which also have a single invagination site). For example, from Thoreson et al., 2004, Neuron, Figure 3.

      Already at low micromolar concentrations of calcium that are readily achieved at the level of bulk calcium in the terminal (e.g. 1-2 microM), release is driven to a substantial degree.

      However, we fully agree that we cannot detect possible nano-domain calcium signalling with our imaging method (in fact we are unsure that with currently available technology it is technically possible in an in-vivo preparation). We therefore now further emphasise the possibility of nanodomains acting on release in the discussion.

      Notably, we do already allow exploring the possible influence of nanodomain-type calcium kinetics in the online model, and we think this usefully adds to our exploration of links between calcium signalling and glutamate release.

      5) Likewise, the kinetics of iGluSnFR and of glutamate concentration in the cleft. Admittedly, figs 2a, 3c etc. show that the glutamate signal drops rapidly following the transition from dark to light, however, the rates of vesicle pool replenishment are a topic in the field-some discussion of how glutamate clearance from the cleft and the kinetics of the sensor will influence your estimates of replenishment rates would help future readers better interpret your findings in the context of their own observations.

      We agree that there are technical limitations as to what the iGluSnFR signal can tell us about the exact dynamics of glutamate in an unperturbed situation. Likely this will never be fully addressable. Rather, we use the iGluSnFR signals in a comparative fashion across eye regions, where presumably any distortion of the signals as alluded to by the reviewer would be approximately equal. Following the reviewer’s suggestion, we now explain this more directly in the main text.

      6) In Fig 2d, the rising phase kinetics of the Glu for that nasal cone is strikingly different from that of the acute zone cone. However, such difference is not seen in Fig 3. Therefore, the one in Fig 2d may not be a good representation?

      Thanks, we agree. We have replaced the nasal example with a more representative trace.

      7) In Fig 3a, c.u. and v.u. (only defined in Fig 4 in the context of the model) were used here but not S.D. as in Fig 2, any explanation?

      After scaling, SD adopts arbitrary units. For consistency with the model later we decided to use c.u. and v.u. Here (i.e. “calcium units”, and “vesicle units”). We agree that this could be explained better, and have now rephrased as follows: “We show the rescaled traces in c.u. (calcium units) and v.u. (vesicle units) respectively, to be consistent with the used units in the model later.”

      8) Lines 186-188, how were traces "normalized with respect to the UV-bright stimulus periods"?

      The traces were rescaled such that the UV-bright stimulus periods had a mean of zero and a standard deviation of one. We included this missing piece of information and expanded additionally the explanation of the pre-processing.

      9) Lines 194-195, "In addition, the glutamate release baseline of AZ UV-cones was increased during 50% contrast at the start of the stimulus" - it is unclear whether higher glutamate baseline occurred during the adaptation step (i.e. it increased during that period) or said increase was the level during adaptation compared to that during bright periods?

      Thank you, we meant the former (i.e. glutamate release “is” higher during the adaptation step). This is now clarified in the text.

      10) Lines 219-220, "a sigmoidal non-linearity with slope k and offset x0 which drives the final release" - this sentence is not clear, needs to clarify that it is referring to the relationship between calcium and release.

      Thanks, this is now clarified in the manuscript.

      11) Lines 230-232, "x0 can be understood as the inverted calcium baseline (see Methods)" - Methods don't cover this point, though it is described in the f(Ca) equation, but it isn't obvious how x0 should be the inverted baseline, as if Ca=x0, f(Ca) = 0.5 (i.e., the point of half-release probability). Please clarify this. In general, there are places where explanations of model found in methods don't match those described in the main text (also see some of the points below). Please go over carefully to ensure consistency.

      x0 can be seen as an inverted baseline as it shifts the whole linearity to a different operating point: the smaller x0 the less additional calcium is needed to trigger vesicle release. If we assume a fixed calcium affinity this implies an increased baseline level. We apologise for having omitted these explanations in the initial manuscript, we have expanded the explanation in the Methods of the revised manuscript.

      12) Fig 4e suggests a 5-10 times difference in RRP size between acute zone and nasal UV cones, which is not in line with the anatomical data (Fig 1h). Some discussions and clarifications will be helpful. As we note in the manuscript, it is difficult to quantitatively link anatomical structures to functional data. However, the small RRP size in the nasal zone inferred by the model (Fig. 4e) matches very well to the low vesicle densities at a small distance from the ribbon in the nasal zone in Fig. 1. Our model thus picks up the right trends for an anatomical structure from pure functional recordings, which is in our opinion already remarkable given the experimental noise and fine-grained differences. We commented on this point in the revised manuscript.

      13) From Fig 4h, and Fig S3b,c, the linear model doesn't look too bad (unless I misunderstand the figure panels, which are not explained in great detail). The explanation in lines 272-274 needs some work to make it clearer.

      Compared to the “best model”, the linear model clearly lacks in accuracy, perhaps most intuitively visible when looking at adaptation kinetics. This is especially the case for the relevant loss, which is based on the summary statistics. We extended the mentioned lines and hope to clarify it now in the manuscript.

      14) Sobol indices and their explanation are lacking. Are they computed using Ca2+ and glutamate signals, or just glutamate? It is hard to parse their relative "contributions" to model behavior as described in the text, when the methods caution against interpreting this analysis as determining the "importance" of parameters (lines 805-806).

      The first order Sobol indices measure the direct effect of each parameter on the variance of the model output. More specifically, it tells us the expected reduction in relative variance of the output if we fix one parameter. For the computation, broadly speaking, many parameters were drawn from the posterior distribution and the model was evaluated on these parameters. Afterwards the reduction in variance of the model evaluations was computed if one dimension of the parameter space was fixed. We agree that they are non-intuitive to interpret for a single time point, however its temporal changes give us insight into the time dependent influence on the model output. Often Sobol indices are computed by drawing random samples from a uniform distribution on a high dimensional cuboid [r1,s1] x … x [rn,sn] where each interval [ri,si] is simply defined by the mean+-10% of the parameter fit, where the definition of 10% leaves much room for interpretation and could not be meaningful in the same way for all parameters. We believe that the inferred posterior distributions are a much better suited probability distributions as they encode all parameter combinations which agree with the experimental data.

      We expanded our explanation on this point in the manuscript.

      15) The sensitivity analysis suggests that vesicle transitions are more important than pool sizes or their calcium dependence. Thus, it appears that one intuition from the model is that ribbon size - the main anatomical difference of the UV cone ribbons from different regions - is not very important for the functional difference observed (also see discussion in lines 438-439). Although, it has been discussed that ribbon size does not necessarily correlate with IP or RRP size, but this appears to be the hallmark of the acute zone.

      As the reviewer notes, one potentially interesting hint from our work is that ribbon size does not necessarily translate 1:1 to vesicle pool sizes, or their relative transition rates. One particularly clear example of this might come from comparing Figs. 1d-f and Fig. 1h, between nasal and acute zone. Both have similar ribbon geometry (Fig. 1d-f), but nasal ribbons nevertheless appear to pack fewer vesicles (Fig. 1h). Linking with our functional data and modelling, it then appears that perhaps on top of that, vesicles simply move at different rates between the pools, a property that is impossible to pick up from a static EM reconstruction.

      More generally, as mentioned in the manuscript and discussed in the previous point, it is difficult to judge the overall importance of a parameter from the sensitivity analysis. However, we clearly see time dependent effects of the different parameters and especially the RRP size matters for the transient component, which can be seen in Fig. 5. Indeed, the pattern for IP size seems to be different and it may be that case that the used stimulus is not optimal to infer this parameter from functional recordings.

      How the ribbon size relates to different vesicle densities and how these densities could potentially influence the changing is however still an open question and cannot be answered in the scope of this manuscript.

      16) Lines 460-461, intuitively, a slower RRP refill rate will result in more transient response - after the depletion of RRP, less refilled vesicles to give the sustained component of the response. This is the opposite of what model predicted (a faster RRP). Some explanation and discussion will be helpful.

      The RRP refill rate indeed influences the transience in the mentioned way. However, its influence already starts earlier and is also influencing the overall amplitude (if some minimal background activation is assumed). It is therefore especially influencing the sustained component. However, for the nasal model already the inferred RRP size is the smallest and it seems that a small RRP refill rate is sufficient to produce the sustained response behaviour which we see in Fig. 4f. We thank the reviewer for this thoughtful comment and mentioned this behaviour in the discussion.

      17) Also, the model simplifies vesicle transition rates by removing their calcium dependence. The Methods section indicates that this choice resulted from early fitting results that essentially "dialed out" the calcium dependence. Given the relative freedom that the model seems to have in finding suitable solutions, how is the lack of calcium dependence justified, and what potential impact might it have on the modeling results?

      Identifying model (mis-)specification is a non-trivial task in general. The presented model is complex enough to replicate the recorded data but can easily be extended to more complex dynamics (e.g. more complex calcium handling) in future studies, as it is publicly available online. Further added components could even act as “distractors” to compare the other parameters across zones and we thus decided to use an “as simple as possible” model. Interestingly our previous study (Schröder et al., 2019, Approximate bayesian inference for a mechanistic model of vesicle release at a ribbon synapse, NeurIPS.) showed that even at a temporal resolution of single released glutamate vesicles, it was not necessary to include calcium dependency for the refilling of the vesicle pools. This study thus supports our model choice.

      18) Lines 503-508, "In combination with the approximately equal and opposite effects of calcium baseline on the detectability of On- and Off-events (Fig. 7b,f), this suggest(s) that the calcium baseline may present a key variable that enables ribbons to trade-off the transmission of high frequency stimuli against providing an approximately balanced On- and Off- response behaviour." - what will be the physiological relevance for such conditions, perhaps the level of adaptation? Any existing data or predictions?

      The reviewer raises an interesting but ultimately perhaps unanswerable point, given the scarcity of available data on temporal natural image statistics in the UV band across the larval zebrafish visual field. It is of course tempting to speculate that the ecological need to tune kinetics and On/Off preferences might be linked (e.g. detecting a “dark looming predator” might disproportionately benefit from a rapid Off response). However, to truly understand this idea at a useful level of detail would likely be a rather involved study in its own right. Accordingly, we here prefer to simply point at the possibility to “tune” the ribbon using calcium baseline, and what effects this might have on kinetics if all else was kept equal.

      19) I am slightly skeptical of the predictions that the model might make about the ribbon's frequency tuning (Fig. 7) in light of the fact that the AZ model in particular seems unable to reliably capture the fast transient response to dark flashes (Fig. 4c,f).

      The noted effect in the fast transient components in Fig. 4c,f is partially due to the slow calcium recordings which act as an input for the model in Fig. 4. As mentioned, and discussed above, there is an ongoing discussion to what extent nanodomain or more global calcium concentration drives the release. For this reason, we added a simple calcium model for the simulations for Fig. 7 which includes a variable time constant for calcium (nanodomains would presumably have much faster calcium transients than used for the model default). This allows us to explore the influence of different possible calcium handlings. Although this extrapolation to new stimuli is based on the fitted model, it allows for varying all essential parameters. In the online simulation it can be observed that for fast calcium handlings the ribbon is able to also follow higher frequency stimuli. However, we agree that experimentally testing the influence of different ribbon configurations on frequency tuning is an interesting research direction but goes beyond the scope of this manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this study, the authors sought to characterize protein turn over in young/growing and skeletally mature mice. These authors were most interested in protein turnover in three tissues rich in collagen, proteoglycans and glycoproteins. These tissues were articular cartilage, bone and skin. They also examined protein turn over in peripheral blood as a comparison/control tissue. To accomplish this, male C57BL/6 mice were feed a heavy isotope diet for 3 weeks at an immature (4-7 week of age), young adult (12 to 15 weeks of age) and older adult (42 to 45 weeks of age) ages. Tibial bone, articular cartilage and skin were collected, and mass spectrometry was used to identify labelled (heavier) and therefore newer proteins relative to lighter/ residual proteins. They observed that turnover decreased with age and there was far less protein turn over in bone and cartilage relative to skin. The study design is appropriate, and the ages of the mice are justified, but to be clear, the oldest group of mice used were not old and do not reflect a comparable period of old age/elderly in humans. Rather this oldest group of mice in this study reflect mature adulthood. The results of this paper are not overly surprising given previous work in the field, but what sets this work apart is the level of detail that this method afforded. This work provides detailed information about what collagens and other cellular proteins turn over with aging in these matrix rich tissues, providing information that is complimentary to what is collected with other omics methods such as RNA seq. A limitation of this work is that only male mice were used there are known differences in bone turn over and aging as a function of sex.

      We thank the reviewer for their comments. The initial purpose of our study was to consider how matrix rich connective tissues alter their synthetic activity with age and to see whether this related to how prone these tissues are to developing age-related disease. We thought very carefully about which ages of mice to choose for our analysis; we correctly hypothesised that the period of rapid skeletal growth would provide a good positive control for synthetic activity within articular cartilage and bone. Once skeletal maturity had occurred these tissues, as expected, reduced their synthetic activity, particular for stable proteins of the matrisome such as the fibrillar collagens. These tissues also had a more evident adult ‘ageing’ phenotype, compared with skin and plasma although it is worth saying that all tissues still maintained quite high synthetic rates up to 45 weeks of age. We didn’t go beyond this age as mice will spontaneously develop osteoarthritis and osteoporosis, which we felt would confound our analysis. We also only looked at male mice initially, but are very keen to consider comparing male and female profiles now that we have seen the data. Whilst our data are able to confirm previous findings from other labs using different methods to measure protein turnover ((Heinemeier et al., 2016), (Verzijl et al., 2000) ) the breadth of our analysis allowed us to look at much larger numbers of regulated proteins at a given time and to look for clusters of proteins sharing common pathways. In doing so we identified both common and distinct ageing protein signatures among the 3 collagen-rich tissues.

      Reviewer #2 (Public Review):

      The manuscript entitled "Age-dependent changes in protein incorporation into collagen-rich tissues of mice by in vivo pulsed SILAC labelling" by Ariosa-Morejon and co-workers describes the incorporation of the stable amino acid Lys6 into different tissues in living mice. The authors used different time points during development and the adult stage and measured Lys6 incorporation rates using state-of-the-art mass spectrometry. Although protein turnover is an important issue for assessing protein stability and activity, the authors compared different tissues that differ greatly in their cellular composition and proliferation. It is known from previous studies that dividing tissues can incorporate labelled amino acids into their proteome compared to post-mitotic cells. However, this does not represent protein turnover but rather tissue turnover. A weakness of this paper is the scant attention paid to this critical point.

      Thank you for this important comment which we did not address adequately in the first draft. The reviewer is correct that these tissues are very different in their composition. Articular cartilage contains just chondrocytes which are largely regarded as post-mitotic cells in healthy adult tissue. Bone cells contain a mixture of post mitotic (osteocyte), renewable (from blood monocytes) cells (osteoclasts), and proliferating cells (osteoblasts). Skin fibroblasts are mitotic cells. We have stressed this in the revised manuscript and indicate that this may in part account for some of the changes we note as the animal ages. It is perhaps surprising that, when one considers global synthetic activity, this is maintained at quite high levels in all tissues indicating that, even in cartilage, the non-collagenous tissue still turns over even though there is no recognised ‘shedding’ as seen with skin. Our analysis really highlights that it is specific groups (clusters) of proteins that change in an age-dependent and tissue dependent manner rather than proteins generally. This has been emphasised in the revised manuscript. The other important point to make is that connective tissue cells, in particular, rely on their native matrix to maintain their phenotype and that is why it is important to do these sorts of analyses in the native tissue (whatever its cellular makeup), rather than trying to extrapolate from studies in isolated populations of cells in vitro.

      Reviewer #3 (Public Review):

      The authors have conducted an elegant study to monitor proteostasis in collagen rich tissues from immature, adult and ageing mice using SILAC labelling combined with mass spectrometry analysis. Resulting data demonstrate rapid turnover of extracellular matrix proteins in immature tissues, which declines with ageing, particularly in bone and cartilage, with network analysis revealing alterations in regulatory elements which may be driving this process. The methods used in this study are highly appropriate, and the data analysis is sound. The main conclusions are supported by the data presented and the study description is clear. Establishing how proteostasis is altered with ageing at the level of the proteome provides information crucial to developing strategies to prevent age-related diseases and promote healthy ageing.

      A weakness of this work is the comprehensive analysis of the signaling pathways and upstream regulators involved in the age-related decline in protein turnover observed, which would provide potential targets for age-related diseases that are common in these tissues. Establishing any alterations in the abundance of specific proteins with ageing, as well as alterations in their turnover rate would identify proteins most impacted by ageing, and are therefore likely to play a role in age-associated diseases.

      Thank you for this important comment. As part of our original analysis we performed a number of bioinformatic analyses for pathway enrichment, enrichment for ageing and relevant age-related diseases (osteoarthritis, osteoporosis, wound healing) and STRING clustering. We used a variety of software packages, including IPA, DAVID, the recently developed Clinical Knowledge Graph (CKG) and supplemented this with manual searches in Pubmed. As expected, we found that we were underpowered for most of the bioinformatic pathway analyses (more often used for larger transcriptomic datasets). In the revised manuscript we have included supplementary data showing these results which identify pathways of potential interest, albeit not reaching statistical significance after correction. Both the STRING protein clusters and cluster enrichment using DAVID showed similar results that appeared robust and biologically sensible.

    1. Author Response:

      Evaluation Summary:

      This study, which will be of interest to neuroscientists in the fields of learning and memory, somatosensation, and motor behavior, uses systems neuroscience tools to expand our view how the postero-medial (POm) nucleus of the thalamus contributes to goal-directed behavior. The reviewers suggested additional ontogenetic experiments to clarify the nature and specificity of those roles. They also indicated that certain alternative explanations to the experimental observations could be addressed for a more balanced presentation and interpretation of the results.

      We thank the editors and reviewers for their constructive comments. We have now performed additional analysis and revised the text which we believe has improved the manuscript.

      Reviewer #1 (Public Review):

      1) Fig 1 - Supp 1 suggests that virus expression was always limited to POm. Drawing borders expressing areas from epifluorescence images is probably very dependent on imaging parameters. The Methods indicate that the authors scaled so that no pixels were saturated. This could mean that there was some weak expression of GCaMP6f or ArchT outside of POm. As I understand it, the authors set exposure/gains by the brightest points in the image. The limited extent of the infection in the figures might just reflect its center, which is brightest, rather than its full extent. If there were GCaMP or ArchT in VPL, some results would need to be reinterpreted.

      We agree with the reviewer that the determined expression areas are dependent on imaging parameters, however, we are confident that the virus expression used for analysis in this study are confined to the POm. In this study, our analysis of targeting of POm is three-fold. First, we optimized the volume of virus loaded to the minimum necessary to observe POm projections in S1 (a single targeted injection of 60 nl). Second, we analyzed the fluorescence spread using fluorescence microscopy after every experiment. We set exposure to use the full dynamic range of the image as previously described (Gambino et al., 2014). Occasionally, the virus spread to the adjacent VPM nucleus and this was easily recognizable by the characteristic VPM projections with the barrels of the barrel cortex. These animals were excluded from this study and not further analyzed. The VPL nucleus is located further caudally in respect to the VPM and again, we were able to identify if the virus has spread to this nucleus via posthoc fluorescence microscopy. These animals were excluded from this study and not further analyzed. We note that our stereotaxic injections were not flawless and the virus occasionally spread along the injection pipette track and into high-order visual thalamic nuclei LP and LD, superficial to POm. This is shown in Figure 1. These two nuclei, however, do not target S1 (Kamishina et al., 2009; van Groen and Wyss, 1992) and were therefore not imaged within our study. Third, we analyze the projection profile in FPS1 to ensure that it corresponds to the projection profile of POm and not VPL. If there is fluorescence in non-targeted areas, then the experiments were excluded from analysis.

      An additional degree of precision is offered by our imaging and optogenetic strategy. Calcium imaging was performed in layer 1 which is targeted by POm (Meyer et al., 2010), and not VPL which targets layer 4. Therefore, spillover into VPL would not influence our imaging results as we only image axons in layer 1 which is targeted by POm. Furthermore, during the optogenetic experiments, the fiber optic was targeted to the POm (not the VPL), thus providing a secondary POm localization of the photo-inhibited region. This is now discussed in the revised manuscript.

      2) Calcium responses are weaker during the naïve state than the expert state (Fig.1D,E), similar to the start of the reversal training (Fig.4G,H). If POm encodes correct actions, why is there any response at all in naïve mice? Is that not also a sign of stimulus encoding? Might there be another correlate of correctness with regard to the task, such as an expert mouse holding their paw more firmly or still on the stimulating rod? This could alter the effective stimulus or involve different motor signals to POm.

      We agree with the reviewer that the POm is encoding the stimulus in the naïve state. This is evident in our study, and others, which show that the POm increases activity during sensory input in naïve mice. In the expert state, stimulus encoding may also be performed by a subset of POm axons, however, our findings show that, overall, there is a significant increase in the POm activity which is dependent on the behavioral performance (HIT, MISS), and not on the presentation of the stimulus. This is not due to licking motion as there was similar POm activity during the action and suppression tasks which involved licking and not licking for reward (Figure 3E). Furthermore, all experiments were monitored online via a behavioral camera to examine the location of the forepaw on the stimulus during all trials, and trials where the paw was not clearly resting on the stimulating rod were excluded from analysis. However, we cannot rule out that non-detectable changes in postures/paw grip may occur which may alter the effectiveness of the stimulus. This is now discussed in the revised manuscript.

      3) The authors are rightly concerned that licking might contribute to POm activity and expend some good effort checking this. The reversal is a good control, but doesn't produce identical POm activity. The other licking analyses, while good, did not completely rule out licking effects. First, lines 110-111 state "…as there was no correlation between licking frequency and POm axonal activity (Figure 1I)", but Fig.1I doesn't seem to support that statement. Second, the authors analyze isolated spontaneous licks, but these probably involve less licking and less overall motion than during a real response.

      We thank the reviewer for acknowledging the effort we made to assess the influence of licking behavior on POm axonal activity. We now include a more direct analysis in the revised manuscript illustrating the relationship between the licking response and POm activity. This analysis shows there is no correlation between licking and POm axonal activity (linear regression, p = 0.9228), further suggesting that POm axonal activity is not simply due to licking behavior.

      4) Many figures (Fig.1F, 2B, 3C, 4C) make it apparent that a population of axons respond very early to the stimulus itself. I understand the authors point that many of their analyses show that on average the axons are not strongly modulated by this stimulus, but this is not true of every axon. Either some of these axons are coming from cells outside of POm (see #1) or some POm cells are stimulus driven. In either case, if some axons are strongly stimulus driven, the activity of these axons will correlate with correct choices. The stimulus and correct choices are themselves highly correlated because the animals perform so well. I do not understand how stimulus encoding and choice encoding can be disentangled by either behavior or the two behaviors in comparison. Simple stimulus encoding might be further modulated by arousal or reward expectation that increases with task learning (see #6).

      In this study, we are able to disentangle stimulus encoding and choice encoding by comparing the POm axonal activity with the different behavioral performance (HIT or MISS). Here, the same stimulus is always presented (tactile, 200 Hz), however, the mouse response differs. Despite receiving the same tactile stimulus, POm signaling in forepaw S1 is significantly increased during correct HIT trials compared with MISS trials in both the action and suppression task. Therefore, we do not believe POm axonal activity is predominantly encoding sensory information in this task. We agree with the reviewer that individual POm axons are heterogenous and a subset of axons may respond to the sensory stimulus during the behavior. We now state this in the revised manuscript. However, if some axons are strongly stimulus driven, the activity of these axons should correlate with both correct and incorrect choices as the same stimulus is also delivered during MISS trials. We now highlight this in the revised manuscript.

      Simple stimulus encoding might be further modulated by arousal or reward expectation that increases with task learning. In our study, the increase in POm activity during HIT behaviour was not due to elevated task engagement as, despite similar levels of arousal (Figure 4B), POm activity in expert mice differed in comparison to chance performance (switch behaviour; Figure 4G, H). This is now discussed in detail in the revised manuscript.

      5) I was unable to understand the author's conclusion about what POm is doing. They use terms like "behavioral flexibility" to describe its purpose, but the connection of this term to POm is not explained. Is a role as a flexibility switch really supported? Why does S1 need POm to signal a correct choice? Fig.6 did not seem helpful here. Couldn't S1 just detect the stimulus on its own and transmit consequent signals to wherever they need to be to generate behavior?

      We have now revised the manuscript and clearly define behavioral flexibility and to improve the clarity of our conclusions. We believe that S1 needs POm to signal a correct choice as behavior needs to be dynamically modulated at all times. If S1 simply detected the stimulus on its own and transmitted a consequent signals to generate behavior, then important modulatory processes that lead to dynamic changes in behavior would not be processed. Along with other feedback projections, the POm targets the upper layers of the cortex, whereas external sensory information targets the layer 4 input layer. At the level of a single pyramidal neuron, this means POm input lands on the tuft dendrites whereas external sensory information lands on the proximal basal dendrites. This segregation of input provides a great cellular mechanism for increasing the computational capabilities of neurons. Since the POm is most active in the expert state during correct behavior, we believe the POm plays a vital role in providing behaviorally relevant information. Our findings illustrate that the POm is simply not conveying a ‘Go’ signal as POm activity was not increased during correct behavior in chance performance.

      6) Arousal or reward expectation may be better explanations than flexibility. Lines 323-324 say that POm activity increased with pupil diameter normally but reversed during reward delivery. Which data support this statement? With regards to pupil, the Results only seem to indicate that there is no difference in diameter between the two conditions (expert and 50% chance) using 3 bins of data. However, I could not find the time windows used for computing these. Pupil is known to be lagged and the timing could be critical.

      The statement that ‘POm activity increased with pupil diameter normally but reversed during reward delivery’ stems from data illustrated in Figure 1I and 3B. For space and flow of the manuscript, we weren’t able to show them on the same graph as per below. Here, you can see that during reward (blue), POm activity decreased compared to response (green) whereas the pupil diameter was maximum during reward delivery. We now include more information in the methods regarding pupil tracking (see line 908 to 916, Data analysis and statistical methods; Pupil tracking).

      7) There are other possible interpretations of the results when the authors target POm for optogenetic suppression (around lines 246-248). The effects here are also consistent with preventing tonic and evoked POm activity from reaching lots of target structures other than S1: S2, PPC, motor cortex, dorsolateral striatum, etc. Maybe one of these cannot respond to the stimulus as well and Hits decrease?

      We now include a discussion in the revised manuscript that ‘since the POm targets many cortical and subcortical regions (Alloway et al., 2017; Oh et al., 2014; Trageser and Keller, 2004; Yamawaki and Shepherd, 2015), target-specific photo-inhibition is required to illustrate which POm projection pathway specifically influences goal-directed behavior.’

      8) Line 689. What alerts the mouse that a catch trial is happening? Is there something like an audio cue for onset of stimulus trials and catch trials? If there is no cue, wouldn't mice be in a different behavioral state during catch trials than during stimulus trials? The trial types could differ by more than the presence of the stimulus.

      There is broadband noise during the trial that acts as a cue. This is detailed in the methods and text.

      Reviewer #2 (Public Review):

      In this manuscript, D LaTerra et al explored the function of POm neurons during a tactile-based, goal-directed reward behavior. They target POm neurons that project to forepaw S1 and use two-photon Ca2+imaging in S1 to monitor activity as mice performed a task where forepaw tactile stimulation (200 Hz, 500 ms) predicted a reward if mice licked at a reward port within 1.5 seconds. If mice did not lick, there was a time-out instead of a reward. The authors found that POm-S1 axons showed enhanced responses during the baseline period, the response window after the cue, and during reward delivery. They then showed that a subset of neurons were active during the response window during correct trials when the tactile stimulus served as a cue, but not on catch trials where animals spontaneously licked for a reward.

      They then showed that POm axonal activity in S1 increased during the response window for "HIT" trials where animals correctly responded to the tactile stimulus with licking but the activity was less during "MISS" trials where animals did not respond. In order to probe whether this activity in the response window was being driven by motor activity, they designed a suppression task in which animals had to learn to suppress licking in response to the tactile stimulus in order to the receive a reward. POm neurons also showed increased activity during the response window even though action was being suppressed. However, this activity was less than during the action task. Thus, although POm activity is not encoding action, its activity is significantly different during an action-based task than an action suppression one. They then analyzed calclium activity during the training period between the action task and the suppression task in which animals were learning the new contingency and were not performing as experts. In this non-expert context there was not a difference between in POm axonal activity between "HIT" and "MISS" trials.

      Lastly, they used ArchT to inhibit POm cell body activity during the tactile stimulus and response window of some trials and showed that they reduced performance during the trials when light was on.

      Altogether, this paper provides evidence that POm neurons are not simply encoding sensory information. They are modulated by learning and their activity is correlated to performance in this goal-directed task. However, the actual role of the POm input to S1 is not discernable from the current experiments. Subsets of neurons show significant activity during the response window as well as reward. In addition, the role of this input is different during the switch task than during expert performance. There are a number of outstanding questions, which, if answered, would help to directly define the role of these neurons in this specific paradigm. For instance, the authors record specifically from POm axons in S1. How distinct is this activity from other neurons in the POm? Some POm neurons still show significant activity during MISS trials. Do these neurons have a different function than those that show a preferential response during HIT trials? Does POm activity during the switch task, which has a component of extinction training, differ from when the animals are first learning the action-based task? Likewise, are the same neurons that acquire a response during the initial learning of the action-based task, the same neurons that are responding during the action suppression task?

      The authors provide great evidence that POm neurons that project to the S1 do not simply encode sensory information or actions, but are instead signaling during correct performance. However, inhibition of cell bodies did not dramatically effect performance and it is still unclear what role this circuit actually plays in this behavior. Finer-tuned optogenetic experiments and analysis of cell bodies within POm may provide greater details that will help define this circuit's role.

      We thank the reviewer for their comments. We have now revised the manuscript to clearly state the role of the POm during the goal-directed behavioral tasks used in this study. We have provided more information regarding the range of activity patterns in POm axons within S1.

      The POm contains a heterogenous population of neurons and since it projects to multiple cortical and subcortical regions, the activity of POm axonal projections in S1 may indeed be different to other projection targets.

      The activity of POm axons during MISS behavior may have a different function than those that show a preferential response during HIT trials, however, this evoked rate is not significantly different to baseline and therefore is hard to differentiate from spontaneous activity (see Figure 2). Furthermore, the evoked rate of POm activity during the switch task is not significantly different compared to naïve mice (p = 0.159; Kruskal-Wallis test). This information is now included in the manuscript.

      It is unknown whether the same neurons that acquire a response during the initial learning of the action-based task are the same neurons that are responding during the action suppression task as we were unable to conclusively determine whether or not the same POm axons were imaged in the different protocols.

      Reviewer #3 (Public Review):

      In their paper "Higher order thalamus flexibly encodes correct goal-directed behavior", LaTerra et al. investigate the function of projections from the thalamic nucleus POm to primary somatosensory cortex (S1) in the performance of goal-directed behaviors. The authors performed in vivo calcium imaging of POm axons in layer 1 of the forepaw region of S1 (fpS1) to monitor the activity of POm-fpS1 projections while mice performed a tactile detection task. They report that the activity of POm-fpS1 axons on successful ('hit') trials was increased in trained mice relative to naïve mice. Additionally, the authors used an action suppression variant of the task to show that POm-fpS1 axon activity was higher on successful trials over unsuccessful ('miss') trials regardless of the correct motor response required. During transition between task conditions, when mice perform at chance levels, the increase of POm-fpS1 activity during correct trials is no longer seen. Finally, the authors use inhibitory optogenetic tools to suppress POm activity, revealing a modest suppression in behavioral success. The authors conclude from these data that POm-fpS1 axons preferentially "encode and influence correct action selection" during tactile goal-oriented behavior.

      This study presents several interesting findings, particularly with respect to the change in activity of POm-fpS1 axons during successful execution of a trained behavior. Additionally, the similarity in responses of POm-fpS1 on both the 'goal-directed action' and 'action suppression' tasks provides convincing evidence that POm-fpS1 activity is not likely to encode the motor response. Overall, these results have important implications for how activity in higher order thalamic nuclei corresponds to learning a sensorimotor behavior, and the authors use several clever experiments to address these questions. Yet, the major claim that POm encodes 'correct performance' should be defined more clearly. As is, there are alternative explanations that could be raised and should be discussed in more depth (Points 1), especially as it relates to any causal role the authors ascribe to POm (Point 2). In addition some clarification as to which types of signals (i.e. frequency of active axons vs. amplitude of signal in the active axons) the authors feel are most informative would be helpful (Point 3).

      We thank the reviewer for their helpful comments and assessment of our study. We have now addressed all comments and revised the manuscript accordingly.

      1) The authors argue that POm activity reflects 'correct task performance' and that the increased activity of POm-fpS1 axons in the response epoch is not due to sensory encoding. An alternative explanation is that POm-fpS1 axons do convey sensory information, and these connections are facilitated with learning - meaning the activity of pathways conveying sensory signals that are correlated with task success could be facilitated with training, and this facilitation could be disrupted during the switching task. In this sense, the activity profiles do not encode 'correct action' per se, but rather represent the sensory responses whose correlation to rewarded action have been reinforced with training (which would also be a very interesting finding). This would be quite distinct from the "cognitive functions" they ascribe to this pathway (line 341). It might have helped to introduce a delay period in between the sensory stimulus and response epoch to try to distinguish responses that encode information about the sensory stimulus from those that might be involved in encoding task performance. However, as is, it is difficult to distinguish between these two scenarios with this data, and thus the interpretations the authors present could be rephrased with alternatives discussed in more depth.

      Based on multiple findings within this study, we suggest that the POm does not predominantly encode sensory information. This is most evident when comparing POm activity during correct (HIT) and incorrect (MISS) behavior in both the action and suppression tasks. As shown in Figures 2 and 3, there is a considerable difference in activity during correct (HIT) and incorrect (MISS) trials, even though the same stimulus was delivered in both trial types. This finding suggests that POm axons do not convey sensory information which is facilitated with learning as, if this were true, it could be expected that HIT and MISS responses would be similarly increased in expert (HIT and MISS) compared to naïve mice. This is now discussed in detail in the revised manuscript.

      We agree that it would have been beneficial to separate the stimulus from the response period in the behavioral paradigm. However, to avoid engaging working memory, we did not wish to enforce a delay period in this study. We have, in another study, enforced a short delay period (500 ms) between the stimulus and response epoch. Here, the evoked rate of POm axonal activity in expert mice was three-fold greater in the (now clearly separated) response epoch compared to the stimulus epoch (0.30 ± 0.02 vs. 0.099 ± 0.01, n = 196 dendrites; p < 0.0001; Wilcoxon matched-pairs signed rank test). Although out of the scope of this study, these unpublished results provides further confirmation and confidence in the analysis performed and conclusions made in this study.

      2) Similarly, while the authors attempt to establish a causal role for POm in task performance by optogenetically inhibiting POm during the response epoch, the results are also consistent with a deficit in sensory processing, and cannot be interpreted strictly as a disruption of the encoding of 'correct action' task performance signals. Furthermore, these perturbation studies do not demonstrate that the POm-fpS1 projections they are studying are implicated in the modest behavioral deficits. As the authors state, POm projects to many targets (lines 63-66), and similar sensory-based, goal-directed behaviors do not require S1 (lines 302-305). In light of these points, some of the statements ascribing a causal role for these projections in task success could be rephrased (e.g. line 33 "to encode and influence correct action selection", line 252 "a direct influence", line 340 "plays an active role during correct performance").

      We agree that the decrease in correct performance during optogenetic inhibition of POm cell bodies may also be explained by a deficit in sensory processing. However, in this study, we went to great lengths to illustrate that the POm is encoding correct action, and not sensory information (detailed in response to 1). This is further expanded upon in the revised manuscript. We also agree that the perturbation studies do not directly demonstrate that the POm to S1 projections are driving the behavioral deficits. We therefore only refer to the POm itself when discussing the influence on behavior and we have now revised the manuscript accordingly.

      3) Event amplitude and probability were both quantified, but were not consistently reported throughout the manuscript and figures. For example, Figure 1 reports both probability and amplitude (Figure 1G and H), whereas Figure 2 only reports probability. Thus, it was not always clear as to whether the authors were ascribing biological significance to one or both of these measures, given that in some cases differences were found in one and not the other, and which of the measures were reported was occasionally switched. It would be helpful for the authors to clarify the significance they assign to each measure, and report both measures side by side for all experiments if they interpret them both as relevant.

      We thank the reviewer for this observation and have now included a statement discussing the reporting of Ca2+ transient probability and/or amplitude in the methods. Throughout the Figures, we typically illustrated probability of an evoked transient as this is a reliable measure which was dramatically altered within this study. We now report the Ca2+ transient peak amplitudes during HIT and MISS trials for direct comparison of both measures (Figure 2).

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors build off their previous data where they have identified differences in the sst1 locus as responsible for differences in susceptibility of B6 and C3HeB/Fej mice to Mycobacterium tuberculosis infection. The authors have previously shown that this susceptibility is attributed to higher levels of type I IFN signaling and in particular, the ISG IL-1Ra. The sst1 locus contains many genes that could be contributing to the differential susceptibility in C3HeB/Fej mice, and the model in the field was that differences in Sp110 expression was a likely candidate to explain the susceptibility. However, in this manuscript, the authors show that it is not lower expression of Sp110, but instead decreased expression of another gene in the sst1 locus, Sp140, that contributes to the increased susceptibility of mice carrying the sst1S sequence to bacterial infections. This is a very significant and surprising finding, supported by very clear and convincing data from experiments performed with a high level of rigor. Although identification of the gene responsible for differences in susceptibility and outcomes during bacterial infections is an advance for the field, the manuscript stops there in terms of new insight and falls short of providing any additional information beyond what has already been published regarding how this gene or lucus is functioning to regulate immune responses to infection. This limited scope embodies the major concern for this otherwise strong manuscript.

      We thank for the reviewer for recognizing the importance of our discovery that loss of Sp140 (not Sp110) confers susceptibility to M. tuberculosis. Our generation of Sp140 deficient mice allows us to demonstrate, for the first time, that Sp140 is a negative regulator of type I IFNs. By generating crosses between Sp140–/– and Ifnar–/– mice, we further demonstrate that type I IFNs mediate the susceptibility of Sp140–/– mice to M. tuberculosis and Legionella. The reviewer appears to believe that because IFNs were previously shown to mediate the phenotype of Sst1S mice that somehow the function of Sp140 was already known. By contrast, we feel that in fact the function of Sp140 was not at all clear prior to our work, and that our work does indeed provide important mechanistic insight into the function of Sp140 as a regulator of type I IFNs. Sst1S mice contain many genetic differences compared to B6 mice. It is only because of our work that we can now go back and reinterpret the prior work on Sst1S mice, but this would not be possible without the work we have reported in this paper. Of course we would love to be able to describe more about the molecular mechanism by which Sp140 represses interferon transcription. This is indeed something we are working on. However, our preliminary experiments indicate this is not likely to be straightforward and will require considerable effort that is certainly beyond the scope of this current paper. It should be noted, for example, that Sp140 is in the same protein family as the well-known transcriptional regulator Aire. The mechanism by which Aire regulates gene expression has been studied for almost two decades and is still not entirely clear (and was certainly not clear in the initial foundational paper on Aire function published by Anderson et al in Science in 2002). We expect the mechanism of Sp140 to be similarly complex. Importantly, we now know for the first time which protein to study mechanistically, i.e., SP140 instead of SP110.

      Reviewer #2 (Public Review):

      The authors have suggested the importance of SP140 for resistance to Mtb, Legionella infections in mice. They also provide evidence for IFNaR signalling in mediating the increased susceptibility of SP140-/- mice. While they attribute an important function of the transcriptional regulator SP140 to regulation of type I IFN responses by demonstrating the dysregulation of these responses in the SP140-/- mice, more direct evidence for this is needed.

      We appreciate the reviewer’s succinct summary of the main conclusions of our manuscript. While we would agree that there is more to learn about the mechanism of SP140 function, it is not entirely clear to us what the reviewer means when they say that more “direct” evidence is needed for our claim that Sp140 regulates the IFN response during bacterial infection. We feel that the genetic experiments we provide are clear on this point. The reviewer may be thinking that we are proposing a specific mechanism, e.g., that our model is that Sp140 regulates IFN production by binding to the IFN beta gene; although that is an appealing possibility, we agree that is not shown in our manuscript, and indeed, we are careful not to make any such claim. Indeed, we explicitly state that a more indirect mechanism is possible (line 390). What is clear, though, is that loss of Sp140 mediates susceptibility to infection via (direct or indirect) increases in type I IFN. We observe increased type I IFN responses in Sp140–/– mice in vivo, and moreover, we find that a cross of Sp140–/– mice to Ifnar–/– mice reverses susceptibility to infection. These results demonstrate that the dysregulation of type 1 IFN in the absence of Sp140 is not merely correlative, but in fact drives susceptibility to bacterial infection in vivo.

      Reviewer #3 (Public Review):

      In this manuscript Ji et al carefully examine candidate genes driving a previously described susceptibility within the severe susceptibility to tuberculosis (sst1). Surprisingly, mice deficient in the original candidate gene within this locus, SP110, showed no change in susceptibility to infection with M. tuberculosis. In contrast, the authors found that loss of a second gene in this locus, SP140, recapitulated many phenotypes seen in the SST1 mouse, including increased Type I IFN. SP140 susceptibility was reversed by blocking these exacerbated type I IFNs, similar to SST1 mice. RNAseq analysis identify changes in pro-inflammatory cytokines and type I IFNs. The strengths of this paper are the careful and controlled experiments to target and analyze mouse mutants within a notoriously challenging region with homopolymers. Their results are robust, convincing and will be of broad interest to the field of immunology and host-pathogen interactions. Convincingly identifying a single gene within this region that recapitulates many aspects of the SST1 mouse is very important. While a minor weakness is the lack of any mechanistic understanding of how SP140 functions, this is overcome by the impact of the other findings and it is anticipated that this mouse will now be a key resource to dissect the mechanisms of susceptibility in much greater detail.

      We thank the reviewer for their generous evaluation. Mechanistically, we do show that Sp140 affects resistance to bacterial infection via regulation of the interferon response, which we think is an important and technically non-trivial advance that provides insight into the function of Sp140. However, we agree that the mechanism for how Sp140 regulates type I IFN is not shown (nor is it claimed to be shown) and addressing this mechanism is now an important and exciting question for future studies.

    1. Author Response:

      Reviewer #1 (Public Review):

      The paper by Ma et al. uses a combination of proteomics, morpholinos and inhibitor studies to organize a pathway by which matriptase overactivation in Hai-1 mutants leads to epidermal clumps and defects, peroxidation, and inflammation. They find that Hai-1 mutants have upregulated H2O2, calcium signalling and pERK activation, which are mediated through Gq and RSK. Other studies have suggested how Hai-1 mutants over-activate matriptase to cause epidermal clumping and shedding, associated with increased inflammation. This study uses a series of morpholino and inhibitor studies to more mechanistically order this pathway. Understanding the downstream pathway of matriptase activation, upregulated in wound healing and cancer, could reveal a better understanding of its roles in these processes Overall, it is an interesting study, logically laid out, with convincing data.

      The model at the end and the discussion propose that the inflammation and the epidermal pathways are in parallel but from it seems more likely that inflammation results directly from epidermal defects, which should act like a wound. If this is true and if these embryos soak up more fluorescent dextran compared to wild type embryos, it would seem that this could be a linear rather than parallel pathway.

      We have tested this using fluorescein dextran and methylene blue permeability assays (Zhang, J et al., (2015) Exp Dermatol, 24: 605-610; Richardson, R., et al. (2013) The Journal of Investigative Dermatology, 133(6), 1655–1665). Whilst larval fin wounds show strong uptake of the dyes, we were unable to show robust staining in hai1a mutants. This data has been added as Figure 9 - figure supplement 2. The lack of overt dye penetration made it difficult to draw conclusions from this as it is still possible there is permeability problems, just that we cannot detect it through these assays. We think more pertinent was the fact that we can rescue the epithelial defects robustly (eg with MAPK or PKC inhibition) without completely rescuing the inflammation phenotype, which you would expect if inflammation was purely a consequence of epithelial defects. Finally, new data added in response to a request by reviewer #2 has shown that the increase in Ca++ and H2O2 occurs very early, prior to epidermal phenotypic presentation (New panels 1L and 3H). As these are well described pro-inflammatory drivers in zebrafish, we believe this adds to the case that the inflammation is, to some extent, independent of the epithelial defects. We thus think it unlikely that inflammation in hai1a is solely due to epithelial defects. HOWEVER, there is clear evidence from us and others that PKC and MAPK activation can also promote inflammation. We have added remarks on this to the discussion

    1. Author Response:

      Reviewer #2 (Public Review (required)):

      Using high-speed holographic methodology, the swimming trajectories of two Leishmania life cycle stages are measured. Significant differences between the life stages become apparent. In addition, the authors show in a chemotaxis experiment that the infectious metacyclics respond chemotactically to the presence of macrophages.

      The physics part of the study is flawless, and the holography is very impressive, especially in view of the comparatively simple setup. The analysis and presentation of the data is also flawless.

      What is not so clear is the biological interpretation of the data. Chemotactic behavior has been repeatedly postulated for Leishmania, trypanosomes, and other parasites. However, there have been no experiments to date that allow conclusions to be drawn about in vivo relevance. Unfortunately, this does not really change with this study.

      It has been shown in trypanosomes that the swimming behavior of different species and life stages are influenced by the mechanical conditions of their microenvironments. Viscosity, obstacles, and hydrodynamics can all play a critical role in determining motility. These factors are ignored in the study. Cell culture medium with the viscosity of water cannot image the situation in the vector or body fluids such as blood or lymph. A chemotactic gradient such as the one generated here by rather simple means cannot arise at all in vivo, simply because everything is in flux and parasites and macrophages move continuously. Moreover, one may wonder why Leishmania should actively move chemotactically toward macrophages when they come into contact with target cells much more rapidly by chance due to self-stirring properties of body fluids. I am not questioning the finding at all. I am merely questioning its biological relevance. Perhaps it would be better to describe this aspect of the paper more cautiously and to discuss it quite openly critically. Otherwise, the result might enter our knowledge as evidence for biologically relevant chemotaxis, and that would be problematic.

      We thank the reviewer for their perspective and agree that providing formal evidence for chemotaxis in vivo is complicated. The reviewer is right that mechanical stimulus, viscosity, elasticity etc. are present in body tissues, and that they will affect the motion of the flagellum, and that there is evidence that physical obstructions interrupt the flagellar beat (though ‘stirring’ does not play a role in Leishmania’s motion through tissue). At any rate, we contend that an in vitro study such as ours decouples the mechanical heterogeneity of the in vivo environment from the parasite’s cellular response. If a chemotactic response is present in the parasite, then it will be most sensitively and uniquely tested in an isotropic environment such as a bulk Newtonian fluid - indeed, this is what we find. Chemical gradients are known to occur and persist in cutaneous infections, as damage to tissue, sand fly saliva and Leishmania-derived molecules have been shown to recruit immune cells by this mechanism - we have added references and words to this effect on lines 211-214.

      Reviewer #3 (Public Review (required)):

      The authors describe a clever and powerful assay to show chemotactic behavior in metacyclic Leishmania, which is an important result. The data seem mostly solid, but some results are confusing (perhaps partly an issue with presentation?) and overall conclusions seem like they need to be toned down a little. It is expected that this work will have long-lasting impact on the research community, and the new methods developed will be widely utilized.

      Major concerns:

      • "Pre-Adaptation", e.g. lines 149-150: A major message of the work is to suggest that motility behavior and chemotaxis is a "pre-adaptation". However, I don't agree that the current studies show that "…flagellar motility is a …preadaptation to infection of human hosts." What are the data to support this? The authors do a very good job of defining motility features of PCF and META forms, including quantitative analysis of motility features in 3D. They find that motility differs in PCF vs META forms. They also demonstrate chemotaxis in META forms. But, I don't see how these combined results demonstrate a "pre-adaptation" to infection of human hosts. As such, the "pre-adaptation" statement should be moved to speculation. Notably, I did not see tests for chemotaxis in PCF. Thus, it is even not formally demonstrated whether or not chemotaxis itself is an "adaptation" specific to META forma, or rather (and quite likely) is a fundamental property of all life cycle stages.

      o To test if chemotaxis is an 'adaptation', the authors would need to provide an analysis of PCFs. To be an adaptation, one would expect to find either that PCFs do not exhibit chemotaxis, or that they do not chemotax toward macrophages in the assay used. Without this, the authors cannot say whether chemotaxis is a stage-specific behavior, much less a "pre"-adaptation.

      We have moderated the language around claims of ‘pre-adaptation’ (please see next point for locations), and provided additional results from chemotaxis assays in PCF. Consistent with previous studies (e.g. Oliveira et al, Exp. Parasitol. (2000), Leslie et al., Exp. Parasitol. (2002), Barros et al., Exp. Parasitol. (2006)), we find a different chemo/osmotactic response in which PCF cells are drawn towards the agar in the pipette tip even in the absence of an embedded stimulant such as macrophages. We speculate that this result is due to the presence of small carbohydrate molecules from the unrefined agar - and note that the response is distinct to META, which show no such attraction. However, as suggested, this has been made more speculative in the revised discussion.

      o Note, I think the work would not be negatively affected if the whole concept of "adaptation" were omitted and the work was framed around the very important results of developing a new and powerful approach to investigate Leishmania motility in 3D; quantitative definition of motility parameters; demonstration of chemotaxis in META forms.

      We thank the reviewer for their suggestion (and their positive words), and have modified the language around claims of pre-adaptation. We have rephrased the claims in the abstract, and around lines 188-90 in the summary/conclusions.

      • Chemotaxis: The work would benefit from some commentary on chemotaxis in kinetoplastids. A 'suggestion' for a potential advantage provided by chemotaxis (lines153-155) is not unwarranted, but that should be kept to speculation at this point, and implication that this is an 'adaptation' is not supported by the current data. With report of chemotaxis being a major message, the paper would benefit from a brief discussion on what's been demonstrated regarding chemotaxis in trypanosomatids, as this is an important, yet under-represented area of research on these organisms. Without this, the novelty and significance of the author's rigorous, novel and very interesting work are not brought out.

      We thank the reviewer for this suggestion, and have added another paragraph to the introduction (lines 53-81), giving additional context to our results by providing an overview of more experiments in the field. We have also changed the word ‘suggest’ to ‘speculate’ in the summary and conclusions (line 243).

      • Lines 125 - 129: How is it that tumble frequency decreases, but run duration is unaffacted? I would think that less frequent tumbles would lead to longer runs? This warrants more comment.

      We thank the reviewer for pointing out the apparent confusion here. This stems from the fact that (as stated in the subsequent sentence) in the majority of the population, the tumble rate is significantly suppressed, to either one or zero tumbles per track. We require at least two tumbles per track to measure run duration, so the small fraction of the population unaffected by the stimulus contributes the bulk of the measurable runs. We have clarified this section of the text to clarify how we measure run duration.

      • Fig 3 and Lines 135-139: How does one reconcile the finding that murine macrophages and human macrophages both induce taxis toward the pipet tip (3A), but there is opposite impact on speed profiles, with murine macrophages causing slower speeds, and human macrophages causing faster speeds (3H,K vs 3I,L)? Perhaps analysis done for human macrophages must also be done for murine macrophages. Some more commentary, and analysis needs to be provided on this point.

      We thank the reviewer for this suggestion, and in the light of their comments, we have revised our description of the murine data, highlighting that the results are not statistically significant. To further emphasise this point to the reader, we have recast the error bars in figure 3a in terms of 95% confidence intervals rather than using the standard error on the mean, as in the previous version. Although one may be calculated directly from the other without any further assumptions, the 95% CI representation might be more familiar to the readership. In this light, the fairly modest decrease in average swimming speed (also seen in absolute terms in the DMEM case) reinforces the revised conclusion that the null hypothesis (META are not stimulated by mm\phi) cannot be rejected.

      • Regarding replicates: While the number of cells tracked are clearly indicated, I did not see a description of how many different chambers were imaged for each condition, or how many different fields per chamber.

      This has been amended in the Methods section, subheading “Chemotaxis Assay”

    1. Author Response:

      Evaluation Summary:

      The authors assessed multivariate relations between a dimensionality-reduced symptom space and brain imaging features, using a large database of individuals with psychosis-spectrum disorders (PSD). Demonstrating both high stability and reproducibility of their approaches, this work showed a promise that diagnosis or treatment of PSD can benefit from a proposed data-driven brain-symptom mapping framework. It is therefore of broad potential interest across cognitive and translational neuroscience.

      We are very grateful for the positive feedback and the careful read of our paper. We would especially like to thank the Reviewers for taking the time to read this lengthy and complex manuscript and for providing their helpful and highly constructive feedback. Overall, we hope the Editor and the Reviewers will find that our responses address all the comments and that the requested changes and edits improved the paper.

      Reviewer 1 (Public Review):

      The paper assessed the relationship between a dimensionality-reduced symptom space and functional brain imaging features based on the large multicentric data of individuals with psychosis-spectrum disorders (PSD).

      The strength of this study is that i) in every analysis, the authors provided high-level evidence of reproducibility in their findings, ii) the study included several control analyses to test other comparable alternatives or independent techniques (e.g., ICA, univariate vs. multivariate), and iii) correlating to independently acquired pharmacological neuroimaging and gene expression maps, the study highlighted neurobiological validity of their results.

      Overall the study has originality and several important tips and guidance for behavior-brain mapping, although the paper contains heavy descriptions about data mining techniques such as several dimensionality reduction algorithms (e.g., PCA, ICA, and CCA) and prediction models.

      We thank the Reviewer for their insightful comments and we appreciate the positive feedback. Regarding the descriptions of methods and analytical techniques, we have removed these descriptions out of the main Results text and figure captions. Detailed descriptions are still provided in the Methods, so that they do not detract from the core message of the paper but can still be referenced if a reader wishes to look up the details of these methods within the context of our analyses.

      Although relatively minors, I also have few points on the weaknesses, including i) an incomplete description about how to tell the PSD effects from the normal spectrum, ii) a lack of overarching interpretation for other principal components rather than only the 3rd one, and iii) somewhat expected results in the stability of PC and relevant indices.

      We are very appreciative of the constructive feedback and feel that these revisions have strengthened our paper. We have addressed these points in the revision as following:

      i) We are grateful to the Reviewer for bringing up this point as it has allowed us to further explore the interesting observation we made regarding shared versus distinct neural variance in our data. It is important to not confuse the neural PCA (i.e. the independent neural features that can be detected in the PSD and healthy control samples) versus the neuro-behavioral mapping. In other words, both PSD patients and healthy controls are human and therefore there are a number of neural functions that both cohorts exhibit that may have nothing to do with the symptom mapping in PSD patients. For instance, basic regulatory functions such as control of cardiac and respiratory cycles, motor functions, vision, etc. We hypothesized therefore that there are more common than distinct neural features that are on average shared across humans irrespective of their psychopathology status. Consequently, there may only be a ‘residual’ symptom-relevant neural variance. Therefore, in the manuscript we bring up the possibility that a substantial proportion of neural variance may not be clinically relevant. If this is in fact true then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance does not map to clinical features and therefore is orthogonal statistically. We have now verified this hypothesis quantitatively and have added extensive analyses to highlight this important observation made the the Reviewer. We first conducted a PCA using the parcellated GBC data from all 436 PSD and 202 CON (a matrix with dimensions 638 subjects x 718 parcels). We will refer to this as the GBC-PCA to avoid confusion with the symptom/behavioral PCA described elsewhere in the manuscript. This GBC-PCA resulted in 637 independent GBC-PCs. Since PCs are orthogonal to each other, we then partialled out the variance attributable to GBC-PC1 from the PSD data by reconstructing the PSD GBC matrix using only scores and coefficients from the remaining 636 GBC-PCs (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. The results are shown in Fig. S21 and reproduced below. Removing the first PC of shared neural variance (which accounted for about 15.8% of the total GBC variance across CON and PSD) from PSD data attenuated the statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution.

      We repeated the symptom-neural regression next with the first 2 GBC-PCs partialled out of the PSD data Fig. S22, with the first 3 PCs parsed out Fig. S23, and with the first 4 neural PCs parsed out Fig. S24. The symptom-neural maps remain fairly robust, although the similarity with the original βP CGBC maps does drop as more common neural variance is parsed out. These figures are also shown below:

      Fig. S21. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first neural PC parsed out. If a substantial proportion of neural variance is not be clinically relevant, then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance will not map to clinical features. We therefore performed a PCA on CON and PSD GBC to compute the shared neural variance (see Methods), and then parsed out the first GBC-PC from the PSD GBC data (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The first GBC-PC accounted for about 15.8% of the total GBC variance across CON and PSD. Removing GBC-PC1 from PSD data attenuated the βP C1GBC statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S22. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first two neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−2, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S23. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first three neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−3, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S24. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first four neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first four GBC-PC from the PSD GBC data (GBˆCwoP C1−4, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      For comparison, we also computed the βP CGBC maps for control subjects, shown in Fig. S11. In support of the βP CGBC in PSD being circuit-relevant, we observed only mild associations between GBC and PC scores in healthy controls:

      Results: All 5 PCs captured unique patterns of GBC variation across the PSD (Fig. S10), which were not observed in CON (Fig. S11). ... Discussion: On the contrary, this bi-directional “Psychosis Configuration” axis also showed strong negative variation along neural regions that map onto the sensory-motor and associative control regions, also strongly implicated in PSD (1, 2). The “bi-directionality” property of the PC symptom-neural maps may thus be desirable for identifying neural features that support individual patient selection. For instance, it may be possible that PC3 reflects residual untreated psychosis symptoms in this chronic PSD sample, which may reveal key treatment neural targets. In support of this circuit being symptom-relevant, it is notable that we observed a mild association between GBC and PC scores in the CON sample (Fig. S11).

      ii) In our original submission we spotlighted PC3 because of its pattern of loadings on to hallmark symptoms of PSD, including strong positive loadings across Positive symptom items in the PANSS and conversely strong negative loadings on to most Negative items. It was necessary to fully examine this dimension in particular because these are key characteristics of the target psychiatric population, and we found that the focus on PC3 was innovative because it provided an opportunity to quantify a fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. This is a powerful demonstration of how data-driven techniques such as PCA can reveal properties intrinsic to the structure of PSD-relevant symptom data which may in turn improve the mapping of symptom-neural relationships. We refrained from explaining each of the five PCs in detail in the main text as we felt that it would further complicate an already dense manuscript. Instead, we opted to provide the interpretation and data from all analyses for all five PCs in the Supplement. However, in response to the Reviewers’ thoughtful feedback that more focus should be placed on other components, we have expanded the presentation and discussion of all five components (both regarding the symptom profiles and neural maps) in the main text:

      Results: Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive loadings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      iii) We felt that demonstrating the stability of the PCA solution was extremely important, given that this degree of rigor has not previously been tested using broad behavioral measures across psychosis symptoms and cognition in a cross-diagnostic PSD sample. Additionally, we demonstrated reproducibility of the PCA solution using independent split-half samples. Furthermore, we derived stable neural maps using the PCA solution. In our original submission we show that the CCA solution was not reproducible in our dataset. Following the Reviewers’ feedback, we computed the estimated sample sizes needed to sufficiently power our multivariate analyses for stable/reproducible solutions. using the methods in (3). These results are discussed in detail in our resubmitted manuscript and in our response to the Critiques section below.

      Reviewer 2 (Public Review):

      The work by Ji et al is an interesting and rather comprehensive analysis of the trend of developing data-driven methods for developing brain-symptom dimension biomarkers that bring a biological basis to the symptoms (across PANSS and cognitive features) that relate to psychotic disorders. To this end, the authors performed several interesting multivariate analyses to decompose the symptom/behavioural dimensions and functional connectivity data. To this end, the authors use data from individuals from a transdiagnostic group of individuals recruited by the BSNIP cohort and combine high-level methods in order to integrate both types of modalities. Conceptually there are several strengths to this paper that should be applauded. However, I do think that there are important aspects of this paper that need revision to improve readability and to better compare the methods to what is in the field and provide a balanced view relative to previous work with the same basic concepts that they are building their work around. Overall, I feel as though the work could advance our knowledge in the development of biomarkers or subject level identifiers for psychiatric disorders and potentially be elevated to the level of an individual "subject screener". While this is a noble goal, this will require more data and information in the future as a means to do this. This is certainly an important step forward in this regard.

      We thank the Reviewer for their insightful and constructive comments about our manuscript. We have revised the text to make it easier to read and to clarify our results in the context of prior works in the field. We fully agree that a great deal more work needs to be completed before achieving single-subject level treatment selection, but we hope that our manuscript provides a helpful step towards this goal.

      Strengths:

      • Combined analysis of canonical psychosis symptoms and cognitive deficits across multiple traditional psychosis-related diagnoses offers one of the most comprehensive mappings of impairments experienced within PSD to brain features to date
      • Cross-validation analyses and use of various datasets (diagnostic replication, pharmacological neuroimaging) is extremely impressive, well motivated, and thorough. In addition the authors use a large dataset and provide "out of sample" validity
      • Medication status and dosage also accounted for
      • Similarly, the extensive examination of both univariate and multivariate neuro-behavioural solutions from a methodological viewpoint, including the testing of multiple configurations of CCA (i.e. with different parcellation granularities), offers very strong support for the selected symptom-to-neural mapping
      • The plots of the obtained PC axes compared to those of standard clinical symptom aggregate scales provide a really elegant illustration of the differences and demonstrate clearly the value of data-driven symptom reduction over conventional categories
      • The comparison of the obtained neuro-behavioural map for the "Psychosis configuration" symptom dimension to both pharmacological neuroimaging and neural gene expression maps highlights direct possible links with both underlying disorder mechanisms and possible avenues for treatment development and application
      • The authors' explicit investigation of whether PSD and healthy controls share a major portion of neural variance (possibly present across all people) has strong implications for future brain-behaviour mapping studies, and provides a starting point for narrowing the neural feature space to just the subset of features showing symptom-relevant variance in PSD

      We are very grateful for the positive feedback. We would like to thank the Reviewers for taking the time to read this admittedly dense manuscript and for providing their helpful critique.

      Critiques:

      • Overall I found the paper very hard to read. There are abbreviation everywhere for every concept that is introduced. The paper is methods heavy (which I am not opposed to and quite like). It is clear that the authors took a lot of care in thinking about the methods that were chosen. That said, I think that the organization would benefit from a more traditional Intro, Methods, Results, and Discussion formatting so that it would be easier to parse the Results. The figures are extremely dense and there are often terms that are coined or used that are not or poorly defined.

      We appreciate the constructive feedback around how to remove the dense content and to pay more attention to the frequency of abbreviations, which impact readability. We implemented the strategies suggested by the Reviewer and have moved the Methods section after the Introduction to make the subsequent Results section easier to understand and contextualize. For clarity and length, we have moved methodological details previously in the Results and figure captions to the Methods (e.g. descriptions of dimensionality reduction and prediction techniques). This way, the Methods are now expanded for clarity without detracting from the readability of the core results of the paper. Also, we have also simplified the text in places where there was room for more clarity. For convenience and ease of use of the numerous abbreviations, we have also added a table to the Supplement (Supplementary Table S1).

      • One thing I found conceptually difficult is the explicit comparison to the work in the Xia paper from the Satterthwaite group. Is this a fair comparison? The sample is extremely different as it is non clinical and comes from the general population. Can it be suggested that the groups that are clinically defined here are comparable? Is this an appropriate comparison and standard to make. To suggest that the work in that paper is not reproducible is flawed in this light.

      This is an extremely important point to clarify and we apologize that we did not make it sufficiently clear in the initial submission. Here we are not attempting to replicate the results of Xia et al., which we understand were derived in a fundamentally different sample than ours both demographically and clinically, with testing very different questions. Rather, this paper is just one example out of a number of recent papers which employed multivariate methods (CCA) to tackle the mapping between neural and behavioral features. The key point here is that this approach does not produce reproducible results due to over-fitting, as demonstrated robustly in the present paper. It is very important to highlight that in fact we did not single out any one paper when making this point. In fact, we do not mention the Xia paper explicitly anywhere and we were very careful to cite multiple papers in support of the multivariate over-fitting argument, which is now a well-know issue (4). Nevertheless, the Reviewers make an excellent point here and we acknowledge that while CCA was not reproducible in the present dataset, this does not explicitly imply that the results in the Xia et al. paper (or any other paper for that matter) are not reproducible by definition (i.e. until someone formally attempts to falsify them). We have made this point explicit in the revised paper, as shown below. Furthermore, in line with the provided feedback, we also applied the multivariate power calculator derived by Helmer et al. (3), which quantitatively illustrates the statistical point around CCA instability.

      Results: Several recent studies have reported “latent” neuro-behavioral relationships using multivariate statistics (5–7), which would be preferable because they simultaneously solve for maximal covariation across neural and behavioral features. Though concerns have emerged whether such multivariate results will replicate due to the size of the feature space relative to the size of the clinical samples (4), Given the possibility of deriving a stable multivariate effect, here we tested if results improve with canonical correlation analysis (CCA) (8) which maximizes relationships between linear combinations of symptom (B) and neural features (N) across all PSD (Fig. 5A).

      Discussion: Here we attempted to use multivariate solutions (i.e. CCA) to quantify symptom and neural feature co- variation. In principle, CCA is well-suited to address the brain-behavioral mapping problem. However, symptom-neural mapping using CCA across either parcel-level or network-level solutionsin our sample was not reproducible even when using a low-dimensional symptom solution and parcellated neural data as a starting point. Therefore, while CCA (and related multivariate methods such as partial least squares) are theoretically appropriate and may be helped by regularization methods such as sparse CCA, in practice many available psychiatric neuroimaging datasets may not provide sufficient power to resolve stable multivariate symptom-neural solutions (3). A key pressing need for forthcoming studies will be to use multivariate power calculators to inform sample sizes needed for resolving stable symptom-neural geometries at the single subject level. Of note, though we were unable to derive a stable CCA in the present sample, this does not imply that the multivariate neuro-behavioral effect may not be reproducible with larger effect sizes and/or sample sizes. Critically, this does highlight the importance of power calculations prior to computing multivariate brain-behavioral solutions (3).

      • Why was PCA selected for the analysis rather than ICA? Authors mention that PCA enables the discovery of orthogonal symptom dimensions, but don't elaborate on why this is expected to better capture behavioural variation within PSD compared to non-orthogonal dimensions. Given that symptom and/or cognitive items in conventional assessments are likely to be correlated in one way or another, allowing correlations to be present in the low-rank behavioural solution may better represent the original clinical profiles and drive more accurate brain-behaviour mapping. Moreover, as alluded to in the Discussion, employing an oblique rotation in the identification of dimensionality-reduced symptom axes may have actually resulted in a brain-behaviour space that is more generalizable to other psychiatric spectra. Why not use something more relevant to symptom/behaviour data like a factor analysis?

      This is a very important point! We agree with the Reviewer that an oblique solution may better fit the data. For this reason, we performed an ICA as shown in the Supplement. We chose to show PCA for the main analyses here because it is a deterministic solution and the number of significant components could be computed via permutation testing. Importantly, certain components from the ICA solution in this sample were highly similar to the PCs shown in the main solution (Supplementary Note 1), as measured by comparing the subject behavioral scores (Fig. S4), and neural maps (Fig. S13). However, notably, certain components in the ICA and PCA solutions did not appear to have a one-to-one mapping (e.g. PCs 1-3 and ICs 1-3). The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps. The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps. We have added this language to the main text Results:

      Notably, independent component analysis (ICA), an alternative dimensionality reduction procedure which does not enforce component orthogonality, produced similar effects for this PSD sample, see Supplementary Note 1 & Fig. S4A). Certain pairs of components between the PCA and ICA solutions appear to be highly similar and exclusively mapped (IC5 and PC4; IC4 and PC5) (Fig. S4B). On the other hand, PCs 1-3 and ICs 1-3 do not exhibit a one-to-one mapping. For example, PC3 appears to correlate positively with IC2 and equally strongly negatively with IC3, suggesting that these two ICs are oblique to the PC and perhaps reflect symptom variation that is explained by a single PC. The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps Fig. ??G). The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps.

      Additionally, the Reviewer raises an important point, and we agree that orthogonal versus oblique solutions warrant further investigation especially with regards to other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of behavioral variation in prodromal individuals, as these individuals are in the early stages of exhibiting psychosis-relevant symptoms and may show early diverging of dimensions of behavioral variation. We elaborate on this further in the Discussion:

      Another important aspect that will require further characterization is the possibility of oblique axes in the symptom-neural geometry. While orthogonal axes derived via PCA were appropriate here and similar to the ICA-derived axes in this solution, it is possible that oblique dimensions more clearly reflect the geometry of other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of neuro-behavioral variation in a sample of prodromal individuals, as these patients are exhibiting early-stage psychosis-like symptoms and may show signs of diverging along different trajectories.

      Critically, these factors should constitute key extensions of an iteratively more robust model for indi- vidualized symptom-neural mapping across the PSD and other psychiatric spectra. Relatedly, it will be important to identify the ‘limits’ of a given BBS solution – namely a PSD-derived effect may not generalize into the mood spectrum (i.e. both the symptom space and the resulting symptom-neural mapping is orthogonal). It will be important to evaluate if this framework can be used to initialize symptom-neural mapping across other mental health symptom spectra, such as mood/anxiety disorders.

      • The gene expression mapping section lacks some justification for why the 7 genes of interest were specifically chosen from among the numerous serotonin and GABA receptors and interneuron markers (relevant for PSD) available in the AHBA. Brief reference to the believed significance of the chosen genes in psychosis pathology would have helped to contextualize the observed relationship with the neuro-behavioural map.

      We thank the Reviewer for providing this suggestion and agree that it will strengthen the section on gene expression analysis. Of note, we did justify the choice for these genes, but we appreciate the opportunity to expand on the neurobiology of selected genes and their relevance to PSD. We have made these edits to the text:

      We focus here on serotonin receptor subunits (HTR1E, HTR2C, HTR2A), GABA receptor subunits (GABRA1, GABRA5), and the interneuron markers somatostatin (SST) and parvalbumin (PVALB). Serotonin agonists such as LSD have been shown to induce PSD-like symptoms in healthy adults (9) and the serotonin antagonism of “second-generation” antipsychotics are thought to contribute to their efficacy in targeting broad PSD symptoms (10–12). Abnormalities in GABAergic interneurons, which provide inhibitory control in neural circuits, may contribute to cognitive deficits in PSD (13–15) and additionally lead to downstream excitatory dysfunction that underlies other PSD symptoms (16, 17). In particular, a loss of prefrontal parvalbumin-expression fast-spiking interneurons has been implicated in PSD (18–21).

      • What the identified univariate neuro-behavioural mapping for PC3 ("psychosis configuration") actually means from an empirical or brain network perspective is not really ever discussed in detail. E.g., in Results, "a high positive PC3 score was associated with both reduced GBC across insular and superior dorsal cingulate cortices, thalamus, and anterior cerebellum and elevated GBC across precuneus, medial prefrontal, inferior parietal, superior temporal cortices and posterior lateral cerebellum." While the meaning and calculation of GBC can be gleaned from the Methods, a direct interpretation of the neuro-behavioural results in terms of the types of symptoms contributing to PC3 and relative hyper-/hypo-connectivity of the DMN compared to e.g. healthy controls could facilitate easier comparisons with the findings of past studies (since GBC does not seem to be a very commonly-used measure in the psychosis fMRI literature). Also important since GBC is a summary measure of the average connectivity of a region, and doesn't provide any specificity in terms of which regions in particular are more or less connected within a functional network (an inherent limitation of this measure which warrants further attention).

      We acknowledge that GBC is a linear combination measure that by definition does not provide information on connectivity between any one specific pair of neural regions. However, as shown by highly robust and reproducible neurobehavioral maps, GBC seems to be suitable as a first-pass metric in the absence of a priori assumptions of how specific regional connectivity may map to the PC symptom dimensions, and it has been shown to be sensitive to altered patterns of overall neural connectivity in PSD cohorts (22–25) as well as in models of psychosis (9, 26). Moreover, it is an assumption free method for dimensionality reduction of the neural connectivity matrix (which is a massive feature space). Furthermore, GBC provides neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices), which were necessary for quantifying the relationship with independent molecular benchmark maps (i.e. pharmacological maps and gene expression maps). We do acknowledge that there are limitations to the method which we now discuss in the paper. Furthermore we agree with the Reviewer that the specific regions implicated in these symptom-neural relationships warrants a more detailed investigation and we plan to develop this further in future studies, such as with seed-based functional connectivity using regions implicated in PSD (e.g. thalamus (2, 27)) or restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions. We have provided elaboration and clarification regarding this point in the Discussion:

      Another improvement would be to optimize neural data reduction sensitivity for specific symptom variation (28). We chose to use GBC for our initial geometry characterizations as it is a principled and assumption-free data-reduction metric that captures (dys)connectivity across the whole brain and generates neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices) that are necessary for benchmarking against molecular imaging maps. However, GBC is a summary measure that by definition does not provide information regarding connectivity between specific pairs of neural regions, which may prove to be highly symptom-relevant and informative. Thus symptom-neural relationships should be further explored with higher-resolution metrics, such as restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions, or seed-based FC using regions implicated in PSD (e.g. thalamus (2, 27)).

      • Possibly a nitpick, but while the inclusion of cognitive measures for PSD individuals is a main (self-)selling point of the paper, there's very limited focus on the "Cognitive functioning" component (PC2) of the PCA solution. Examining Fig. S8K, the GBC map for this cognitive component seems almost to be the inverse for that of the "Psychosis configuration" component (PC3) focused on in the rest of the paper. Since PC3 does not seem to have high loadings from any of the cognitive items, but it is known that psychosis spectrum individuals tend to exhibit cognitive deficits which also have strong predictive power for illness trajectory, some discussion of how multiple univariate neuro-behavioural features could feasibly be used in conjunction with one another could have been really interesting.

      This is an important piece of feedback concerning the cognitive measure aspect of the study. As the Reviewer recognizes, cognition is a core element of PSD symptoms and the key reason for including this symptom into the model. Notably, the finding that one dimension captures a substantial proportion of cognitive performance-related variance, independent of other residual symptom axes, has not previously been reported and we fully agree that expanding on this effect is important and warrants further discussion. We would like to take two of the key points from the Reviewers’ feedback and expand further. First, we recognize that upon qualitative inspection PC2 and PC3 neural maps appear strongly anti-correlated. However, as demonstrated in Fig. S9O, PC2 and PC3 maps were anti-correlated at r=-0.47. For comparison, the PC2 map was highly anti-correlated with the BACS composite cognitive map (r=-0.81). This implies that the PC2 map in fact reflects unique neural circuit variance that is relevant for cognition, but not necessarily an inverse of the PC3.

      In other words, these data suggest that there are PSD patients with more (or less) severe cognitive deficits independent of any other symptom axis, which would be in line with the observation that these symptoms are not treatable with antipsychotic medication (and therefore should not correlate with symptoms that are treatable by such medications; i.e. PC3). We have now added these points into the revised paper:

      Results Fig. 1E highlights loading configurations of symptom measures forming each PC. To aid interpretation, we assigned a name for each PC based on its most strongly weighted symptom measures. This naming is qualitative but informed by the pattern of loadings of the original 36 symptom measures (Fig. 1). For example, PC1 was highly consistent with a general impairment dimension (i.e. “Global Functioning”); PC2 reflected more exclusively variation in cognition (i.e. “Cognitive Functioning”); PC3 indexed a complex configuration of psychosis-spectrum relevant items (i.e. “Psy- chosis Configuration”); PC4 generally captured variation mood and anxiety related items (i.e. “Affective Valence”); finally, PC5 reflected variation in arousal and level of excitement (i.e. “Agitation/Excitation”). For instance, a generally impaired patient would have a highly negative PC1 score, which would reflect low performance on cognition and elevated scores on most other symptomatic items. Conversely, an individual with a high positive PC3 score would exhibit delusional, grandiose, and/or hallucinatory behavior, whereas a person with a negative PC3 score would exhibit motor retardation, social avoid- ance, possibly a withdrawn affective state with blunted affect (29). Comprehensive loadings for all 5 PCs are shown in Fig. 3G. Fig. 1F highlights the mean of each of the 3 diagnostic groups (colored spheres) and healthy controls (black sphere) projected into a 3-dimensional orthogonal coordinate system for PCs 1,2 & 3 (x,y,z axes respectively; alternative views of the 3-dimensional coordinate system with all patients projected are shown in Fig. 3). Critically, PC axes were not parallel with traditional aggregate symptom scales. For instance, PC3 is angled at 45◦ to the dominant direction of PANSS Positive and Negative symptom variation (purple and blue arrows respectively in Fig. 1F). ... Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive load- ings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      Another nitpick, but the Y axes of Fig. 8C-E are not consistent, which causes some of the lines of best fit to be a bit misleading (e.g. GABRA1 appears to have a more strongly positive gene-PC relationship than HTR1E, when in reality the opposite is true.)

      We have scaled each axis to best show the data in each plot but see how this is confusing and recognise the need to correct this. We have remade the plots with consistent axes labelling.

      • The authors explain the apparent low reproducibility of their multivariate PSD neuro-behavioural solution using the argument that many psychiatric neuroimaging datasets are too small for multivariate analyses to be sufficiently powered. Applying an existing multivariate power analysis to their own data as empirical support for this idea would have made it even more compelling. The following paper suggests guidelines for sample sizes required for CCA/PLS as well as a multivariate calculator: Helmer, M., Warrington, S. D., Mohammadi-Nejad, A.-R., Ji, J. L., Howell, A., Rosand, B., Anticevic, A., Sotiropoulos, S. N., & Murray, J. D. (2020). On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations (p. 2020.08.25.265546). https://doi.org/10.1101/2020.08.25.265546

      We deeply appreciate the Reviewer’s suggestion and the opportunity to incorporate the methods from the Helmer et al. paper. We now highlight the importance of having sufficiently powered samples for multivariate analyses in our other manuscript first-authored by our colleague Dr. Markus Helmer (3). Using the method described in the above paper (GEMMR version 0.1.2), we computed the estimated sample sizes required to power multivariate CCA analyses with 718 neural features and 5 behavioral (PC) features (i.e. the feature set used throughout the rest of the paper):

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required.

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required. We also computed the estimated sample sizes required for 180 neural features (symmetrized neural cortical parcels) and 5 symptom PC features, consistent with the CCA reported in our main text:

      Assuming that rtrue is likely below 0.3, this minimal required sample size remains at least an order of magnitude greater than the size of our present sample, consistent with the finding that the CCA solution computed using these data was unstable. As a lower limit for the required sample size plausible using the feature sets reported in our paper, we additionally computed for comparison the estimated N needed with the smallest number of features explored in our analyses, i.e. 12 neural functional network features and 5 symptom PC features:

      These required sample sizes are closer to the N=436 used in the present sample and samples reported in the clinical neuroimaging literature. This is consistent with the observation that when using 12 neural and 5 symptom features (Fig. S15C) the detected canonical correlation r = 0.38 for CV1 is much lower (and likely not inflated due to overfitting) and may be closer to the true effect because with the n=436 this effect is resolvable. This is in contrast to the 180 neural features and 5 symptom feature CCA solution where we observed a null CCA effect around r > 0.6 across all 5 CVs. This clearly highlights the inflation of the effect in the situation where the feature space grows. There is no a priori plausible reason to believe that the effect for 180 vs. 5 feature mapping is literally double the effect when using 12 vs. 5 feature mapping - especially as the 12 features are networks derived from the 180 parcels (i.e. the effect should be comparable rather than 2x smaller). Consequently, if the true CCA effect with 180 vs. 5 features was actually in the more comparable r = 0.38, we would need >5,000 subjects to resolve a reproducible neuro-behavioral CCA map (an order of magnitude more than in the BSNIP sample). Moreover, to confidently detect effects if rtrue is actually less than 0.3, we would require a sample size >8,145 subjects. We have added this to the Results section on our CCA results:

      Next, we tested if the 180-parcel CCA solution is stable and reproducible, as done with PC-to-GBC univariate results. The CCA solution was robust when tested with k-fold and leave-site-out cross- validation (Fig. S16) likely because these methods use CCA loadings derived from the full sample. However, the CCA loadings did not replicate in non-overlapping split-half samples (Fig. 5L, see see Supplementary Note 4). Moreover, a leave-one-subject-out cross-validation revealed that removing a single subject from the sample affected the CCA solution such that it did not generalize to the left-out subject (Fig. 5M). This is in contrast to the PCA-to-GBC univariate mapping, which was substantially more reproducible for all attempted cross-validations relative to the CCA approach. This is likely because substantially more power is needed to resolve a stable multivariate neuro-behavioral effect with this many features. Indeed, a multivariate power analysis using 180 neural features and 5 symptom features, and assuming a true canonical correlation of r = 0.3, suggests that a minimal sample size of N = 8145 is needed to sufficiently detect the effect (3), an order of magnitude greater than the available sample size. Therefore, we leverage the univariate neuro-behavioral result for subsequent subject-specific model optimization and comparisons to molecular neuroimaging maps.

      Additionally, we added the following to Supplementary Note 4: Establishing the Reproducibility of the CCA Solution:

      Here we outline the details of the split-half replication for the CCA solution. Specifically, the full patient sample was randomly split (referred to as “H1” and “H2” respectively), while preserving the proportion of patients in each diagnostic group. Then, CCA was performed independently for H1 and H2. While the loadings for behavioral PCs and original behavioral items are somewhat similar (mean r 0.5) between the two CCAs in each run, the neural loadings were not stable across H1 and H2 CCA solutions. Critically, CCA results did not perform well for leave-one-subject-out cross-validation (Fig. 5M). Here, one patient was held out while CCA was performed using all data from the remaining 435 patients. The loadings matrices Ψ and Θ from the CCA were then used to calculate the “predicted” neural and behavioral latent scores for all 5 CVs for the patient that was held out of the CCA solution. This process was repeated for every patient and the final result was evaluated for reproducibility. As described in the main text, this did not yield reproducible CCA effects (Fig. 5M). Of note, CCA may yield higher reproducibility if the neural feature space were to be further reduced. As noted, our approach was to first parcellate the BOLD signal and then use GBC as a data-driven method to yield a neuro-biologically and quantitatively interpretable neural data reduction, and we additionally symmetrized the result across hemispheres. Nevertheless, in sharp contrast to the PCA univariate feature selection approach, the CCA solutions were still not stable in the present sample size of N = 436. Indeed, a multivariate power analysis (3) estimates that the following sample sizes will be required to sufficiently power a CCA between 180 neural features and 5 symptom features, at different levels of true canonical correlation (rtrue):

      To test if further neural feature space reduction may be improve reproducibility, we also evaluated CCA solutions with neural GBC parcellated according to 12 brain-wide functional networks derived from the recent HCP driven network parcellation (30). Again, we computed the CCA for all 36 item-level symptom as well as 5 PCs (Fig. S15). As with the parcel-level effects, the network-level CCA analysis produced significant results (for CV1 when using 36 item-level scores and for all 5 CVs when using the 5 PC-derived scores). Here the result produced much lower canonical correlations ( 0.3-0.5); however, these effects (for CV1) clearly exceeded the 95% confidence interval generated via random permutations, suggesting that they may reflect the true canonical correlation. We observed a similar result when we evaluated CCAs computed with neural GBC from 192 symmetrized subcortical parcels and 36 symptoms or 5 PCs (Fig. S14). In other words, data-reducing the neural signal to 12 functional networks likely averaged out parcel-level information that may carry symptom-relevant variance, but may be closer to capturing the true effect. Indeed, the power analysis suggests that the current sample size is closer to that needed to detect an effect with 12 + 5 features:

      Note that we do not present a CCA conducted with parcels across the whole brain, as the number of variables would exceed the number of observations. However, the multivariate power analysis using 718 neural features and 5 symptom features estimates that the following sample sizes would be required to detect the following effects:

      This analysis suggests that even the lowest bound of 10k samples exceeds the present available sample size by two orders of magnitude.

      We have also added Fig. S19, illustrating these power analyses results:

      Fig. S19. Multivariate power analysis for CCA. Sample sizes were calculated according to (3), see also https://gemmr.readthedocs.io/en/latest/. We computed the multivariate power analyses for three versions of CCA reported in this manuscript: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features. (A) At different levels of features, the ratio of samples (i.e. subjects) required per feature to derive a stable CCA solution remains approximately the same across all values of rtrue. As discussed in (3), at rtrue = 0.3 the number of samples required per feature is about 40, which is much greater than the ratio of samples to features available in our dataset. (B) The total number of samples required (nreq)) for a stable CCA solution given the total number of neural and symptom features used in our analyses, at different values of rtrue. In general these required sample sizes are much greater than the N=436 (light grey line) PSD in our present dataset, consistent with the finding that the CCA solutions computed using our data were unstable. Notably, the ‘12 vs. 5’ CCA assuming rtrue = 0.3 requires only 700 subjects, which is closest to the N=436 (horizontal grey line) used in the present sample. This may be in line with the observation of the CCA with 12 neural vs 5 symptom features (Fig. S15C) that the canonical correlation (r = 0.38 for CV1) clearly exceeds the 95% confidence interval, and may be closer to the true effect. However, to confidently detect effects in such an analysis (particularly if rtrue is actually less than 0.3), a larger sample would likely still be needed.

      We also added the corresponding methods in the Methods section:

      Multivariate CCA Power Analysis. Multivariate power analyses to estimate the minimum sample size needed to sufficiently power a CCA were computed using methods described in (3), using the Genera- tive Modeling of Multivariate Relationships tool (gemmr, https://github.com/murraylab/ gemmr (v0.1.2)). Briefly, a model was built by: 1) Generating synthetic datasets for the two input data matrices, by sampling from a multivariate normal distribution with a joint covariance matrix that was structured to encode CCA solutions with specified properties; 2) Performing CCAs on these synthetic datasets. Because the joint covariance matrix is known, the true values of estimated association strength, weights, scores, and loadings of the CCA, as well as the errors for these four metrics, can also be computed. In addition, statistical power that the estimated association strength is different from 0 is determined through permutation testing; 3) Varying parameters of the generative model (number of features, assumed true between-set correlation, within-set variance structure for both datasets) the required sample size Nreq is determined in each case such that statistical power reaches 90% and all of the above described error metrics fall to a target level of 10%; and 4) Fitting and validating a linear model to predict the required sample size Nreq from parameters of the generative model. This linear model was then used to calculate Nreq for CCA in three data scenarios: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features.

      • Given the relatively even distribution of males and females in the dataset, some examination of sex effects on symptom dimension loadings or neuro-behavioural maps would have been interesting (other demographic characteristics like age and SES are summarized for subjects but also not investigated). I think this is a missed opportunity.

      We have now provided additional analyses for the core PCA and univariate GBC mapping results, testing for effects of age, sex, and SES in Fig. S8. Briefly, we observed a significant positive relationship between age and PC3 scores, which may be because older patients (whom presumably have been ill for a longer time) exhibit more severe symptoms along the positive PC3 – Psychosis Configuration dimension. We also observed a significant negative relationship between Hollingshead index of SES and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). We also found significant sex differences in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores.

      Fig. S8. Effects of age, socio-economic status, and sex on symptom PCA solution. (A) Correlations between symptom PC scores and age (years) across N=436 PSD. Pearson’s correlation value and uncorrected p-values are reported above scatterplots. After Bonferroni correction, we observed a significant positive relationship between age and PC3 score. This may be because older patients have been ill for a longer period of time and exhibit more severe symptoms along the positive PC3 dimension. (B) Correlations between symptom PC scores and socio-economic status (SES) as measured by the Hollingshead Index of Social Position (31), across N=387 PSD with available data. The index is computed as (Hollingshead occupation score 7) + (Hollingshead education score 4); a higher score indicates lower SES (32). We observed a significant negative relationship between Hollingshead index and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). (C) The Hollingshead index can be split into five classes, with 1 being the highest and 5 being the lowest SES class (31). Consistent with (B) we found a significant difference between the classes after Bonferroni correction for PC1 and PC2 scores. (D) Distributions of PC scores across Hollingshead SES classes show the overlap in scores. White lines indicate the mean score in each class. (E) Differences in PC scores between (M)ale and (F)emale PSD subjects. We found a significant difference between sexes in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores. (F) Distributions of PC scores across M and F subjects show the overlap in scores. White lines indicate the mean score for each sex.

      Bibliography

      1. Jie Lisa Ji, Caroline Diehl, Charles Schleifer, Carol A Tamminga, Matcheri S Keshavan, John A Sweeney, Brett A Clementz, S Kristian Hill, Godfrey Pearlson, Genevieve Yang, et al. Schizophrenia exhibits bi-directional brain-wide alterations in cortico-striato-cerebellar circuits. Cerebral Cortex, 29(11):4463–4487, 2019.
      2. Alan Anticevic, Michael W Cole, Grega Repovs, John D Murray, Margaret S Brumbaugh, Anderson M Winkler, Aleksandar Savic, John H Krystal, Godfrey D Pearlson, and David C Glahn. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral cortex, 24(12):3116–3130, 2013.
      3. Markus Helmer, Shaun D Warrington, Ali-Reza Mohammadi-Nejad, Jie Lisa Ji, Amber Howell, Benjamin Rosand, Alan Anticevic, Stamatios N Sotiropoulos, and John D Murray. On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations. bioRxiv, 2020. .
      4. Richard Dinga, Lianne Schmaal, Brenda WJH Penninx, Marie Jose van Tol, Dick J Veltman, Laura van Velzen, Maarten Mennes, Nic JA van der Wee, and Andre F Marquand. Evaluating the evidence for biotypes of depression: Methodological replication and extension of. NeuroImage: Clinical, 22:101796, 2019.
      5. Cedric Huchuan Xia, Zongming Ma, Rastko Ciric, Shi Gu, Richard F Betzel, Antonia N Kaczkurkin, Monica E Calkins, Philip A Cook, Angel Garcia de la Garza, Simon N Vandekar, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nature communications, 9(1):3003, 2018.
      6. Andrew T Drysdale, Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, Yue Meng, Robert N Fetcho, Benjamin Zebley, Desmond J Oathes, Amit Etkin, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature medicine, 23(1):28, 2017.
      7. Meichen Yu, Kristin A Linn, Russell T Shinohara, Desmond J Oathes, Philip A Cook, Romain Duprat, Tyler M Moore, Maria A Oquendo, Mary L Phillips, Melvin McInnis, et al. Childhood trauma history is linked to abnormal brain connectivity in major depression. Proceedings of the National Academy of Sciences, 116(17):8582–8590, 2019.
      8. David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639–2664, 2004.
      9. Katrin H Preller, Joshua B Burt, Jie Lisa Ji, Charles H Schleifer, Brendan D Adkinson, Philipp Stämpfli, Erich Seifritz, Grega Repovs, John H Krystal, John D Murray, et al. Changes in global and thalamic brain connectivity in LSD-induced altered states of consciousness are attributable to the 5-HT2A receptor. eLife, 7:e35082, 2018.
      10. Mark A Geyer and Franz X Vollenweider. Serotonin research: contributions to understanding psychoses. Trends in pharmacological sciences, 29(9):445–453, 2008.
      11. H Y Meltzer, B W Massey, and M Horiguchi. Serotonin receptors as targets for drugs useful to treat psychosis and cognitive impairment in schizophrenia. Current pharmaceutical biotechnology, 13(8):1572–1586, 2012.
      12. Anissa Abi-Dargham, Marc Laruelle, George K Aghajanian, Dennis Charney, and John Krystal. The role of serotonin in the pathophysiology and treatment of schizophrenia. The Journal of neuropsychiatry and clinical neurosciences, 9(1):1–17, 1997.
      13. Francine M Benes and Sabina Berretta. Gabaergic interneurons: implications for understanding schizophrenia and bipolar disorder. Neuropsychopharmacology, 25(1):1–27, 2001.
      14. Melis Inan, Timothy J. Petros, and Stewart A. Anderson. Losing your inhibition: Linking cortical gabaergic interneurons to schizophrenia. Neurobiology of Disease, 53:36–48, 2013. ISSN 0969-9961. . What clinical findings can teach us about the neurobiology of schizophrenia?
      15. Samuel J Dienel and David A Lewis. Alterations in cortical interneurons and cognitive function in schizophrenia. Neurobiology of disease, 131:104208, 2019.
      16. John E Lisman, Joseph T Coyle, Robert W Green, Daniel C Javitt, Francine M Benes, Stephan Heckers, and Anthony A Grace. Circuit-based framework for understanding neurotransmitter and risk gene interactions in schizophrenia. Trends in neurosciences, 31(5):234–242, 2008.
      17. Anthony A Grace. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nature Reviews Neuroscience, 17(8):524, 2016.
      18. John F Enwright III, Zhiguang Huo, Dominique Arion, John P Corradi, George Tseng, and David A Lewis. Transcriptome alterations of prefrontal cortical parvalbumin neurons in schizophrenia. Molecular psychiatry, 23(7): 1606–1613, 2018.
      19. Daniel J Lodge, Margarita M Behrens, and Anthony A Grace. A loss of parvalbumin-containing interneurons is associated with diminished oscillatory activity in an animal model of schizophrenia. Journal of Neuroscience, 29(8): 2344–2354, 2009.
      20. Clare L Beasley and Gavin P Reynolds. Parvalbumin-immunoreactive neurons are reduced in the prefrontal cortex of schizophrenics. Schizophrenia research, 24(3):349–355, 1997.
      21. David A Lewis, Allison A Curley, Jill R Glausier, and David W Volk. Cortical parvalbumin interneurons and cognitive dysfunction in schizophrenia. Trends in neurosciences, 35(1):57–67, 2012.
      22. Alan Anticevic, Margaret S Brumbaugh, Anderson M Winkler, Lauren E Lombardo, Jennifer Barrett, Phillip R Corlett, Hedy Kober, June Gruber, Grega Repovs, Michael W Cole, et al. Global prefrontal and fronto-amygdala dysconnectivity in bipolar i disorder with psychosis history. Biological psychiatry, 73(6):565–573, 2013.
      23. Alex Fornito, Jong Yoon, Andrew Zalesky, Edward T Bullmore, and Cameron S Carter. General and specific functional connectivity disturbances in first-episode schizophrenia during cognitive control performance. Biological psychiatry, 70(1):64–72, 2011.
      24. Avital Hahamy, Vince Calhoun, Godfrey Pearlson, Michal Harel, Nachum Stern, Fanny Attar, Rafael Malach, and Roy Salomon. Save the global: global signal connectivity as a tool for studying clinical populations with functional magnetic resonance imaging. Brain connectivity, 4(6):395–403, 2014.
      25. Michael W Cole, Alan Anticevic, Grega Repovs, and Deanna Barch. Variable global dysconnectivity and individual differences in schizophrenia. Biological psychiatry, 70(1):43–50, 2011.
      26. Naomi R Driesen, Gregory McCarthy, Zubin Bhagwagar, Michael Bloch, Vincent Calhoun, Deepak C D’Souza, Ralitza Gueorguieva, George He, Ramani Ramachandran, Raymond F Suckow, et al. Relationship of resting brain hyperconnectivity and schizophrenia-like symptoms produced by the nmda receptor antagonist ketamine in humans. Molecular psychiatry, 18(11):1199–1204, 2013.
      27. Neil D Woodward, Baxter Rogers, and Stephan Heckers. Functional resting-state networks are differentially affected in schizophrenia. Schizophrenia research, 130(1-3):86–93, 2011.
      28. Zarrar Shehzad, Clare Kelly, Philip T Reiss, R Cameron Craddock, John W Emerson, Katie McMahon, David A Copland, F Xavier Castellanos, and Michael P Milham. A multivariate distance-based analytic framework for connectome-wide association studies. Neuroimage, 93 Pt 1:74–94, Jun 2014. .
      29. Alan J Gelenberg. The catatonic syndrome. The Lancet, 307(7973):1339–1341, 1976.
      30. Jie Lisa Ji, Marjolein Spronk, Kaustubh Kulkarni, Grega Repovš, Alan Anticevic, and Michael W Cole. Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage, 185:35–57, 2019.
      31. August B Hollingshead et al. Four factor index of social status. 1975.
      32. Jaya L Padmanabhan, Neeraj Tandon, Chiara S Haller, Ian T Mathew, Shaun M Eack, Brett A Clementz, Godfrey D Pearlson, John A Sweeney, Carol A Tamminga, and Matcheri S Keshavan. Correlations between brain structure and symptom dimensions of psychosis in schizophrenia, schizoaffective, and psychotic bipolar i disorders. Schizophrenia bulletin, 41(1):154–162, 2015.
    1. Author Response:

      Reviewer #1 (Public Review):

      This study demonstrates with analyical methods and simulations a new approach to estimate pairwise noise and signal correlations in two-photon calcium imaging data. This approach compensates for biases introduced by the dynamics of calcium signals, without deconvolution and for low trial numbers. Simulations based on idealized calcium signals demonstrate the efficiency of the method, and application to auditory cortex imaging data leads to mild changes in the results shown in the past based on less accurate estimates. This study has the merit to identify biases that can arise when evaluating noise and signal correlations across neurons with indirect signals. Moreover the solution provided, may become a useful addition to the neuroscientist's signal analysis toolbox. Noise and signal correlation are related to fonctional connectivity between neurons, and thereby give insights about the fonctional structure of the underlying network. They do not necessarily account for the full complexity of neural interactions but are used in numerous studies, which would be improved by this tool. A potential improvement of the study could be to indicate how this approach could be generalized to other neuron to neuron interaction measurements or data-driven neural network modeling.

      We would like to sincerely thank Reviewer 1 for his supportive stance towards our work, and for providing helpful feedback to improve our manuscript

      The main weakness of the study is that the efficency of the method is only assessed with simulated datasets. Finding real ground-truth data for a validation beyond that would be difficult if not impossible. However, authors could further convince the reader by showing the effect of relaxing certain assumptions of their surrogate data generation model (e.g. absence of temporal correlation in measurement noise), and show the robustness and limits of the methods.

      Thank you for this suggestion. Motivated by this comment, and a related comment by Reviewer 2, we have now substantially enhanced our performance analyses in the revised manuscript and compiled them in a new subsection titled “Analysis of Robustness with respect to Modeling Assumptions” for better clarity and consistency. In summary:

      1) We first examined the robustness of our proposed method with respect to model mismatch in the stimulus integration model. As suggested, we generated data according to a non-linear (i.e., quadratic sum of linear filters) receptive field model:

      but assumed a linear stimulus integration model in our inference procedure

      The comparison of the correlations estimated under this setting by each method are shown in Figure 2 – Figure Supplement 3. While the performance of our proposed signal correlation estimates under this setting degrade as compared to that in Figure 2 with no model mismatch, our proposed estimates still outperform the other methods and recovers the ground truth signal correlation structure reasonably well.

      It is noteworthy that the model mismatch in the stimulus integration component does not affect the accuracy of noise correlation estimates in our method, as is evident from the noise correlation estimates in Figure 2 – Figure Supplement 3. In comparison, the biases induced in the other methods due to model mismatch and various other factors such as observation noise, temporal blurring, undermining non-linear mappings between spikes and underlying covariates, results in significantly larger errors in both signal and noise correlation estimates.

      2) We incorporated our previous analysis of robustness with respect to calcium decay model mismatch in this subsection, which is shown in Figure 2 – Figure Supplement 4.

      3) In response to a related comment by Reviewer 2, we then performed extensive simulations to evaluate the effects of SNR and firing rate on the performance of our method. Overall, while the performance of all algorithms degrades at low SNR or firing rate values (SNR < 10 dB, firing rate < 0.5 Hz), our algorithm outperforms the existing methods in a wide range of SNR and firing rate values considered. The results are summarized in Figure 2 – Figure Supplement 5.

      4) Finally, we considered two observation noise model mismatch conditions, namely, white noise + low frequency drift and pink noise, similar to the treatment in Deneux et al. (2016). For each noise mismatch model, we also varied the SNR level and firing rate and compared the performance of the different algorithms as reported in Figure 2 – Figure Supplement 6. These new analyses demonstrate that our proposed estimates outperform the existing methods, under correlated generative noise models, and also with respect to varying levels of SNR and firing rate. As clearly evident in panels C and F of Figure 2 – Figure Supplement 6, even though the estimated calcium concentrations are contaminated by the temporally correlated fluctuations in observation noise, the putative spikes estimated as a byproduct of our iterative method closely match the ground truth spikes, which in turn results in accurate estimates of signal and noise correlations.

      To address this comment, we performed extensive simulations to evaluate the robustness of different algorithms under model mismatch conditions induced by 1) non-linearity in the stimulus integration model, 2) calcium decay, 3) SNR and firing rate, and 4) temporal correlation of observation noise. We have now compiled these results in a new subsection called “Analysis of Robustness with respect to Modeling Assumptions” (Pages 6-7).

      Also further intuitions about why this method outperform others would be of great help for the non-specialist readers.

      Thank you for this suggestion. There are two sources for the performance gap between our proposed method and existing approaches:

      1) Favorable soft decisions on the timing of spikes achieved by our method, as a byproduct of the iterative variational inference procedure: an accurate probabilistic decoding of spikes results in better estimates of the signal/noise correlations, and conversely having more accurate estimates of the signal/noise covariances improves the probabilistic characterization of spiking events. This is in contrast with both the Pearson and Two-Stage methods: in the Pearson method, spike timing is heavily blurred by the calcium decay; in the two-stage methods, erroneous hard (i.e., binary) decisions on the timing of spiking events result in biases that propagate to and contaminate the downstream signal and noise correlation estimation and thus result in significant errors.

      2) Explicit modeling of the non-linear mapping from stimulus and latent noise covariates to spiking through a canonical point process model (which is in turn tied to a two-photon observation model in a multi-tier Bayesian fashion) results in robust performance under limited number of trials and observation duration. As we have shown in Appendix 1, as the number of trials L and trial duration T tend to infinity, conventional notions of signal and noise correlation indeed recover the ground truth signal and noise correlations, as the biases induced by non-linearities average out across trial repetitions. However, as shown in Figure 2 - Figure supplement 2, in order to achieve comparable performance to our method using 20 trials, the conventional correlation estimates require ~1000 trials.

      To address this comment, we have now included the aforementioned items in the revised Discussion section, highlighting the key aspects of our method that makes it outperform existing approaches (Pages 17-18).

      Reviewer #2 (Public Review):

      This manuscript describes a new method for estimating signal and noise correlations from two-photon recordings of calcium activity in large neuronal networks. Unlike existing methods that first require inferring spikes from calcium transients before estimating the correlations, the proposed method performs the correlation estimation directly from the fluorescence traces. It treats the different inputs to each neuron as latent variables to be inferred from its observed fluorescence activity, and divides these inputs according to whether they are provided by stimulus-dependent (signal) or stimulus-independent (noise) inputs. The authors showed with simulations that proper definitions of signal and noise correlations based on these inferred variables converge with trial repetition much faster to the true correlations than conventional estimates. They are not sensitive to blurring produced by inaccurate spike deconvolution and are less prone to erroneously mixing the signal and noise components of the correlations. By applying this new method to real optical recordings from the auditory cortex of awake mice, the authors shed new light on the structure of the circuitry underlying the processing of sound information in this brain region. Circuits processing sound-related and sound-independent information appear to be more orthogonal than previously thought, with a spatial signature that changes between thalamorecipient layer 4 and supragranular layers 2/3.

      This is a mathematical manuscript that introduces a promising new analysis approach. It is designed to be applied to two-photon experiments, that typically produce recordings of calcium activity of several hundred of neurons simultaneously. Because of their massive parallel recordings, which do not rely on spike sorting to identify single units, these optical techniques naturally provide access to correlation between units. They have given rise to a field of active research that attempts to link these correlations to elementary functional circuits in the brain. However, as the authors point out, the low efficiency of spike inference from calcium traces raises the need for correlation estimation approaches that circumvent this problem, as the method presented here does. As such, it could have a significant impact if the community succeeds in using it (see below).

      We would like to sincerely thank Reviewer 2 for his/her supportive stance towards our work, and for providing helpful feedback to improve our manuscript.

      Weaknesses and strengths

      1) Public availability of the code implementing the new method is clearly necessary for the two-photon microscopy community to adopt it, and this is indeed the case at https://github.com/Anuththara-Rupasinghe/Signal-Noise-Correlation. However, it is also crucial that any end-user be able to get a clear picture of the conditions under which the method can or cannot be applied before diving in. The fact that such an applicability domain is not well defined is a major concern. Notably, each Real Data Study presented in the paper uses a preliminary selection of "highly active cells" (1rst study: N = 16; 2nd study: N = 10; 3rd study: N~20 per field), as the authors succinctly discuss that performance is expected to degrade "in the regime of extremely low spiking rate and high observation noise" (l. 518-519). But no precise criteria are provided to specify what is meant by "highly active cells". On the other hand, the authors also assume that there is at most one spiking event per time frame for each neuron, which seems to exclude bursting neurons. The latter assumption seems to be a challenge with respect to the example traces shown on Fig. 4C (F/F reaches 400%) and on Fig. 6C (F/F reaches 100%), considering that the GCaMP6s signal for a single spike is expected to peak below 10-20%. This forces the authors to take a scaling factor of the observations A = 1 x I (Real Data Study 1 and 3) or A = 0.75 x I (Real Data Study 2) compared to the A = 0.1 x I taken in the Simulation Studies. Therefore, it looks like if the Real Data Studies were performed on mainly bursting cells and each burst was counted as one spiking event. A detailed discussion of the usable range of firing rates, whether in spike or burst units, as well as the usable range of SNR should be added to the main text to allow future users to assess the suitability of their data for this analysis.

      Thank you for pointing out the issues related to the applicability domain of our method. We agree that clarifying the rationale behind our model parameter choices is key to facilitating its usage by future users. In response to this comment, we have made three major revisions:

      1) Adding a new subsection to the Methods and Materials called “Guidelines for model parameter settings” that includes our rationale and criteria for choosing the number of neurons (N), stim- ulus integration window length (R), observation noise covariance (Σ_w), scaling matrix A, state transition parameter (α), and mean of the latent noise process (μ_x);

      2) Inspecting the capability of our proposed method in compensating for rapid increase of firing rate;

      3) Performing extensive new simulations to evaluate the effect of SNR level and firing rate on the performance of our proposed method, included in a new subsection in the Results section called “Analysis of robustness with respect to modeling assumptions”.

      We will next describe these changes in a point-by-point fashion.

      -Criterion for selecting the number of neurons. While our proposed method scales-up well with the population size due to low-complexity update rules involved, including neurons with negligible spiking activity in the analysis would only increase the complexity and potentially contaminate the correlation estimates. Thus, we performed an initial pre-processing step to extract N neurons that exhibited at least one spiking event in at least half of the trials considered. This criterion is now clearly stated in the subsection “Guidelines for model parameter settings”. We have also reworded “highly active cells” to “responsive cells (according to the selection criterion described in Methods and Materials)” for clarity.

      -Evaluating the effects of SNR level and firing rate. We had previously noted that the performance degrades at low SNR and firing rate values, with little quantitative justification. In response to this comment, and a related comment by Reviewer 1, we performed extensive simulations to evaluate the robustness of the different methods under varying SNR levels, firing rates, and observation noise model mismatch (including white noise + drift and pink noise models). These results are included in a new subsection called “Analysis of robustness with respect to modeling assumptions” and shown in Figure 2 – Figure Supplement 5 and 6.

      While the performance of all methods (including ours) degrades at low SNR levels or firing rates (SNR < 10 dB, firing rate < 0.5 Hz), our proposed method outperforms the existing methods in a wide range of SNR and firing rate values and under the considered observation noise model mismatch conditions. To quantify this comparison, we have also indicated the mean and standard deviation of the relative performance gain of our proposed estimates across SNR levels and firing rates as insets in Figure 2 – Figure Supplement 5 and 6.

      -Choosing the scaling matrix A. In each case, we set A=aI, and estimated a by considering the average increase in fluorescence after the occurrence of isolated spiking events. Specifically, we derived the average fluorescence activity of multiple trials triggered to the spiking onset and set a as the increment in the magnitude of this average fluorescence immediately following the spiking event.

      -Compensation for rapid increase of firing rate. The comment of the reviewer regarding the sudden increase of ∆F/F in Fig. 4C prompted us to inspect the performance of the algorithm in such scenarios where the choice of A may underestimate the rapid increase of firing rate (e.g., A= I). In the new supplementary figure to Fig. 4, called Figure 4 – Figure Supplement 2, we show a zoomed-in view of the time-domain estimates of the latent processes obtained by our proposed method (replicated here for discussion):

      Notably, the fluorescence activity rises up to a magnitude of ∼ 14, while we have set a=1. Thus, as the reviewer pointed out, this activity is induced by a burst-like event due to successive closely-spaced spikes. Due to the low firing rate of A1 neurons, we believe this is not a bursting event (in the electrophysiological sense), but a rapid increase in firing rate that may result in the occurrence of more than one spike per frame. From the estimates of the latent calcium concentration (purple) and putative spikes (green), we clearly see that our proposed method is still capable of matching the observed fluorescence activity through two mitigatory mechanisms that we describe next:

      1) The proposed method predicts spiking events in adjacent time frames to compensate for rapid increase of firing rate (see the green trace following the vertical dashed line) and thus infers calcium concentration levels that match the observed fluorescence activity;

      2) Even though our generative model assumes that there is only one spiking event in a given time frame, this assumption is implicitly alleviated in our inference framework by relaxing the constraint

      as explained in the section Methods and Materials - Low-complexity parameter updates (Page 23). While this relaxation was performed in order to make the inverse problem tractable, we see that it in fact leads to improved estimation results under such settings, by allowing the putative spike magnitudes

      to be greater than 1, as it is also evident in the magnitude of the inferred spikes right after the rise of fluorescence activity (the horizontal dashed line corresponds to spiking magnitude equal to 1).

      We have now discussed this observation in the Results section (Page 10).

      To address this comment, we have added a new subsection to Methods called “Guidelines for model parameter settings” that includes our rationale and criteria for choosing key model parameters (Page 24), have performed new simulation studies to evaluate the effects of SNR and firing rate on the performance of the proposed method (Pages 6-7), and closely inspected the performance of our method under rapid increase of firing rate (Page 10).

      2) Another parameter seems to be set by the authors on a criterion that is unclear to me: the number of time lags R to be included in the sound stimulus vector st. It seems to act as a memory of the past trajectory of the stimulus and probably serves to enhance the effect of stimulus onset/offset relative to the rest of the sound presentation. It is consistent with the known tendency of neurons in the primary auditory cortex to respond to these abrupt changes in sound power. However, this R is set at 2 in the Simulation Study 1, whereas it is set at 25, in the Real Data Studies 1 and 3, and to 40 in the Real Data Study 2. What leads to these differences escaped to me and should be explained more clearly.

      Thank you for pointing out this lack of clarity in explaining the rationale behind choosing R. In addressing this comment, we have now added an entry in the new subsection “Guidelines for model parameter settings”. Furthermore, we have unified our choice of R in the three real data studies. We will explain these changes in a point-by-point fashion next.

      -Choice of R in simulation studies. The stimulus used in the simulation was a 6th-order autoregressive process whose present and immediate past values contributed to spiking in our generative model (i.e., R=2). Given that the ground truth value of R was known in the simulations, we used R=2 for inference as well.

      -Choice of R for real data application. The number of lags R considered in stimulus integration is a key parameter that can be set through data-driven approaches or using prior domain knowledge. Examples of common data-driven criteria include cross-validation, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), which balance the estimation accuracy and model complexity.

      To quantify the effect of R on model complexity, we first describe the stimulus encoding model in our framework. Suppose that the onset of the pth tone in the stimulus set (p=1,⋯,P , where P is the number of distinct tones) is given by a binary sequence

      The choice of R implies that the response at time t post-stimulus depends only on the R most recent time lags. As such, the effective stimulus at time t corresponding to tone p is given by

      By including all the P tones, the overall effective stimulus at the tth time frame is given by

      The stimulus modulation vector d_j would thus be RP-dimensional. As a result, the number of parameters (M=RP) to be estimated linearly increases with R. By using additional domain knowledge, we chose R to be large enough to capture the stimulus effects, and at the same time to be small enough to control the complexity of the algorithm.

      As an example, given that the typical response duration of mouse primary auditory neurons is < 1 s, with a sampling frequency of f_s=30 Hz, we surmised that a choice of R∼30 would suffice to capture the stimulus effects. We further examined the effect of varying R on the proposed correlation estimates in Figure 4 – Figure Supplement 1. As shown, small values of R (e.g., R = 1 or 10) may not be adequate to fully capture the effects of stimuli. By considering values of R in the range 25 − 50, we noticed that the correlation estimates remain stable. We thus chose R=25 for our real data analyses. Notably, the results of real data study 2 (that previously used R = 40) are nearly unchanged with the new choice of R=25, which is in accordance with our observation in Figure 4 – Figure Supplement 1.

      To address this comment, we have added a new subsection to Methods called “Guidelines for model parameter settings” (Page 24) that includes our rationale for choosing the stimulus integration window length R and have performed a new analysis to evaluate the effect of R on the performance of the proposed method in real data study 1 (Page 10).

      3) This memory of the past stimulus trajectory appears to be specific to the proposed method and is not accounted for in the 2-stage Pearson estimation, for example. Since it probably helps to reflect the common sensitivity of neurons to onset/offset, it alone provides an advantage to the proposed method over the 2-stage Pearson estimation. It would be instructive to also perform this comparison with R set to 1 to get an idea of the magnitude of this advantage.

      We agree that explicit modeling of stimulus integration is a key advantage of our proposed method in comparison to the conventional ones. We have now explained this virtue in the discussion of the role of R in real data study 1 (Page 10). Additionally, as explained in our responses to the previous comment, we have included a new analysis of the sensitivity of our proposed estimates to the choice of R as a supplementary figure to Figure 4. As the reviewer suggested, we see that R=1 indeed fails to capture the underlying structure in the signal correlations. However, when R is sufficiently large (R>20), the estimates become stable.

      To address this comment, we have now discussed the advantage of including the stimulus history in our model and probed the sensitivity of our estimates to the choice of R in Figure 4 – Figure Supplement 1 (Page 10).

      4) Finally, although the example of ground truth signal and noise correlation matrices taken to illustrate the method in the simulation study on Fig. 2A have been chosen to be with almost no overlap in their non-zero coefficients, there is no fundamental reason why this separation should be the rule for real data. These coefficients reflect the patterns of stimulus-dependent and stimulus-independent functional connectivity in the recorded network. As such, these patterns could have different degree of overlap, depending on the brain areas recorded. It is therefore particularly striking that the authors find in their data a strong dissimilarity and almost no covariance between signal and noise correlation coefficients, throughout all the different sets of experiments they present here (Fig. 4E, Table 1, 2, 3, and Fig. 6A&B). This makes a strong and compelling statement on the likely separation of the corresponding circuits in the primary auditory cortex of the mouse.

      We agree with the assessment of the reviewer. We suspect that some of the reported similari- ties between signal and noise correlations in existing literature could be due to leakage in estimating these two quantities, likely indued by limited number of trials, short observation duration, and undermining the effect of calcium dynamics and non-linearities.

      Likely impact on the field

      It is now well established that sound processing is modulated, even at the level of primary auditory cortex, by locomotion (Schneider et al. Nature 2018), task engagement (Fritz et al. Nat. Neurosci. 2003), or several other factors. Applying the proposed method to these situations could help understand how sound processing circuits are remodeled, without confounding other coexisting processes. In general, whenever a brain structure makes associations between multiple processes within the same network, the presence of multiple circuits makes the observation of correlations difficult to attribute to the signature of a single circuit. By significantly improving the estimation of signal and noise correlations, the proposed method should help distinguish the boundaries of these circuits as well as their intersections. The exploration of the role of many secondary sensory and associative cortical structures could be renewed by this work.

      We would like to thank Reviewer 2 again for his/her supportive stance towards our work and for fairly summarizing our contributions

    1. Author Response:

      Reviewer #2 (Public Review):

      Dieterle et al set out to determine the receptors needed by hantaviruses to infect human endothelial cells. Prior to this publication, the authors identified protocadherin-1 (PCDH1) as a putative viral receptor for New-World Hantaviruses, but not Old-World hantaviruses. Additionally, both Integrins and DAF have been reported as receptor candidates for hantaviruses. However, whether these molecules function alone, or in combination to promote hantavirus entry and infection remains unclear. Dieterle et al generate and validate single and combinatorial knockouts of these 4 genes (PCDH1, DAF, ITGB1, ITGB3) and test the ability of the resulting cells to support viral replication in two independent assays. Dieterle et al confirm that New World hantaviruses require PCDH1 for infection. Furthermore, Dieterle et al fail to find a functional role for Integrins (Beta 3/Beta1) or DAF in hantavirus infection, even when knocked out in combination. Overall, the data is clearly presented and well controlled. The findings help clarify entry mechanisms used by hantaviruses and provide a foundation to identify receptor candidates for old-world hantaviruses. A few minor points are worth mentioning.

      1) The authors clearly demonstrate a lack of genetic requirement for (DAF, ITGB1, ITGB3). However, a second orthogonal approach to block access to Integrins or DAF would strengthen the conclusion and alleviate any minor concerns of incomplete genetic knockout.

      We agree that additional approaches would be helpful to rule out some caveats to our study and that they are warranted. For this short report, we chose to focus on a loss-of-function genetic approach. Our genetic and biochemical evidence indicate that incomplete genetic knockout is unlikely to explain our findings. However, we cannot rule out more complex scenarios, including those we articulate in the manuscript. We concur that more work will be required to fully rule out (or ‘rule in’) the involvement of the various entry factors that have been proposed.

      2) The authors are commended for a nuanced conclusion. In particular lines 181-185 the authors state "We note that our results do not rule out that one or more of these proteins is involved in hantavirus entry into other cell types not examined herein, or that they are involved in endothelial cell subversion post-viral entry, as shown previously (Gavrilovskaya et al. 1998; Gavrilovskaya et al. 1999; Krautkrämer and Zeier 2008)." It should be noted that studies demonstrating a requirement for DAF used polarized cells. This would suggest in addition to cell type, growth conditions, may play an important distinction in receptor utilization studies. None the less, under the conditions tested the authors clearly demonstrate that DAF is not absolutely required for hantavirus infection in human endothelial cells.

      We have added language to the “Results and DIscussion” indicating as an additional caveat that some of the previously proposed receptors could play roles in polarized cell layers, as indeed has been proposed for DAF (Krautkramer and Zeier, 2008).

    1. Author Response:

      Reviewer #1:

      In this study, Sias and colleagues examined the neural mechanism underlying stimulus-outcome associations using a Pavlovian-to-instrumental transfer (PIT) task in rats. Rats were first trained in a Pavlovian conditioning task in which two different auditory stimuli (white noise or tone) predicted different outcomes (sucrose solution or food pellet). The rats were then subjected to an instrumental conditioning and a PIT test to examine stimulus-outcome associations. The authors first used fiber photometry to examine the bulk calcium signals from the basolateral amygdala (BLA) during Pavlovian conditioning, and found that a population of BLA neurons are activated at the onset of a conditioned stimulus and at the time of reward retrieval. The response was observed from the first day and the magnitude was relatively constant over the entire period (8 days), indicating that the population activity contained responses to novel auditory stimuli. The authors then performed optogenetic inhibitions of BLA neurons at the time of reward delivery and consumption during Pavlovian conditioning. Although the BLA inhibition did not affect the acquisition of Pavlovian approach to the reward port, it impaired a facilitation of pressing the lever associated with a specific outcome predicted by an auditory cue, supporting a role of BLA in learning to predict specific outcomes, not just reward generally. The authors also examined the role of interactions between BLA and the lateral orbitofrontal cortex (lOFC), first by inactivating lOFC axons in BLA, and then by a serial circuit disconnection experiment combining optogenetic and pharmacogenetic inhibitions of specific projections.

      Although the role of BLA and lOFC in learning has been studied extensively, this study extends these studies by performing temporally specific inhibitions using optogenetics, axonal inactivation, and serial disconnection experiments. The finding that the BLA-lOFC circuit is not necessary for the acquisition of simple Pavlovian approaches but critical for outcome-specific stimulus-outcome associations is surprising. The authors performed sophisticated and difficult experiments, and the experiments are generally well done. The manuscript is clearly written, and the results are discussed carefully.

      We appreciate this thoughtful evaluation of our manuscript.

      I have one relatively minor concern regarding the description of the serial disconnection experiment. Overall, the manuscript provides interesting results and warrants publication at eLife.

      1) The use of a serial circuit disconnection experiment (Figure 5) is elegant and informative. However, the authors could have achieved almost the same goal by bilateral inactivation of axonal terminals of lOFC->BLA projections during the encoding phase or BLA->lOFC projections during the retrieval phase.

      We did these bilateral axonal terminal inactivation experiments. They showed us that the lOFCBLA pathway is involved in the learning (Figure 4) and the BLAlOFC pathway is involved in the retrieval (Lichtenberg et al., 2017) of stimulus-outcome memories. But these experiments are not capable of providing information on whether these pathways form a circuit. That is, whether BLAlOFC projection activity mediates the use of the associative information that is learned via activation of lOFCBLA projections or whether these pathways tap in to independent information streams. Our goal with the serial disconnection experiment was to address this specific circuit question. We have clarified the logic of this experiment.

      • Results on Pg. 10: “But it remains unknown whether BLAlOFC projection activity mediates the use of the associative information that is learned via activation of lOFCBLA projections. That is, whether lOFCBLAlOFC is a functional stimulus-outcome memory encoding and retrieval circuit or whether lOFCBLA and BLAlOFC projections tap in to independent, parallel information streams. Indeed, stimulus-outcome memories are highly complex including multifaceted information about outcome attributes (e.g., value, taste, texture, nutritional content, category, probability, timing, etc.) and related consummatory and appetitive responses (Delamater & Oakeshott, 2007). Therefore, we next asked whether the lOFCBLA and BLAlOFC pathways form a functional stimulus-outcome memory encoding and retrieval circuit, i.e., whether the sensory-specific associative information that requires lOFCBLA projections to be encoded also requires activation of BLAlOFC projections to be used to guide decision making, or whether these are independent, parallel pathways, tapping into essential but independent streams of information. To arbitrate between these possibilities, we multiplexed optogenetic and chemogenetic inhibition to perform a serial circuit disconnection. We disconnected lOFCBLA projection activity during stimulus-outcome learning from BLAlOFC projection activity during the retrieval of these memories at the PIT test (Figure 5a)… …If BLAlOFC projection activity mediates the retrieval of the sensory-specific associative memory that requires activation of lOFCBLA projections to be encoded, then we will have bilaterally disconnected the circuit, attenuating encoding in one hemisphere and retrieval in the other, thereby disrupting the ability to use the stimulus-outcome memories to guide choice behavior during the PIT test. If, however, these pathways mediate parallel information streams, i.e., independent components of the stimulus-outcome memory, the expression of PIT should be intact because one of each pathway is undisrupted to mediate its individual component during each phase.”

      Furthermore, if there are contralateral projections, the experimental design might have a problem. Please clarify these points.

      This is a great point that we did not discuss as clearly as we could have. We appreciate the opportunity to clarify our logic. There are both ipsilateral and contralateral lOFCBLA projections. For this reason, we optically inactivated both the ipsilateral and contralateral lOFC input to the BLA of one hemisphere, leaving both the ipsilateral and contralateral lOFCBLA projections to the BLA of the other hemisphere intact. To achieve this, we expressed the inhibitory opsin ArchT bilaterally into the lOFC and placed the optical fiber unilaterally in the BLA. BLAlOFC projections are largely ipsilateral and so we expressed the inhibitory designer receptor hM4Di unilaterally in the BLA and put a guide cannula for CNO infusion over the hemisphere opposite to that in which we had placed the optical fiber. We have clarified this logic in the revised results and methods:

      • Results Pg. 10 ¶2: “For the disconnection group (N = 10), we again expressed ArchT bilaterally in lOFC neurons (Figure 5b-d) to allow expression in lOFC axons and terminals in the BLA. This time, we implanted the optical fiber only unilaterally in the BLA (Figure 5b-d), so that green light (532nm, ~10mW), delivered again during Pavlovian conditioning for 5 s during the delivery and retrieval of each reward during each cue, would inhibit both the ipsilateral and contralateral lOFC input to the BLA of only one hemisphere. In these subjects, we also expressed the inhibitory designer receptor human M4 muscarinic receptor (hM4Di) unilaterally in the BLA of the hemisphere opposite to the optical fiber and in that same hemisphere placed a guide cannula over the lOFC near hM4Di-expressing BLA axons and terminals (Figure 5b-d). This allowed us to infuse the hM4Di ligand clozapine-n-oxide (CNO; 1 mM in 0.25 µl) prior to the PIT test to unilaterally inhibit BLA terminals in the lOFC, which are largely ipsilateral (Lichtenberg et al., 2017), in the hemisphere opposite to that for which we had inhibited lOFCBLA projection activity during Pavlovian conditioning. Thus, we optically inhibited the lOFCBLA stimulus-outcome learning pathway in one hemisphere at each stimulus-outcome pairing during Pavlovian conditioning, and chemogenetically inhibited the putative BLAlOFC retrieval pathway in the opposite hemisphere during the PIT test in which stimulus-outcome memories must be used to guide choice.”
      • Methods on Pg. 19 ¶1: “The disconnection group (N = 10) was infused with AAV encoding the inhibitory opsin ArchT (rAAV5-CAMKIIa-eArchT3.0-eYFP; 0.3 µl) bilaterally at a rate of 0.1 µl/min into the lOFC (AP: +3.3; ML: ±2.5; DV: -5.4 mm from bregma) using a 28-gauge injector tip. Injectors were left in place for an additional 10 minutes. An optical fiber (200 µm core, 0.39 NA) held in a ceramic ferrule was implanted unilaterally (hemisphere counterbalanced across subjects) in the BLA (AP: -2.7; ML: ±5.0; DV: -7.7 mm from dura) to allow subsequent light delivery to both the ipsilateral and contralateral ArchT-expressing axons and terminals in the BLA of only one hemisphere. During the same surgery, in the hemisphere contralateral to optical fiber placement, a second AAV was infused unilaterally at a rate of 0.1 µl/min into the BLA (AP: -3.0; ML: ±5.1; DV: -8.6 from bregma) to drive expression of the inhibitory designer receptor human M4 muscarinic receptor (hM4Di; pAAV8-hSyn-hM4D(Gi)-mCherry, Addgene; 0.5 µl). A 22-gauge stainless-steel guide cannula was implanted unilaterally above the lOFC (AP: +3.0; ML: ±3.2: DV: -4.0) of the BLA-hM4Di hemisphere to target the hM4D(Gi)-expressing axonal terminals, which are predominantly ipsilateral.”

      Also, the control experiments are now shown in Figure 5-2. It would be useful to have it in a main figure.

      We have incorporated the ipsilateral control group data into the main Figure 5 (Pg. 11). As you can see below, because there were no differences between the two control groups (contralateral fluorophore only eYFP/mCherry & ipsilateral ArchT/hM4Di), we combined them into a single control group for comparison to the disconnection group. The individual data points in Figure 5 are coded by control group (eYFP/mCherry solid lines and circles, ipsilateral ArchT/hM4Di dashed lines and triangles). We also provide the data with the control groups disaggregated showing a comparison between all three groups in Figure 5-2 (Pg. 43)

      • Results on Pg. 10 ¶2: “The control group received identical procedures with the exception that viruses lacked ArchT and hM4Di (N = 8). To control for unilateral inhibition of each pathway without disconnecting the circuit, a second control group (N = 8) received the same procedures as the experimental contralateral ArchT/hM4Di disconnection group, except with BLA hM4Di and the lOFC guide cannula in the same hemisphere as the optical fiber used to inactivate lOFC axons and terminals in the BLA (Figure 5-1). Thus, during the PIT test, for this group the BLAlOFC pathway was chemogenetically inactivated in the same hemisphere in which the lOFCBLA pathway had been optically inactivated during Pavlovian conditioning, leaving the entire circuit undisrupted in the other hemisphere. These control groups did not differ on any measure and so were collapsed into a single control group [(Pavlovian training, Training: F(2.2,31.3) = 12.96, P < 0.0001; Control group type: F(1,14) = 0.02, P = 0.89; Group x Training: F(7.98) = 0.76, P = 0.62) (PIT Lever presses, Lever: F(1,14) = 14.68, P = 0.002; Control group type: F(1,14) = 0.38, P = 0.55; Group x Lever: F(1,14) = 0.43, P = 0.52) (PIT Food-port entries, t14 = 0.72, P = 0.48)]. See also Figure 5-2 for disaggregated control data.”

      Reviewer #2:

      This manuscript aimed to dissociate two potential roles of the basolateral amygdala (BLA) in choice behavior: (1) contributing to sensory-specific stimulus-outcome memories or (2) assigning general valence to a reward-predictive cue. The authors used a well-validated Pavlovian-to-instrumental transfer (PIT) test with a series of circuit manipulations to show that lateral OFC to BLA projections are necessary for learning specific cue-outcome associations, rather than general valence, and that return BLA to lateral OFC projections are important for using that learned information in the PIT test.

      Overall, this paper addresses a question that is important to anyone studying amygdala or orbitofrontal function. The study is well-designed, the multiplexed opto-chemogenetics experiment is particularly creative, and there are convincing results with appropriate controls.

      We appreciate this thoughtful evaluation of our manuscript.

      I only have a few minor questions about the calcium signals reported in the first portion of the manuscript. First, there is a steep rise in calcium signal in panel 1f, suggesting that the signal is time-locked to the cue. However, there is a qualitatively different response to rewards in 1g. Is this just because it's more difficult to time-lock to the animal's movements than an experimentally-controlled cue? Or is it possible that there's another source in the experimental set-up that could be triggering the response. For example, does the reward delivery make an audible sound?

      You are absolutely right that the signal is not as time-locked to the reward collection because the rats collected the reward at somewhat variable times after delivery, which is, indeed, signaled by a subtle, but audible cue (pellet dispenser click or pump onset). To clarify this, we have now included Figure 1-4 (Pg. 35) showing the BLA calcium response to reward delivery. As you can see, the BLA reward response is also detectable when the data are aligned to the reward delivery, but there is still not as sharp of a response as that to the cue onset, likely owing to slight variability in the precise moment that the reward is perceived.

      • Reference in Results Pg. 5 ¶2: “The same BLA reward response could also be detected when the data were aligned to reward delivery (Figure 1-4).”

      Second, in Fig 2, is there any change in the reward response across training sessions, or is this signal also stable?

      This is an interesting question, but unfortunately one we are not able to answer because we only recorded during one unpredicted reward delivery session after the last CSØ session. Because we saw that the BLA GCaMP response to the CSØ decreased and was nearly completely absent on the last day of exposure we wanted to make sure that this was not due to signal degradation over time, so we recorded during an unexpected reward session to serve as this positive control. We have now clarified this logic in the results.

      • Results Pg. 6 ¶1: “To check whether the decline of the CSØ response was due simply to signal degradation over time, following the last CSØ session we recorded BLA calcium responses to unpredicted reward delivery. Rewards were capable of robustly activating the BLA (Figure 2g-i; peak; t5 = 2.93, P = 0.03; AUC; t5 = 4.07, P = 0.01). This positive control indicates that the decline of the BLA CSØ response was due to stimulus habituation, not signal degradation.”

      Reviewer #3:

      Summary:

      This work tests the hypothesis that the reciprocal connections between the BLA and lOFC are needed to encode sensory-specific reward memories, as well as retrieve this same information once it has been learned in order guide decision making. The authors first use fiber photometry to measure the activity of excitatory BLA neurons during Pavlovian conditioning of two specific cues with two specific reward outcomes and find that transient responses are evident in BLA at cue onset and each time there is a cue contingent attempt to retrieve a reward. Using this information about event encoding in BLA, the authors go on to use optogenetics to inhibit BLA activity driven by lOFC inputs to BLA following reward retrieval attempts without affecting overall conditioned approach behavior. This manipulation has the effect of disrupting encoding of sensory-specific reward memories as it impairs the animals' subsequent performance on an outcome-specific Pavlovian instrumental transfer test. Since the authors have previously demonstrated that BLA inputs to lOFC are important for retrieving sensory-specific reward memories to affect decision making in the same PIT procedure, they go on to use an innovative serial disconnection approach using chemogenetic and optogenetic tools to show that inhibiting either pathway in opposing hemispheres, simultaneously, has comparable effects on outcome-specific PIT performance as bilateral inhibition of either pathway in isolation. Overall this is a compelling demonstration that inputs from BLA to lOFC and lOFC to BLA act in a coordinated manner to facilitate appetitive decision making.

      Strengths:

      These experiments build directly on the authors' prior demonstrations that lOFC projections to BLA are important for encoding incentive value but not for the retrieval of appetitive reward associations.

      An elegant use of an outcome-specific Pavlovian instrumental transfer (PIT) procedure to demonstrate the important contributions of projections between the BLA and lOFC in encoding and retrieving stimulus-outcome reward associations.

      The use of GCaMP measurements of BLA activity to temporally constrain optogenetic inhibition of lOFC inputs to BLA following reward retrieval, allowing specific conclusion about how encoding of stimulus-outcome memories mediated by lOFC inputs to BLA.

      The authors utilize a measure of Pavlovian conditioned approach behavior to convincingly demonstrate that the effects of their optogenetic manipulations during Pavlovian conditioning on behavior during PIT is sensory specific and due to potentially confounding changes in motivation or learning.

      We appreciate this thoughtful evaluation of our manuscript.

      Weaknesses:

      The conditioned approach responses appear to asymptote after two out of the eight Pavlovian conditioning sessions. Although the authors have run a control experiment in which they show that novelty contributes to the GCaMP responses measured in BLA at cue onset in early sessions, they do not clearly demonstrate learning related changes in GCaMP responses across sessions to either cue or reward retrieval. Thus, it isn't necessarily clear how quickly the sensory-specific reward memories are formed in BLA and if repeated stimulus-outcome pairings, particularly once general approach behavior reaches asymptote, actually serve to increasingly strengthen the memory.

      We agree with this limitation that our report and are actively working to address these interesting questions in our ongoing work. Indeed, a learning related-change in the BLA response can only be inferred from the present data and is not directly demonstrated. In the present experiment the nature of the memory is tested after learning, precluding understanding of the precise time course of the development of the sensory-specific stimulus-outcome memory. Future work should incorporate an online neural and/or behavioral assessment of sensory-specific reward memory encoding during learning to well address this important question.

      No explanation is provided for how the transient BLA GCaMP responses at cue onset sustain stimulus-outcome memory encoding at the time of reward. A straightforward account would be a sustained response to the cue that overlaps with the GCaMP response to reward retrieval. In addition there is no attempt to transiently inactivate the entire BLA or specific pathways at cue onset to determine how simple cue encoding affects subsequent performance in the PIT paradigm.

      This is an excellent point. We were somewhat surprised to see only a transient response to the CS onset. This suggests to us that perhaps there is a more sustained response elsewhere in the brain (or even in a different cell type in the BLA). Perhaps this sustained response follows the transient response detected here.

      We also agree that it is an important question (and limitation of the current work) of whether the BLA response to the cue is important for S-O memories. This is also a question we are addressing on in our ongoing work. We have acknowledged both this limitation/interesting question in the revised manuscript.

      • Discussion Pg. 13 ¶1: “Future work is needed to reveal the precise information content encoded by BLA neurons during reward experience that confers their function in the formation of stimulus-outcome memories, though BLA neurons will respond selectively to unique food rewards (Liu et al., 2018), which could support the generation of sensory-specific reward memories. Whether BLA cue responses are also important for encoding stimulus-outcome memories is another important question exposed by the current results.”

      The multiplexed chemogenetic and optogenetic serial disconnection approach is too coarse a manipulation to support the claim that reciprocal connections between the BLA and lOFC support encoding and retrieval of the same information. To make this claim it is necessary to use detailed functional assays of the activity in each pathway to determine what information they code during the Pavlovian conditioning and PIT procedures.

      We completely agree with this excellent point. We appreciate the reviewer pointing out how our language led to an interpretation that is not supported by the current data. Indeed, the data do not show whether the same information is transmitted between lOFCBLA and BLAlOFC and that need not be the case for these projections to function in a circuit. To remedy this, we have removed the ‘same information’ language throughout the manuscript, including in the abstract (Pg. 2), results (Pg. 9-11), discussion (Pg. 13-14), and methods (Pg. 20-21). We have brought our framing and interpretation of the disconnection results much closer to the present data. For example:

      • Results Pg. 10 ¶1: “Therefore, we next asked whether the lOFCBLA and BLAlOFC pathways form a functional stimulus-outcome memory encoding and retrieval circuit, i.e., whether the sensory-specific associative information that requires lOFCBLA projections to be encoded also requires activation of BLAlOFC projections to be used to guide decision making, or whether these are independent, parallel pathways, tapping into essential but independent streams of information.”

      • Results Pg. 11 ¶1: “…indicating that the lOFC and BLA form a bidirectional circuit for the encoding (lOFCBLA) and use (BLAlOFC) of appetitive stimulus-outcome memories.”

      • Discussion Pg. 14 ¶1: “Here, using a serial disconnection procedure, we found that during reward choice BLAlOFC projection activity mediates the use of the sensory-specific associative information that is learned via activation of lOFCBLA projections. Thus, lOFCBLAlOFC is a functional circuit for the encoding (lOFCBLA) and subsequent use (BLAlOFC) of sensory-specific reward memories to inform decision making.”

      We have also included the important caveat that future work with detailed characterization of the activity of each pathway is needed to draw conclusions on the information content conveyed by each pathway:

      • Discussion Pg. 14 ¶2: “The precise information content conveyed by each component of the lOFC-BLA circuit and how it is used in the receiving structure is a critical follow-up question that will require a cellular resolution investigation of the activity of each pathway.”
    1. Author Response:

      Reviewer #1:

      This study reports the novel and interesting finding that AKAP220 knockout leads to a dramatic increase in primary cilia in renal collecting ducts. AKAP220 is known to sequester PKA, GSK3, the Rho GTPase effector IQGAP-1 and PP1. Previous work from this group demonstrated that AKAP220-/- mice exhibit reduced accumulation of apical actin in the kidney attributable to less GTP-loading of RhoA. Relatedly, AKAP220-/- mice display mild defects in aquaporin 2 trafficking. In this work, Golpalan et al examine the effects of AKAP220 mutation on cilia. They demonstrate increased numbers of primary cilia decorating AKAP220-/- collecting ducts. This phenotype is striking as little is known about negative regulators of cilium biogenesis.

      The authors also provide evidence that interaction of AKAP220 with protein phosphatase 1 (PP1) is critical for its function. Through PP1, AKAP220 may regulate HDAC6, which may in turn inhibit tubulin acetylation, which may in turn control cilia stability. Aberrant cilia function is implicated in autosomal dominant polycystic kidney disease. The authors also speculate that AKAP220 and tubulin acetylation may have clinical relevance for autosomal dominant polycystic disease. However, it remains unclear how increased cilia biogenesis may affect cell or tissue physiology. This work is of interest to cell biologists seeking to understand the biogenesis of the primary cilium, and to others interested in ciliopathies (i.e., disorders of the primary cilium).

      We thank the reviewer 1 for their insightful comments and concur with their assessment that “it remains unclear how increased cilia biogenesis may affect cell or tissue physiology”. This is clearly a topic for further study within the field that will include ourselves and other laboratories.

      Reviewer #2:

      The authors show that AKAP220 knockout in kidney collecting ducts leads to a pronounced increase in primary cilia. They go on to demonstrate that this effect holds true in multiple different preparations, before clearly demonstrating that the PP1 anchoring site is critical for the normal role of AKAP220 is limiting primary cilia formation.

      Although the key overall finding is well supported, I did not find the specific mechanism concerning a AKAP220-PP1-HDAC6 signaling complex/axis csufficiently onvincing. The authors propose that AKAP220 interacts with HDAC6 via PP1, and that within the complex HDAC6 is stabilised through phosphorylation. The knock on effect is efficient deacetylation. Although this complicated mechanism is consistent with the data, three supporting observations towards this specific mechanism come with caveats: (i) in figure 2C, they show an increase in acetyl tubulin by immunoblotting, but the densitometry seems to be the ratio of acetyl tubulin to GAPDH - would it not be more appropriate to reference to total tubulin?

      We are encouraged that this reviewer considers that our “overall findings are well supported”. In response to their comments, we have bolstered our evidence that AKAP220 interacts with HDAC6 via PP1 by including new biochemical and imaging data showing that recruitment of the histone deacetylase is attenuated in kidney cells engineered to express a PP1-binding defective mutant of the anchoring protein. These new data are incorporated into figure 3D and supplemental figures S3D-L.

      The mechanism investigated in this paper is concerned with absolute levels of acetylated tubulin. Since the levels of both control proteins (alpha tubulin and GAPDH) and do not change between wildtype and AKAP220KO, therefore we chose to normalize to GAPDH. It is important to note that normalizing to total tubulin does not change the result.

      Reviewer #3:

      The authors had previously generated a mouse line with inactivation of AKAP220, which encodes an A-kinase anchoring protein, and observed defects in their collecting ducts (CD) leading to defects in trafficking of aquaporin 2. While further characterizing the samples, they observed that CD epithelia had increased numbers and length of their primary cilia compared to CD cells of control mice. While some AKAP proteins have been localized to the primary cilium, AKAP220 was not one of them so the authors pursued a systematic series of experiments to determine how AKAP220 has these effects. Using a combination of CRISPR-manipulated renal epithelial cell lines (IMCD cells), drugs/compounds, 3D and organ-on-a chip cell culture systems they present compelling data that show that AKAP220 anchors a complex of HDAC6 and Protein Phosphatase-1 (PP1) that controls the polymerization of actin and thereby affects cilia formation and elongation. Genetic or pharmacologic manipulations that disrupt AKAP220 or its ability to bind to PP1, inhibit HDAC6, or affect actin stability result in a similar phenotype of enhanced ciliogenesis and ciliary length. Given that polycystic kidney disease has been described as a ciliopathy, with the gene products of the two most common forms of the disease (polycystin-1 and polycystin-2) localized to the cilia, they tested whether inhibiting HDAC6 activity might affect cyst growth using a human iPSC organoid system. They found that organoids lacking polycystin-2 treated with tubacin had smaller cyst size compared to vehicle-treated mutants, leading them to propose manipulation of HDAC6 as a tentative therapeutic strategy for human autosomal dominant polycystic kidney disease and for ciliopathies characterized by defects in ciliogenesis.

      Strengths: These findings will be of interest to the ciliary community. They have identified a new factor and its associated partners that appear to regulate ciliogenesis. The studies follow a logical progression and are generally well-done with suitable controls, rigorous quantitation, and a reasonable level of replication (all done at least three times). They have used complementary methods (ie. Genetic manipulation, pharmacologic inhibition) to support their model, sometimes in combination to show that the underlying factor targeted by either genetics or drugs work through the same mechanism.

      Weaknesses: The major weakness of the report is in its attempt to be translational. Here, the report has a number of serious theoretical and experimental limitations. On the theoretical level, the rationale behind using an HDAC6 inhibitor is unclear given their data and their model. On the one hand, a prior study had reported that a non-specific inhibitor of HDACs slowed cyst growth in an orthologous mouse model of ADPKD. The current work could suggest that HDAC6 was the actual target in the prior work and that a specific inhibitor for HDAC6 should confer the same benefits. On the other hand, there are compelling reports that show that genetic inhibition of ciliogenesis actually attenuates cystic disease in orthologous mouse models of human ADPKD. The current paradigm is that preserved ciliary activity in the absence of Polycystin-1 or Polycystin-2 promotes cystic growth. This would suggest that any intervention that boosts ciliary function could actually worsen disease. And while the authors never directly comment on the functional properties of the "mutant" cilia that result from deletion of AKAP220 or inhibition of HDAC6, they imply that these "enhanced" cilia are functional by suggesting the use of HDAC6 inhibitors as therapy for ciliopathies that are the result of defective biogenesis. Their prior work also provides indirect support for the notion that the enhanced cilia are functional. AKAP220 knock-out mice are reported to be generally functional, apparently lacking phenotypes commonly associated with defective cilia structure or function. These contradictory observations suggest that one or more of the following conclusions: the "mutant" cilia are in fact poorly functional, the HDAC inhibitors are working through a different mechanism than that which has been proposed, or that the assay as used in this report is not a good read-out of cyst-modulating effects. The last point is particularly relevant for this report. The investigators scored effectiveness of tubacin based on the relative rate of growth of cysts treated with different concentrations of tubacin vs vehicle. In this assay, cyst growth is principally driven by rates of cellular proliferation. Tubacin is an anti-proliferative agent with some toxicity, and while it might be highly selective for HDAC6, these studies cannot distinguish between effects mediated through the AKAP22-HDAC6 pathway versus others. In sum, while tubacin or a similarly-acting drug may or may not be effective for slowing cyst growth, there are multiple reasons to think it isn't through the mechanism the authors propose.

      We are encouraged that reviewer 3 considers “our studies follow a logical progression and are generally well-done with suitable controls, rigorous quantitation, and a reasonable level of replication”. In terms of weaknesses, our reading of the reviewer’s detailed passage has identified two specific points that we can address.

      1) Lesions in cilia and polycystins are linked to Autosomal Dominant Polycystic Kidney Disease (Hughes et al., 1995; Mochizuki et al., 1996). Although there is general agreement on this point, the molecular details remain unclear and are inherently paradoxical. For example, loss of morphologically intact cilia favors a less severe cystic phenotype (Ma et al., 2013). In contrast, other investigators report that loss of intact primary cilia results in renal cystogenesis (Kolb and Nauli, 2008; Lin et al., 2003). How primary cilia can be pro-cystogenic in one context yet anti-cystogenic in another context remains an unsolved paradox for the field. We appreciate the need for further clarification on this point as raised by reviewer 3. This conundrum is now noted in the discussion on page 34, line 3.

      2) Searching for a therapeutic approach to restore functional primary cilia is the rationale behind our concluding studies. However, the complexity of genetic models for ADPKD and the above mentioned “cilia paradox” limits our ability to accurately predict how pharmacological agents targeting cilia might affect cellular models of cystogenesis. That being said, we realize that HDAC6 inhibitors have been used by other groups to target cyst size (Cebotaru et al., 2016; Yanda et al., 2017). The reviewer is correct in pointing out that the mechanism by which HDAC6 inhibitors act to inhibit cystogenesis could be less than straightforward given the multitude of functions for HDAC6. We have amended the discussion on page 34, line 5to reflect the reviewer’s valid point.

    1. Author Response:

      Reviewer #1:

      In this paper, the authors study one of the understudied aspects of the evolutionary transition to multicellularity: the evolution of irreversible somatic differentiation of germ cells. Division of labour via functional specialisation of cells to perform different tasks is pervasive across the tree of life. Various studies assume that the differentiation of reproductive cells ("germ-role cells" in this manuscript) into a non-reproducing cell type ("soma-role cells") is irreversible. In reality, the conditions that promote the evolution of this irreversible transition are unclear. Here, the authors set out to fill in this knowledge gap. They model a population of organisms that grow from a single germ-role cell and find the optimal developmental strategy in terms of differentiation probabilities, under different scenarios. Under their model assumptions, they show that irreversible somatic differentiation can evolve when 1) cell differentiation is costly, 2) somatic cells' contribution to growth rate is large, 3) organismal body size is large.

      Overall, I think the authors identified an interesting and neglected aspect of cellular differentiation and division of labour. I enjoyed reading the paper; I thought the writing was clear and the modelling approach was adequate to address the authors' question.

      Thank you for a detailed and constructive review.

      Some aspects that can be improved:

      1) Throughout the manuscript, I was somewhat confused about what system the authors have in mind: a colony with division of labour or a multicellular organism? While their model can potentially capture both, their Introduction and Discussion seem to be geared towards colonies at the transition to multicellularity, whereas the Results section gives the impression that the authors have multicellular organisms in mind (e.g. very large body sizes).

      We are interested in the transition from a colonial life, where tasks are distributed in time, to multicellular organisms, where tasks are divided between cells. As such, our model covers these scenarios as two limit cases. In the context of our study, we discuss examples from the nature where this transition is observed – e.g. among Volvocales algae. For the purpose of the necessary colony/organism size, we do not need to go further than 2^6 = 64 cells. However, to infer the patterns of the composition effect Fcomp (Fig.3 C,D), we consider organisms doing four more rounds of cell divisions before reproduction, leading to maturity size of 2^10=1024 cells. There, irreversible somatic differentiation can occur at a wide range of differentiation costs (see Fig.4 A). Also, smaller sizes put stronger restrictions on the composition effect Fcomp, so the distribution of parameters presented at Fig.3C,D taken at the n=6 instead of 10, would have much less data points and this could obfuscate the pattern found in this study. Overall, the scale of about 1000 cells, for which we report most of our modeling results, features entities with very diverse complexity: from undifferentiated colonies (ocean algae Phaeocystis antarctica), to intermediary life forms (slime molds slugs), to paradigm multicellular organisms (higher Volvocales and C. elegans). We think that the chosen range of the organism size is adequate to the comparison of entities with undifferentiated and differentiated cells. In the updated manuscript, we extend the exposition of organism size to reflect this aspect.

      2) From the point of view of someone who works on topics related to cancer and senescence, I think these fields are very much connected to the evolution of multicellularity. Maybe because I had multicellular organisms in mind rather than colonies with division of labour (above), I thought the manuscript missed this connection. Damage accumulation is key to Weismann and Kirkwood's theories of germ-soma divide and disposable soma, respectively, whereas dysregulated differentiation is one of the important aspects of tumour development (e.g. Aktipis et al. 2015). Making these links could also be relevant to discuss some of the model assumptions. For instance, the authors assume that fast growth comes with no cost in terms of cell damage, which may not always be the case (e.g. Ricklefs 2006) and reversibility of somatic differentiation can come at a cost of increased risk of somatic "cheaters" or cancerous cell lines.

      Thank you for this suggestion. Indeed, the aspect of cancer risk has not been considered in the initially submitted manuscript. In the updated manuscript, we introduce a model where differentiation is linked to the risk of an organism for death instead of a delay in development. The results with this model exhibit very similar pattern, see Fig.5. Hence, the term “cost of differentiation” can be interpreted more broadly than just cell division delay suggested by our main model.

      3) The authors assume the differentiation strategy (D) does not change over the lifetime (which equates to ontogenesis in their model, i.e. they do not consider mature lifespan). I wonder if this is really the case, or whether organisms/cells can respond to the composition of cells they perceive. For instance, at least in some animal tissues, a small number of stem cells are kept to replenish differentiated tissue cells when needed. I understand that making D plastic can make the model really complicated, but maybe it is worth talking about what strategy would evolve if D was not stable through ontogenesis (and mature lifespan). My initial guess is that if differentiation probabilities can change through life and if one considers cellular damage accumulation, senescence and cancer (as above), the conditions that favour irreversible somatic differentiation would expand.

      Indeed, we assume the differentiation strategy to be constant in our model. We do not know whether it is true at the brink of multicellularity and, for sure, once evolution makes a species complex enough, this assumption will become inadequate. Yet, when we consider a dynamic differentiation strategy, we find a very efficient but unrealistic solution: at the very beginning of a life cycle a germ-role cell gives rise to two soma-role cells, then these soma-role cells produce only soma-role cells and finally, at the very last round of cell division, they give rise to as many germ cells as possible. This scenario is the most efficient in terms of the rate of the organism development (100% of useful soma-role cells during growth), amount of offspring produced (every cell becomes a germ at the end of the day), and differentiation costs/risks (differentiation occurs only twice in a life time). Still, it is unrealistic. There must be some constraints on the flexibility of the dynamic differentiation strategy. We think that the exploration of the space of dynamical differentiation strategies and their constraints goes beyond the scope of the current study. Nevertheless, we are very interested to explore this topic further in following projects.

      Reviewer #2:

      This works seeks to determine the conditions in which simple multicellular groups can evolve irreversibly somatic cells, that is: a replicating cell lineage that provides cooperative benefits as the group grows and cannot de-differentiate into reproductive germ cells.

      This question is addressed with a well-constructed model that is easy to understand and provides intuitive results. Groups are composed of germ and soma cells that replicate synchronously until the group has reached a maximal size. When each type of cell divides, they may have different probabilities of producing daughter cells of each type, and the analysis determines the optimal differentiation probabilities for each type of cell depending on a variety of factors. Critically, irreversible somatic differentiation arises when the optimal probability for soma cells is to produce only soma cells.

      The elegance of the model means that the predictions are easy to interpret. First, when there is a higher cost for soma cells to produce germ cells, then a dedicated lineage of somatic cells is more favourable. Second, when soma cells produce only soma cells and germ cells can produce both types, the proportion of soma cells in the group will increase with each division. Consequently, for irreversible somatic cells to be optimal, germ cells must produce a small number of soma cells and these few must provide large benefits. Third, larger group sizes are required for a small number of soma cells to arise and provide sufficient benefits to the group.

      Inevitably, there is a trade-off between the benefits of a simple model and the costs of idealised assumptions.

      Among other assumptions, the model assumes that germ cells and soma cells replicate synchronously and at the same rate, and that soma cells provide benefits throughout the growth of the group, but do not increase the fecundity of germ cells in the last generation. Consequently, it is not clear to what extent the predictions of the model apply to the notable empirical cases where these assumptions do not hold. For instance, in the often-cited Volvocine algae, soma cells do not provide any benefits until the last generation of the group life cycle. This may help to explain why many Volcocine species have a very large number of somatic cells, counter to the second prediction of the model.

      Overall, this analysis is targeted and provides clear predictions within the bounds of its assumptions. Thus, these results provide a compelling framework or stepping-stone against which future models of germ-soma differentiation in alternate scenarios can be compared and evaluated.

      Thank you for the kind words and the well-thought review. Indeed, our model takes a number of simplifying assumptions. In the revised manuscript, we consider the model, in which the strongest of our simplifications – of simultaneous cell divisions - is violated. This asynchronous cell division model shows that irreversible differentiation may evolve, at least, under asymmetric differentiation costs. However, its evolution is observed less often than in a synchronous model.

      We absolutely agree that the design of our model does not replicate the details of Volvocine life cycles. However, our work is not aimed to be a model of germ-soma differentiation in Volvocales. Instead, we developed a simplistic model implementing features from a diverse range of organisms. While in higher Volvocales young colonies develop within a maternal organism, there is a wide range of colonial organisms, which grow from independently living single cell, e.g. colonial diatoms, Haptophytes Phaeocystis antarctica, and amoebazoan Phalansterium. We agree that the protection by maternal organism should play a major role in Volvocales and we are looking forward to investigate a follow-up model taking this factor into account.

      Reviewer #3:

      This paper provides a theoretical investigation of the evolution of somatic differentiation. While many studies have considered this broad topic, far fewer have specifically modelled the evolutionary dynamics of the reversibility of somatic differentiation. Within this subset, the conditions that select for irreversible somatic differentiation have appeared conspicuously restrictive. This paper suggests that an overly simplified fitness function (mapping the soma-germline composition of an organism to its growth rate) may be partly to blame. By allowing for a more complex fitness function (that captures the effect of upper and lower bounds for the contribution of somatic cells to organism fitness) the authors are able to identify three conditions for the evolution of irreversible somatic differentiation: costly cell differentiation (particularly for the redifferentiaton of soma-cell lineages to germ line); a high/near maximal organismal growth advantage imbued by a small proportion of soma cells; a large maturity size for the organism (typically greater than 64 cells).

      The model presented is simple and elegant, and succeeds in its aim of providing biologically feasible conditions for the evolution of irreversible somatic differentiation. Although the observation arising from the first condition (that high costs to reversible somatic differentiation promote the evolution of irreversible somatic differentiation) is perhaps unsurprising, the remaining conditions on the fitness function and the organism maturity size are interesting and initially non-obvious. Particularly tantalising is the prospect of testing these conditions, either against available empirical data, or in an experimental setting.

      The model does however make a number of simplifying assumptions, the effects of which may limit the broad applicability of the results.

      The first is to assume that cell division is synchronous, so that the costs of cell differentiation can be straight-forwardly averaged across the organism at each division. While the authors present a convincing biological justification for this assumption for algae such as Eudorina illinoiensis and Pleodorina californica, it is not immediately that this assumption should hold more widely.

      The second is to assume that the development strategy (i.e. the rates of differentiation between somatic and germ-line cell types) is constant throughout the organism's growth. For instance, there may be a growth advantage in the current model (aside from the advantages with respect to reduced mutation accumulation) of producing more germ cells early in the developmental programme, before transitioning to producing more soma cells in later development.

      Exploring such extensions to this model presents a seam of potential avenues for investigation in future theoretical studies.

      Thank you for the kind assessment of our findings. In the updated manuscript, we in addition investigated a model with asynchronous cell divisions. However, due to computational limitations, we are unable to fully replicate the investigation protocol of the original synchronous model. The execution time of the synchronous model scales linearly with the number of generations (n) and it still takes about a week to compute a single map like Fig.2A on a 2000-node cluster. The asynchronous model, in turn scales linearly with number of cell divisions, and hence, exponentially with generation time (as 2^n), which results in calculations taking much more time. For instance, the map in Fig.2A requires about 160 times more computer time with the asynchronous model. Nevertheless, we were able to implement this model for smaller organisms, with less statistics. There, we found that asynchronous model allows an evolution of irreversible somatic differentiation. However, it is suppressed comparing with the synchronous model – the fraction of Fcomp profiles promoting irreversible differentiation is much smaller and the organism size restriction is higher.

      To study a dynamic differentiation strategy would be wonderful. Early on, we considered studying this scenario. The crucial factor here is how flexible can the strategy be. In a naïve situation with a complete flexibility between every cell generation, the most successful strategy would be all cells of an organism first completely turn into soma-role to gain the maximal benefits, and then at the last step, they all convert back to germ to produce the maximal number of offspring. This is not observed in natural species; hence the flexibility of dynamic differentiation program must be constrained. We are curious to study what kind of constraints can lead to irreversible soma, but this task is beyond the scope of the current study. Our work with a constant differentiation program is the beginning of the future line of research. We are already looking forward to explore the space of dynamic differentiation programs in later projects.

    1. Author Response:

      Reviewer #1:

      Miyamoto and colleagues study the role of various oncogenes including MYC, HOXA9 and SOX4 in transformation of haematopoietic cells in vitro and in vivo. The authors analyze gene expression profiles and characterize leukemogenesis and cell survival resulting from manipulation of MLL-AF10 expression in myeloid leukemias. The experiments largely utilise ectopic over-expression of transgenes; hence results comparing relative "potency" of individual genes must be interpreted with caution due to supraphysiological levels of expression.

      Specific comments:

      1) In Figure 1A, the authors attempt to identify direct target genes of the MLL fusion protein MLL-ENL by performing ChIPseq using an anti-MLL antibody. Whether or not the signal can be attributed to MLL-ENL or wild-type MLL is unclear. Furthermore, genome-wide MLL-occupancy patterns are not shown. The work would be stronger if the authors could reconcile current data with other publicly available datasets for MLL or MLL-fusion protein occupancy in comparable contexts.

      We appreciate the inputs from the Editor and reviewers. Here we provide point-by-point response to the comments.

      We performed ChIP-seq analysis of HB1119 cells in which wildtype MLL, but not MLL-ENL, was specifically knocked down by shRNA (Figure 1A, Figure 1-figure supplement-1B,C), as shown In our previous publication (Okuda et al., 2017). Depletion of wildtype MLL did not affect the ChIP signals. Thus, we concluded that most of the MLL ChIP signals can be attributed to MLL-ENL. These data was presented in our previous report (Okuda et al., 2017) and partially adopted in the revised manuscript. MLL and MLL fusion proteins localize near transcription start sites (TSSs)( Figure 1-figure supplement-1C) because MLL has a CXXC domain that recognizes unmethylated CpGs (Okuda et al., 2014). Such TSS-centric localization of MLL is observed in many other non-MLL-rearranged cell lines such as HEK293T (embryonic kidney) and REH (Leukemia) cells (Miyamoto et al., 2020), in addition to HB1119 cells (MLL-rearranged leukemia cells)(Okuda et al., 2017). We mentioned this in the revised manuscript.

      2) It would appear (based on capitalisation), that the authors are over-expressing human transgenes in mouse cells. This is not necessarily a concern, but should be considered when interpreting the data. Likewise, whether the primers used for qPCR are detecting expression of the transgenes, the endogenous genes or both is important (for some of the figures such as Fig. 1C there seems to be a mix e.g. Myc vs HoxA9/HOXA9).

      We used human transgenes in the presented experiments. The qPCR probes for mouse Hoxa9 and Meis1 detected human HOXA9 and MEIS1, respectively. Hence, we described HOXA9/Hoxa9 and MEIS1/Meis1 to clearly indicate that these probes detect both human and mouse genes. The qPCR probe for mouse endogenous Myc did not detect the human MYC transgene. The samples producing qPCR signals for both endogenous murine genes and exogenous human transgenes are highlighted by # and faded color (Figure 1C).

      3) Most of the in vivo transplantation experiments have not been performed using fluorescent reporters or congenic recipients that would enable identification of donor-derived cells. Differences between the groups could be attributed to differential engraftment, or potentially even immune rejection (assuming ectopic expression of human transgenes in an immune-competent context). Disease features in recipient mice (beyond survival) are also not shown and expression of transgenes at end-point not confirmed.

      As for the possibility of immune rejection of the cells expressing human transgenes:

      As shown in Figure 3D, the mouse Myc gene was tested in addition to human MYC and did not induce leukemia in vivo, supporting that the enhanced MYC function alone is insufficient to induce leukemia under these experimental conditions. It has been shown that the mouse Hoxa9 gene is also a weak oncogene in vivo by Kroon et al. whereas it induced leukemia as a combination with Meis1(Kroon et al., 1998). The human HOXA9 transgene phenocopied mouse Hoxa9 in our assays. These results did not support the possibility of immune rejection of the human transgene-expressing cells. We mentioned that in the revised manuscript.

      As for the possibility of different engraftment:

      We did not mean to exclude the possibility of different engraftment as the reason of not inducing leukemia by a certain oncogene. It is likely that HOXA9 promotes engraftment of MYC-transduced cells by conferring survival advantage with BCL2/SOX4-mediated anti-apoptotic properties. It is possible that HOXA9 mediates additional functions to promote engraftment other than providing anti-apoptotic properties. However, we chose to focus on the HOXA9-mediated anti-apoptotic functions in this paper.

      As for the disease features:

      We have added the expression and immune phenotype data in Figure 4-figure supplement-3B and Figure 6-figure supplement-1B.

      In contrast to MLL-AF10 and HOXA9 containing gene sets (HOXA9-MEIS1, HOXA9-MYC), MYC-BCL2 induced lymphoid leukemia in vivo, consistent with the previous report (Luo et al., 2005). We speculate that HOXA9 and SOX4 are more functional in the myeloid lineage, while BCL2 functions more efficiently in the lymphoid lineage than in the myeloid lineage. Consequently, the MYC-BCL2 combination tended to induce lymphoid leukemia.

      As for the expression of the transgene at end-point:

      Regarding the expression of the transgenes in Figure 3D and 4E, we have provided the RT-qPCR data for the transgenes in Figure 4-figure supplement-3B. Regarding the expression of the transgenes in Figure 6B, the protein expression of the transgenes is shown in Figure 6-figure supplement-1A. Regarding the expression of the transgenes in Figure 7A, B, we have provided the RT-qPCR data for the transgenes in Figure 7-figure supplement 2A.

      4) The authors propose that the data in Figure 5B confirms direct regulation of Bcl2, Sox4 and Igf1 by HOXA9. However, the regulation could also be indirect e.g. HOXA9 could regulate a transcription factor that regulates those genes, or HOXA9 depletion could induce differentiation that may result in downregulation of those genes.

      The regulatory mechanisms by which HOXA9 controls the expression of its target genes are of great interest. Indeed, the expression of BCL2 and/or SOX4 could be regulated indirectly by HOXA9. We changed the wording by removing the word “direct” in the revised manuscript.

      Reviewer #2:

      The manuscript of Miyamoto et al. describes the synergistic function between HOXA9 and MYC downstream of MLL fusions in myeloid leukemogenesis. They show that MLL-AF10 expression up-regulates both HOXA9 and MYC expression. Gene expression profiles of immortalized cells (IC) indicate that distinct genetic pathways are driven by HOXA9 and MYC. Cooperativity in in vivo leukemogenesis between HOXA9 and MYC is shown. Apoptotic cell death is increased in MYC-IC and it is cancelled by overexpression of BCL2 or SOX4 that are up-regulated in HOXA9-IC but not in MYC-IC, suggesting that these genes are downstream of HOXA9 and responsible for cooperativity between MYC and HOXA9. Moreover, deletion of BCL2 or SOX4 inhibited MLL-AF10- or HOXA9/MEIS1-induced leukemogenesis. This study is well designed and experimental results are clearly presented. These results provide useful information for our understanding the mechanisms of HOX-associated leukemogenesis.

      We appreciate the comments from the reviewer and hope our study is useful for the understanding of leukemogenesis.

    1. Author Response:

      Reviewer #1:

      The use of DREADDs to modulate astrocyte signaling and evaluate the contribution of these glial cells to the control of the GnRH system is relevant, timely and innovative. The authors provide a combination of compelling neuroanatomical data, electrophysiological recordings and LH measures that support their key findings in males. The calcium imaging experiments are rigorously performed but the data need to be validated on a larger number of animals. The authors also explore possible sex differences in the process but several caveats need to be overcome before reaching a conclusion on this aspect. Several additional points should be addressed in order to improve the manuscript, as elaborated below.

      1) It would be relevant to provide an estimation of the fraction of GnRH or KNDy neuron populations surrounded by infected astrocytes. This data would be interesting to discuss in relation with previous work showing that activation of only a fraction of GnRH neurons can induce LH release.

      This information has been added to the results and discussion parts (line 64-69 and 245-247 and 255-256).

      2) In the characterization of cell targeting, the authors should specify whether GFAP+ alpha tanycytes lining the dorsal part of the arcuate nucleus were also infected by viral constructs injected into the arcuate nucleus.

      This point has been added to the results, in the characterization of cell targeting part (line 76-77).

      3) Calcium imaging analyses were performed on 1 to 2 animals per group, which is below the minimum number of animals required for statistical analyses. In all experiments, a minimum of 3 animals per group is required.

      We agree philosophically that the statistical analysis of small n is not ideal. Our goal with the calcium imaging was to test if treatment with CNO altered intracellular calcium as a technical control, not to make any statement about the dynamics of this response. We were frankly surprised that significant P values emerged given the low numbers. We included these for transparency but could delete if that is the wish of the editorial group.

      4) To evaluate whether PGE2 mediates the effect of astrocyte activation on GnRH neuron firing, the authors pretreated slices with a mix of EP1 and EP2 antagonists. The rationale for choosing this combination should be explained considering that EP1 was previously shown not to be involved in the stimulatory effect of PGE2 on GnRH neuronal activity (Clasadonte et al., 2011, PMID: 21896757).

      As the reviewer points out, the main effect of PGE2 on GnRH neurons is likely through EP2. We chose a mix of EP1 and EP2 because EP1 is also expressed in GnRH neurons and we wanted to block all potential PGE2 receptors present on GnRH neurons.

      5) It is not clear whether the characterization data shown in figure 1 are also applicable for the experiments performed on females. If it is not the case, the data obtained in female should be added.

      The characterization done in Figure 1 was in males. A similar colocalization pattern emerged in Figure 6 with the co-injection of AAV5-GCaMP6 and AAV5-Gq, with the exception of higher infection of cells with neuronal morphology likely attributable to the longer post injection period.

      6) As fairly pointed out by the authors, there are major caveats in the experiments performed in females. They indicate that recordings were not made at the same moment of the day between males and females but also that the time post-surgery significantly differed between the 2 sexes (less than 2 months in males vs 5 months in females). Therefore, any conclusion about a possible sex difference can unfortunately not be drawn from these data. These experiments need to be reproduced in a rigorously controlled manner in order to reach a definitive conclusion.

      We agree the female data are limited by caveats, which we attempted to be transparent about. As mentioned in our response above, we still feel providing these data while acknowledging their limitations may be of use to others working in this area. It is likely to be quite some time before we have a replacement set of female data and if the present data can inform the experimental design of others, we’d like to share it.

      7) No electrophysiological recordings are shown. Representative recordings of GnRH and KNDy neuronal activity should be added to the figures.

      We have added representative recordings of GnRH and KNDy neurons to figure 4.

    1. Author Response:

      Reviewer #1:

      Zappia et al investigate the function of E2F transcriptional activity in the development of Drosophila, with the aim of understanding which targets the E2F/Dp transcription factors control to facilitate development. They follow up two of their previous papers (PMID 29233476, 26823289) that showed that the critical functions of Dp for viability during development reside in the muscle and the fat body. They use Dp mutants, and tissue-targetted RNAi against Dp to deplete both activating and repressive E2F functions, focussing primarily on functions in larval muscle and fat body. They characterize changes in gene expression by proteomic profiling, bypassing the typical RNAseq experiments, and characterize Dp loss phenotypes in muscle, fat body, and the whole body. Their analysis revealed a consistent, striking effect on carbohydrate metabolism gene products. Using metabolite profiling, they found that these effects extended to carbohydrate metabolism itself. Considering that most of the literature on E2F/Dp targets is focused on the cell cycle, this paper conveys a new discovery of considerable interest. The analysis is very good, and the data provided supports the authors' conclusions quite definitively. One interesting phenotype they show is low levels of glycolytic intermediates and circulating trehalose, which is traced to loss of Dp in the fat body. Strikingly, this phenotype and the resulting lethality during the pupal stage (metamorphosis) could be rescued by increasing dietary sugar. Overall the paper is quite interesting. It's main limitation in my opinion is a lack of mechanistic insight at the gene regulation level. This is due to the authors' choice to profile protein, rather than mRNA effects, and their omission of any DNA binding (chromatin profiling) experiments that could define direct E2F1/ or E2F2/Dp targets.

      We appreciate the reviewer’s comment. Based on previously published chromatin profiling data for E2F/Dp and Rbf in thoracic muscles (Zappia et al 2019, Cell Reports 26, 702–719) we discovered that both Dp and Rbf are enriched upstream the transcription start site of both cell cycle genes and metabolic genes (Figure 5 in Zappia et al 2019, Cell Reports 26, 702–719). Thus, our data is consistent with the idea that the E2F/Rbf is binding to the canonical target genes in addition to a new set of target genes encoding proteins involved in carbohydrate metabolism. We think that E2F takes on a new role, and rather than being re-targeted away from cell cycle genes. We agree that the mechanistic insight would be relevant to further explore.

      Reviewer #2:

      The study sets out to answer what are the tissue specific mechanisms in fat and muscle regulated by the transcription factor E2F are central to organismal function. The study also tries to address which of these roles of E2F are cell intrinsic and which of these mechanisms are systemic. The authors look into the mechanisms of E2F/Dp through knockdown experiments in both the fat body* (see weakness) and muscle of drosophila. They identify that muscle E2F contributes to fat body development but fat body KD of E2F does not affect muscle function. To then dissect the cause of adult lethality in flies, the authors proteomic and metabolomic profiling of fat and muscle to gain insights. While in the muscle, the cause seems to be an as of yet undetermined systemic change , the authors do conclude that adult lethality in fat body specific Dp knockdown is the result of decrease trehalose in the hemolymph and defects in lipid production in these flies. The authors then test this model by presenting fat body specific Dp knockdown flies with high sugar diet and showing adult survival is rescued. This study concurs with and adds to the emerging idea from human studies that E2F/Dp is critical for more than just its role in the cell-cycle and functions as a metabolic regulator in a tissue-specific manner. This study will be of interest to scientists studying inter-organ communication between muscle and fat.

      The conclusions of this paper are partially supported by data. The weaknesses can be mitigated by specific experiments and will likely bolster conclusions.

      1) This study relies heavily on the tissue specificity of the Gal4 drivers to study fat-muscle communication by E2F. The authors have convincingly confirmed that the cg-Gal4 driver is never turned on in the muscle and vice versa for Dmef2-Gal4. However, the cg-Gal4 driver itself is capable of turning on expression in the fat body cells and is also highly expressed in hemocytes (macrophage-like cells in flies). In fact, cg-Gal4 is used in numerous studies e.g.:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4125153/ to study the hemocytes and fat in combination. Hence, it is difficult to assess what contribution hemocytes provide to the conclusions for fat-muscle communication. To mitigate this, the authors could test whether Lpp-Gal4>Dp-RNAi (Lpp-Gal4 drives expression exclusively in fat body in all stages) or use ppl-Gal4 (which is expressed in the fat, gut, and brain) but is a weaker driver than cg. It would be good if they could replicate their findings in a subset of experiments performed in Figure 1-4.

      This is indeed an important point. We apologize for previously not including this information. Reference is now on page 7.

      Another fat body driver, specifically expressed in fat body and not in hemocytes, as cg-GAL4, was tested in previous work (Guarner et al Dev Cell 2017). The driver FB-GAL4 (FBti0013267), and more specifically the stock yw; P{w[+mW.hs]=GawB}FB P{w[+m*] UAS-GFP 1010T2}#2; P{w[+mC]=tubP-GAL80[ts]}2, was used to induce the loss of Dp in fat body in a time-controlled manner using tubGAL80ts. The phenotype induced in larval fat body of FB>DpRNAi,gal80TS recapitulates findings related to DNA damage response characterized in both Dp -/- and CG>Dp- RNAi (see Figure 5A-B, Guarner et al Dev Cell 2017). The activation of DNA damage response upon the loss of Dp was thoroughly studied in Guarner et al Dev Cell 2017. The appearance of binucleates in cg>DpRNAi is presumably the result of the abnormal transcription of multiple G2/M regulators in cells that have been able to repair DNA damage and to resume S-phase (see discussion in Guarner et al Dev Cell 2017). More details regarding the fully characterized DNA damage response phenotype were added on page 6 & 7 of manuscript.

      Additionally, r4-GAL4 was also used to drive Dp-RNAi specifically to fat body. But since this driver is weaker than cg-GAL4, the occurrence of binucleated cells in r4>DpRNAi fat body was mild (see Figure R1 below).

      As suggested by the reviewer, Lpp-GAL4 was used to knock down the expression of Dp specifically in fat body. All animals Lpp>DpRNAi died at pupa stage. New viability data were included in Figure 1-figure supplement 1. Also, larval fat body were dissected and stained with phalloidin and DAPI to visualize overall tissue structure. Binucleated cells were present in Lpp>DpRNAi fat body but not in the control Lpp>mCherry-RNAi (Figure 2-figure supplement 1B). These results were added to manuscript on page 7.

      Furthermore, Dp expression was knockdowned using a hemocyte-specific driver, hml-GAL4. No defects were detected in animal viability (data not shown).

      Thus, overall, we conclude that hemocytes do not seem to contribute to the formation of binucleated-cells in cg>Dp-RNAi fat body.

      Finally, since no major phenotype was found in muscles when E2F was inactivated in fat body (please see point 3 for more details), we consider that the inactivation E2F in both fat body and hemocytes did not alter the overall muscle morphology. Thus, exploring the contribution of cg>Dp-RNAi hemocytes in muscles would not be very informative.

      2) The authors perform a proteomics analysis on both fat body and muscle of control or the respective tissue specific knockdown of Dp. However, the authors denote technical limitations to procuring enough third instar larval muscle to perform proteomics and instead use thoracic muscles of the pharate pupa. While the technical limitations are understandable, this does raise a concern of comparing fat body and muscle proteomics at two distinct stages of fly development and likely contributes to differences seen in the proteomics data. This may impact the conclusions of this paper. It would be important to note this caveat of not being able to compare across these different developmental stage datasets.

      We appreciate the suggestion of the reviewer. This caveat was noted and included in the manuscript. Please see page 11.

      3) The authors show that the E2F signaling in the muscle controls whether binucleate fat body nuclei appear. In other words, is the endocycling process in fat body affected if muscle E2F function is impaired. However, they conclude that imparing E2F function in fat does not affect muscle. While muscle organization seems fine, it does appear that nuclear levels of Dp are higher in muscles during fat specific knock-down of Dp (Figure 1A, column 2 row 3, for cg>Dp-RNAi). Also there is an increase in muscle area when fat body E2F function is impaired. This change is also reflected in the quantification of DLM area in Figure 1B. But the authors don't say much about elevated Dp levels in muscle or increased DLM area of Fat specific Dp KD. Would the authors not expect Dp staining in muscle to be normal and similar to mCherry-RNAi control in Cg>dpRNAi? The authors could consider discussing and contextualizing this as opposed to making a broad statement regarding muscle function all being normal. Perhaps muscle function may be different, perhaps better when E2F function in fat is impaired.

      The overall muscle structure was examined in animals staged at third instar larva (Figure 1A-B). No defects were detected in muscle size between cg>Dp-RNAi animals and controls. In addition, the expression of Dp was not altered in cg>Dp-RNAi muscles compared to control muscles. The best developmental stage to compare the muscle structure between Mef2>Dp-RNAi and cg>Dp-RNAi animals is actually third instar larva, prior to their lethality at pupa stage (Figure 1- figure supplement 1).

      Based on the reviewer’s comment, we set up a new experiment to further analyze the phenotype at pharate stage. However, when we repeated this experiment, we did not recover cg>Dp-RNAi pharate, even though 2/3 of Mef2>Dp-RNAi animals survived up to late pupal stage. We think that this is likely due to the change in fly food provider. Since most cg>DpRNAi animals die at early pupal stage (>75% animals, Figure 1-figure supplement 1), pharate is not a good representative developmental stage to examine phenotypes. Therefore, panels were removed.

      Text was revised accordingly (page 6).

      4) In lines 376-380, the authors make the argument that muscle-specific knockdown can impair the ability of the fat body to regulate storage, but evidence for this is not robust. While the authors refer to a decrease in lipid droplet size in figure S4E this is not a statistically significant decrease. In order to make this case, the authors would want to consider performing a triglyceride (TAG) assay, which is routinely performed in flies.

      Our conclusions were revised and adjusted to match our data. The paragraph was reworded to highlight the outcome of the triglyceride assay, which was previously done. We realized the reference to Figure 6H that shows the triglyceride (TAG) assay was missing on page 17. Please see page 17 and page 21 of discussion.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors have studied mutations in the K13 gene that is linked to Artemisinin resistance in a range of African parasites. They show that these mutations can confer resistance in a in vitro survival assay but that they are often linked to reduced fitness. The authors also show that different parasites have less of an impact on fitness when the K13 mutations are introduced in line with the suggestion that the overall genetic background is critical for transmission of K13 mutations. The paper also shows evidence that genes potentially contributing to the genetic background are not involved.

      The overall work involves a significant amount of work that to generate a wide range of different parasite lines that allow a detailed assessment of how different mutations interact with the genetic background of the parasite. This provides a significant amount of new insights. A key conclusion the authors draw from this work relates to the relationship between fitness and resistance and by inference on why artemisinin resistance has occurred in SE Asia. While this indeed would be a striking conclusion I think the data at this stage is not strong enough to make this claim. The claim is mainly based on Figure 3 E and F as well as 5 C and D. While indeed, initially it looks like RSA has much less of a survival impact in Dd2 there is some concern that the data is generated using different baselines (isogenic WT parasite in Figure 3 and Dd2eGFP in Figure 5 D). This is noteworthy as in Figure 5C the Dd2wt parasite is used and the fitness cost appears to be different.

      Please see our reply below to Reviewer 1 Comment #2.

      A striking finding is that the UG659C560Y line appears to have a relatively small fitness cost - especially if looked at for the whole 40 generations rather than the somewhat arbitrarily picked 38 days. This data could suggest that there are parasites in Africa that have the capacity to acquire resistance with minimal cost to fitness.

      We thank the Reviewer for this suggestion and have now recalculated our fitness data using a 36-day period, which we have adopted as a standardized timeline and which allows us to compare across all prior and newly acquired fitness assays. We note that this is already relatively lengthy compared to a number of other reports in the literature. For example, Baragana et al. (2015, Nature) measured competitive growth rates over a 14-day period. Gabryszweski et al. (2016, Mol Biol Evol) used 20-day assays. Siddiqui et al. (2020, mBio) used longer 48-day assays. We agree with the Reviewer that our data suggest that some African strains can achieve in vitro ART resistance with a minimal cost to fitness. In support of this, our new data presented in the revised Figure 3 provide evidence for the R561H mutation having little to no fitness cost in 3D7 parasites that are closely related to Rwandan isolates (see our response above to Comment #2 from the Editors).

      As pointed out above, we now include new fitness data on the R561H variant in African parasites, based on competition assays with an eGFP reporter line. To standardize our fitness data, we now have analyzed our data to day 36 across assays, as follows:

      Methods lines 538-539: “Cultures were maintained in 12-well plates and monitored every four days over a period of 36 days (18 generations) by harvesting at each time point a fraction of each co-culture for saponin lysis.”

      Figure 3 Legend lines 920-921: “K13 mutant clones were co-cultured at 1:1 starting ratios with isogenic K13 wild-type controls over a period of 36 days.”

      The selective sweep to C560Y in SE Asia is something that has been known for a while. It is striking that it has been selected as based on the data presented here P563L has a similar fitness and RSA profile. The authors could explore this further.

      The Reviewer highlights the important point that RSA values and fitness were comparable for C580Y and P553L, yet only the former swept across Southeast Asia. This would argue for additional factors that contribute to the successful dissemination of C580Y. These could include favorable genetic backgrounds that help propagate C580Y mutant parasites, or increased transmission rates, relative to P553L. To date, reasons for C580Y’s success beyond its moderate resistance and relatively minor fitness cost have not been firmly established. One possibility might be related to piperaquine pressure that selected for amplification in plasmepsins II and III as well as novel mutations in PfCRT, which emerged in parasites harboring K13 C580Y and which have been shown to spread as a series of genetically closely related sublineages (referred to as KEL1/PLA1; Hamilton et al. 2019, Lancet Infect Dis; Imwong et al. 2020, Lancet Infect Dis). These points are discussed as follows:

      Discussion lines 361-369: “Our studies into the impact of K13 mutations on in vitro growth in Asian Dd2 parasites provide evidence that that the C580Y mutation generally exerts less of a fitness cost relative to other K13 variants, as measured in K13-edited parasites co-cultured with an eGFP reporter line. A notable exception was P553L, which compared with C580Y was similarly fitness neutral and showed similar RSA values. P553L has nonetheless proven far less successful in its regional dissemination compared with C580Y (Menard et al., 2016). These data suggest that additional factors have contributed to the success of C580Y in sweeping across SE Asia. These might include specific genetic backgrounds that have favored the dissemination of C580Y parasites, possibly resulting in enhanced transmission potential (Witmer et al., 2020), or ACT use that favored the selection of partner drug resistance in these parasite backgrounds (van der Pluijm et al., 2019).”

      Overall, the main conclusion that there are K13 mutations that can confirm resistance to Art in the context of African parasites is clearly presented and convincing and this highlights the risk that exists for public health officials in African nations. What would be interesting from a readers perspective is how likely it is that this loss of fitness hurdle is going to be overcome in Africa and whether the risk of resistance development will increase as transmission rates drop.

      We appreciate this suggestion from the Reviewer. Our revised manuscript now addresses this topic as follows:

      Discussion lines 393-399: “It is nonetheless possible that secondary determinants will allow some African strains to offset fitness costs associated with mutant K13, or otherwise augment K13-mediated ART resistance. Identifying such determinants could be possible using genome-wide association studies or genetic crosses between ART-resistant and sensitive African parasites in the human liver-chimeric mouse model of P. falciparum infection (Vaughan et al., 2015; Amambua-Ngwa et al., 2019). Reduced transmission rates in areas of Africa where malaria is declining, leading to lower levels of immunity, may also benefit the emergence and dissemination of mutant K13 (Conrad and Rosenthal, 2019).”

      Reviewer #2 (Public Review):

      In this paper, the investigators performed two large-scale surveys of the propeller domain mutations in the K13 gene, a marker of artemisinin (ART) resistance, in African (3299 samples) and Cambodian (3327 samples) Plasmodium falciparum populations. In the African parasite population, they identified the K13 R561H variant in Rwanda, while parasites from other areas had the wild-type K13. In Cambodia, however, they documented a hard genetic sweep of C580Y mutation that occurred rapidly. They generated the C580Y and M579I mutations in four different parasite strains with different genetic backgrounds and found that these mutations conferred varying degrees of in vitro ART resistance. They further edited the SE Asian parasite strains Dd2 and Cam3.II with 7 K13 mutations and found that all the propeller domain mutations conferred ART resistance in the Dd2 parasite, whereas three of the mutations did so in the Cam3.II background. The R561H and C580Y mutations were also evaluated in several parasites collected from Thailand. In vitro growth competition analysis showed that K13 mutations caused substantial fitness costs in the African parasite background, but much less fitness costs in the SE Asian parasites. This study demonstrated the potential emergence of ART resistance in African parasite populations and offered insights into the importance of the parasite's genetic background in the emergence of ART resistance.

      We thank the Reviewer for this thorough summary and favorable assessment of our work.

      Reviewer #3 (Public Review):

      Stokes et al address the question: Why have mutations in the K13 gene spread rapidly across South East Asia and led to widespread treatment failure with artemisinin-based antimalarials? In contrast, why do K13 mutations remain quite rare in Africa, and artemisinin-based antimalarials remain effective?

      The work combines a number of different studies on different parasites of different origins. Gene editing has been used to assess the effects of K13 mutations in different parasite backgrounds, leading to a very complex view of the competing factors of level of resistance conferred and fitness cost.

      The authors put forward the hypothesis that fitness costs associated with K13 mutations select against their dissemination in the high malaria transmission settings in Africa. However, the complexity of the genetic backgrounds of the parasites makes it difficult to tease out the contributing factors.

      We agree that these are complex and multifactorial areas of investigation and appreciate the Reviewer’s summary.

    1. Author Response:

      Reviewer #1 (Public Review):

      This work described a novel approach, host-associated microbe PCR (hamPCR), to both quantify microbial load compared to the host and describe interkingdom microbial community composition with the same amplicon library preparation. The authors used the host single (low-copy) genes as PCR targets to set the host reference for microbial amplicons. To handle the problem that in many cases, the host DNA is excessive compared to the microbiome DNA, the authors adjusted the host-to-microbe amplicon ratio before sequencing. To prove the concept, hamPCR was tested with the synthetic communities, was compared to the shotgun metagenomics results, was applied in the biological systems involving the interkingdom microbial communities (oomycetes and bacteria), or diverse hosts, or crop hosts with large genomes. Substantial data from diverse biological systems confirmed the hamPCR approach is accurate, versatile, easy-to-setup, low-in-cost, improving the sample capacity and revealing the invisible phenomena using regular microbial amplicon sequencing approaches.

      Since the amplification of host genes would be the key step for this hamPCR approach, the authors might also include more strategy discussions about the selection of single (low copy) genes for a specific host and the primer design for the host genes to guarantee the hamPCR usage in the biological systems other than those mentioned in the manuscript.

      A deeper discussion about the design of suitable host primers has been added to the Supplementary Information as Supplementary Discussion 3, and is now mentioned in the main text in the first section of the Methods.

      Reviewer #2 (Public Review):

      Lundberg and colleagues provide a detailed set of data showing the utility of host-associated microbe PCR. By simultaneously amplifying microbial community and host DNA, hamPCR provides an opportunity to measure the microbial load of a sample. I was largely convinced about the robustness of this approach after seeing the many different optimization datasets that were presented in the paper. I also appreciated the various applications of hamPCR that were demonstrated and compared to other standard approaches (CFU counting and shotgun metagenomics, for example). As clearly illustrated in Figure 6f, hamPCR could dramatically improve our understanding of interactions within microbiomes as it helps remove issues of relative abundance data.

      One challenge about the approach presented is that it cannot be quickly adapted to a new system. Unlike most primers for 'standard' microbial amplicon sequencing, considerable time will be required to determine which host gene to target, how to make that host gene size larger than the size of the microbial amplicon, etc. This may limit wide adoption of hamPCR in the field. I do appreciate the authors providing some details in the Supplement on how they developed hamPCR for the several different systems described in this paper. The helpful tips may make it easier for others to develop hamPCR for their own systems.

      Additional strategy of primer design was addressed in the response to Reviewer #1 Public Review.

      An issue that repeatedly came up is that at high and low ends of host:microbe ratios, inaccurate estimates can occur. For example, with high levels of microbial infection, the authors note that hamPCR has reduced accuracy. The authors propose three solutions to this problem (1. altering host:microbe amplicon ratio, 2. use a host gene with higher copy number, 3. and adjust concentrations of host primers), but only present data for #1 and 3. Do they have any data to show that #2 would actually work?

      One instance of potential unreliable load that sticks out in the paper is in Figure 5b. The authors note that this is likely due to unreliable load calculation. Is this just one of 4 replicates? What are other potential reasons this would be an outlier and how can the authors rule this out? Did they repeat the hamPCR for this outlier to confirm the striking difference from the other three samples in the eds1-1 Hpa + Pto sample?

      Both qPCR and amplicon sequencing can be used to detect copy number variation in genomes [1]. Because amplicon-based methods are known to be sensitive to small differences in gene copy number, we are confident, without generating additional data on the topic, that #2 would work.

      Furthermore, bacterial genomes from different taxa are known to vary slightly in their copy number of 16S rDNA, usually from between 1 to about 15 copies [2]. These variations are reflected in sequence counts from amplicon sequencing, biasing the counts towards taxa with more 16S rDNA gene copies [2, 3, 4]. This phenomenon has been well documented, distorts the accurate description of microbial communities, and therefore has led to some efforts to correct 16S rDNA gene amplicon data by dividing the counts from each taxon by the (estimated) 16S rDNA copy number of that taxon, so that the counts better reflect the numbers of bacterial cells.

      Because amplicon methods are sensitive to copy number variation (whether those copies are from inside the same cell, or coming from different cells), we reasoned that choosing a host gene with a higher copy number, similar to the effects of copy number variation on 16S rDNA gene counts, will increase the representation of that host amplicon in the final library (because there will be more template host DNA molecules available to amplify). We did not test this explicitly - we think the evidence from literature is strong support on its own. We have added to the paper a statement that now references the Kembel 2012 paper, which we hope adequately supports our claim:

      “Second, a host gene with a higher copy number could be chosen for HM-tagging throughout the entire project, which would increase host representation by a factor of that copy number (Kembel et al., 2012).”

      1) Martins, W.F.S., Subramaniam, K., Steen, K. et al. Detection and quantitation of copy number variation in the voltage-gated sodium channel gene of the mosquito Culex quinquefasciatus . Sci Rep 7, 5821 (2017). https://doi.org/10.1038/s41598-017-06080-8

      2) Kembel, S. W., Wu, M., Eisen, J. A., & Green, J. L. (2012). Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Computational Biology, 8(10), e1002743. https://doi.org/10.1371/journal.pcbi.1002743

      3) Starke, R., Pylro, V. S., & Morais, D. K. (2021). 16S rRNA Gene Copy Number Normalization Does Not Provide More Reliable Conclusions in Metataxonomic Surveys. Microbial Ecology, 81(2), 535–539. https://doi.org/10.1007/s00248-020-01586-7

      4) Louca, S., Doebeli, M., & Parfrey, L. W. (2018). Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem. Microbiome, 6(1), 41. https://doi.org/10.1186/s40168-018-0420-9

      Could the DNA extraction method used cause biases in hamPCR for/against either the host or the microbiome? If two different labs study the same system (let's say bacterial communities growing on Arabidopsis leaves) but use different DNA extraction approaches, would we expect them to obtain different answers using hamPCR? Did the authors try several different DNA extraction methods to see if this is an issue? Or has another team of researchers considered this and addressed it in a separate paper? I would appreciate seeing either data to address this or a discussion paragraph that reasons through this.

      Differences in DNA extraction method will certainly change the results, not only of the microbe-to-plant ratio, but also in the representation of microbes, because microbes differ in their sensitivity to different lysis methods. This is a well-documented concern in microbiome studies and has been demonstrated by using different methods on the same mock community in papers such as the following:

      Yuan, S., Cohen, D. B., Ravel, J., Abdo, Z., & Forney, L. J. (2012). Evaluation of methods for the extraction and purification of DNA from the human microbiome. PloS One, 7(3), e33865. https://doi.org/10.1371/journal.pone.0033865

      Albertsen, M., Karst, S. M., Ziegler, A. S., Kirkegaard, R. H., & Nielsen, P. H. (2015). Back to Basics--The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated Sludge Communities. PloS One, 10(7), e0132783. https://doi.org/10.1371/journal.pone.0132783

      In short, if the DNA is not extracted because plant or microbial cells are not lysed, it cannot be amplified in PCR. However, there is a good overall strategy to minimize the problem, as also proposed in the above papers, and that is to err on the side of a harsher lysis (using strong bead beating, as we have done), since this will leave fewer cells unlysed (and thus less information will be hidden). We note that similar concerns about lysis methods changing results also apply to DNA extraction for qPCR and live bacterial isolation for CFU counting (for which too harsh a lysis will kill bacteria, but too gentle a lysis will not release them from host tissue).

      We addressed this in two places. First, in the results section we mention briefly the following:

      “All DNA preps employed heavy bead beating to ensure thorough lysis of both host and microbes, as an incomplete DNA extraction can lead to underrepresentation of hard-to-lyse cells (Albertsen et al., 2015; Yuan et al., 2012).”

      Second, we added a paragraph to the discussion about sample selection and DNA extraction as follows:

      “Because hamPCR can only quantify the DNA available in the template, choice of sample and appropriate DNA extraction methods are very important. In particular, the sample must in the first place include a meaningful quantity of host DNA. For example, although there is some host DNA in mammalian fecal samples or in plant rhizosphere soil samples, this host DNA does not accurately represent the sample volume, and therefore relating microbial abundance to host abundance probably has less value in these cases. Further, the DNA extraction method chosen must lyse both the host and microbial cell types. An enzymatic lysis suitable for DNA extraction from pure cultures of E. coli may not lyse host cells or even other microbes. Appropriate DNA preparation methods for metagenomics have been thoroughly evaluated elsewhere (Albertsen et al., 2015; Yuan et al., 2012), and a common point of agreement is that strong bead-beating increases the yield and completeness of the DNA extraction, but comes at the cost of some DNA fragmentation. Especially for short reads, as we have used here, this fragmentation is not a problem, and we recommend to err on the side of a harsher lysis, using strong bead beating potentially preceded by grinding steps using a mortar and pestle as necessary for tougher tissue.”

      One emerging theme in microbiome science is to have consistent methodologies that are used across studies/labs to allow direct comparisons of microbiome datasets. Standardization of approaches may make microbiome science more robust in the long-term. Given much of the nuance in developing hamPCR for different systems, my impression is that this method is best for comparing samples within a particular host-microbe system and not across systems. For example, it may be challenging to directly compare my bacterial load hamPCR data from Arabidopsis to another lab's if we used different Arabidopsis host genes or if we used different 16S gene regions. Can the authors unpack this a bit in a discussion paragraph? If it is widely adopted, is there a way to standardized hamPCR so that it can be consistently used and compared across datasets? Or should that not be the goal?

      There appears to be considerable non-specific amplification or dimers in the gels presented throughout the manuscript. Could this non-specific amplification vary across host-microbe primer combinations? Would this impact quantification of host and microbial amplicons?

      Non-specific amplification / dimers do vary across host-microbe primer combinations. Indeed, they also vary between common 16S rRNA primer pairs used on their own (not shown). Fortunately non-specific amplicons amplified during the exponential PCR step do not, at least with our method, seem to impact quantification of host and microbial amplicons.

      One reason is that non-specific amplicons can be recognized by their sequence and ignored. After the sequences of the amplicons have been extracted from the short read data, only those that match expected length and sequence patterns of the targeted amplicons need to be counted. Non-specific amplicons are certainly a nuisance because they represent wasted sequencing resources, but they can be excluded bioinformatically and therefore do not change the accuracy of the microbial load measurement. This is in contrast to ddPCR/qPCR, for which any off-target amplicons are also quantified!

      A second reason is that the sensitive exponential amplicon step of hamPCR is done with a single primer pair. Off-target sequences do squander PCR reagents including primers and dNTPs, such that they become limiting at earlier cycles than without off-target sequences, but because the exponential PCR step is done with a single primer pair, such inferior amplification conditions are shared by all molecules, and therefore do not differentially affect the host or microbial amplicon. Any off-target binding occurring in the initial tagging reaction (before the PCR step) would certainly be a concern if the reaction was carried on long enough, because for example the microbial primer pair might become limiting at an earlier cycle number, leading to underestimates of microbial load. However, limiting the tagging cycle to a low number of cycles ensures that – should primers targeting a particular host or microbial amplicon be non-specific – the fraction still available to bind the correct sequence remains in excess.

    1. Author Response:

      Reviewer #1:

      The limitations of the approach could be included in the last paragraph of the introduction. It would similarly be useful in the discussion to not only compare photopic stimulation with other approaches, but to an ideal approach.

      We have reviewed the main limitations in the Discussion section.

      Is it possible to modulate the hair bundle position continuously - e.g. sinusoidally? If not, this would be useful to state as a limitation.

      We have included a new illustration—Figure 4—in which we show hair bundle responses to sinusoidal sweeps in frequencies between 10 Hz and 2000 Hz. We have also included in the Appendix a new figure (Figure 4) in which we show hair-bundle responses to continuously increasing and decreasing ramps. We have added a new section entitled “Variety of stimuli”:

      “The fiber's power delivered onto a hair bundle can be modulated by changing the laser's power at the source. By combining analog and digital signals to drive the laser's output, we were able to stimulate hair bundles with an assortment of stimuli: sine waves, frequency sweeps, step pulses of various magnitudes, and continuously ascending and descending ramps (see Appendix 1, Figure 4). The responses of bullfrog's hair bundles to sinusoidal frequency sweeps at frequencies up to 2 kHz (Fig. 4). In this case the upper boundary in the stimulus frequency was set by the ability of the hair bundle to follow, rather than by the limitations of the stimulation method.”

      The caption of the new Figure 4 reads: “Responses of hair bundles from the bullfrog's sacculus to sinusoidal frequency sweeps between 10 Hz and 200 Hz (A), 100 Hz and 500 Hz (B), and 1 kHz and 2 kHz (C). Each stimulus was achieved by driving the laser's source such that the amplitude of the sweep peaked at the maximum power output—12.5 mW for this fiber—while keeping its minimum above 0 mW. Each hair bundle was stimulated in the positive direction with 561 nm laser light; each trace is the average of 25 responses. Panel C portrays two 20 ms-long representative segments of the stimulus waveform, which would be unintelligible if displayed in full. These segments, located near the beginning and end of the sweep, are aligned with the magnification of the simultaneous hair-bundle response (red dashed boxes).

      The caption of the new Figure 4 in Appendix 1 reads: “Example of the variety of stimuli offered by photonic-force stimulation. (A) Frog hair bundles were stimulated in the positive direction with 561 nm light. The fiber's power output was increased in five steps from 0 mW to the maximum, 12.5 mW for this fiber. Each colored trace shown is the average of 25 responses tracking the hair bundle movement to a pulse of 50 ms at a constant laser power. (B) Frog's hair-bundle responses to increasing (left) and decreasing (right) ramps of 100 ms, in which the laser power was varied continuously between 0 mW and 12.5 mW.”

      First paragraph of results. Could you elaborate a little here (a few additional sentences is probably enough)? The methods describes nicely why reflection alone is not sufficient, and some of the argument given there would demystify this paragraph.

      We have modified the text: “Although an analysis based on reflection alone would indicate that a hair bundle is relatively insensitive to radiation pressure, geometric considerations reveal that multiple modes of light propagation occur in a hair bundle by virtue of the cylindrical shape of its stereocilia (see Material and methods). Each of these modes is capable of transferring momentum and therefore of mechanically stimulating the bundle.”

      Reviewer #2:

      The manuscript by Kozlov et al., entitled Rapid mechanical stimulation of inner-ear hair cells by photonic pressure, is another in the long series of elegant publications from the Hudspeth lab. The manuscript addresses the long-standing problem of engineering a stimulation method for individual sensory hair cells in vitro that adequately provides a uniform and rapid stimulus characteristic of the native stimulus in the inner ear. The authors address this unmet need with development and characterization of a light-based stimulus to generate rapid photonic force capable of deflecting a range of hair bundle geometries, including amphibian and mammalian vestibular and auditory hair bundles. The writing is straightforward and easy to follow and figures are beautifully illustrated and informative. There are several shortcomings, attention to which, could further improve the manuscript and utility of the photonic stimulation method.

      Major:

      1) While the manuscript provides a significant technical advance, the end result does not necessarily inspire confidence that it can be widely implemented. For example, to be useful, the stimulator would need to provide a range of stimulus amplitudes to a single hair bundle. Likewise, a range of stimulus waveforms, steps, sinewaves of various frequencies, etc, would enhance the broad utility of the approach. Since the introduction section highlights the short comings of current hair bundle stimulation methods, it would also be of value for the results/discussion section address whether the current photonic stimulation method has overcome those shortcomings or whether further technical development will be needed.

      This comment is similar to that of Reviewer 1; please see our response above.

      2) In general, the results section is loosely quantified. For example, Figure 2A demonstrates significant cell-to-cell variability in the amplitude of the motion. What is the source of that variability? Biological variability in hair bundle stiffness, or variability in stimulus, probe position, light intensity, etc. Furthermore, what is the trial-to-trial variability for a single hair bundle? Fig. 2 legend states each trace in panel 2A is an average of 25 responses, thus some representation of trial-to-trial variability could be quantified and presented. This would add value and provide the reader with a better sense of stimulus reproducibility.

      We have added the 25 individual traces that constitute the average response displayed in Figure 2C. The traces are shown in gray and this is now mentioned in the caption.

      3) A technical concern needs to be addressed to reassure readers that the photodiode signal is an accurate representation of hair bundle position. This has been well established in prior publications, but needs to be revisited here, either with additional experimentation or a sufficiently persuasive explanation. The concern is, since the stimulus is light itself and the response (bundle position) depends on a measurement of light signal, the stimulus could contaminate measurement of the response. This issue needs to be addressed in the results section. If its buried in the methods section, I missed it, so please clarify.

      The lights from the laser and LED light source had distinct wavelengths and were separated with filters and dichroic mirrors. Therefore, no laser light reached the photodiode. When the laser was pointed at a hair bundle adjacent to the one whose position was being tracked by the photodiode, no signal was detected by the photodiode, even though the laser's light brightened the entire field of view.

      4) The section entitled "Survival of mechanotransduction after laser irradiation" is important but somewhat unfulfilling. Measurement of spontaneous bundle motion is just one measure of intact mechanotransduction. It would be reassuring to know that other measures are also intact following hair bundle irradiation. Recordings of hair cell transduction current or receptor potentials, uptake of FM1-43, etc. could provide more direct evidence.

      The fact that hair bundles continued to oscillate spontaneously after irradiation means that not only the mechanotransduction apparatus was not damaged but, more subtly, that the hair bundles continued to reside in the same region of the phase space—meaning that the control parameter governing the transition to the oscillatory regime was not perturbed by irradiation. In Appendix 1, Figure 6, we have now added results showing the persistent uptake of FM1-43 by hair bundles after extensive irradiation.

      We have added a paragraph to the section “Survival of mechanotransduction after laser irradiation”:

      “To further assess the health of the hair bundles exposed to laser irradiation, we compared their intake of FM1-43—a fluorescent dye that enters a hair cell through open mechanotransduction channels (Gale et al., 2001)—with that of surrounding undisturbed hair cells and that of a mechanically damaged bundle (Appendix1 Fig.6). The fluorescence signal from laser-irradiated hair bundles showed no visible difference with respect to those of the unscathed cells, whereas mechanically damaged bundles were visibly dimmer. The diminished fluorescence likely resulted from breakage of tip links that reduced the opening of the mechanotransduction channels, thereby limiting the intake of the dye.”

      The caption of the new Figure 6 in Appendix 1 reads: “The fluorescence signal of bullfrog saccular hair bundles loaded with FM1-43 after exposure to laser irradiation. We stimulated successive hair bundles in a row with 50 ms pulses at a power of 12.5 mW, the full power available for the fiber. The five hair bundles in the green box were subjected to 10, 20, 30, 40, and 50 pulses, as indicated. A sixth hair bundle in the same row, labeled in red, was damaged by being crushed a few times by the end of the optical fiber. Immediately after those procedures, the sample was exposed to 1 mM of FM1-43 for one minute, after which fluorescence was recorded. The stimulated hair bundles show comparable levels of brightness between each other and with respect to those in the surroundings. The loading of the dye is visibly reduced in the mechanically damaged cell owing to breakage of the tip links.”

      Reviewer #3 (Public Review):

      There are only small modifications to be made to the manuscript in order to better characterize the variability of the responses induced in the hair bundle, a discussion on how the method could be used and validated in mammalian hair cells and a request to provide additional paths to check the viability of the cells and the robustness of the mechanosensory response after multiple optical stimulations have been performed.

      Major comments:

      1) The variability of the displacement to the 25 stimulations at 30mW @561nm in Figure 2A should be added as standard deviation (as a shade of light color) on top of the average depicted here. The variability in displacement for the rising as well as for the relaxation in B should also be depicted across stimulations for one cell and across cells.

      This comment is similar to that of Reviewer 2; as noted above, we have added 25 individual traces to illustrate the small variability.

      Same indication of variability across trials and cells should apply for other figures where the average of 25 stimulations is depicted.

      Showing individual traces for all the figures would clutter them extensively. For each recording condition, the variability of responses across trials was always small. Because we measured displacements in the nanometer range, the principal purpose of repeated stimulation was to average away the noise owing to thermal agitation of the bundles and to photon shot noise in the photodiode system. Please also see our response to the related comment below.

      2) The authors make a point that mechanical stimulations are too slow to match the optimal frequency of activation of mammalian hair cells. However, if there is such variability in amplitude & kinetics of the displacement induced by the photonic force through the optic fiber, how can this technique be calibrated in small mammalian hair bundles?

      The variability is inevitable because the method applies force and not displacement. Therefore, the motion of each hair bundle is determined by its stiffness and drag coefficient (and, at higher frequencies, its mass). Because hair-bundle stiffness is known to vary—and can even become negative owing to the phenomenon of gating compliance—this variability is expected, and indeed a welcome indication of healthy bundles. Given the rich panoply of known active hair-bundle behaviors, it would indeed be strange if all hair bundles moved identically in response to a force step.

      We discuss the issue of calibration at the end of the Discussion. In addition, we have added Figure 7 of Appendix 1, in which we determine the stiffness of an individual hair bundle with a calibrated glass fiber, then ascertain what photonic stimulus yields a similar deflection. Although in principle a bundle's stiffness can be also deduced from its Brownian motion, mammalian hair bundles are relatively stiff and it is difficult to measure their Brownian motion without laser interferometry.

      Perhaps most importantly, one needs to know the stimulus force exactly when one wishes to measure hair-bundle stiffness accurately. But the stiffness of bundles is already well known; for example, hair bundles of outer hair cell have stiffness values that vary between about 1 mN·m-1 and 4 mN·m-1. So if such a hair bundle—for example the one illustrated in Figure 3—was deflected by 10 nm with photonic force, the associated force was 10-40 pN. The precise force is not critical in estimates of the time constant of hair-bundle deflection.

      Finally, stiffness can be estimated by applying forces using slower traditional methods. Our method is designed to apply whatever force it takes to rapidly move a hair bundle by a desired and directly measured distance.

      3) The authors should check the viability of the cells and the robustness of the mechanosensory response after multiple optical stimulations have been performed. Currently they compare the spontaneous oscillations before and after a stimulation to illustrate that the method is not disrupting the function of the hair cell. However spontaneous oscillations are not visible on all cells. Are there other means (calcium imaging? electrophysiology?) by which the author could illustrate that the technique is not damaging the cell and altering the mechanosensory response in the hair bundle?

      We concur with this suggestion, and have performed an additional control experiment. Kindly see our response to comment 4 of Reviewer 1.

    1. Author Response:

      Reviewer #1 (Public Review):

      The experimental data and modeling are highly robust. The conclusions of the paper are clearly supported by the results. The sensitivity analysis is particularly impressive and suggests a system that is highly conserved across a wide parameter space. Model validation with CD8+ depletion is a nice addition that leads to interesting and surprising conclusions.The figures are highly instructive and easy to read.

      An area where the paper could be improved is conveying the actual scientific conclusions more clearly and precisely with more focused review of existing literature. The relevance of the paper's conclusions for human influenza could be discussed with more careful language.

      Thank you for the suggestions. As mentioned above, we updated the abstract and the text to better highlight the biological conclusions in addition to the mathematical conclusions. We also included some additional explanation on the specific points below.

      First, the mechanistic conclusions of the work could be emphasized along with the methodology of the work. At present, these are completely lacking from the abstract which somewhat blandly just says that the paper describes a model which fits to data. From my perspective, currently underemphasized and novel / interesting conclusions are that:

      1) CD8+ mediated killing becomes much more rapid on a per capita basis (40000 fold increase) when infected cells dip below several hundred cells approximately 7 days post infection.

      2) There is a negative correlation between infected cell clearance by innate versus CD8+ mediated mechanisms, implying that poorer initial clearance of virus may result in more effective later killing by acquired immune mechanisms.

      3) Even ~80% reduction in maximal CD8E+ levels could prolong infection by 10 days though delay in attaining these threshold CD8E+ levels due to experimental or in silico CD8+ depletion only delays viral elimination by a day.

      In our CD8 depletion data, the entire infection is altered (see Lines 270-303). That is, our results suggest that there are fewer infected cells initially infected, which leads to lower viral loads at d2 and will automatically result in fewer CD8s. However, the depletion antibody itself is known to directly alter CD8s (see Lines 282-284), so one cannot make direct, quantified conclusions about the precise reduction percentage under this experimental condition.

      4) Most interesting and counterintuitively, CD8+ depletion allows for considerable reductions in the size of lung lesions as well as inflammation scores and degree of weight loss during primary influenza infection. This result suggests that CD8+ T cells have the potential to create significant bystander damage in the lung.

      While bystander damage may be present, our CD8 depletion data do not directly show bystander damage. As we mention above, one important aspect of depleting CD8s prior to infection was that it reduced the viral loads early on. We believe this is likely a result of immune activation from the initial CD8 kill-off (noted in Lines 284-285). A decrease in target/infected cells will automatically reduce the number of total cells that become infected and, thus, reduce the lesioned area of the lung. This was verified by the reduced weight loss, and no modifications were made to the predictions (Fig 4) or correlation to weight loss (Fig 5).

      Second, the introduction and discussion continue to not differentiate whether past experimental results are from humans or mice. It is somewhat misleading to cite mouse studies without acknowledging that these are from a model that in no way captures the totality of human infection conditions. For all animal models of human infection, the strengths of the model (ability to control experimental inputs and obtain frequent measurements) are counter-balanced by lack of realism. Humans have a complex background of immunity based on past vaccination and infection, different modes of exposure and other innumerable differences. In most human infections, the degree of lung involvement is minimal. Please stipulate in the review of existing literature which papers were done in mice versus humans. Please also frame conclusions of this paper in the discussion in terms of how it may or may not be relevant to human infection.

      As mentioned above, we updated the text to better highlight the human relevance and distinguish results between different host species in Lines 43, 61, 69, 74, 77, 88, 99, 101, 111, 116, 118, 120, 400-406, 410-411, 413, 435, 437, 446-450, 452, 463, 471, 485, and 846. We use the words ‘human’, ‘clinical’, and ‘patient’ to denote humans and the words ‘animal’, ‘murine’, and ‘experimental’ to denote animal models. Further distinguishment in some areas comes from the methods noted (e.g., CT scans/imaging is used/relevant for humans but not animals (this would be a microCT)). Because many features of the infection are observed in both humans and animals (see, for example, 10.1016/j.jim.2014.03.023 and 10.3390/pathogens3040845), underscoring the strong relevance of animal models to study influenza (noted in Lines 453-454), we limited specifying this in every sentence as we feel it would reduce the readability. In addition, we feel that splitting up the references mid-sentence would also reduce readability given the journal’s reference style, and have left most at the end of a sentence. The references themselves should provide a reader to easily distinguish.

      Third, this is a primary infection model, and this point also should be emphasized. The greatest relevance of the mouse model in the paper may be for pediatric infection in humans, rather than adults who have had multiple prior influenza exposures and possibly vaccinations. Presumably CD8+ responses can be expected to be more rapid with availability of a pre-existing population of tissue resident CD8+ T cells as would occur with re-infection. The results of CD8+ depletion prior to re-infection would potentially be very different (likely harmful) in a re-infection model and this should be discussed. This is mentioned in Line 467 but is given short attention elsewhere.

      We added text to highlight that we are studying a primary infection in Lines 46, 130, and 176-177. In general, primary infections may not necessarily always equate to children as any novel strain or strain novel to that individual may act as a primary infection. It is also feasible that waning immunity would appear similar to the dynamics of a primary infection.

      Because CD8 depletion would also deplete out resident and other T cells (>99% efficiency as noted in Line 273), one would expect the exact same results as we showed here. Thus, it would not be the appropriate experimental design to study recall responses. As we mentioned in our prior response, we might expect some adjustments to account primed responses and added text that highlights this in Lines 424-427 and 498.

      Line 60: stating that other studies have had limited success is rather insulting. Please rephrase and be more specific about why this study breaks new ground.

      We did not mean to be insulting and reworded Lines 66-67 to “…but have had not yet found the appropriate mathematical relation with the available data.” Of note, even the authors of Price et al. noted their inability to capture the dynamics: “Some trajectories, notably activated macrophages and epithelial damage are not well captured by the model, suggesting that the immunophenotype we selected for active macrophages may not be accurate, and that using animals weight as a proxy for epithelial damage may not be appropriate”. The novelty of our study and the gaps in the field are stated throughout the introduction and discussion.

      Line 81:: "viral loads in the upper respiratory tract do not reflect the lower respiratory tract environment. " Please include a citation, remove or clarify that this is a possible confounding variable in the analysis.

      We’ve added several citations from both human and animals studies to Lines 90-91.

      Line 91: define lung histomorphometry. This is a fairly novel approach for most readers.

      This was defined in our prior revision and remains in Lines 102-104.

      Line 101: This is a strong statement about viral load. Unless formal correlate studies have been done in humans (which they have not), I would day "may not be correlated" or remove altogether.

      We updated Line 110 to “…may not be directly correlated…”.

      Line 201: involved with what? I am not sure what this sentence means.

      We were referencing effector-mediated killing and memory generation as noted earlier in the sentence. We updated Lines 210-212 to clarify: “One benefit of using the total CD8+ T cells is that the model automatically deduces the dynamics of effector-mediated killing and memory generation without needing to specify which phenotypes might be involved in these processes as they may be dynamically changing”

      Line 209: I would suggest denoting a separate section to the sensitivity analysis versus the parameter fitting as the fitted correlation between delta and delta_e appears separate mechanistically from the relationship between delta and viral clearance / total # of CD8E

      We appreciate the suggestion but have chosen to leave this part of the text unaltered.

      Line 251: Please cite the clinical correlate oof this in the discussion. Immuncompromised humans often shed influenza (and SARS CoV-2) for months. See work from Jesse Bloom's group published in Elife on this subject.

      The suggested article has been added to Line 474.

      Line 321 should this read "clear infected cells from the lung?" I am confused about what this sentence means.

      Line 332 has been updated to “…of the infected areas within the lung”.

      Fig 5D: why are the dots yellow? Is the magenta line CD8 depleted?

      We inadvertently left off the explanation, but added clarification to the caption and text (Lines 371- 374 and 380-384). The yellow markers are interstitial inflammation while the white markers are alveolar inflammation. The magenta markers/lines are the CD8 depletion prediction.

      Line 386: Has antiviral therapy been linked with extent of radiologic lung lesions in clinical trials. This would be a very atypical clinical trial endpoint so please be more precise with language. It is possible as previously mentioned in the paper that viral load may not predict lesion size or disease severity in humans.

      To our knowledge, CT images are not typically taken in many contexts for a variety of clinical reasons (e.g., cost, exposure to the patient, etc.), but antivirals have been linked to reductions in disease severity (e.g., see 10.1056/NEJMoa1716197). In that particular line, we mention that minor reductions in viral load are paired with more significant reductions in disease/symptom, which is reported in the referenced clinical and experimental data.

      Line 477: add degree of immunity from prior infections as a critical variable

      This has been added to Line 478.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this work, Panigrahi et. al. develop a powerful deep-learning-based cell segmentation platform (MiSiC) capable of accurately segmenting bacteria cells densely packed within both homogenous and heterogeneous cell populations. Notably, MiSiC can be easily implemented by a researcher without the need for high-computational power. The authors first demonstrate MiSiC's ability to accurately segment cells with a variety of shapes including rods, crescents and long filaments. They then demonstrate that MiSiC is able to segment and classify dividing and non-dividing Myxococcus cells present in a heterogenous population of E. coli and Myxococcus. Lastly, the authors outline a training workflow with which MiSiC can be trained to identify two different cell types present in a mixed population using Myxococcus and E. coli as examples.

      While we believe that MiSiC is a very powerful and exciting tool that will have a large impact on the bacterial cell biological community, we feel explanations of how to use the algorithm should be more greatly emphasized. To help other scientists use MiSiC to its fullest potential, the range of applications should be clarified. Furthermore, any inherent biases in MiSiC should be discussed so that users can avoid them.

      We thank the reviewer for the positive feedback and comments to help disseminate MiSiC to the broad bacterial cell biology community as it is meant to. As described above we have largely addressed this comment via the redaction of a comprehensive handbook. As detailed below, we now also provide precise measurements of the MiSiC segmentation accuracy compared to ground truth for the various imaging modalities and bacterial species segmentation.

      Major Concerns:

      1) It is unclear to us how a MiSiC user should choose/tune the value for the noise variance parameter. What exactly should be considered when choosing the noise variance parameter? Some possibilities include input image size, cell size (in pixels), cell density, and variance in cell size. Is there a recommended range for the parameter? These questions along with our second minor correction can be addressed with a paragraph in the Discussion section.

      Setting the noise parameters is now detailed in the handbook (section 1.d). A set of thumb rules and recommendations are provided. In addition a paragraph explaining the importance of noise addition for images with sparse bacterial cell density has been added in the results section.

      “Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

      2) Could the authors expand on using algorithms like watershed, conditional random fields, or snake segmentation to segment bacteria when there is not enough edge information to properly separate them? How accurate are these methods at segmenting the cells? Should other MiSiC parameters be tuned to increase the accuracy when implementing these methods?

      We thank the reviewer for raising this point as it is important to make clear that post-processing algorithms can certainly improve the accuracy of MiSiC masks downstream. To show this specifically, we further processed MiSiC masks of Bacillus subtilis filamentous cells to resolve division septa using the watershed algorithm. This example is now provided as Figure S3. Importantly, there is no particular MiSiC adjustment that needs to be performed prior to running these processing steps, which can be done directly in Image-J or its bacterial cell analysis plug-in, MicrobeJ. It is worth noting that the post- processing strategy may depend on the scientific question under consideration. In the handbook, we also give an example of post-processing methods that may be used.

      “Associated Figure S3. Refining cell separations with watershed. Watershed methods may be used to obtain a more accurate segmentation of septate filaments such as Bacillus subtilis. In this example applying this method to the MiSiC mask effectively resolves cell boundaries that are not captured in the prediction but are visible by eye (arrows).”

      3) Can the MiSiC's ability to accurately segment phase and brightfield images be quantitatively compared against each other and against fluorescent images for overall accuracy? A figure similar to Fig. 2C, with the three image modalities instead of species would nicely complement Fig. 2A. If the segmentation accuracy varies significantly between image modalities, a researcher might want to consider the segmentation accuracy when planning their experiments. If the accuracy does not vary significantly, that would be equally useful to know.

      This is a very important issue that was also raised by reviewer 3 and which we decided to address in full. For each imaging modality and distinct species, we measured the Jaccard Index as a function of the threshold set for the Intersection over Union (ioU). The resulting curves are now provided in two separate Figures 2 and 3 and a supplemental Figure S2; they provide a robust measure of the segmentation for each modality/tested species.

      “Figure 2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense E. coli microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=763, 811, 799 total cells for Fluorescence, Phase Contrast and Bright Field, respectively (bands are the maximum range, the solid line is the median). The fluorescence images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods).”

      “Associated Figure S2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense M. xanthus microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained curves are the average of analyses conducted over three biological replicates and n=193,206,211 total cells for Fluorescence, Phase Contrast and Bright Field, respectively. The fluorescence (bands are the maximum range, the solid line is the median) images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods). c) A human observer is slightly less performant than MiSiC. The same ground truth as used in Figure 2 (dashed lines) was compared to an independent observer’s annotation (solid lines) and Jaccard score curves were constructed as shown in Figure 2. BF: Bright Field, PC: Phase Contrast, Fluo: Fluorescence.”

      “Figure 3. MiSiC predictions in various bacterial species and shapes. a) MiSiC masks and corresponding annotated masks of phase contrast images of another Pseudomonas aeruginosa (rod-shape), Caulobacter crescentus (crescent shape) and Bacillus subtilis (filamentous shape). b) Jaccard index as a function of IoU threshold for each species determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=1149,101,216 total cells for P. aeruginosa, B. subtilis and C. crescentus, respectively (bands are the maximum range, solid line the median). Note that the B. subtilis filaments are well predicted but edge information is missing for optimal detection of the cell separations.”

      4) The ability of MiSiC to segment dense clusters of cells is an exciting advancement for cell segmentation algorithms. However, is there a minimum cell density required for robust segmentation with MiSiC? The algorithm should be applied to a set of sparsely populated images in a supplemental figure. Is the algorithm less accurate for sparse images (perhaps reflected by an increase in false-positive cell identifications)? Any possible biases related to cell density should be noted.

      In fact, MiSiC performs well both with densely or sparsely populated images. In the case of sparsely populated images it is however possible that non-cell objects can occasionally appear in the MiSiC mask. As mentioned above, inclusion of noise can help remove these objects in the sparsely populated images. This issue is now fully explained in a supplemental Figure S1. Of note, non-cell objects -if they were to remain after noise addition- can be eliminated using additional general morphometric filters or specific models fitting bacterial cells, as for example those included in Microbe-J and Oufti. These points are now clarified in the text.

      “Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

      and:

      “Along similar lines, non-cell objects can appear in the MiSiC masks and while some can be removed by the introduction of noise, an easy way to do it is to apply a post-processing filter, for example using morphometric parameters to remove objects that are not bacteria. This can be easily done using Fiji, MicrobeJ or Oufti."

      5) It is exciting to see the ability of MiSiC to segment single cells of M. xanthus and E. coli species in densely packed colonies (Fig. 4b). Although three morphological parameters after segmentation were compared with ground truth, the comparison was conducted at the ensemble level (Fig. 4c). Could the authors use the Mx-GFP and Ec-mCherry fluorescence as a ground truth at the single cell level to verify the results of segmentation? For example, for any Ec cells identified by MiSiC in Fig. 4b, provide an index of whether its fluorescence is red or green. This single-cell level comparison is most important for the community.

      We have now performed this comparison and determined Jaccard indexes for E. coli and Myxococcus detection using the individual fluorescence images as a reference (figure 5b). Since we were only able to make this comparison in relatively small fields we also kept the comparison of expected morphometric parameters in large images. Taken together, these data now demonstrate that semantic classification as performed does well separate Myxococcus cells from E. coli cells (see more details in our response to reviewer 3).

      Reviewer #2 (Public Review):

      Panigrahi and co-authors introduce a program that can segment a variety of images of rod-shaped bacteria (with somewhat different sizes and imaging modalities) without fine-tuning. Such a program will have a large impact on any project requiring segmentation of a large number of rod-shaped cells, including the large images demonstrated in this manuscript. To my knowledge, training a U-Net to classify an image from the image's shape index maps (SIM) is a new scheme, and the authors show that it performs fairly well despite a small training set including synthetic data that, based on Figure 1, does not closely resemble experimental data other than in shape. The authors discuss extending the method to objects with other shapes and provide an example of labelling two different species - these extensions are particularly promising.

      The authors show that their network can reproduce results of manual segmentation with bright field, phase and fluorescence input. Performance on fluorescence data in Fig. 1 where intensities vary so much is particularly good and shows benefits of the SIM transformation. Automated mapping of FtsZ show that this method can be immediately useful, though the authors note this required post-processing to remove objects with abnormal shapes. The application in mixed samples in Fig. 4 shows good performance. However, no Python workflow or application is provided to reproduce it or train a network to classify mixtures in different experiments.

      We thank the reviewer for the positive comment. As discussed in our answer to reviewer 1, the classification presented in Figure 4 (now Figure 5) is meant to provide an example of how MiSiC can be further used to train networks to classify species in interspecies communities by generating two datasets, one per species of interest, to further train a U-Net. Here, the secondary U-Net was developed to specifically discriminate Myxococcus from E. coli, which is a very specialized application. Hence it was not included in the MiSiC package. Nevertheless the code is accessible at https://github.com/pswapnesh/MyxoColi (which is mentioned in the Methods).

      Performance was compared between SuperSegger with default parameters and MiSiC with tuned parameters for a single data set. Perhaps other SuperSegger parameters would perform better with the addition of noise, and it's unclear that adding Gaussian noise to a phase contrast image is the best way to benchmark performance. An interesting comparison would be between MiSiC and other methods applying neural networks to unprocessed data such as DeepCell and DeLTA, with identical training/test sets and an attempt to optimize free parameters.

      In fact, we believe that it does make sense to test how MiSiC performs in the presence of noise and show that it is robust, making it suitable for use on complex multi-tile images. For this analysis we kept the comparison with Superseger, which provides a reference as it is done on a data set optimized for Superseger segmentation. Importantly, we keep the parameters constant throughout the analysis because it would not be feasible to tweek parameters tile-by-tile in a multi-tile image. This analysis shows that MiSiC is more adapted for this application.

      INSTALLATION: I installed both the command line and GUI versions of MiSiC on a Windows PC in a conda environment following provided instructions. Installation was straightforward for both. MiSiCgui gave one error and required reinstallation of NumPy as described on GitHub. Both give an error regarding AVX2 instructions. MiSiCgui gives a runtime error and does not close properly. These are all fairly small issues. Performance on a stack of images was sufficiently fast for many applications and could be sped up with a GPU implementation.

      We have updated the pip install script available in GitHub for MiSiCgui that remediates some of these issues : There is no more numpy error, it closes properly and there are only warning messages concerning future deprecations in the napari packages. We have tested in Windows 10, Linux Ubuntu 18, and Mac OS Catalina. For the moment it seems impossible to install in Mac OS BigSur maybe due to the python 3.7 requirement. We will work on this problem in the near future. We have removed the command line interface as we are developing future version with an easiest way to provide MiSiC as Napari or FIJI/ImageJ plugin

      TESTING: I tested the programs using brightfield data focused at a different plane than data presumably used to train the MiSiC network, so cells are dark on a light background and I used the phase option which inverts the image. With default settings and a reasonable cell width parameter (10 pixels for E. coli cells with 100-nm pixel width; no added noise since this image requires no rescaling) MiSiCgui returned an 8-bit mask that can be thresholded to give segmentation acceptable for some applications. There are some straight-line artifacts that presumably arise from image tiling, and the quality of segmentation is lower than I can achieve with methods tuned to or trained on my data. Tweaking magnification and added noise settings improved the results slightly. The MiSiC command line program output an unusable image with many small, non-cell objects. Looking briefly at the code, it appears that preprocessing differs and it uses a fixed threshold.

      We thank the reviewer for testing the programs. Tiling related artifacts may now be avoided by excluding a few pixels at the border in the new version of MiSiC code. This is now implemented in the MiSiC.segment function as segment(im,invert = False,exclude = 16). Without seeing the reviewers data it is difficult for us to see how the segmentation (which is said to be acceptable) could be further improved. The command line program has now been removed in favor of continuous development on the graphical interface.

      Reviewer #3 (Public Review):

      The authors aimed to develop a 2D image analysis workflow that performs bacterial cell segmentation in densely crowded colonies, for brightfield, fluorescence, and phase contrast images. The resulting workflow achieves this aim and is termed "MiSiC" by the authors.

      I think this tool achieves high-quality single-cell segmentations in dense bacterial colonies for rod-shaped bacteria, based on inspection of the examples that are shown. However, without a quantification of the segmentation accuracy (e.g. Jaccard coefficient vs. intersection over union, false positive detection, false negative detection, etc), it is difficult to pass a final judgement on the quality of the segmentation that is achieved by MiSiC.

      We thank the reviewer for this comment. To address it we divided the previous Figure 2 into two figures (and associated supplemental figures) separately showing how MiSiC performs (i), to segment two very distinct bacterial species E. coli and Myxococcus under various imaging modalities. (ii) to segment other bacterial species: rods (P. aeruginosa), filaments (B. subtilis) and crescent shapes (C. crescentus). The results now clearly show both the strength and limitations of the system.

      A particular strength of the MiSiC workflow arises from the image preprocessing into the "Shape Index Map" images (before the neural network analysis). These shape index maps are similar for images that are obtained by phase contrast, brightfield, and fluorescence microscopy. Therefore, the neural network trained with shape index maps can apparently be used to analyze images acquired with at least the above three imaging modalities. It would be important for the authors to unambiguously state whether really only a single network is used for all three types of image input, and whether MiSiC would perform better if three separate networks would be trained.

      A single network is using a shape-index-map rather than the original images as an input. As mentioned by the reviewer this is a major strength of the workflow given that it permits segmentation, independent of the imaging modality, which we now measure for each modality.

      As the reviewer hints, three different models specific to each modality (CP, Fluorescence and BF) could also be used to train three networks, allowing the direct end-to-end segmentation of raw images. In theory, this could improve the segmentation (although this might lead to negligible benefits given the actual segmentation quality).

    1. Author Response:

      Reviewer #1 (Public Review):

      The study by Diebold et al. describes a fast and scalable method that allows to link bacterial plasmids to the organisms that harbor them. The authors then go on to apply this technique to track horizontal gene transfer in an complex bacterial population originating from clinical samples. There is no doubt that the development of such methodologies for better tracking plasmidic resistance genes and following horizontal gene transfer events is very important. The authors do a good job in optimizing their method to be a one step process that has high sensitivity and relatively low error, while it can also be scaled, automated and used with multiplex primers. Subsequently, they apply this method to two clinical patient samples for which metagenomic data is available. In this case, they correctly identify expected relationships between beta-lactamase genes and specific bacterial taxa (and in particular K. pneumoniae), but also find that the same beta-lactamase genes are associated with organisms of the microbiome. With the exception of providing evidence that the association of particular genes with multiple organisms is not due to physical association of the bacteria in question, this is an interesting study putting forward a much needed technique for the study of antibiotic resistance but also other relationships in complex bacterial mixtures.

      We are very thankful for the positive review and the reviewer’s suggestion that we distinguish between gene transfer and physical association. We provide a detailed response to this in major point #1 of the review summary, but to summarize, we performed an OIL-PCR experiment to confirm that the results are indeed due to physical association of the bacteria and updated our manuscript accordingly.

      Reviewer #2 (Public Review):

      Diebold et al. developed a simplified and improved version of the epicPCR method applied to environmental samples. The results section describes well how they perform their development and support the easy to use application. They clearly demonstrate that their methods could be used to screen association of specific genes to taxonomic markers in environmental microbial populations. They then apply their methods on human gut samples ranging from hospitalized patients and demonstrate demonstrate the utility of their methods to characterize the hosts of different targeted genes (notably AMR and plasmid related genes). However, most of their results are based on previous studies on the same sample. Therefore, it appears difficult to know how their method can be used on new samples. Do they need to redo a classical metagenomic analysis in order to obtain data on new samples ? What kind of metagenomic analysis is mandatory before performing their methods ? What is the depth of the metagenomic analysis ? Those are important questions as it will be clearly more expensive to perform the whole metagenomic analysis.

      Thank you for pointing out the need to explain possible screening methods for OIL-PCR on unsequenced samples. We chose to use sequenced stool samples for testing the method in order to provide parallel validation of our results; however, we agree that metagenomic sequencing is not a practical or cost-effective way to select samples for OIL-PCR. qPCR is a more practical method to pre-screen samples for target genes before performing OIL, but we failed to include this important point in our discussion.

      Since drafting and submitting the manuscript, we have demonstrated that the three primers designed for OIL (forward, fusion, and nested primers) can easily be converted into probe- based qPCR assays by designing a fluorescent probe with the nested primer sequence. We have updated the discussion to convey this important feature of OIL-PCR.

      The conclusion of the paper is well supported by data but the overall approach on new sample is never discussed. Moreover, the title appear somehow misleading as their methods do not allow to clearly identify plasmids but rather to link some targeted genes to taxonomic markers.

      Reviewer #3 (Public Review):

      This manuscript is composed of two parts. The first part describes development of an emulsion-based PCR fusion method, called OIL-PCR, for matching two specific gene sequences from the same cell. In this report these are beta-lactamase genes from the V4 section of rRNA, allowing the matching of this horizontally transferred gene with its donor sequence. The second part is a demonstration project that features the use of OIL-PCR to monitor horizontal transfer of beta-lactam genes between gut bacteria from the metagenomes of two neutropenic patients. OIL-PCR was set to multiplexed class A beta-lactam genes. This is a descriptive study that largely recapitulates a previously published work on these samples showing that the relatively unstudied Romboutsia commensal genus is a carrier of these plasmid-borne genes in patient metagenomes.

      Overall, this is a well-written manuscript. Data were comprehensively analyzed with appropriate controls. The figures are excellent.

      OIL-PCR is a derived of other fusion PCR methods, especially epicPCR. There are some nice technical improvements described here, e.g efficient lysis within emulsion droplets using Ready-Lyse lysozyme. This is an incremental technical advance for a fairly niche application (where you have known target genes and are concerned about potential culture-bias) but it may be useful in particular for understanding HGT in microbiomes. There are some problems with the method that are brought to the foreground by the authors rather than quietly dropped, which is commendable.

      Thank you for acknowledging our effort to be up front about the strengths and weaknesses of OIL-PCR. We hope that this information will help inform other researchers in applying this method.

      One problem appears to be that the necessary dilution for single-cell PCR reduces the taxonomic diversity of the metagenome. The only way around this to perform efficient sampling appears to be to perform multiple independent sequencing experiments and pool the results. Another feature of the system is that the accuracy falls slightly as the proportion of the target sequence in the community increases for reasons that are not discussed. However, this effect is not great (97% accuracy at 10% proportion) and most applications, the target cells will be a much lower proportion of the community.

      The results of the demonstration study on metagenomes from neutropenic patients are clearly described and provide a nicely worked example of combining this directed method with metagenome sequencing. The significance is limited but gives some descriptive hits about the mechanism of HGT between Romboutsia and Klebsiella.

      Other points:

      Unfortunately, there was no comparative test where the same samples were run against "competing" technologies (e.g sequencing of cultured beta-lactam resistant strains, epicPCR, Hi-C or single-cell) to directly compare strengths (and weaknesses) of OIL-PCR.

      Thank you for this fair criticism that we did not compare OIL-PCR to other available methods. We address comparing OIL-PCR to Hi-C in our response to major point #4 (above). With regards to epicPCR, we did consider comparing OIL-PCR to epicPCR, but decided against it for two main reasons: 1) Acquiring all the reagents necessary to perform epicPCR was cost- prohibitive (over $1,000 for the one demonstration experiment), and 2) because a large motivation for the development of OIL-PCR is the difficulty of performing epicPCR. Although we believe that both epicPCR and OIL-PCR are robust methods, OIL-PCR is a shorter protocol that does not rely on hazardous, costly and difficult to obtain reagents. We were concerned an inexperienced attempt by us to perform epicPCR would likely have yielded poor results and would not provide a fair comparison. Overall, we feel that the validation experiments we perform with OIL-PCR are enough to highlight both the strengths and weakness of the method.

      As protocol development is central to this manuscript paper, and one of the main advantages claimed for OIL-PCR is ease of use, the supplement should contain a detailed protocol for control sample with a list of equipment and reagents needed and what results should be obtained. This could easily be adapted from the methods section, which is highly detailed. What is the estimated cost-per sample of this procedure and how does it compare roughly with other methods, - EPIC-PCR and culture-based?

      Thank you for the suggestion that we provide a detailed protocol. We hope that the inclusion of this step-by-step protocol will enable more labs to adopt the method. The cost of OIL is approximately $15 per replicate. The cost is largely driven by the large amount of Phusion polymerase needed, which is the same as in epicPCR. Culturing may be less expensive depending on the cost of reagents needed for media, antibiotics etc, but we do not feel the two are comparable. For example, even though we show that Romboutsia did not acquire resistance genes in this case, even if it had, culturing would not have captured it due to the difficult and specific culturing conditions required for growing most Romboutsia strains.

      Line 197-198 reference needed to the Kent et al study here? What is the reason that the Hi-C results from this manuscript are not compared to the results of the OIL-PCR experiments?

      Thank you for this suggestion. The congruence of our results highlights the strengths of both approaches. As we discuss in detail for major point 4 (above), the Hi-C and OIL-PCR results both correctly identify Klebsiella as a carrier of the plasmid with CTX-M and TEM. We have now added this to the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Chakraborty focuses on methods to direct dsDNA to specific cell types within an intact multicellular organism, with the ultimate goal of targeting DNA-based nanodevices, often as biosensors within endosomes and lysosomes. Taking advantage of the endogenous SID-2 dsRNA receptor expressed in C. elegans intestinal cells, the authors show that dsDNA conjugated to dsRNA can be taken into the intestinal endosomal system via feeding and apical endocytosis, while dsDNA alone is not an efficient endocytic cargo from the gut lumen. Since most cells do not express a dsRNA receptor, the authors sought to develop a more generalizable approach. Via phage display screening they identified a novel camelid antibody 9E that recognizes a short specific DNA sequence that can be included at the 3' end of synthesized dsDNAs. The authors then showed that this antibody can direct binding, and in some cases endocytosis, of such DNAs when 9E was expressed as a fusion with transmembrane protein SNB-1. This approach was successful in targeting microinjected dsDNA pan-neuronally when expressed via the snb-1 promoter, and to specific neuronal subsets when expressed via other promoters. Endocytosed dsDNA appeared in puncta moving in neuronal processes, suggesting entry into endosomes. Plasma membrane targeting appeared feasible using 9E fusion to ODR-2.

      The major strength of the paper is in the identification and testing of the 9E camelid antibody as part of a generalizable dsDNA targeting system. This aspect of the paper will likely be of wide interest and potentially high impact, since it could be applied in any intact animal system subject to transgene expression. A weakness of the paper is the choice of "nanodevice". It was not clear what utility was present in the DNAs used, such as D38, that made them "devices", aside from their fluorescent tag that allowed tracking their localization.

      We used a DNA nanodevice, denoted pHlava-9E, that uses pHrodo as a pH-sensitive dye. pHlava-9E is designed to provide a digital output of compartmentalization i.e., its pH profile is such that even if it is internalized into a mildly acidic vesicle, the pH readout is as high as one would observe with a lysosome. This gives an unambiguous readout of surface-immobilized probe to endocytosed probe.

      Another potential weakness is that the delivered DNA is limited to the cell surface or the lumen of endomembrane compartments without access to the cytoplasm or nucleus. In general the data appeared to be of high quality and was well controlled, supporting the authors conclusions.

      We completely agree that we cannot target DNA nanodevices to sub-cellular locations such as the cytoplasm or the nucleus with this strategy. However, we do not see this as a “weakness”, but rather, as a limitation of the current capabilities of DNA nanotechnology. It must be mentioned that though fluorescent proteins were first described in 1962, it was 30 years before others targeted them to the endoplasmic reticulum (1992) or the nucleus (1993)(Brini et al., 1993; Kendall et al., 1992). Probe technologies undergo stage-wise improvements/expansions. We have therefore added a small section in the conclusions section outlining the future challenges in sub-cellular targeting of DNA-nanodevices.

      Reviewer #2 (Public Review):

      The authors demonstrate the tissue-specific and cell-specific targeting of double-stranded DNA (dsDNA) using C. elegans as a model host animal. The authors focused on two distinct tissues and delivery routes: feeding dsDNA to target a class of organelles within intestinal cells, and injecting dsDNA to target presynaptic endocytic structures in neurons. To achieve efficient intestinal targeting, the authors leveraged dsRNA uptake via endogenous intestinal SID-2 receptors by fusing dsRNA to a fluorophore-labeled dsDNA probe. In contrast, neuronal endosome/synaptic vesicle (SV) targeting was achieved by designing a nanobody that specifically binds a short dsDNA motif fused to the fluorophore-labeled dsDNA probe. Combining dsDNA probe injection with nanobody neuronal expression (fused to a neuronal vSNARE to achieve synaptic targeting), the authors demonstrated that the injected dsDNA could be taken up by a variety of distinct neuronal subtypes.

      Strengths:

      While nanodevices built on dsDNA platforms have been shown to be taken up by scavenger receptors in C. elegans (including previous work from several of these authors), this strategy will not work in many tissue types lacking these receptors. The authors successfully circumvented this limitation using distinct strategies for two cell types in the worm, thereby providing a more general approach for future efforts. The approaches are creative, and the nanobody development in particular allows for endocytic delivery in any cell type. The authors exploited quantitative imaging approaches to examine the subcellular targeting of dsDNA probes in living animals and manipulated endogenous receptors to demonstrate the mechanism of dsRNA-based dsDNA uptake in intestinal cells.

      Weaknesses:

      To validate successful delivery of a functional nanodevice, one would ideally demonstrate the function of a particular nanodevice in at least one of the examples provided in this work. The authors have successfully used a variety of custom-designed dsDNA probes in living worms in numerous past studies, so this would not be a technical hurdle. In the current study, the reader has no means of assessing whether the dsDNA is intact and functional within its intracellular compartment.

      We now demonstrate the use of a functional nanodevice to detect pH profiles of a given microenvironment. This functional nanodevice contains two fluorescent reporter dyes, each attached to one of the strands of a DNA duplex. In order to obtain pH readouts, the device integrity is essential for ratiometric sensing.

      Coelomocytes are cells known for their scavenging and degradative lysosomal machinery. Previous studies of the stability of variously structured DNA nanodevices in coelomocytes, have shown that DNA devices based on 38 bp DNA duplexes have a half life of >8 hours in actively scavenging cells such as coelomocytes (Chakraborty et al., 2017; Surana et al., 2013) Given that our sensing in the gut as well as in the neuron are performed in <1 hour post feeding or injection, pHlava-9E is >97% intact.

      Another minor weakness is the lack of a quantitative assessment of colocalization in intestinal cells or neurons in an otherwise nicely quantitative study. Since characterization of the targeting described here is an essential part of evaluating the method, a stronger demonstration of colocalization would significantly buttress the authors' claims.

      We have now quantified colocalization in each cellular system. Please see Figure R1 below (Figure 1 Supplementary figure 1 and Figure 4 Supplementary figure 2 of the revised manuscript).

      Figure R1: a) Pearson’s correlation coefficient (PCC) calculated for the colocalization between R50D38 (red) and lysosomal markers LMP-1 or GLO-1 (green) in the indicated transgenic worms. b) & d) Representative images of nanodevice nD647 uptake (red) in transgenics expressing both prab-3::gfp::rab-3 (green) and psnb-1:snb-1::9E c - e) Normalized line intensity profiles across the indicated lines in b and d; f) Percentage colocalization of nD647 (red) with RAB3:GFP (green). Error bar represents the standard deviation between two data sets.

      While somewhat incomplete, this study represents a step forward in the development of a general targeting approach amenable to nanodevice delivery in animal models.

    1. Author Response:

      Reviewer #1:

      In this paper, Alhussein and Smith set out to determine whether motor planning under uncertainty (when the exact goal is unknown before the start of the movement) results in motor averaging (average between the two possible motor plans) or in performance optimization (one movement that maximizes the probability of successfully reaching to one of the two targets). Extending previous work by Haith et al. with two new, cleanly designed experiments, they show that performance optimization provides a better explanation of motor behaviour under uncertainty than the motor averaging hypothesis.

      We thank the reviewer for the kind words.

      1) The main caveat of experiment 1 is that it rules out one particular extreme version of the movement averaging idea- namely that the motor programs are averaged at the level of muscle commands or dynamics. It is still consistent with the idea that the participant first average the kinematic motor plans - and then retrieve the associated force field for this motor plan. This idea is ruled out in Experiment 2, but nonetheless I think this is worth adding to the discussion.

      This is a good point, and we have now included it in the paper as suggested – both in motivating the need for Expt 2 in the Results section and when interpreting the results of Expt 1 in the Discussion section.

      2) The logic of the correction for variability between the one-target and two-target trials in Formula 2 is not clear to me. It is likely that some of the variability in the two-target trials arises from the uncertainty in the decision - i.e. based on recent history one target may internally be assigned a higher probability than the other. This is variability the optimal controller should know about and therefore discard in the planning of the safety margin. How big was this correction factor? What is the impact when the correction is dropped ?

      Short Answer:

      (1) If decision uncertainty contributed to motor variability on 2-target trials as suggested, 2-target trials should display greater motor variability than 1-target trials. However, 1-target and 2-target trials display levels of motor variability that are essentially equal – with a difference of less than 1% overall, as illustrated in Fig R2, indicating that decision uncertainty, if present, has no clear effect on motor variability in our data.

      (2) The sigma2/sigma1 correction factor is, therefore, very close to 1, with an average value of 1.00 or 1.04 depending on how it’s computed. Thus, dropping it has little impact on the main result as shown in Fig R1.

      Longer, more detailed, answer:

      We agree that it could be reasonable to think that if it were true that motor variability on 2-target trials were consistently higher than that on 1-target trials, then the additional variability seen on 2-target trials might result from uncertainty in the decision which should not affect safety margins if the optimal controller knew about this variability. However, detailed analysis of our data suggests that this is not the case. We present several analyses below that flush this out.

      We apologize in advance that the response we provide to this seemingly straightforward comment is so lengthy (4+ pages!), especially since capitulating to the reviewer’s assertion that “correction” for the motor variability differences between 1 & 2-target trails should be removed from our analysis, would make essentially no difference in the main result, as shown Fig R1 above. Note that the error bars on the data show 95% confidence intervals. However, taking the difference in motor variability (or more specifically, it’s ratio) between 1-target and 2-target trials into account, is crucial for understanding inter-individual differences in motor responses in uncertain conditions. As this reviewer (and reviewer 2) points out below, we did a poor job of presenting the inter-individual differences analysis in the original version of this paper, but we have improved both the approach and the presentation in the current revision, and we think that this analysis is important, despite being secondary to the main result about the group-averaged findings.

      Therefore, we present analyses here showing that it is unlikely that decision uncertainty accounts for the individual-participant variability differences we observe between 1-target and 2-target trials in our experiments (Fig R2). Instead, we show that the variability differences we observe in different conditions for individual participants are due to (largely idiosyncratic) spatial differences in movement direction (Fig R3), which when taken into account, afford a clearly improved ability to predict the size of the safety margins around the obstacles, both in 1-target trials where there is no ‘decision’ to be made (Figs R4-R6) and in 2-target trials (Figs R5-R6).

      Variability is, on average, nearly identical on 1-target & 2-target trials, indicating no measurable decision-related increase in variability on 2-target trials

      At odds with the idea that decision uncertainty is responsible for a meaningful fraction of the 2-target trial variability that we measure, we find that motor variability on 2-target trials is essentially unchanged from that on one-target trials overall as shown in Fig R2 (error bars show 95% confidence intervals). This is the case for both the data from Expt 2a (6.59±0.42° vs 6.70±0.96°, p > 0.8), and for the critical data from Expt 2b that was designed to dissociate the MA hypothesis from the PO hypothesis (4.23 ±0.17° vs 4.23±0.27°, p > 0.8 for the data from Expt 2b), as well as when the data from Expts 2a-b are pooled (4.78±0.24° vs 4.81±0.35°, p > 0.8). Note that the nominal difference in motor variability between 1-target and 2-target trials was just 1.7% in the Expt 2a data, 0.1% in the Expt 2b data, and 0.6% in the pooled data. This suggests little to no overall contribution of decision uncertainty to the motor variability levels we measured in Expt 2.

      Correspondingly, the sigma2/sigma1 ‘correction factor’ (which serves to scale the safety margin observed on 1-target trials up or down based on increased or decreased motor variability on 2-target trials) is close to 1. Specifically, this factor is 1.01±0.13 (mean±SEM) for Expt 2a and 1.04±0.09 for Expt 2b, if measured as mean(sigma2i/sigma1i), where sigma1i and sigma2i are the SDs of the initial movement directions on 1-target and 2-target trials. This factor is 1.02 for Expt 2a and 1.00 for Expt 2b, if instead measured as mean(sigma2i)/mean(sigma1i), and thus in either case, dropping it has little effect on the main population-averaged results for Expt 2 presented in Fig 4b in the main paper. Fig R1 shows versions of the PO model predictions in Fig 4b computed with or without dropping the sigma2/sigma1 ‘correction factor’ that reviewer asks about. These with vs without versions are quite similar for the results from both Expt 2a and Expt 2b. In particular, the comparison between our experimental data and the population-average-based model predictions for the MA vs the PO hypotheses, show highly significant differences between the abilities of the MA and PO models to explain the experimental data in Expt 2b (Fig R1, right panel), whether or not the sigma2/sigma1 correction is included for the comparison between MA and PO predictions (p<10-13 whether or not the sigma2/sigma1 term included, p=4.31×10-14 with it vs p=4.29×10-14 without it). Analogously, for Expt 2a (where we did not expect to show meaningful differences between the MA and PO model predictions), we also find highly consistent results when the sigma2/sigma1 term is included vs not (Fig R1, left panel) (p=0.37 for the comparison between PO and MA predictions with the sigma2/sigma1 term included vs 0.38 without it).

      Analysis of left-side vs right-side 1-target trial data indicates the existence of participant-specific spatial patterns of variability.

      With the participant-averaged data showing almost identical levels of motor variability on 1-target and 2-target trials, it is not surprising that about half of participants showed nominally greater variability on 1-target trials and about half showed nominally greater variability on 2-target trials. What was somewhat surprising, however, was that 16 of the 26 individual participants in Expt 2b displayed significantly higher variability in one condition or the other at α=0.05 (and 12/26 at α=0.01). Why might this be the case? We found an analogous result when breaking down the 1-target trial data into +30° (right-target) and -30° (left-target) trials that could offer an explanation. Note that the 2-target trial data come from intermediate movements toward the middle of the workspace, whereas the 1-target trial data come from right-side or left-side movements that are directed even more laterally than the +30° or -30° targets themselves (the average movement directions to these obstacle-obstructed lateral targets were +52.8° and -49.0°, respectively, in the Expt 2b data, see Fig 4a in the main paper for an illustration). Given the large separation between 1 & 2-target trials (~50°) and between left and right 1-target trails (~100°), differences in motor variability would not be surprising. The analyses illustrated in Figs R3-R6 show that these spatial differences indeed have large intra-individual effects on movement variability (Fig R3) and, critically, large a subsequent effect on the ability to predict the safety margin observed in one movement direction from motor variability observed at another (Figs R4-R6).

      Fig R3 shows evidence for intra-individual direction-dependent differences in motor variability, obtained by looking at the similarity between within-participant spatially-matched (e.g. left vs left or right vs right, Fig R3a) compared to spatially-mismatched (left vs right, Fig R3b) motor variability across individuals. To perform this analysis fairly, we separated the 60 left-side obstacle1-target trial movements for each participant into those from odd-numbered vs even-numbered trials (30 each) to be compared. And we did the same thing for the 60 right-side obstacle 1-target trial movements. Fig R3a shows that there is a large (r=+0.70) and highly significant (p<10-6) across-participant correlation between the variability measured in the spatially-matched case, i.e. for the even vs odd trials from same-side movements, indicating that the measurement noise for measuring movement variability using n=30 movements (movement variability was measured by standard deviation) did not overwhelm inter-individual differences in movement variability.

      The strength of this correlation would increase/decrease if we had more/less data from each individual because that would decrease/increase the noise in measuring each individual’s variability. Therefore, to be fair, we maintained the same number of data points for each variability measurement (n=30) for the spatially-mismatched cases shown in Fig R3b and R3c. The strong positive relationship between odd-trial and even-trial variability across individuals that we observed in the spatially-matched case is completely obscured when the target direction is not controlled for (i.e. not maintained) within participants, even though left-target and right-target movements are randomly interspersed. In particular, Fig R3b shows that there remains only a small (r=+0.09) and non-significant (p>0.5) across-participant correlation between the variability measured for the even vs odd trials from opposite-side movements that have movement directions separated by ~100°. This indicates that idiosyncratic intra-individual spatial differences in motor variability are large and can even outweigh inter-individual differences in motor variability seen in Fig R3a. Fig R3c shows that an analogous effect holds between the laterally-directed 1-target trials and the more center-directed 2-target trials that have movement directions separated by ~50°. In this case, the correlation that remains when the target direction is not is maintained within participants, is also near zero (r=-0.13) and non-significant (p>0.3). It is possible that some other difference between 1-target & 2-target trials might also be at play here, but there is unlikely to be a meaningful effect from decision variability given the essentially equal group-average variability levels (Fig R2).

      Analysis of left-side vs right-side 1-target trial data indicates that participant-specific spatial patterns of variability correspond to participant-specific spatial differences in safety margins.

      Critically, dissection of the 1-target trial data also shows that the direction-dependent differences in motor variability discussed above for right-side vs left-side movements predict direction-dependent differences in the safety margins. In particular, comparison of panels a & b in Fig R4 shows that motor variability, if measured on the same side (e.g. the right-side motor variability for the right-side safety margin), strongly predicts interindividual differences in safety margin (r=0.60, p<0.00001, see Fig R4b). However, motor variability, if measured on the other side (e.g. the right-side motor variability for the left-side safety margin), fails to predict interindividual differences in safety margin (r=0.15, p=0.29, see Fig R4a). These data show that taking the direction-specific motor variability into account, allows considerably more accurate individual predictions of the safety margins used for these movements. In line with that idea, we also find that interindividual differences in the % difference between the motor variability measured on the left-side vs the right-side predicts inter-individual differences in the % difference between the safety margin measured on the left-side vs the right-side as shown in Fig R4c (r=0.52, p=0.006).

      Analyses of both 1-target trial and 2-target trial data indicate that participant-specific spatial patterns of variability correspond to participant-specific spatial differences in safety margins.

      Not surprisingly, the spatial/directional specificity of the ability to predict safety margins from measurements of motor variability observed in the 1-target trial data in Fig R4, is present in the 2-target data as well. Comparison of panels a-d in Fig R5 shows that motor variability from 1-target and 2-target trial data in Expt 2b strongly predict interindividual differences in 1-target and 2-target trial safety margins (r=0.72, p=3x10-5 for the 2-target trial data (see Fig R5d), r=0.59, p=1x10-3 for the 1-target trial data (see Fig R5a)).

      This is the case even though the 1-target and 2-target trial data display essentially equal population-averaged levels of motor variability. However, in Expt 2b, motor variability, if measured on 1-target trials fails to predict inter-individual differences in the safety margin on 2-target trials (r=0.18, p=0.39, see Fig R5c), and motor variability, if measured on 2 target trials fails to predict inter-individual differences in the safety margin on 1-target trials (r=-0.12, p=0.55, see Fig R5b). As an aside, note that Fig 5a is similar to 4b in content, in that 1-target trial safety margins are plotted against motor variability levels in both cases. But in 5a, the left and right- target data are averaged whereas in 4b the left and right-target data are both plotted resulting in 2N data points. Also note that the correlations are similar, r=+0.59 vs r=+0.60, indicating that in both cases the amount of motor variability predicts the size of the safety margin.

      A final analysis indicating that the spatial specificity of motor variability rather than the presence of decision variability accounts for the ability to predict safety margins is shown in Fig R6. This analysis makes use of the contrast between Expt 2b (where there is a wide spatial separation (51° on average) between 1-target trials and 2-target trials because participants steer laterally around the Expt 2b 1-target trial obstacles, i.e. away from the center), and Expt 2a (where there is only a narrow spatial separation (10.4° on average) between the movement directions of 1-target trials and 2-target trials because participants steer medially around the Expt 2a 1-target trial obstacles, i.e. toward the center). If the spatial specificity of motor variability drove the ability to predict safety margins (and thus movement direction) on 2-target trials, then such predictions should be noticeably improved in Expt 2a compared to Expt 2b, because the spatial match between 1-target trials and 2-target trials is five-fold better in Expt 2a than in Expt2b. Fig R6 shows that this is indeed the case. Specifically, comparison of the 3rd and 4th clusters of bars (i.e. the data on the right side of the plot), shows that the ability to predict 2-target trial safety margins from 1-target trial variability and conversely the ability to predict 1-target trial safety margins from 2-target trial variability are both substantially improved in Expt 2a compared to Expt 2b (compare the grey bars in the 4th vs the 3rd clusters of bars).

      Moreover, comparison of the 1st and 2nd clusters of bars (i.e. the data on the left side of the plot), shows that the ability to predict left 1-target trial safety margins from right 1-target trial variability and conversely the ability to predict right 1-target trial safety margins from left 1-target trial variability are also both substantially improved in Expt 2a compared to Expt 2b (compare the grey bars in the 1st vs the 2nd clusters of bars). This corresponds to a spatial separation between the movement directions on left vs right 1-target trials of 20.7° on average in Expt 2a in contrast to a much greater 102° in Expt 2b.

      The analyses illustrated in Figs R4-R6 make it clear that accurate prediction of interindividual differences in safety margins critically depend on spatially-specific information about motor variability, and we have, therefore, included this information for the analyses in the main paper, as it is especially important for the analysis of inter-individual differences in motor planning presented in Fig 5 of the manuscript.

      3) Equation 3 then becomes even more involved and I believe it constitutes somewhat of a distractions from the main story - namely that individual variations in the safety margin in the 1-target obstacle-obstructed movements should lead to opposite correlations under the PO and MA hypotheses with the safety margin observed in the uncertain 2-target movements (see Fig 5e). Given that the logic of the variance-correction factor (pt 2) remains shaky to me, these analyses seem to be quite removed from the main question and of minor interest to the main paper.

      The reviewer makes a good point. We agree that the original presentation made Equation 3 seem overly complex and possibly like a distraction as well. Based on the comment above and a number of comments and suggestions from Reviewer 2, we have now overhauled this content – streamlining it and making it clearer, in both motivation and presentation. Please see section 2.2 in the point-by-point response to reviewer 2 for details.

      Reviewer #2:

      The authors should be commended on the sharing of their data, the extensive experimental work, the experimental design that allows them to get opposite predictions for both hypotheses, and the detailed of analyses of their results. Yet, the interpretation of the results should be more cautious as some aspects of the experimental design offer some limitations. A thorough sensitivity analysis is missing from experiment 2 as the safety margin seems to be critical to distinguish between both hypotheses. Finally, the readability of the paper could also be improved by limiting the use of abbreviations and motivate some of the analyses further.

      We thank the reviewer for the kind words and for their help with this manuscript.

      1) The text is difficult to read. This is partially due to the fact that the authors used many abbreviations (MA, PO, IMD). I would get rid of those as much as possible. Sometimes, having informative labels could also help FFcentral and FFlateral would be better than FFA and FFB.

      We have reduced the number of abbreviations used in the paper from 11 to 4 (Expt, FF, MA, PO), and we thank the reviewer for the nice suggestion about changing FFA and FFB to FFLATERAL and FFCENTER. We agree that the suggested terms are more informative and have incorporated them.

      2) The most difficult section to follow is the one at the end of the result sections where Fig.5 is discussed. This section consists of a series of complicated analyses that are weakly motivated and explained. This section (starting on line 506) appears important to me but is extremely difficult to follow. I believe that it is important as it shows that, at the individual level, PO is also superior to MA to predict the behavior but it is poorly written and even the corresponding panels are difficult to understand as points are superimposed on each other (5b and e). In this section, the authors mention correcting for Mu1b and correcting for Sig2i/Sig1Ai but I don't know what such correction means. Furthermore, the authors used some further analyses (Eq. 3 and 4) without providing any graphical support to follow their arguments. The link between these two equations is also unclear. Why did the authors used these equations on the pooled datasets from 2a and 2b ? Is this really valid ? It is also unclear why Mu1Ai can be written as the product of R1Ai and Sig1Ai. Where does this come from ?

      We agree with the reviewer that this analysis is important, and the previous explanation was not nearly as clear as it could have been. To address this, we have now overhauled the specifics of the context in Figure 5 and the corresponding text – streamlining the text and making it clearer, in both motivation and presentation (see lines 473-545 in the revised manuscript). In addition to the improved text, we have clarified and improved the equations presented for analysis of the ability of the performance optimization (PO) model to explain inter-individual differences in motor planning in uncertain conditions (i.e. on 2-target trials) and have provided more direct graphical support for them. Eq 4 from the original manuscript has been removed, and instead we have expanded our analyses on what was previously Eq 3 (now Eq 5 in the revised manuscript). We have more clearly introduced this equation as a hybrid between using group-averaged predictions and participant-individualized predictions, where the degree of individualization for all parameters is specified with the individuation index 𝑘. For example, a value of 1 for 𝑘 would indicate complete weighting of the individuated model predictors. The equation that follows in the revised manuscript, Eq 6, is a straightforward extension of Eq 5 where each model parameter was instead multiplied by a different individuation index. With this, we now present the partial-R2 statistic associated with each model predictor (see revised Figs 5a and 5e) to elucidate the effect of each. We have, additionally, now plotted the relationships between the each of the 3 model predictors and the inter-individual differences that remain when the other two predictors are controlled (see revised Figs 5b-d and Fig 5f-h). These analyses are all shown separately for each experiment, as per the reviewer’s suggestion, in the revised version of Fig 5.

      Overall, this section is now motivated and discussed in a more straightforward manner, and now provides better graphical support for the analyses reported in the manuscript. We feel that the revised analysis and presentation (1) more clearly shows the extent to which inter-individual differences in motor planning can be explained by the PO model, and (2) does a better job of breaking down how the individual factors in the model contribute to this. We sincerely thank the reviewer for helping us to make the paper easier to follow and better illustrated here.

      3) In experiment 1, does the presence of a central target not cue the participants to plan a first movement towards the center while such a central target was never present in other motor averaging experiment.

      Unfortunately, the reviewer is mistaken here, as central target locations were present in several other experiments that advocated for motor averaging which we cite in the paper. The central target was not present on any 2-target trials in our experiments, in line with previous work. It was only present on 1-target center-target trials.

      In the adaptation domain, people complain that asking where people are aiming would induce a larger explicit component. Similarly, one could wonder whether training the participants to a middle target would not induce a bias towards that target under uncertainty.

      Any “bias” of motor output towards the center target would predict an intermediate motor output which would favor neither model because our experiment designs result in predictions for motor output on different sides of center for 2-target trials in both Expt 1 and Expt 2b. Thus we think any such effect, if it were to occur, would simply reduce the amplitude of the result. However, we found an approximately full-sized effect, suggesting that this is not a key issue.

      4) The predictions linked to experiment 2 are highly dependent on the amount of safety margin that is considered. While the authors mention these limitations in their paper, I think that it is not presented with enough details. For instance, I would like to see a figure similar to Fig.4B when the safety margin is varied.

      We apologize for any confusion here. The reviewer seems to be under the impression that we can specifically manipulate safety margins around the obstacle in making model predictions for experiment 2. This is, however, not the case for either of the two safety margins in the performance-optimization (PO) modelling. Let us clarify. First, the safety margin on 1-target trials, which serves as input to the PO model, is experimentally measured on obstacle-present 1-target trials, and thus cannot be manipulated. Second, the predicted safety margin on 2-target trials is the output of the PO model and thus cannot be manipulated. There is only one parameter in the main PO model (the one for making the PO prediction for the group-average data presented in Fig 4b, see Eq 4), and that is the motor cost weighting coefficient (𝛽). 𝛽 is implicitly present in Eq 2 as well, fixed at 1/2 in this baseline version of the PO model. It is of course true that changing the motor cost weighting will affect the model output (the predicted 2-trial safety margin), but we do not think that the reviewer is referring to that here, since he or she asks about that directly in section 2.4.4 and in section 2.4.6 below, where we provide the additional analysis requested.

      For exp1, it would be good to demonstrate that, even when varying the weight of the two one-target profiles for motor averaging, one never gets a prediction that is close to what is observed.

      Here the reviewer is referring an apparent inconsistency between our analysis of Expts 1 and 2, because in Expt 2 (but not in Expt 1) we examine the effect of varying the relative weight of the two 1-target trials for motor averaging. However, we only withheld this analysis in Expt 1 because it would have little effect. Unlike Expt 2, the measured motor output on left and right 1-target trials in Expt 1 is remarkably similar (see the left panel in Fig R7a below (which is based on Fig 2b from the manuscript)). This is because left and right 1-target trials in Expt 1 were adapted to the same FF perturbation ( FFLATERAL in both cases), whereas left and right 1-target trials in Expt 2 received very different perturbation levels, because one of these targets was obstacle-obstructed and the other was not. Therefore, varying the relative weightings in Expt 1 would have little effect on the MA prediction as shown in Fig R7b at right. We now realize that is point was not explained to readers, and we have now modified the text in the results section where the analysis of Expt 1 is discussed in order to include a summary of the explanation offered above. We thank the reviewer for surfacing this.

      It is unclear in the text that the performance optimization prediction simply consists of the force-profile for the center target. The authors should motivate this choice.

      We’re a bit unclear about this comment. This specific point is addressed in the first paragraph under the Results section, the second paragraph under the subsection titled “Adaptation to novel physical dynamics can elucidate the mechanisms for motor planning under uncertainty”, the Figure 2 captions, and in the second paragraph under the subsection titled “Adaptation to a multi-FF environment reveals that motor planning during uncertainty occurs via performance-optimization rather than motor averaging”. Direct quotes from the original manuscript are below:

      Line 143: “However, PO predicts that these intermediate movements should be planned so that they travel towards the midpoint of the potential targets in order to maximize the probability of final target acquisition. This would, in contrast to MA, predict that intermediate movements incorporate the learned adaptive response to FFB, appropriate for center-directed movements, allowing us to decisively dissociate PO from MA.”

      Line 200: “In contrast, PO would predict that participants produce the force pattern (FFB) appropriate for optimizing the planned intermediate movement since this movement maximizes the probability of successful target acquisition5,34 (Fig 1d, right).”

      Line 274: “The 2-target trial MA prediction corresponds to the average of the force profiles (adaptive responses) associated with the left and right 1-target EC trials plotted in Fig 2b, whereas the 2-target trial PO prediction corresponds to the force profile associated with the center target plotted in Fig 2b, as this is appropriate for optimizing a planned intermediate movement.”

      For the second experiment 2, the authors do not present a systematic sensitivity analysis. Fig. 5a and d is a good first step but they should also fit the data on exp2b and see how this could explain the behavior in exp 2a. Second, the authors should present the results of the sensitivity analysis like they did for the main predictions in Fig.4b.

      We thank the reviewer for these suggestions. We have now included a more-complete analysis in Fig R8 below, and presented it in the format of Fig 4b as suggested. Please note that we have included the analysis requested above in a revised version of Fig 4b in the manuscript, and ta related analysis requested in section 2.4.6 in the supplementary materials.

      Specifically, the partial version of the analysis that had been presented (where the cost weighting for PO as well as the target weighting for MA were fit on Expt 2a and cross-validated using the Expt 2b data, but not conversely fit on Expt 2b and tested on Expt 2a) was expanded to include cross-validation of the Expt 2b fit using the Expt 2a data. As expected, the results from the converse analysis (Expt2b à Expt2a) mirror the results from the original analysis (Expt 2a à Expt 2b) for the cost weighting in the PO model, where the self-fit mean squared prediction errors modestly by 11% for the Expt 2a data, and by 29% for the Expt 2b data. In contrast, for the target weighting in the MA model, the cross-validated predictions did not explain the data well, increasing the self-fit mean squared prediction errors by 115% for the Expt 2a data, and by 750% for the Expt 2b data. Please see lines 411-470 in the main paper for a full analysis.

      While I understand where the computation of the safety margin in eq.2 comes from, reducing the safety margin would make the predictions linked to the performance optimization look more and more towards the motor averaging predictions. How bad becomes the fit of the data then ?

      We think that this is essentially the same question as that asked in above in section 2.4.1. Please see our response in that section above. If that response doesn’t adequately answer this question, please let us know!

      How does the predictions look like if the motor costs are unbalanced (66 vs. 33%, 50 vs. 50% (current prediction), 33 vs. 66% ). What if, in Eq.2 the slope of the relationship was twice larger, twice smaller, etc.

      Fig R8 above shows how PO prediction would change using the 2:1 (66:33) and 1:2 (33:66) weightings suggested by the reviewer here, in comparison to the 1:1 weighting present in the original manuscript, the Expt 2a best fit weighting present in the original manuscript, and the Expt 2b best fit weighting that the reviewer suggested we include in section 2.4.2. Please note that this figure is now included as a supplementary figure to accompany the revised manuscript.

      The safety margin is the crucial element here. If it gets smaller and smaller, the PO prediction would look more and more like the MA predictions. This needs to be discussed in details. I also have the impression that the safety margin measured in exp 2a (single target trials) could be used for the PO predictions as they are both on the right side of the obstacle.

      We again apologize for the confusion. We are already using safety margin measurements to make PO predictions. Specifically, within Expt 2a, we use safety margin measurements from 1-target trials (in conjunction with variability measurements on 1 & 2 target trials) to estimate safety margins on 2-target trials. And analogously within Expt 2b, we use safety margin measurements from 1-target trials (in conjunction with variability measurements on 1 & 2 target trials) to estimate safety margins on 2-target trials. Fig 4b in the main paper shows the results of this prediction (and it now also includes the cross-validated predictions of the refined models as requested in Section 2.4.4 above. Relatedly Fig R1 in this letter shows that, at the group-average level, these predictions for 2-target trial behavior in both Expt 2a and Expt 2b are essentially identical whether they are based solely on the safety margins observed on 1-target trials or on these safety margins corrected for the relative motor variabilities on 1-target and 2-target trials.

      5) On several occasions (e.g. line 131), the authors mention that their result prove that humans form a single motor plan. They don't have any evidence for this specific aspect as they can only see the plan that is expressed. They can prove that the latter is linked to performance optimization and not to the motor averaging one. But the absence of motor averaging does not preclude the existence of other motor plans…. Line 325 is the right interpretation.

      Thanks for catching this. We agree and have now revised the text accordingly (see for example, lines 53, 134, and 693-695 in the revised manuscript).

      6) Line 228: the authors mention that there is no difference in adaptation between training and test periods but this does not seem to be true for the central target. How does that affect the interpretation of the 2-target trials data ? Would that explain the remaining small discrepancy between the refined PO prediction and the data (Fig.2f) ?

      There must be some confusion here. The adaptation levels in the training period and the test period data from the central target are indeed quite similar, with only a <10% nominal difference in adaptation between them that is not close to statistically significant (p=0.14). We also found similar adaptation levels between the training and test epochs for the lateral targets (p=0.65 for the left target and p=0.20 for the right target). We further note that the PO predictions are based on test period data. And so, even if there were a clear decrease in adaptation between training and test periods, it would not affect the fidelity of the predictions or present a problem, except in the extreme hypothetical case where the reduction was so great that the test period adaptation was not clearly different from zero (as that would infringe on the ability of the paradigm to make clearly opposite predications for the MA and PO model) – but that is certainly not the case in our data.

      Reviewer #3:

      In this study, Alhussein and Smith provide two strong tests of competing hypotheses about motor planning under uncertainty: Averaging of multiple alternative plans (MA) versus optimization of motor performance (PO). In this first study, they used a force field adaptation paradigm to test this question, asking if observed intermediate movements between competing reach goals reflected the average of adapted plans to each goal, or a deliberate plan toward the middle direction. In the second experiment, they tested an obstacle avoidance task, asking if obstacle avoidance behaviors were averaged with respect to movements to non-obstructed targets, or modulated to afford optimal intermediate movements based on a commuted "safety margin." In both experiments the authors observed data consistent with the PO hypothesis, and contradictory of the MA hypothesis. The authors thus conclude that MA is not a feasible hypothesis concerning motor planning under uncertainty; rather, people appear to generate a single plan that is optimized for the task at hand.

      I am of two minds about this (very nice) study. On the one hand, I think it is probably the most elegant examination of the MA idea to date, and presents perhaps the strongest behavioral evidence (within a single study) against it. The methods are sound, the analysis is rigorous, and it is clearly written/presented. Moreover, it seems to stress-test the PO idea more than previous work. On the other hand, it is hard for me to see a high degree of novelty here, given recent studies on the same topic (e.g. Haith et al., 2015; Wong & Haith, 2017; Dekleva et al., 2018). That is, I think these would be more novel findings if the motor-averaging concept had not been very recently "wounded" multiple times.

      We thank the reviewer for the kind words and for their help with this manuscript.

      The authors dutifully cite these papers, and offer the following reasons that one of those particular studies fell short (I acknowledge that there may be other reasons that are not as explicitly stated): On line 628, it is argued that Wong & Haith (2017) allowed for across-condition (i.e., timing/spacing constraints) strategic adjustments, such as guessing the cued target location at the start of the trial. It is then stated that, "While this would indeed improve performance and could therefore be considered a type of performance-optimization, such strategic decision making does not provide information about the implicit neural processing involved in programming the motor output for the intermediate movements that are normally planned under uncertain conditions." I'm not quite sure the current paper does this either? For example, in Exp 1, if people deliberately strategize to simply plan towards the middle on 2-target trials and feedback-correct after the cue is revealed (there is no clear evidence against them doing this), what do the results necessarily say about "implicit neural processing?" If I deliberately plan to the intermediate direction, is it surprising that my responses would inherit the implicit FF adaption responses from the associated intermediate learning trials, especially in light of evidence for movement- and/or plan-based representations in motor adaptation (Castro et al., 2011; Hirashima & Nozacki, 2012; Day et al., 2016; Sheahan et a., 2016)?

      The reviewer has a completely fair point here, and we agree that the experiments in the current study are amenable to explicit strategization. Thus, without further work, we cannot claim that the current results are exclusively driven by implicit neural processing.

      As the reviewer alludes to below, the possibility that the current results are driven by explicit processes in addition to or instead of implicit ones does not directly impact any of the analyses we present – or the general finding that performance-optimization, not motor averaging, underlies motor planning during uncertainty. Nonetheless, we have added a section in the discussion section to acknowledge this limitation. Furthermore, we highlight previous work demonstrating that restriction of movement preparation time suppresses explicit strategization (as the reviewer hints at below), and we suggest leveraging this finding in future work to investigate how motor output during goal uncertainty might be influenced under such constraints. This portion of the discussion section is quoted below:

      “An important consideration for the present results is that sensorimotor control engages both implicit and explicit adaptive processes to generate motor output47. Because motor output reflects combined contributions of these processes, determining their individual contributions can be difficult. In particular, the experiments in the present study used environmental perturbations to induce adaptive changes in motor output, but these changes may have been partially driven by explicit strategies, and thus the extent to which the motor output measured on 2-target trials reflects implicit vs explicit feedforward motor planning requires further investigation. One method for examining implicit motor planning during goal uncertainty might take inspiration from recent work showing that in visuomotor rotation tasks, restricting the amount of time available to prepare a movement appears to limit explicit strategization from contributing to the motor response48–51. Future work could dissociate the effects of MA and PO on intermediate movements in uncertain conditions at movement preparation times short enough to isolate implicit motor planning.”

      In that same vein, the Gallivan et al 2017 study is cited as evidence that intermediate movements are by nature implicit. First, it seems that this consideration would be necessarily task/design-dependent. Second, that original assumption rests on the idea that a 30˚ gradual visuomotor rotation would never reach explicit awareness or alter deliberate planning, an assumption which I'm not convinced is solid.

      We generally agree with the reviewer here. We might add that in addition to introducing the perturbation gradually, Gallivan and colleagues enforced a short movement preparation time (325ms). However, we agree that the extent to which explicit strategies contribute to motor output should clearly vary from one motor task to another, and on this basis alone, the Gallivan et al 2017 study should not be cited as evidence that intermediate movements must universally reflect implicit motor planning. We have explained this limitation in the discussion section (see quote below) and have revised the manuscript accordingly.

      “We note that Gallivan et al. 2017 attempted to control for the effects of explicit strategies by (1) applying the perturbation gradually, so that it might escape conscious awareness, and (2) enforcing a 325ms preparation time. Intermediate movements persisted under these conditions, suggesting that intermediate movements during goal uncertainty may indeed be driven by implicit processes. However, it is difficult to be certain whether explicit strategy use was, in fact, effectively suppressed, as the study did not assess whether participants were indeed unaware of the perturbation, and the preparation times used were considerably larger than the 222ms threshold shown to effectively eliminate explicit contributions to motor output."

      The Haith et al., 2015 study does not receive the same attention as the 2017 study, though I imagine the critique would be similar. However, that study uses unpredictable target jumps and short preparation times which, in theory, should limit explicit planning while also getting at uncertainty. I think the authors could describe further reasons that that paper does not convince them about a PO mechanism.

      We had omitted a detailed discussion of the Haith et al 2015 study as we think that the key findings, while interesting, have little to do with motor planning under uncertainty. But we now realize that we owe readers an explanation of our thoughts about it, which we have now included in the Discussion. This paragraph is quoted below, and we believe it provides a compelling reason why the Haith et al. 2015 study could be more convincing about PO for motor planning during uncertainty.

      “Haith and colleagues (2015) examined motor planning under uncertainty using a timed-response reaching task where the target suddenly shifted on a fraction (30%) of trials 150-550ms] before movement initiation. The authors observed intermediate movements when the target shift was modest (±45°), but direct movements towards either the original or shifted target position when the shift was large (±135°). The authors argued that because intermediate movements were not observed under conditions in which they would impair task performance, that motor planning under uncertainty generally reflects performance-optimization. This interpretation is somewhat problematic, however. In this task, like in the current study, the goal location was uncertain when initially presented; however, the final target was presented far enough before movement onset that this uncertainty was no longer present during the movement itself, as evidenced by the direct-to-target motion observed when the target location was shifted by ±135°. Therefore the intermediate movements observed when the target location shifted by ±45° are unlikely to reflect motor planning under uncertain conditions. Instead, these intermediate movements likely arose from a motor decision to supplement the plan elicited by the initial target presentation with a corrective augmentation when the plan for this augmentation was certain. The results thus provide beautiful evidence for the ability of the motor system to flexibly modulate the correction of existing motor plans, ranging from complete inhibition to conservative augmentation, when new information becomes available, but provide little information about the mechanisms for motor planning under uncertain conditions.”

      If the participants in Exp 2 were asked both "did you switch which side of the obstacle you went around" and "why did you do that [if yes to question 1]", what do the authors suppose they would say? It's possible that they would typically be aware of their decision to alter their plan (i.e., swoop around the other way) to optimize success. This is of course an empirical question. If true, it wouldn't hurt the authors' analysis in any way. However, I think it might de-tooth the complaint that e.g. the Wong & Haith study is too "explicit."

      The participants in Expts 1, 2a, and 2b were all distinct, so there was no side-switching between experiments per se. However, the reviewer’s point is well taken. Although we didn’t survey participants, it’s hard to imagine that any were unaware of which side they traveled around the obstacle in Expt 2. Certainly, there was some level of awareness in our experiments, and while we would like to believe that the main findings arose from low-level, implicit motor planning, we frankly do not know the extent to which our findings may have depended on explicit planning. We have now clarified this key point and discussed it’s implications in the discussion section of the revised paper. That said, we do still think that the direct-to-target movements in the Wong and Haith study were likely the result of a strategic approach to salvaging some reward in their task. Please see the new section in the discussion titled: “Implicit and explicit contributions to motor planning under uncertainty” which for convenience is copied below:

      Implicit and explicit contributions to motor planning under uncertainty An important consideration for the present results is that sensorimotor control engages both implicit and explicit adaptive processes to generate motor output. Because motor output reflects combined contributions of these processes, determining their individual contributions can be difficult. In particular, the experiments in the present study used environmental perturbations to induce adaptive changes in motor output, but these changes may have been partially driven by explicit strategies, and thus the extent to which the motor output measured on 2-target trials reflects implicit vs explicit feedforward motor planning requires further investigation. One method for examining implicit motor planning during goal uncertainty might take inspiration from recent work showing that in visuomotor rotation tasks, restricting the amount of time available to prepare a movement appears to limit explicit strategization from contributing to the motor response. Future work could dissociate the effects of MA and PO on intermediate movements in uncertain conditions at movement preparation times short enough to isolate implicit motor planning.

      We note that Gallivan et al. 2017 attempted to control for the effects of explicit strategies by (1) applying the perturbation gradually, so that it might escape conscious awareness, and (2) enforcing a 325ms preparation time. Intermediate movements persisted under these conditions, suggesting that intermediate movements during goal uncertainty may indeed be driven by implicit processes. However, it is difficult to be certain whether explicit strategy use was, in fact, effectively suppressed, as the study did not assess whether participants were indeed unaware of the perturbation, and the preparation times used were considerably larger than the 222ms threshold shown to effectively eliminate explicit contributions to motor output.

    1. Author Response:

      Reviewer #1:

      This meta analysis addresses a double-edged sword in evolutionary biology. Group living may be beneficial for many reasons, but has costs in terms of increased rates of parasitism. Furthermore, if groups are highly related, parasites that are genetically able to infect on member of the group may be able to infect all of them, putting the entire group at risk. In the her presented meta analysis, many original studies working on questions related to parasitism, relatedness and group living are brought together in one unifying framework. The authors find that indeed, group living can facilitate the spread of infectious diseases. However, they also find that the negative effects of disease can be overcompensated by the benefits of being social. The authors stress that experimental studies are necessary to disentangle these effects. The study is of high standard and well-conducted. The take home message is clear and of general interest.

      The study highlights that experimental work is important to understand the relationship between parasitism, relatedness and living in groups. However, I missed an important aspect here. Experiments tend to stretch factors (sometimes to extremes), which may go square to the biology of the species. In some cases, this results in non-social organisms to be pressed in a group-environment. For example, the monoculture effect as we know it in agriculture is highly artificial. Clonal lines of crop are planted in high density, promising high yield, if pathogens stay out. These plants do not have a history of evolving mechanisms to deal with the effect of high relatedness. In contrast animals living in social groups, may never experience setting with non-relatives. Social insects evolved to deal with parasites by expressing specific adaptations, such a grooming, hygiene and social structure in the colony. Many social insects may never experience conditions of low relatedness. Thus, I expect it makes a difference if you experimentally force a non-social organism to be social, or a social organism to be asocial. I would be happy if this factor could be included in the reasoning, and maybe even analyzed quantitatively. For example, I would expect that non-social species made artificially to grow in groups of relatives, suffer much more from parasites than typical social animals with the same degree of relatedness.

      This is an important point. One of the main motivations for conducting this study was to test if species that typically live with kin have evolved adaptations to minimise any increase in susceptibility to pathogens brought about by living in groups with relatives. We therefore collected data on whether species are: a) typically social or non-social, and b) average levels of relatedness between individuals in groups under natural conditions (see Methods section ‘Data on species characteristics’).

      a) Testing differences between social and non-social species. All species included in our dataset had some part of their life-cycle where they were social (note we specifically excluded any studies on non-natural systems such as crops and domesticated species). This meant that only comparisons between species that are obligately social versus species that are social during specific life stages could made. This is problematic as assumptions need to be made about the strength of selection during different life cycle phases. For example, mortality caused by pathogens maybe particular high during the social juvenile phases of otherwise non-social species, resulting in selection for adaptions to reduce pathogen spread being similar to species that are obligately social. An additional problem was that experimental studies (a key factor highlighted by our analyses) of species that are non-social apart from specific life-cycle phases were rare (n=1, Rana latastei) precluding any meaningful comparisons.

      We have now added the following sentences to the methods to clarify this point:

      “We also collected data on whether species always lived in social groups (‘obligately social’) or whether species were only social during specific life stages (‘periodically social’). However, it was not possible to analyse this data as experimental manipulations of pathogens, a key factor influencing the relationship between relatedness and mortality and pathogen abundances, were only performed for one periodically social species (Rana latastei)” (Lines 425-430).

      b) Testing differences between species that typically live with kin and non-kin. The third aim of the paper was to test if species that typically live with kin have evolved to deal with pathogens as the referee suggests. We found that species that live with kin, such as social insects, have similar rates of mortality and pathogen abundances to species that live with non-kin (Figure 3). However, species that typically live with kin had lower rates of mortality in groups with higher relatedness when pathogens were absent compared to species that typically live with non-kin. This suggests that pathogens represent an omnipresent threat to all species, but that adaptations have evolved to reap the benefits of living with relatives in social species.

      In summary, as suggested by the referee we analysed whether “species made artificially to grow in groups of relatives, suffer much more from parasites than typical social animals with the same degree of relatedness” as much as was possible given the limitations of the published data. We have edited parts of the manuscript to emphasise that this was a key aim of the paper (Lines 66-74; 92-94; 136-153).

      The term (and concept) "monoculture" is typically used to describe clonal populations, predominantly in agricultural settings. I understand that the authors like to expand this term (as have others done before) to include social animals. However, for most people this would be a change in terminology and may cause misunderstandings. I would prefer if you could stick with the mainstream terminology and avoid pressing this concept into a new costume.

      We included the term “monoculture effect” to facilitate links to existing literature, both in the fields of agriculture and evolutionary biology (e.g. Ekroth at al 2019). While we think that making the reader aware of relevant work in other fields is valuable, we understand its prominence could give the impression that we included agricultural studies. Therefore, we have removed it from the abstract, but have chosen to keep one reference to the monoculture effect in the introduction.

      Reviewer #2:

      This study uses an unusually broad comparative data set to disentangle the positive (relatedness) and negative (pathogen pressure) effects of living in groups. The authors largely succeed in this task even though the data do not allow answers to all outstanding issues. Not unexpectedly, experimental manipulation studies appear to be most informative. The results are broadly consistent with expectations based on kin-selection theory and clarify the effects of a number of important covariables. The study is thoroughly executed and innovative in its approach. I expect this study to be interesting for a broad readership and this method of searching literature data to have considerable impact. Some suggestions strengthening this paper are below:

      • I think it would be helpful for readers to have the Discussion start with a few lines on what your study achieved in language that is complementary to the abstract, perhaps followed by a brief explanation of which angles/ambiguities/challenges you will be taking up in the paragraphs to follow.

      We have now edited the beginning of the discussion in accordance with this suggestion. It reads:

      “Our analyses show that pathogens can increase rates of mortality in groups of relatives. The detrimental effects of pathogens were, however, counteracted by high relatedness reducing mortality when pathogens were rare, particularly in species that live in kin groups. Such contrasting effects of relatedness meant that experimental manipulations were crucial for detecting the costs and benefits of living with relatives when the presence of pathogens varied. Additionally, high relatedness resulted in more even abundances of pathogens across groups, but more variable rates of mortality, highlighting the importance of population genetic structure in explaining the epidemiology of diseases. We discuss these findings in relation to the environments favouring the evolution of different social systems, the mechanisms that have evolved to prevent disease spread in social groups, and the types of study system where more experimental data are required” (Lines 171-181).

      • The rationale of this study is (often implicitly) that tendencies to live with relatives or not is a continuous variable. This surprised me because the senior author has written influential papers showing that family groups are different from non-family groups. In some contexts of this study it seems crucial to make that distinction. For example, a number of data points come from studies of social insects (bumblebees, honeybees, ants). Here, living with non-relatives is not an option but a given. It is well documented that these caste-differentiated colonies originated from ancestors that had exclusively full-sib colonies, so maximal relatedness was ancestral and became only diluted secondarily in some lineages. Would it be possible to check statistically whether the social insect data points always showed the same pattern as the other data points? That would test whether it matters that low relatedness is either derived or ancestral (as I think we implicitly assume to be the case in all other organisms).

      The primary studies included in our analyses were conducted on a diverse set of species where relatedness was often reported and measured on a continuous scale (range 0 to 1). Our rationale and statistical treatment of the data (the effect size of Pearson’s correlation coefficient captures continuous variation in relatedness) reflect the measures reported in the primary studies. This does not mean, however, that we believe groups evolve from along a continuum of within-group relatedness.

      As the referee points out there are two distinct routes to group formation that set the limits to relatedness within groups. In species, where offspring do not disperse from their natal patch (‘family’ groups) the opportunity for interacting with relatives is high, whereas in species where groups form after individuals disperse from natal patches (‘non-family’ groups) relatedness is typically low. Some variation in within-group relatedness subsequently arises within these two categories because of a number of modifying factors (breeder turnover, number of males and females founding groups, ‘budding’ dispersal and so on). However, the potential for kin selection to favour adaptations, including those that limit pathogen spread, remains fundamentally different between family and non-family groups. We tried to capture such differences by classifying species as typically living with kin and non-kin using life- history information (dispersal patterns, mating systems) and direct estimates of relatedness.

      We used the terms kin and non-kin rather than family and non-family because across such a diverse set of study species, with variable types of information (e.g. some species only had molecular genetic estimates of relatedness others had only life-history information), it was not possible to ascertain exactly how groups form for each species. Nevertheless, our analyses are aimed at addressing if species that typically live with kin, such as the social insects, have more effective mechanisms for reducing the impact of pathogens amongst relatives than species that live with non-kin.

      The referee makes an additional valuable point that for social insects ancestral levels of relatedness in groups are known to be high, with lower levels of relatedness being derived. Examining whether species with low versus high contemporary estimates of relatedness may therefore shed light on the importance of current versus past evolutionary responses to pathogens.

      Unfortunately, the sample sizes are just too limited to conduct any meaningful analyses. Only one species of social insect in our dataset was classified as living with non-kin (r <0.25). We also examined finer scale predictors of relatedness applicable to social insects (queen mating frequency: monogamous (r = 0.5) versus polyandrous (r > 0.25 & <0.5)). Sample sizes for crucial comparisons were again too small for formal analysis (Number of monogamous species with experimental data: pathogens present = 3, Pathogens absent = 3. Number of polyandrous species with experimental data: pathogens present = 2, Pathogens absent = 1).

      We have extended the discussion highlighting that more work on species with ancestral and derived levels of high and low levels of relatedness will aid our understanding of the evolutionary history of adaptations to minimise pathogen spread in groups (Lines 248-250). We have also checked and edited the manuscript to remove any implication that groups originate from a continuum of relatedness.

      • I wondered whether you could (interpretationally, i.e. in the discussion) do more with comparative data on pathogen pressure in the wild. The 1987 Hamilton chapter that you cite has lots of interesting natural history observations, which are now often supported by better data. I think he speculates about how altruistic soldiers evolved in aphids and thrips and connects their sociality with living in their own food (galls), which should mean low parasite pressure. The same is true for the lower termites. Would your results allow you to conjecture that all independent lineages that evolved differentiated castes (only possible in families with full siblings; or clones as in aphids) likely had to do that in disease free habitats?

      This is an interesting point and an area where further research would be very valuable. It fits in nicely with our current discussion of how the evolution of groups with high relatedness maybe more likely to occur in environments where pathogens are rare. This was rather vertebrate focused before and so we are grateful for the referee’s suggestion, which has broadened this point. The section now reads:

      “Parallel arguments have been made for social insects. Species with sterile worker castes, that only evolved in groups with high levels of relatedness, are thought to have arisen in environments protected from pathogens (Hamilton 1987). For example, sterile soldier castes have evolved at least six independent times in clonal groups of aphids, and the majority of these cases form galls that provide protection against pathogens (Hamilton, 1987; Stern and Foster, 1996). Escape from pathogens may therefore be a general feature governing the evolutionary origin, as well as the current ecological niches, of species living in highly related groups” (Lines 190-197).

      • I think some effort should be made to make Figures 2,3 and 4 easier to interpret. The ultra-brief acronyms along the y-axis take a while to digest and to realize the nestedness of the analyses. Could you give one piece of information on the left axis (spelled out like 'experimental data' and 'observational data' and the other piece on the right axis (spelled out as 'pathogens absent' and pathogens present'? It would also be helpful if the reader could fully understand the figures without first having to go through the entire method section, so I recommend you extend the legend to explain: 1. What Zr stands for. 2. What the directionality is (so the cryptic line just below Zr can become a proper sentence in the legend), and 3. The rationale of the multifactorial analyses with four or eight combinations (as you describe in the methods; I believe Figure 4 is an example of eight, but this remains rather hazy).

      Many thanks for these suggestions. We have now revised the axis labels and figure legends to improve interpretability.

    1. Author Response:

      Reviewer #1:

      Skrapits et al., report on a population of GnRH neurons in the putamen that dwarfs the commonly studied hypothalamic population that regulates fertility. This laboratory performs very careful immunohistochemical studies and has included a number of controls to support this claim. These primarily include comparison of an overlapping staining pattern with multiple polyclonal antibodies, in situ hybridization and measurements of GnRH decapeptide with LC-MS/MS. While these are supportive, the question of the degradation product GnRH1-5, which has been brought up as a potential caveat in prior studies of extrahypothalamic populations as pointed out by the authors, does remain. This cleavage product was detected in their samples from the forebrain, albeit at lower levels. Even the identification of a large population of cells producing the cleavage product would be of interest, but knowledge of the GnRH-related peptides in these cells is needed to point future studies in a fruitful direction.

      We are grateful to the Reviewer for the careful revision of our manuscript and we appreciate the critical remarks and helpful suggestions. Our point-by-point responses follow the reviewer’s comments.

      These immuno studies present a more complete and state-of-the art characterization of populations that have been hinted at in past work not only in primates, which is cited, but also in rodents (Skynner et al., J Neurosci 19:5955-5966), citation of which was overlooked. The authors should also comment on the extended exposure to primary antibodies in these studies, which has been reported to increase the number of GnRH neurons visualized during development in rodents (Wu et al., J Neurobiol 1997 Dec;33(7):983-98.) Also relevant to this point the statement on lines 379-380 is incorrect; the fluorescence of eGFP in these regions in the GnRH-GFP mice used has indeed been reported (Endocrinology, September 2008, 149(9):4596-4604) as has GnRH-GFP signal in another line of mice (Prog Neurobiol 63: 673- 686), and cells were also identified using GnRH promoter to drive beta galactosidease (J Neurosci 19:5955-5966).

      Thank you very much for bringing these papers to our attention. We have read them carefully. Indeed, all of these transgenic models suggested extrahypothalamic GnRH expression in the developing mouse brain. However, none of them mentioned or showed GnRH (transgene) expression specifically in the CPU which would be more closely related to our work. As we have agreed to entirely omit the mouse data from the revised manuscript, citing these articles would not be relevant any more to our work.

      The authors should also comment on the extended exposure to primary antibodies in these studies, which has been reported to increase the number of GnRH neurons visualized during development in rodents (Wu et al., J Neurobiol 1997 Dec;33(7):983-98.).

      We have improved the Methods section by adding missing technical details. In the revised manuscript we indicate that the use of extended exposure times was only necessary when using the 100-μm-thick sections. We believe that this approach is still less sensitive than the standard protocol performed on much thinner floated sections ensuring better antibody penetration. Unfortunately, the use of thick sections was a necessary compromise to reduce excessive overcounting which would be difficult to correct using Abercrombies principle where optimal section thickness should exceed the Z dimension of the counted particles (Figure 1 – figure supplement 1). In our revised manuscript we have also made a statement in Discussion that the quantification results on thick sections likely represent an underestimate of real extrahypothalamic GnRH cell numbers, due to multiple technical factors that would be difficult to eliminate when studying postmortem tissues (Lines 308-311).

      The authors also support their claims with RNAseq data. Performing these studies in human tissues is difficult because of the difficulty in controlling conditions and the data largely support their claims but some of the admitted quality limitations may warrant being more circumspect in their conclusions.

      To extend their findings beyond enhanced anatomical characterization, the authors perform electrophysiologic studies of both putamen GnRH neurons and other putamen neurons identified in young mice. These data are not currently presented in a manner that allows a reader to determine if their conclusions from these studies are justified. Past work on GnRH action on hypothalamic GnRH neurons has indicated a dose dependence (Endocrinology 145(2):728-735), thus the current work should also examine dose effects before a putative direction of action for GnRH can be posited. Discussion of the central localization of GnRH receptors from other studies relative to their findings should also be discussed (Endocrinology 152: 1515-1526).

      We appreciate these critical comments which we agree with. We have accepted to remove the neonatal mouse study from the revised manuscript in view of its poor relevance to the adult human putamen. We have also restricted the citation of rodent literature in the revised manuscript which now contains human studies only.

      In the discussion, possible therapeutic actions of GnRH analogues are suggested. While exciting, this is not new and prior work examining patients on analogue therapy (for example Almeida et al., Psychoneuroendocrinology.2004;29(8):1071-1081 and Gandy et al JAMA.2001;285:2195-2196) should be cited.

      We appreciate these suggestions, too. To avoid being too speculative, we now cite the suggested article by Almeida et al. in the context of the predictable reproductive side effects of GnRH analogues: “...reproductive side effects of GnRH analogues would limit the use of this strategy in clinical practice...” (Lines 424-427).

      Reviewer #2:

      The study beautifully illustrates the detection of a rather large population of GnRH neurons in the basal ganglia, by a convincing combination of neuroanatomical techniques in human brain specimens; techniques which are mastered by the authors and are well suited in terms of characterization of the GnRH neuronal system. The more conventional neuroanatomical techniques are further backed-up by modern molecular (RNA-seq) and biochemical (HPLC-MS) approaches. In addition, incorporation of a mouse model expressing GFP under the GnRH promoter adds some mechanistic dimension to the descriptive contents of the paper, which is a potential advantage, albeit it is not always clear that mouse and human data are fully convergent.

      We are grateful to the Referee for the work devoted to the careful review of our manuscript.

      Despite the strengths of the paper, this referee has identified several limitations, which need further elaboration, in order to avoid over-interpretation of the current dataset. Among these weaknesses, the authors should better clarify the number of individuals used for each analysis, and how representative the current findings are for both sexes and range of ages (and even pathological conditions) in humans.

      The number of individuals is now clearly stated for all studies in the Results and the Methods sections. Tissue sources for each study are stated in Supplementary File 1. Several biological and methodological factors that may contribute to the heterogeneous labeling of human tissue samples are recognized (Lines 447-449). Unfortunately, it would not be possible to distinguish the effects of sex, age, health conditions, perimortal period and postmortem time on the detection of extrahypothalamic GnRH neurons in the different samples used in our study.

      In addition, further discussion about the potential origin and relation (similarities and dissimilarities) with the hypophysiotropic population of GnRH neurons is deserved.

      To address this comment, we have extended the discussion to better support our conclusion that extrahypothalamic GnRH neurons, as opposed to hypothalamic GnRH cells, are unlikely to originate from the olfactory placodes. We discuss that while Quanbeck et al. (1997) initially suggested that the equivalent neurons in the embryonic/fetal monkey brain might originate from the dorsal olfactory placode before olfactory pit formation, the later report of this laboratory by Terasawa and co-workers suggested that the increasing number of extrahypothalamic GnRH neurons might rather be derived from the ventricular wall of the telencephalic vesicle (Lines 305- 340).

      Further, combination of human and mouse data is difficult at some places, since the mouse model do not express GFP in adulthood, and even no confirmation is provided that striatum neurons expressing GFP are actually producing GnRH at the neonatal period in the mouse.

      In accordance with the above and other critical comments about the mouse study and also considering its limited relevance to the adult human brain, we have agreed to omit all mouse data from this manuscript.

      Finally, although the implications of current findings are potentially large, the extended discussion of the present dataset in the context of neurological disease makes the paper over-speculative.

      We appreciate the critical comment. We have eliminated the speculation from the Abstract and Discussion about the use of GnRH/GnRHR1 signaling to influence neurological disorders. Likely reproductive side effects of this approach have been brought up (Lines 424-427). An extended Discussion section points out that the “Receptor profile of human cholinergic interneurons may offer new therapeutics targets to treat neurodegenerative disorders” (Lines 397-427).

      Reviewer #3:

      The impetus for the study was the relatively recent demonstration by Casoni et al that, in man, a large number of GnRH neurons (approx. 8000) migrating from the olfactory placode during embryonic development follow a dorsal migratory route that takes them towards pallial and or subpallial structures, rather than along the more established ventral pathway that leads them to the hypothalamus where they subserve reproduction. The primary purpose of the experiments described were to determine the fate of the embryonic GnRH neurons that follow this ventral pathway and to begin to examine the biology of this interesting group of cells.

      By and large, the varied array of contemporary imaging and molecular methods used are well described and the results are robust. Indeed, the application of such an armamentarium of approaches to study GnRH neurons in the human brain is a major strength of the paper.

      Quantification of extrahypothalamic GnRH neuron number was performed using IHC with a guinea pig antibody, #1018. However, it appears that the standard procedure to establish specificity of an antibody, namely pre-absorption with authentic GnRH in the case of #1018, was not performed here nor presented in the original paper cited as describing this antibody (Hrabovszky et al 2011).

      We are also very grateful to this Reviewer for the time and work invested in the critical review of our manuscript. We appreciate the suggestion to carry out a pre-absorption validation of antibody #1018, in addition to the positive control studies. We now report that pre-absorption of the primary antibody working solution with 0.1 μg/ml GnRH decapeptide eliminated all labeling from the human putamen. We have described (Lines 89-90; Lines 529-533; Lines 937-938) and illustrated (Figure 2A) these negative results.

      The significance of the electrophysiological data derived from brain slices containing caudate-putamen (CPU) of a transgenic mouse (GnRH-GFP), in which GFP expressing cells were observed transiently in the CPU around postnatal day 4-7, is unclear. Regardless of what the outcome of the mouse experiments might have been, it seems highly unlikely that the discussion and implications of the data obtained from extrahypothalamic GnRH neurons in the human brain would have changed. Also the authors themselves "recognize that the neonatal mouse model has severe limitations."

      In the light of criticism to the mouse study by the reviewers and the editor, we have agreed to remove the mouse electrophysiology and morphology blocks from the revised manuscript.

      The aims of the authors have been more than realized: they have 1) provided novel and convincing characterization of extra-hypothalamic GnRH neurons in the human brain, 2) discovered that this population of neurons (>100,000) is far larger than previously considered, and 3) tentatively suggest that the additional extrahypothalamic GnRH neurons they have discovered may not originate from the olfactory placode,

      The authors findings will almost certainly lead to further examination of the function of extrahypothalamic GnRH in normal brain function and neurodegenerative disorders associated with aging, which in turn may lead to new therapeutic applications of GnRH1 receptor ligands.

      Returning to the authors suggestion that the additional extrahypothalamic GnRH neurons they have discovered may not originate from the olfactory placode, the Paragraph discussing this issue (beginning Line 319) confused me. Here, the authors state that it is unlikely that the large number of extrahypothalamic GnRH neurons in the putamen and related areas are identical to the 8000 observed by Casoni et al (2016) along the dorsal migratory route (the authors original aim was to follow the fate of these cells). Instead they suggest that they are homologus to the GnRH cells that, in the monkey leave, the olfactory placode before E30 (termed "early" GnRH neurons). If "early" GnRH neurons originate from the olfactory placode then why are the large numbers of GnRH neurons observed in the human Pu, and argued unlikely to be of placode origin, considered to be homologus to "early" GnRH neurons. In this regard, the relationship between the ChAT negative GnRH neurons in the nasal region of the GW11 human fetus and the "early" and "late" GnRH cells in the monkey fetus should be provided. In clarifying the above issue, the fact that Terasawa's studies utilized fetal rhesus monkeys should be explicitly stated in the Introduction and reinforced when they are discussed with the author's results. As written, the reader does not discover the developmental origin of Terasawa's monkeys until the Discussion.

      We recognize the problem in our writing. We have improved the revised Results and Discussion sections in order to better articulate that extrahypothalamic GnRH neurons, either in monkeys or humans, are very unlikely to be of placodal origin. This conclusion is based on the high number of neurons both in monkeys (Terasawa et al.) and humans (our present study) (Lines 305-340).

      In the Discussion the authors refer to GnRH deficient patients (Chan 2011). Homozygous mutations of GnRH1 are very rare and therefore it's perhaps not surprising that patients with such mutation have shed little light on function of extrahypothalamic GnRH. However, GnRHR1 loss of function mutations are much more common and have been known for nearly 25 years. Surely, a review of this literature would be worthwhile to see if any insight into dysfunction unrelated to reproduction emerges.

      We greatly appreciate this comment. We have analyzed the clinical literature. We found that synkinesia, which would be a challenging candidate symptom in striatal dysfunctions, is quite common in Kallmann syndrome. However, synkinesia or other non-reproductive dysfunctions have not been characterized in GnRH deficient patients (Chan, 2011) or in the more common cases of GnRHR1 deficiency (Seminara, 1998 #511; Chevrier, 2011 #510), as we recognize (Lines 400-402).

    1. Author Response:

      Reviewer #1:

      In this paper, authors did a fine job of combining phylogenetics and molecular methods to demonstrate the parallel evolution across vRNA segments in two seasonal influenza A virus subtypes. They first estimated phylogenetic relationships between vRNA segments using Robinson-Foulds distance and identified the possibility of parallel evolution of RNA-RNA interactions driving the genomic assembly. This is indeed an interesting mechanism in addition to the traditional role for proteins for the same. Subsequently, they used molecular biology to validate such RNA-RNA driven interaction by demonstrating co-localization of vRNA segments in infected cells. They also showed that the parallel evolution between vRNA segments might vary across subtypes and virus lineages isolated from distinct host origins. Overall, I find this to be excellent work with major implications for genome evolution of infectious viruses; emergence of new strains with altered genome combination.

      Comments:

      I am wondering if leaving out sequences (not resolving well) in the phylogenic analysis interferes with the true picture of the proposed associations. What if they reflect the evolutionary intermediates, with important implications for the pathogen evolution which is lost in the analyses?

      We fully appreciate this concern and have explored this extensively. One principle assumption underlying the approach we outline in this manuscript is that the trees analyzed are robust and well- resolved. We use tree similarity as a correlate for relationships between genomic segments, so the trees must be robust enough to support our claims, as we have clarified in lines 128-131. We initially set out to examine a broader range of viral isolates in each set of trees, but larger trees containing more isolates consistently failed to be supported by bootstrapping. Bootstrapping is by far the most widely used methodology for demonstrating support for tree nodes. We provided the closest possible example to the trees presented in this manuscript for comparison. We took all 84 H3N2 strains from 2005-2014 analyzed in replicate trees 1-7 and collapsed these sequences into one tree for each vRNA segment. Figure X-A, specifically provided for the reviewers, illustrates the resultant collapsed PB2 tree, with bootstrap values of 70 or higher shown in red and individual strains coded by cluster and replicate. As expected, the majority of internal nodes on such a tree are largely unsupported by bootstrapping, indicating that relaxing our constraint of 97% sequence identity increases the uncertainty in our trees.

      Because we agree with Reviewers #1 and #3 on the critical importance of validating our approach, we determined the distances between these new collapsed trees using a complementary approach, Clustering Information Distances (CID), that is independent of tree size (Supplemental Figure 4B and Figure X-B & X-C). Larger trees containing all sequences yielded pairwise vRNA relationships that are largely similar to those we report in the manuscript (R2 = 0.6408; P = 3.1E-07; Figure X-B vs. X-C), including higher tree similarity between PB2 and NA over NS. This observation strengthens the rationale to focus on these segments for molecular validation and correlate parallel evolution to intracellular localization in our manuscript (Figure 7). However, tree distances are generally higher in Figure X-C than in Figure X-B, which we might expect if poorly supported nodes in larger trees artificially inflate phylogenetic signal. Given the overall similarity between Figures X-B and X-C, both methods yield largely comparable results. We ultimately relied upon the more robust replicate trees with stronger bootstrap support.

      Lines 50-51: Can you please elaborate? I think this might be useful for the reader to better understand the context. Also, a brief description on functional association between different known fragments might instigate curiosity among the readers from the very beginning. At present, it largely caters to people already familiar with the biology of influenza virus.

      We have added additional information to reflect the complexity of intersegmental interactions and the current standing of the field (lines 49-52).

      Lines 95-96 Were these strains all swine-origin? More details on these lineages will be useful for the readers.

      We have clarified that all strains analyzed were isolated from humans, but were of different lineages (lines 115-120).

      Lines 128-132: I think it will be nice to talk about these hypotheses well in advance, may be in the Introduction, with more functional details of viral segments.

      We incorporated our hypotheses regarding tree similarity into the existing discussion of epistasis in the Introduction (lines 74-75 and 89-106).

      Lines 134-136: Please rephrase this sentence to make it more direct and explain the why. E.g. "... parallel evolution between PB1 and HA is likely to be weaker than that of PB1 and PA".

      The text has been modified (lines 165-168).

      Lines 222-223: Please include a set of hypotheses to explain you results? Please add a perspective in the discussion on how this contribute might to the pandemic potential of H1N!?.

      We have added in our interpretation of the results (lines 259-264) and expanded upon this in the Discussion (lines 418-422).

      Lines 287-288: I am wondering how likely is this to be true for H1N1.

      We have expanded on this in the Discussion (lines 409-410).

      Reviewer #2:

      The influenza A genome is made up of eight viral RNAs. Despite being segmented, many of these RNAs are known to evolve in parallel, presumably due to similar selection pressures, and influence each other's evolution. The viral protein-protein interactions have been found to be the mechanism driving the genomic evolution. Employing a range of phylogenetic and molecular methods, Jones et al. investigated the evolution of the seasonal Influenza A virus genomic segments. They found the evolutionary relationships between different RNAs varied between two subtypes, namely H1N1 and H3N2. The evolutionary relationships in case of H1N1 were also temporally more diverse than H3N2. They also reported molecular evidence that indicated the presence of RNA-RNA interaction driving the genomic coevolution, in addition to the protein interactions. These results do not only provide additional support for presence of parallel evolution and genetic interactions in Influenza A genome and but also advances the current knowledge of the field by providing novel evidence in support of RNA-RNA interactions as a driver of the genomic evolution. This work is an excellent example of hypothesis-driven scientific investigation.

      The communication of the science could be improved, particularly for viral evolutionary biologists who study emergent evolutionary patterns but do not specialise in the underlying molecular mechanisms. The improvement can be easily achieved by explaining jargon (e.g., deconvolution) and methodological logics that are not immediately clear to a non-specialist.

      We have clarified or eliminated jargon wherever possible throughout the text.

      The introduction section could be better structured. The crux of this study is the parallel molecular evolution in influenza genome segments and interactions (epistasis). The authors spent the majority of the introduction section leading to those two topics and then treated them summarily. This structure, in my opinion, is diluting the story. Instead, introducing the two topics in detail at the beginning (right after introducing the system) then discussing their links to reassortments, viral emergence etc. could be a more informative, easily understandable and focused structure. The authors also failed to clearly state all the hypotheses and predictions (e.g., regarding intracellular colocalisation) near the end of the introduction.

      We restructured the Introduction with more background on genomic assembly in influenza viruses, as requested by two reviewers (lines 43-52), more discussion of epistasis (lines 58-63) and provided a more thorough discussion of all hypotheses (lines 74-77, 88-92, 94-95, 97-106).

      The authors used Robinson-Foulds (RF) metric to quantify topological distance between phylogenetic trees-a key variable of the study. But they did not justify using the metric despite its well-known drawbacks including lack of biological rational and lack of robustness, and particularly when more robust measures, such as generalised RF, are available.

      We agree that RF has drawbacks. To address this, we performed a companion analysis using the Clustering Information Distance (CID) recently described by Smith, 2020. The mean CID can be found in Figure S4, the standard error of the mean in Figure S5, and networks depicting overall relationships between segments by CID in Figure S7E-S7H. To better assess how well RF and CID correlate with each other across influenza virus subtypes and lineages, we reanalyzed all data from both sets of distance measures by linear regression (Figure 3B, 4B-C, 5B, S6 and S9). Our results from both methods are highly comparable, which we believe strengthens our conclusions. Both analyses are included in the resubmission (lines 86-89; 162; 164; 187-188; 199-200; 207-208; 231-234; 242-244; 466-470).

      Figure 1 of the paper is extremely helpful to understand the large number of methods and links between them. But it could be more useful if the authors could clearly state the goal of each step and also included the molecular methods in it. That would have connected all the hypotheses in the introduction to all the results neatly. I found a good example of such a schematic in a paper that the authors have cited (Fig. 1 of Escalera-Zamudio et al. 2020, Nature communications). Also this methodological scheme needs to be cited in the methods section.

      We provided the molecular methods in a schematic in Figure 1D and the figure is cited in the Methods (lines 310; 440; 442; 456; 501).

      Finally, I found the methods section to be difficult to navigate, not because it lacked any detail. The authors have been excellent in providing a considerable amount of methodological details. The difficulty arose due to the lack of a chronological structure. Ideally, the methods should be grouped under research aims (for example, Data mining and subsampling, analysis of phylogenetic concordance between genomic segments, identifying RNA-RNA interactions etc.), which will clearly link methods to specific results in one hand and the hypotheses, in the other. This structure would make the article more accessible, for a general audience in particular. The results section appeared to achieve this goal and thus often repeat or explain methodological detail, which ideally should have been restricted to the methods section.

      We organized the Methods section by research aims as suggested. However, some discussion of the methods were retained in the Results section to ensure that the manuscript is accessible to audiences without formal training in phylogenetics.

      Reviewer #3:

      The authors sought to show how the segments of influenza viruses co-evolve in different lineages. They use phylogenetic analysis of a subset of the complete genomes of H3N2 or the two H1N1 lineages (pre and post 2009), and use a method - Robinson-Foulds distance analysis - to determine the relationships between the evolutionary patterns of each segment, and find some that are non-random.

      1) The phylogenetic analysis used leaves out sequences that do not resolve well in the phylogenic analysis, with the goal of achieving higher bootstrap values. It is difficult to understand how that gives the most accurate picture of the associations - those sequences represent real evolutionary intermediates, and their inclusion should not alter the relationships between the more distantly related sequences. It seems that this creates an incomplete picture that artificially emphasizes differences among the clades for each segment analyzed?

      Reviewer #1 raised the same concern. Please refer to our response at the beginning of this letter where we address this issue in depth.

      2) It is not clear what the significance is of finding that sequences that share branching patterns in the phylogeny, and how that informs our understanding of the likelihood of genetic segments having some functional connection. What mechanism is being suggested - is this a proxy for the gene segments having been present in the same viruses - thereby revealing the favored gene segment combinations? Is there some association suggested between the RNA sequences of the different segments? The frequently evoked HA:NA associations may not be a directly relevant model as those are thought to relate to the balance of sialic acid binding and cleavage associated with mutations focused around the receptor binding site and active site, length of NA stalk, and the HA stalk - does that show up in the overall phylogeny of the HA and NA segments? Is there co-evolution of the polymerase gene segments, or has that been revealed in previous studies, as is suggested?

      We clarified our working hypotheses in the Introduction (lines 89-106) and what is known about the polymerase subunits (lines 92-93). Our data do suggest that polymerase subunits share similar evolutionary trajectories that are more driven by protein than RNA (lines 291-293; Figure 2A and 6). The point about epistasis between HA and NA arising from indirect interactions is entirely fair, but these studies are nonetheless the basis for our own work. We have clarified the distinction between these prior studies and our own in the text (lines 60-63 and 74-75). Moreover, our protein trees built from HA and NA recapitulate what has been shown previously, which we highlight in the text (lines 293-296; Figure 6 and Figure S10). We also clarified our interpretation of tree similarity throughout the text (lines 165-168; 190-191; 261-264; 323-326; 419-423).

      The mechanisms underlying the genomic segment associations described here are not clear. By definition they would be related to the evolution of the entire RNA segment sequence, since that is being analyzed - (1) is this because of a shared function (seems unlikely but perhaps pointing to a new activity), or is it (2) because of some RNA sequence-associated function (inter-segment hybridization, common association of RNA with some cellular or viral protein)? (3) Related to specific functions in RNA packaging - please tell us whether the current RNA packaging models inform about a possible process. Is there a known packaging assembly process based on RNA sequences, where the association leads to co-transport and packaging - in that case the co-evolution should be more strongly seen in the region involved in that function and not elsewhere? The apparent increased association in the cytoplasm of the subset of genes examined for the single virus looks mainly in the cytoplasm close to the nucleus - suggesting function (2) and/or (3)?.

      It is difficult to figure out how the data found correlates with the known data on reassortment efficiency or mechanisms of systems for RNA segment selection for packaging or transport - if that is not obvious, maybe you can suggest processes that might be involved.

      We provided more context on genomic packaging in the Introduction, including the current model in which direct RNA interactions are thought to drive genomic assembly (lines 43-53). Although genomic segments are bound by viral nucleoprotein (NP), accurate genomic assembly is theorized to be a result of intersegment hybridization rather than driven by viral or cellular protein. We further clarified our hypotheses regarding the colocalization data in the Results section to make the proposed mechanism clearer (lines 313-326).

    1. Author Response:

      Reviewer #2:

      Weaknesses:

      The principal result isn't hugely surprising: inclusion of the HTR2A map in the model produces ΔGBC changes with a similar spatial topography to that map in the model. The empirical ΔGBC maps are also similar to the HTR2A maps, and so the simulated ΔGBC give a good fit to the empirical ΔGBC data. Yes, the authors demonstrate convincingly that this simulated-empirical ΔGBC fit is stronger than the similarity to the HTR2A map itself, and also to that of various alternative receptor maps and surrogate null models. But the central result does have an element of 'getting out what you put in'.

      The ΔGBC metric is a bit weak as a stand-alone outcome variable. The usual quantity used in this type of model is the goodness-of-fit of simulated to empirical FC. Indeed, the authors have used this calculation in the initial calibration step for their model, where they identified the global coupling strength parameter that yielded the best fit of empirical to simulated FC in the placebo condition, achieving reasonably good fit (Spearman rank correlation r=0.45). However the authors don't report how this FC fit changes with the inclusion of the HTR2A map modulations. It is an open question whether a model with HTR2A-modulations added that improved ΔGBC but not FC fit should be regarded as a better model than one without.

      We believe this issue relates to spatial scales of this class of large-scale models, which operate at the level of brain regions (parcels) and therefore are not well suited to capturing both inter- regional and intra-regional changes in connectivity, because intra-regional connectivity is not explicitly modeled. We consider this bridging across spatial scales (from voxelwise to inter- regional) to be an important direction for future model development to capture pharmacological neuroimaging effects. We now state in the Discussion: “Because our model is defined at the level of cortical parcels, it cannot speak to changes in connectivity that occur over smaller spatial scales, particularly among neurons within the parcels themselves. Our findings indicate that this coarse dynamical description is sufficient to capture regional GBC 447 differences, but future work that goes beyond regional mean-field modeling may be needed to 448 fully resolve the fine- grained effects of pharmacology on within- and between-region FC.”

      The authors do not make clear why it is necessary, and/or why it makes sense to perform GSR on the mathematical model FC anyway. The artifactual contributions to FC that make this necessary for empirical data are by construction not present in modelled data, after all.

      We now state in the Methods: “GSR was also performed in the model, as GSR not only removes artifactual signal components but neuronal signal as well.”

      The model description is very comprehensive but it omits the actual equations used, which are (I believe) the algebraic neural activity covariance equations at ~Eq. 21 in Deco et al. 2014. After 10 equations leading up to this, the methods section simply says "Simulated BOLD covariance matrices were derived by linearizing these equations and then algebraically transforming the linearized synaptic covariance matrix, using a procedure which we previously reported in Demirtaş et al. (2019)." The final algebraic equations should be added, and also emphasize that they are the ones used. Readers less familiar with these models could otherwise be forgiven for thinking that the neural and haemodynamic differential equations listed in Eqs 1-10 were the ones used, which is not the case.

      We now include an expanded description of this in the Methods section.

    1. Author Response:

      Reviewer #1:

      In this manuscript, Mouat et al. investigated the contribution of viral infection to the severity of arthritis in mice. Epstein-Barr virus (EBV) infection is associated with rheumatoid arthritis (RA). By assessing arthritis progression in type II collagen-induced arthritis (CIA) induced mice with or without latent 𝜸HV68 (murine gammaherpesvirus 68) infection, authors showed that latent 𝜸HV68 exacerbates progression of CIA. Additionally, profile of immune cells infiltrating the synovium was altered in 𝜸HV68-CIA subjects - these subjects presented with a Th1-skewed immune profile, which is also observed in human RA patients. Assessment of immune cells in the spleen and inguinal lymph nodes also showed that latent 𝜸HV68 infection alters T cell response towards pathogenic profile during CIA. Lastly, authors showed age-associated B cells (ABCs) are required for the effects of latent 𝜸HV68 infection on arthritis progression exacerbation.

      Findings presented in the manuscript provides important insights and resource to clinical RA research.

      There are some statistical analyses that need to be updated for completeness and appropriateness of use. In addition, the authors will need to highlight that all analyses were conducted in young mice, whereas RA occurs in aged individuals.

      We appreciate the thoughtful feedback from this reviewer. In response to their suggestions, we have updated our statistical analyses throughout the manuscript. In addition, we have added information on the age of primary EBV infection and age of RA onset to clarify that our age of infection and CIA induction model the timing in humans of EBV infection during adolescence and arthritis development typically during adulthood. We thank the reviewer for their feedback which has aided in strengthening this manuscript.

      Reviewer #2:

      In this study, the authors investigate the long-appreciated but little understood link between chronic infection with Epstein-Barr virus and rheumatoid arthritis (RA). Using a collagen-induced (CI)-model of arthritis and a natural murine analog of EBV (gammaherpesvirus 68, HV68), the authors demonstrate that latent infection with HV68 exacerbates clinical progression of CI-arthritis and is associated with changes in the immune cell and cytokine profile in the spleens and joints of HV68 infected mice. The most compelling finding is that an infection can indeed exacerbate the progression of secondary diseases, and the requirement of age-associated B-cells (ABCs) to the severe disease progression. While this study addresses a timely and important question-how chronic infections affect subsequent or secondary disease progression-additional work as well as a clarification of the experimental design is encouraged to understand some of the key conclusions.

      We thank this reviewer for their helpful comments and agree that further understanding the link between chronic infections and subsequent diseases is important.

      Based on their helpful comments we have clarified experimental approaches throughout the manuscript, such as timing of disease induction following ACRTA- γHV68 infection and further explanations of why certain parameters were examined, which have improved the manuscript. We appreciate the time this reviewer took to provide us with thoughtful and helpful comments.

      Reviewer #3:

      The authors developed an in vivo model of EBV's contribution to RA that recapitulates aspects of human disease. They examined the role of age-associated B cells and find that they are critical mediators of the viral-enhancement of arthritis. The manuscript is written in a well-structured form that facilitates the reading and following the incremental experimental setups. The manuscript is appropriate for publication after revisions.

      Some of the statistical measures did not show significant values while the author based several statements as if there is a difference (they rather used phrases as increased/fold change). Whether this is strong enough to support their statements is not clear.

      Overall, this report provides important insights regarding the association between latency, age-associated B cells, and the enhancement of RA in a mouse model. If these insights are translatable to RA immunology in humans is to be further investigated.

      We thank the reviewer for their comments and appreciation of our work. We have edited the text to more accurately describe the differences that we observe in support of our conclusions.

    1. Author Response:

      Reviewer #1:

      The authors use dense electrode recordings in young mice and EEG recordings in human infants to quantitatively describe the transition from immature patterns of brain activity in sleep to more mature patterns. Interestingly, they find an intervening period when overall activity declines in both species. Although primarily concerned with describing the phenomenology of this transition, this study is interesting because it enriches our relatively impoverished view of how mature activity patterns emerge during development.

      Reviewer #2:

      The authors employ sophisticated electrophysiological techniques and analyses to investigate ontogenetic patterns of brain activity in sleep. This is a major strength of the study.

      Although this topic has been explored many times over the last 50-60 years, the authors make some interesting observations. The first is that there is a window of time when immature cortical activity changes from immature forms to more mature forms. The 2nd major finding is a transient condition of diminished brain activity that appears between these stages.

      Major weaknesses:

      The first finding seems incremental in nature. Especially as no mechanistic insights are provided. It is well known that the 2nd postnatal week in rodents is when many cortical and sub cortical events coincide with a change in sleep organization--including cortical manifestations. Therefore, the first finding is more detailed than earlier studies, but not especially surprising when put in proper context.

      Our goal in this work was to investigate emergence of network processes associated with cognitive functions in adults over the course of development. Such features are not necessarily strictly linked to sleep organization. In our opinion, identifying the timelines for such processes merits specific investigation due to the likely implications for derangement of cortical function in neurodevelopmental disorders. Furthermore, we show that these processes change abruptly over a short period of time, rather than progressively or in a staggered fashion during the second postnatal week.

      The 2nd finding is interesting, but its significance is unknown.The significance of this 'state' or 'condition' is a bit overstated. For example, the authors state in their discussion that this state 'enables' the emergence of mature brain organization, but they provide no evidence for this. Their study, as interesting as it is in places, is descriptive and provides no direct evidence of mechanism or function.

      We agree that a key opportunity for future investigation provided by this work is for perturbation of the transition period to identify functional ramifications. Most likely, multiple molecular, genetic, and cellular mechanisms underlie such a profound transition in electrophysiological features. We have now highlighted both of these points in our Discussion. We would suggest, however, that the fact that network properties change swiftly and simultaneously during a quiescent state provides key clues about the ways by which neural networks can shift their properties. On a practical level, an identifiable marker of developmental maturation, such as a quiescent state, allows matching of cortical development timelines across species, and in instances of putative cortical pathology. Therefore, we posit that identification and characterization of this state are functionally useful, regardless of whether a specific function is ascribed to the state.

      There are also methodological issues that make the interpretation of the mouse data extremely difficult.

      Performing in vivo electrophysiologic recordings in immature organisms remains challenging due to various experimental and technical considerations. We employed evidence-based practices and verified the health of the neural networks being monitored to minimize confounders related to any specific methodology. We have included more detail on each of these practices in the Methods section of the manuscript to facilitate robust experimentation in developing organisms.

      Overall, the analyses are meticulous and suggest an important phase of brain organization occurs at about the 2nd postnatal week in rodents--and possibly humans. This study could be very informative, provided that additional control experiments are performed, and direct mechanistic or functional questions are addressed.

      Reviewer #3:

      This paper is, to my knowledge, the first to suggest that there may be 'regressive' or at least non-progressive steps in the general thrust of early activity and functional development, at least before the later stages of net synaptic elimination. The authors show that in mouse somatosensory cortex that the period after spindle-burst elimination (an early activity pattern associated with sensory stimulation either self-generated or evoked) is characterized by a 2-day 'nadir' in total activity before firing rates and synchronization as well as surface EEG power and spread begin again to increase toward adult levels. This pattern was echoed in EEG recordings from human infants, which showed a similar decrease in activity around 45 weeks of gestation (on parietal electrodes). This careful analysis of activity done similarly in the two species is a real strength and overall my confidence is high that this is a real phenomenon in the regions examined. The number of animals and analysis methods are impressive and largely appropriate. Overall the data presented make a solid and important contribution to our understanding of the developmental dynamics of neural activity development.

      To my mind, there are a couple of critical analyses that need to be included to fully support the authors' conclusions.

      1) The mouse experiments call for some control of developmental changes in arousal state especially as regards twitching and other movement. With the current presentation, the quiescent period could as easily be a result of reduced twitching at P8 before extensive volitional (and whisking) emerges starting on P10 as it could be explained by circuit changes in the ascending pathways. Likewise, shifts in the proportion of quiet and active sleep (which are related to twitch amount) could largely account for the differences.

      Thank you for identifying potential confounders for our observations. To address these, we first quantified twitching rate in each animal and examined whether there were any systematic changes across age groups. There was no significant difference in twitching rates across age groups (ANOVA p = 0.0861), though a weak trend toward decrease in twitching over time (P5 to P14) was found, in agreement with other studies of twitches in neonatal rodents (1-3). The lack of statistically significant change in twitch rate across groups, and the lack of nadir in twitching during our identified transition period argues against our results being a function of less twitching. This data is presented in Supplementary Figure 5D with relevant statistical testing.

      We furthermore analyzed the proportion of time spent in active/quiet sleep across this developmental period. As known from the literature, the most mature animals had less active sleep than the most immature animals (3-5). Although the exact amount of quiet sleep in early development remains unclear, our results fits the increasing trend of quiet sleep reported and described by other groups (4). This data is presented in Supplementary Figure 5C with relevant statistical testing. ANOVA with post-hoc testing did not reveal a significant difference in active sleep proportion between P5-7 and P8-9 animals, or between P8-9 and P10-12 animals, indicating lack of an abrupt change in sleep proportions during the transition period that could explain our results. Furthermore, we specifically analyzed data from periods of immobility lasting 10 seconds or more to facilitate analysis of comparable states given the difficulty in precise scoring of active and quiet sleep in neonatal rodents (5-6). Therefore, any potential effects related to sleep state are minimized.

      There was no sharp transition (or statistically significant group difference) in either feature that could account for the unique electrophysiologic features exhibited by the animals at the beginning of the second postnatal week. It would also be difficult to explain the differences in oscillation spatial extent, interspike interval, phase locking, and cross-frequency coupling that we observe during this time as a function of twitching or sleep state. Taken together, these data do not support the notion that the pausing phenomenon is an artifact of twitching or sleep state distributions across ages.

      2) The location of the analyzed contacts is incompletely described and justified. In the mouse they are described as 'somatosensory cortex' but the pictures suggest that barrel cortex is the most likely location. Better descriptions of how the locations for analysis were chosen and controlled over the wide age range are necessary. Were the contacts analyzed verified as barrel cortex by whisker deflection? Is there any possibility the quiescent period is a result of shifting the location of the grid or analyzed channels. The infant data surprisingly are taken primarily from parietal electrodes, which are not the location of sensory-evoked twitches (Milh et al 2007). Why was the analysis limited to parietal? Are the results dependent on this localization?

      We used vGLUT2 immunohistochemistry to identify primary somatosensory and primary visual cortex. Barrel cortex has the most striking histological appearance using this method, and we centered our NeuroGrids over this particular region of primary somatosensory cortex. However, we did not perform functional testing by whisker deflection, which is why we prefer to use the more generic term “somatosensory cortex” than “barrel cortex” because we cannot exclude that some channels were in forelimb, hindlimb or other regions of somatosensory cortex. We note in Supplementary Figure 3 that channels identified histologically as recording from somatosensory cortex displayed spindle bursts in the immature mice, concordant with literature on this region (for instance, 13). NeuroGrids were large enough to extend past somatosensory cortex in all ages, allowing us to consistently identify channels recording from this region and making it essentially impossible to “miss” somatosensory cortex during surgical placement.

      For the human data, we used electrodes that relatively correspond to somatosensory cortex in rodents. In Milh et al 2007, a double distance neonatal montage is used because the recordings are from very premature infants (29-31 wks), where head size precludes placement of a full 10-20 electrode montage. In this case, the “C” or central electrodes are located over the somatomotor area. In a conventional 10-20 montage, the somatosensory area is expected to lie between the central electrodes and the parietal electrodes. We chose to use parietal electrodes because they had the most consistent high-quality data across our patient group, but similar results are obtained if central electrodes are used. We replicated power analysis based on central electrodes in Supplementary Figure 11D, and there is no change to the result.

      We have included this additional information regarding location of the analyzed contacts in the Methods section.

      3) The authors do a number of analyses of cross-frequency co-modulation and spike-frequency modulation that are limited to 'spindle frequencies'. These results are often extrapolated to make general statements about the precision of spiking or spread of activity etc but are really just smaller snapshots of the larger activity. This would be justified if there was good reason to believe that early spindle-bursts and later sleep spindles are the same network activity. However this proposition has only weak support (and is not argued for explicitly here). In essence, the authors end up analyzing three different patterns: spindle-bursts in P5-7, unknown activity in spindle band (P8-10), and sleep spindles (P11+). That these are in the same broad range of frequencies doesn't mean they are making similar measurements across ages. It would strengthen the case that P8-10 is a unique quiescent period to show differences in power spectra and spiking not limited to spindle frequencies. Some of these are presented, but difficult to extract from the spindle analyses. In addition spiking data from layers, 4-6 are used, but these layers are both very diverse in their behavior, and the least likely to be strongly correlated with spindle-bursts (maximal in layer 2-4). A more consistent and limited analysis of spiking is important to confirm the general vs specific nature of this quiescence.

      We do make several analyses that are independent of spindle activity:

      • Continuity (Figure 2C, Figure 6D)
      • Wide band power (Figure 2D, Figure 6C)
      • Spiking rate (Figure 3A)
      • Interspike interval (Figure 3B)
      • Spike autocorrelation (Figure 3C-D)

      Therefore, the transient quiescent period is not limited to spindle band oscillations. To clarify this point, we have included power spectra as suggested by the reviewer, which demonstrate a paucity of oscillatory power across the physiologic frequency spectrum between P8-9 (Supplementary Figure 6), as well as in humans during the transient period (Supplementary Figure 13). We have also clarified in the Results and Methods that these analyses are derived from any activity above the noise floor, not just those in the spindle band. We have also rearranged the results text to improve the clarity of these analyses.

      The rationale for subsequently focusing on the spindle band frequency is that well identified oscillations exist in this band in immature and mature animals. Certainly, this does not presuppose that these oscillations are serving a similar purpose or are generated by similar underlying mechanisms, and as the reviewer notes, we do not espouse this notion here. However, it does allow us to reliably detect discrete oscillations across development for the purpose of investigating the spike/LFP relationship in a relatively controlled fashion.

      We quantified spiking activity from superficial cortical layers to address the last point mentioned here. We used the grouping of layers 2-3 and 4-6 for this purpose to maximize integration with the data obtained using surface arrays, which capture activity primarily from the superficial cortical layers (Khodagholy et al., Nature Neuroscience 2015). We were also mindful of the precision of the histological methods used, and thus did not separate into more than 2 groups. We found that the superficial cortical layers followed a similar pattern to the deeper layers in regard to the spiking measures analyzed (firing rate, interspike interval, recruitment into spindle-band oscillations). The results of these analyses are presented, with complete quantification, in Supplementary Figure 10 and referenced in the Results text. A nadir at the beginning of the second postnatal week was demonstrated in each analysis.

      4) How generalizable these results are, and how they comport with previous studies is unclear. The paper is written as if this quiescent state is universal, and its identification in two species in likely different regions adds to the argument that this is the case. However, it has not been observed in similarly detailed developmental studies in other rodent regions (multiple papers by the Hanganu-Opatz lab, Minlebeav et al Science 2011, Shen and Colonnese J Neuro 2016) nor in the clinical literature. Some more careful and nuanced discussion of the relationship between these findings or expansion of the regions surveyed to show they were wrong would help situate the current findings and better comport the claims and evidence.

      We have carefully reviewed the literature to address this point.

      Hanganu-Opatz and colleagues have performed detailed work on the development of prefrontal cortex (some examples in 18-20). Whether this association cortex, which does not receive sensory information directly from thalamus, would be expected to follow a similar developmental trajectory to a primary sensory cortex is unclear. They group ages across multiple developmental delays (18), or sample with wider intervals (19). With such a method, it is possible that short transitions in neural activity could be obscured. From such studies, it can be discerned that a discontinuous pattern is present in the prefrontal cortex of rats as late at P9, with a continuous appearance by P12. However, the transition between these states is not delineated. Interestingly, their recent study showed that optogenetically increasing mPFC activity around the beginning of the second postnatal week disrupts developmental trajectory and results in functional deficits in adulthood (20).

      Minlebeav and colleagues characterized gamma band activity (14). Importantly, they actually reported an abrupt decrease in gamma band activity at P8 and the disappearance of early gamma oscillations (EGOs) around P8 before “adult” gamma patterns emerge later during development. This is concordant with our timeline of spindle band activities, potentially suggesting potentially a shift in cortical dynamics around the timepoint.

      Shen and Colonnese investigated elicited response from primary visual cortex during the first weeks of neonatal development (21). Although they did not identify a transient nadir in continuity, they reported that LFP continuity in superficial layers did rise sharply after P8 with a highly non-linear trajectory. The depth profile of spontaneous activity changed in cortex between P8-10, accompanied by a change in anticorrelated activity and spectral features at this timepoint. Also concordant with our results, an earlier work by Colonnese and colleagues investigated correlation of neural firing in mouse visual cortex, and reported that activity is the least synchronous at the beginning of the second postnatal week (22), similar to our reported nadir in temporal precision of spiking.

      This literature supports an abrupt change in cortical network function around a similar timepoint to what we identify, with some indicators of a nadir in synchrony, spectral features, and oscillatory patterns. It is possible that our use of surface arrays, which sample summated local field potential activity from the undisturbed superficial cortical layers (I-III) highlights the transient quiescent state compared to penetrating probes that disrupt the cortical surface upon implantation. Furthermore, our conducting polymer-based electrodes have lower impedance than the silicon probes used in most neonatal rodent studies, potentially increasing the sensitivity to changes in oscillatory power and continuity.

      From a clinical perspective, longitudinal studies of neonatal EEG activity are understandably few. The disappearance of trace alternant and emergence of sleep spindles have been used as boundaries for “perinatal” and “infantile” pattern classifications, and this transition is described to occur over a few weeks between 44-49 wks (23), coinciding with our observed transition period. Trace alternant is identified by a pronounced change in amplitude of the EEG signal with a characteristic burst and interburst interval. We propose that the high amplitude activity is lost during the transition period, leading to a low voltage state with lack of organized oscillatory activity, which is replaced by the typical organized patterns of more mature NREM sleep. When quantified, such a trajectory would resemble that of our data in Figure 6. We support this notion by including power spectra derived from the human data in Supplementary Figure 13, which shows a paucity of periodic or aperiodic activity during the transition period. Therefore, we posit that our results are not at odds with clinical literature, but more clearly define the transition period.

    1. Author Response:

      Reviewer #1:

      The manuscript by Jasmien Orije and colleagues has used advanced Diffusion Tensor and Fixel-Based brain imaging methods to examine brain plasticity in male and female European starlings. Songbirds provide a unique animal model to interrogate how the brain controls a complex, learned behaviour: song. The authors used DT imaging to identify known and uncover new structural changes in grey and white matter in male and female brains. The choice of the European starling as a model songbird was smart as this bird has a larger brain to facilitate anatomical localization, clear sex differences in song behavior and well-characterized photoperiod-induced changes in reproductive state. The authors are commended for using both male and female starlings. The photoperiodic treatment used was optimal to capture the key changes in physiological state. The high sampling frequency provides the capability to monitor key changes in physiology, behaviour and brain anatomy. Two exciting findings was the increased role of cerebellum and hippocampal recruitment in female birds engaged in singing behaviour. The development of non-invasive, multi-sampling brain imaging in songbirds provides a major advancement for studies that seek to understand the mechanism that control the motivation and production of singing behavior. The methods described herein set the foundation to develop targeted hypotheses to study how the vocal learning, such as language, is processed in discrete brain regions. Overall, the data presented in the study is extensive and includes a comprehensive analyses of regulated changes in brain microstructural plasticity in male and female songbirds.

      Reviewer #2:

      Orije et al. employed diffusion weighted imaging to longitudinally monitor the plasticity of the song control system during multiple photoperiods in male and female starlings. The authors found that both sexes experience similar seasonal neuroplasticity in multisensory systems and cerebellum during the photosensitive phase. The authors' findings are convincing and rely on a set of well-designed longitudinal investigations encompassing previously validated imaging methods. The authors' identification of a putative sensitive window during which sensory and motor systems can be seasonally re-shaped in both sexes is an interesting finding that advances our understanding of the neural basis of seasonal structural neuroplasticity in songbirds.

      Overall, this is a strong paper whose major strengths are:

      1) The longitudinal and non-invasive measure of plasticity employed

      2) The use of two complementary MR assays of white matter microplasticity

      3) The careful experimental design

      4) The sound and balanced interpretation of the imaging findings

      I do not have any major criticism but just a few minor suggestions:

      1) Pp 6-7. While the comparative description of canonical DTI with respect to fixel-based analysis is well written and of interest to readers with formal training in MR imaging, I found this entire section (and especially the paragraphs in page 7) too technical and out of context in a manuscript that is otherwise fundamentally about neuroplasticity in song birds. The accessibility of this manuscript to non-MR experts could be improved by moving this paragraph into the methods section, or by including it as supplemental material.

      The main purpose of this section was to introduce and explain the diffusion parameters which are used throughout the rest of the paper. Furthermore, we wanted to familiarize the reader with the concept of the population based template and the different structures that can be visualized by them. We agree that the technical details might have distracted from this main message. Therefore, we have trimmed the technical details out of this section and left a short explanation of the biological relevance of the different diffusion parameters and the anatomical structures visible on the population template. The technical details that were taken out are now a part of the material and methods section.

      The section now reads as follows:

      In the current study, we analyzed the DWI scans in two distinct ways: 1) using the common approach of diffusion tensor derived metrics such as fractional anisotropy (FA) and; 2) using a novel method of fiber orientation distribution (FOD) derived fixel-based analysis. Both techniques infer the microstructural information based on the diffusion of water molecules, but they are conceptually different (table 1). Common DTI analysis extracts for each voxel several diffusion parameters, which are sensitive to various microstructural changes in both grey and white matter specified in table 1. Fixel-based analysis on the other hand explores both microscopic changes in apparent fiber density (FD) or macroscopic changes in fiber-bundle cross-section (log FC) (table 1). Positive fiber-bundle cross-section values indicate expansion, whereas negative values reflect shrinkage of a fiber bundle relative to the template (Raffelt, Tournier et al. 2017).

      A population-based template created for the fixel-based analysis can be used as a study based atlas in which many of the avian anatomical structures can be identified (figure 2). We recognize many of the white matter structures such as the different lamina, occipito-mesencephalic tract (OM) and optic tract (TrO) among others. Interestingly, many of the nuclei within the song control system (i.e. HVC, robust nucleus of the arcopallium (RA), lateral magnocellular nucleus of the anterior nidopallium (LMAN), and Area X), auditory system (i.e. intercollicular nucleus complex, nucleus ovoidalis) and visual system (i.e. entopallium, nucleus rotundus) are identified by the empty spaces between tracts. The applied fixel-based approach is inherently sensitive to changes in white matter and cannot report on the microstructure within grey matter like brain nuclei; but rather sheds light on the fiber tracts surrounding and interconnecting them. As such, it provides an excellent tool to investigate neuroplasticity of different brain networks, and in the case of a nodular song control system focusing on changes in the fibers surrounding the song control nuclei, referred to as HVC surr, RA surr and Area X surr.

      2) Similarly, many sections, especially results, are in my opinion too detailed and analytical. While the employed description has the benefit of being systematic and rigorous, the ensuing narrative tends to be very technical and not easily interpretable by non experts. I think the manuscript may be substantially shortened (by at least 20% e.g. by removing overly technical or analytical descriptions of all results and regions affected) without losing its appeal and impact, but instead gaining in strength and focus especially if the new result narrative were aimed to more directly address the interesting set of questions the authors define in the introductory sections.

      We rewrote the result section, taking out the statistic reporting when it was also reported in a figure to reduce the bulk of this section and make it more readable. We made some of the descriptions of the regions affected more approachable by replacing it with parts of the discussion. This way we incorporated some of the explanations why certain findings are unexpected or relevant, as suggested by reviewer #3. Parts of text that were originally in the discussion are indicated in purple.

      3) The possible effect of brain size has been elegantly controlled by using a medial split approach. Have the authors considered using tensor-based morphometry (i.e. using the 3D RARE scans they acquired) to account for where in the brain the small differences in brain size occur? That could be more informative and sensitive than a whole-brain volume quantification.

      We have taken into consideration to add tensor-based morphometry, but we feel that log FC calculated with MrTrix can provide a similar account of the localization of these brain differences. Both methods are based on the Jacobean warps created between the individual images and the population template. They only differ in the starting images they use (3D RARE images in tensor-based morphometry or diffusion weighted images in log FC metric of MrTrix3) and the fact that MrTrix3 limits itself to the volume changes along a certain tract.

      The log FC difference in figure 4 gives a similar account of the differences in brain size between both sexes. Additionally, figure 6 indicates the log FC differences between small and large brain birds.

      4) I think Figures Fig. 3 and Fig. 4 may benefit from a ROI-based quantification of parameters of interests across groups (similar to what has been done for Fig. 7 and its related Fig. 8). This could help readers assess the biological relevance of the parameter mapped. For instance, in Fig. 3, most FA differences are taking place in low FA (i.e. gray matter dense?) regions.

      We supplied the figures with extracted ROI-based parameters of figure 3 and figure 4. In line with this reasoning we also added the same kind of supplementary figures for figure 5 and 6.

      Figure 3 - figure supplement 1: Overview of the fractional anisotropy (FA) changes over time extracted from the relevant ROI-based clusters with significant sex differences. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant sex differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the fractional anisotropy values are not significantly different from each other.

      Figure 4 – figure supplement 2: Overview of the fiber density (FD) changes over time extracted from the relevant ROI-based clusters with significant sex differences. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant sex differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the FD values are not significantly different from each other. Abbreviations: surr, surroundings.

      Figure 4 –figure supplement 3: Overview of the fiber-bundle cross-section (log FC) changes over time extracted from the relevant ROI-based clusters with significant sex differences. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant sex differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the log FC values are not significantly different from each other. Abbreviations: surr, surroundings.

      Figure 5 – figure supplement 1: Overview of the fractional anisotropy (FA) changes over time in extracted from the relevant ROI-based clusters with significant differences in brain size. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant brain size differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the fractional anisotropy values are not significantly different from each other. Abbreviations: C, caudal; surr, surroundings.

      Figure 6- figure supplement 2: Overview of the fiber density (FD) changes over time in extracted from the relevant ROI-based clusters with significant differences in brain size. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant brain size differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the FD values are not significantly different from each other. Abbreviations: C, caudal; surr, surroundings.

      Figure 6- figure supplement 3: Overview of the fiber-bundle cross-section (log FC) changes over time in extracted from the relevant ROI-based clusters with significant differences in brain size. The grey area indicates the entire photosensitive period of short days (8L:16D). Significant brain size differences are reported with their p-value under the respective ROI-based cluster. Different letters denote significant differences by comparison with each other in post-hoc t-tests with p < 0.05 (Tukey’s HSD correction for multiple comparisons) comparing the different time points to each other. If two time points share the same letter, the log FC values are not significantly different from each other. Abbreviations: C, caudal; surr, surroundings.

      5) In Abstract: "We longitudinally monitored the song and neuroplasticity in male.." Perhaps something should be specified after the "the song"? Did the authors mean "the neuroplasticity of song system"?

      No, this is not what we meant, we monitor song behavior and neuroplasticity independently. In our study, we do not limit ourselves to the neuroplasticity of the song system, but instead use a whole brain approach. The monitoring of the song behavior in itself might be useful for other songbird researchers.

      We clarified this in the abstract as follows:

      We longitudinally monitored the song behavior and neuroplasticity in male and female starlings during multiple photoperiods using Diffusion Tensor and Fixel-Based techniques.

      Reviewer #3:

      In their paper, Orije et al used MRI imaging to study sexual dimorphisms in brains of European starlings during multiple photoperiods and how this seasonal neuroplasticity is dependent in brain size, song rates and hormonal levels. The authors main findings include difference in hemispheric asymmetries between the sexes, multisensory neuroplasticity in the song control system and beyond it in both sexes and some dependence of singing behavior in females with large brains. The authors use different methods to quantify the changes in the MRI data to support various possible mechanisms that could be the basis of the differences they see. They also record the birds' song rates and hormonal levels to correlate the neural findings with biological relevant variables.

      The analysis is very impressive, taking into account the massive data set that was recorded and processed. Whole-brain data driven analysis prevented the authors from being biased to well-known sexually dimorphic brain areas. Sampling of a large number of subjects across many time points allowed for averaging in cases where individual measurements could not show statistical significance. The conclusions of the paper are mostly well supported by data (except of some confounds that the authors mention in the text). However, the extensive statistically significant results that are described in the paper, make it hard to follow at times.

      1) In the introduction the authors mention the pre optic area as a mediator for increase singing and therefore seasonal neuroplasticity. Did the authors find any differences in that area or other well know nuclei that are involved in courtship (PAG for example)?

      Interestingly, we did not detect any seasonal changes in the pre-optic area or PAG. Whereas prior studies reported volume changes in the POM within 1-2 days after testosterone administration in canaries (Shevchouk, Ball et al. 2019). In male European starlings, POM volumes changed seasonally, although this seems to depend on whether or not the males possessed a nest box (Riters, Eens et al. 2000). In our setup, our starlings are not provided with nest boxes. The lack of seasonal change in POM could have a biological reason, besides the limitations of our methodology. Since these are small regions and are grey matter like structures, they are less likely to be picked up with our diffusion MRI methods.

      2) Following the first comment, what is the minimum volume of an area of interest that could be detected using the voxel analysis?

      The up-sampled voxel size is (0.1750.1750.175) mm3. In the voxel-based statistical analysis a significance threshold is set at a cluster size of minimum 10 voxels: 0.05 mm3.

      3) It would be useful to have a figure describing the song system in European starlings and how the auditory areas, the cerebellum and the hippocampus are connected to it, before describing the results. It would make it easier for the broader community to make a better sense of the results.

      An additional figure was added to the introduction to give a schematic overview of the song control system, the auditory system and the proposed cerebellar and hippocampal projections. This scheme includes both a 2D, and a 3D representation as well as a movie of the 3D representation of the different nuclei and the tractography.

      Figure 1: Simplified overview of the experimental setup (A), schematic overview of the song control and auditory system of the songbird brain and the cerebellar and hippocampal connections to the rest of the brain (B) and unilateral DWI-based 3D representation of the different nuclei and the interconnecting tracts as deduced from the tractogram (C). Male and female starlings were measured repeatedly as they went through different photoperiods. At each time point, their songs were recorded, blood samples were collected and T2-weighted 3D anatomical and diffusion weighted images (DWI) were acquired. The 3D anatomical images were used to extract whole brain volume (A). The song control system is subdivided in the anterior forebrain pathway (blue arrows) and the song motor pathway (red arrows). The auditory pathway is indicated by green arrows. The orange arrows indicate the connection of the lateral cerebellar nucleus (CbL) to the dorsal thalamic region further connecting to the song control system as suggested by (Person, Gale et al. 2008, Pidoux, Le Blanc et al. 2018) (B,C). Nuclei in (C) are indicated in grey, the tractogram is color-coded according to the standard red-green-blue code (red = left-right orientation (L-R), blue = dorso-ventral (D-V) and green = rostro-caudal (R-C)). For abbreviations see abbreviation list.

      Figure 1 – figure supplement 1: Movie of the unilateral 3D representation of the different nuclei and the interconnecting tracts rotating along the vertical axis.

      4) In the results section the authors clearly describe which brain areas are sexually dimorphic or change during the photoperiod and what is the underlying reason for the difference. However, only in the discussion section it is clearer why some of those differences are expected or surprising. It would be useful to incorporate some of those explanations in the results section other than just having a long list of brain areas and metrics. For example, I found the involvement of visual and auditory areas in the female brain in the mating season very interesting.

      Next to the reductions in technical explanation suggested by reviewer #2, We replaced some of the description of significant regions with parts of the discussion and vice versa(indicated in purple). This way we incorporated some of the explanations why certain findings are unexpected or relevant. Furthermore, we added some extra info on the reason why these changes are relevant for the visual system and the cerebellum.

      In line 420: Neuroplasticity of the visual system could be relevant to prepare the birds for the breeding season, where visual cues like ultraviolet plumage colors are important for mate selection (Bennett, Cuthill et al. 1997).

      In line 424: This shows that multisensory neuroplasticity is not limited to the cerebrum, but also involves the cerebellum, something that has not yet been observed in songbirds.

    1. Author Response:

      We first posted this manuscript on bioRxiv in Nov 2020. While it was under consideration, the bioRxiv posting caused another group to submit a competing manuscript to Nucleic Acids Research. Their manuscript was recently published in (PMID: 33754639 DOI: 10.1093/nar/gkab158). Although that manuscript is less comprehensive that ours (limited only to the study of the S.pombe SMN complex), the paper included a high-resolution crystal structure of a non-native fusion of SMN (involving a large internal deletion of the yeast protein). Publication of that structure caused us to pause submission of a revised version to another journal and to carry out a disulfide crosslinking experiment aimed at directly testing the two possible models of SMN oligomerization (parallel vs antiparallel). As shown in our revised manuscript, the data are incompatible with the proposed antiparallel model suggested by 10.1093/nar/gkab158.

      Overall, the eLife referees were extremely thorough, and we thank them for their detailed comments that certainly improved the manuscript. The critiques were all readily addressible and no additional experiments were suggested, so its mostly a subjective decision. We do, however, disagree with one major point regarding overall impact of the study (see below).

      Summary:

      The manuscript describes a very detailed mutagenesis analysis of the dimerization / oligomerization behavior of the protein Survival Motor Neuron. Mutations in this protein cause Spinal Muscular Atrophy. Analysis of disease causing mutations show a correlation with their impact on oligomerization. A structural model that includes different domains of the protein involved in oligomerization is built from these analyses.

      This analysis is an excellent source for researchers working in the field of SMN proteins. A mechanistic interpretation of how changes in the oligomerization lead to the disease or impact the formation of membraneless organelles, is however missing. Thus, the manuscript provides an enormous amount of important mutational analysis data but does not lead to a significant advancement in our understanding of the disease mechanism.

      The referees feel that our study represents an incremental advance in terms of “our understanding of disease mechanism.” Why is the bar placed so high? We did not set out to solve the disease mechanism here. There is a great deal of misconception and misinformation regarding the molecular etiology of SMA. After >25 years of intensive study, researchers still have essentially no idea why low levels of SMN protein cause the disease.

      We know that SMN binds to itself (along with many other proteins) and that it forms a large, heterogeneous complex in vivo. The assumption has been that YG box-mediated oligomerization of SMN is important for its essential function. Is it? Nobody knows for sure. So we set out to ask and answer this question. As detailed below, we disagree with the contention that the manuscript fails to provide mechanistic evidence regarding the formation of SMN oligomers.

      Reviewer #1:

      Gupta et al. provide a very detailed and in depth analysis of the dimerization / oligomerization behavior of the protein Survival Motor Neuron (SMN). The protein is able to use a modified glycine zipper motif to form tightly packed dimers and additional hydrophobic amino acids for higher oligomeric states. Mutations in SMN cause Spinal Muscular Atrophy and the authors show that mutations leading to this disease affect the oligomerization state of the protein.

      Overall, this is a very detailed study using several biophysical techniques and extensive mutagenesis. The data are of high importance for researchers working in the field of SMN proteins.

      A mechanistic link of how these differences in oligomeric states changes the cellular behavior leading to Spinal Muscular Atrophy is unfortunately missing.

      There is no evidence that formation of SMN-containing membraneless organelles has anything to do with SMA. The mechanism whereby SMN forms nuclear and cytoplasmic foci is a separate but interesting biological question, well worth investigating. But first, we need to know more about how SMN makes oligomers. We identified the residues involved in this process and, as Referee 3 points out, we presented a “plausible model for higher-order oligomers.”

      The authors stress several times that SMN is part of membraneless organelles. Multivalent interactions are characteristic of such organelles, although they are typically based on "fuzzy" interactions involving low complexity regions (and not all dimerization / oligomerization events can be classified as liquid-liquid phase separation). This limits the impact of this detailed analysis.

      We do not wish to de-emphasize the fact that SMN forms membraneless organelles. Our findings are certainly pertinent to this subject because we know so little about how SMN or any other nuclear body protein (e.g. Coilin, NPAT) self-interact and multimerize. However, so as not to distract the reader in the Introduction, we have moved the subject of biomolecular condensation to the Discussion.

      As for this Referee’s other comments, we agree that not all oligomerization events can be classified as LLPS. We note that the sequence of SMN also contains large regions of low-complexity. Particularly in the area located between the YG box and Tudor domains of metazoan SMN proteins. The extent to which any of these domains (including the more structured ones) participate in biomolecular condensation is not known. Therefore, understanding the mechanism whereby the SMN YG box forms oligomers represents a critical first step in the exploration of many downstream aspects of SMN function, including LLPS.

      We note that the opto-droplet technique most investigators use to study LLPS inside mammalian cells involves the use of a very well structured self-interaction motif (Cry2) that is then tethered to a low-complexity “tester domain” (together with a fluorescent reporter). This is not unlike the natural arrangement of domains within SMN. Although we did not focus on the “fuzzy” parts of SMN in this manuscript, we feel that the current study provides an important mechanistic foundation for other researchers working on nuclear and cytoplasmic bodies whose proteins may well employ similar strategies.

      While this very detailed analysis is an excellent source for researchers working in this field the interest beyond SMN proteins will be limited. The paper could also be written in a less dense manner, which would make its reading easier. The main weakness is a missing mechanistic model that can explain how differences in the oligomerization behavior relates to the function of the protein and causes Spinal Muscular Atrophy. The impact of oligomerization on the formation of membraneless organelles would be important.

      This summary paragraph is a re-emphasis of the same points. See our responses above.

      Reviewer #2:

      The current study nicely demonstrates that high-order assembly of SMN protein oligomerization is necessary for animal survival and is dependent on a motif exposed to YG zipper dimers. Mutations in the human SMN1 gene have been shown to cause a neurodegenerative disease named Spinal Muscular Atrophy (SMA). About 50% of the SMA-causing mutations are located in the YG zipper domain. The authors used multi-disciplinary approaches such as biophysical, bioinformatic, computational and genetic approaches to demonstrate that a set of YG box amino acids in SMN protein are not involved in dimerization process and formation high-order oligomers is dependent on these residues. Importantly, mutating key residues within this new structural domain impairs SMN dimerization and causes motor dysfunction as well as viability defects in Drosophila. Overall, this is a well-written paper that offers new insights into the structural and functional aspects of SMN protein. The authors should consider addressing the following issues:

      1) The authors should discuss the impact of the YG zipper domain mutations on snRNP biogenesis. SMN protein is a master regulator of snRNP biogenesis. It is a little surprising that the authors did not mention snRNP biogenesis in the whole manuscript.

      Apologies if we did not mention it prominently enough; the biogenesis of small nuclear RNPs is an area of great interest to our group. However, using the same Drosophila model system, we have already devoted several entire manuscripts to this very subject (e.g. Garcia et al. 2016; Garcia et al. 2013; Praveen et al. 2012). Using qRT-PCR and RNA-seq we showed that hypomorphic point mutations that cause milder forms of the human disease show few overall defects in pre-mRNA splicing in flies. They do however, cause SMA-like phenotypes (e.g. locomotor defects in larvae and adults, reduced adult lifespan, etc). For additional details, see Spring et al. (2019).

      Thus, to the best of our current ability, we feel we have already addressed the question of snRNP biogenesis. That is: the pre-mRNA splicing-related phenotypes seen in the null mutants can indeed be separated from the neuromuscular defects observed in the hypomorphs.

      2) The authors should provide evidence that their transgenic lines express the desired transgene. A WB or qPCR would be great (even as supplementary data).

      Again, this is an extremely well characterized model system. We have already published WB and even RNA-seq data showing that the transgenic lines express the desired transgenes. See Praveen et al. 2014; 2012. For each of the transgenic fly lines, we PCR genotype DNA from the founder lines and test each of them with anti-Flag and anti-SMN antibodies prior to using them in experiments.

      3) Page 12: The authors stated "Both missense mutations display early onset SMA-like phenotypes". Was it age-dependent phenotype? Did adult animals show a more severe motor dysfunction?

      We changed the language here to be more precise. The mutations in question are Y208C and Y208A. The Y208C mutation causes SMA in humans and the fly model of it has been described (Spring et al. 2019). Y208A has not been identified in humans. Both alleles are considered Class 2 in flies (Figs. 5A and S4). These animals display locomotor phenotypes during larval stages and undergo developmental arrest during pupation. As shown in Figs 6C and 6D, Y208A is more severe than Y208C; we never observe Y208A adults, whereas a small fraction of Y208C animals can complete development and eclose.

      4) There are few statements that the authors should consider making clear. Here is an example "Presumably, the structural changes associated with Cys and Val substitutions do interfere with some aspect of SMN biology, leading to the intermediate and severe SMA phenotypes observed". What do you mean by some aspects? Oligomerization, stability or anything else?

      Thank you. This issue has been addressed throughout the text. With respect to the specific example noted above, that sentence was deleted.

      5) There are few typos throughout the manuscript that the authors should correct (western should be written as Western).

      It turns out that, in reference to various blotting techniques, the referee is incorrect: western and northern blotting are not capitalized. The only one that is capitalized is Southern blotting. This procedure is named after Edwin Southern, who first described it. When techniques for blotting RNAs and proteins were subsequently developed, those investigators cheekily named them in homage to Ed. In this context, western and northern are not compass directions, and so they are NOT generally capitalized. We are nevertheless happy to follow whatever editorial house style the journal uses. Also note that we have thoroughly re-checked the rest of manuscript for typographical errors.

      Reviewer #3:

      In "Assembly of higher-order SMN oligomers is essential for animal viability, requiring a motif exposed in TG zipper dimers," Gupta et al. present an impressive amount of data regarding the solution behavior of constructs of the protein SMN1 (or just SMN) from Homo sapiens, Drosophila melanogaster, and Schizosaccharomyces pombe. Defects in the Hs protein are known to cause the neuromuscular disease "Spinal Muscular Atrophy" (SMA). They also present experiments in genetically modified organisms (fission yeast and fruit flies) to test their hypotheses. Bioinformatics are used to generate and refine hypotheses. The potential power of these complementary methods is substantial, if employed well.

      The main finding of these researchers is that the oligomerization potential of SMN and its disease-causing variants (usually in complex with the protein Gemin 2 or "G2") mostly correlates with phenotype severity. In humans, this is correlated with the Type of SMA (I/0 for severe disease, ranging to IV for a milder form), and in fruit flies and yeast, it is correlated with viability and, in some cases, animal behavior. The results are extended through the creation of a model that purports to show how higher-order SMN oligomers can form.

      Strengths:

      The experiments appear to have been carried out competently. There is a virtual mountain of data presented in this paper, and, for the most part, they are summarized in a digestible fashion. The effort to correlate the biophysical solution data with observable phenotypes in human patients or genetically modified organisms is laudable, and it is done in a thoughtful fashion. The authors' structural intuition and savvy enables the generation of testable models that are explored in the paper. A plausible model for higher-order oligomers is presented.

      We thank the referee for their thorough reading and insightful summary of the approach and its strengths.

      Weaknesses:

      The most serious weakness of the paper is that the data cannot support the conclusion stated in the title, i.e. that multimerization of SMN is necessary for organismic viability. Instead, the data support an already-stated, decades-old conclusion (see their reference 21): that multimerization correlates with disease (viability). Even if the reader takes into account the new information about a multimerization interface that is separate from the dimerization one, the advance seems incremental.

      We understand why the referee might take the position that the overall advance seems incremental. We chose this manuscript title because the decades-old ‘conclusion’ to which the referee refers is based on a false assumption. The false assumption is that the self-oligomerization activity of the YG box is required for SMN function. Note that Ref. 21 (Lefebvre et al. 1995) is the positional cloning paper for the SMN locus, not the publication that first showed the correlation betwen SMN self-binding and SMA severity (Lorson et al. 1998). To date, no one has shown that oligomerization of SMN (n > 2) is actually required for its function.

      Thus, the whole point of our going through a pain-staking mutational analysis was to uncover a separation-of-function allele. Namely, one that maintains the dimerization activity of SMN but interferes with its higher-order oligomerization activity. Y208A (Y277A in human) is one such allele. We show that this allele fails to form higher-order oligomers in vitro and that it causes locomotor dysfunction and early lethality in vivo. The referee does not explain; what is the basis for discounting our conclusion? Is it because we did not specifically demonstrate a lack of tetramer formation for this mutant in vivo (a nearly impossible experiment, btw)?

      Given the overall slow rate of progress on SMN ultrastructure in the literature, we felt that the identification of this second tetramer-forming interface is an important finding. We understand if the editors feel that we have not reached the rather subjective threshold necessary for the manuscript to be published in eLife. But we disagree with the referee on the idea that our data fail to support the conclusion.

      The large amount of data leads to numerous difficulties for the reader in the text:

      As the referee mentions above, there is a veritable mountain of data in this manuscript and presenting it clearly and succintly has been a challenge. Finding the right balance between too much detail for the casual reader and too little detail for the cognoscenti is a difficult task. Add to that, the breadth of techniques employed and it is nearly impossible to please everybody. We thank the referee for pointing out some of the areas they found to be problematic.

      1) Complex biophysical measurements, due to space, are usually summarized by one or two words in tabular format.

      2) When these measurements are shown, there is no visual context for the reader to assess the pre-digested conclusions that are included in the figures. For example, all SEC-MALS data show a conclusion ("Tetramer-Octamer"), but there is no visual cue for the reader to know what the theoretical masses for these species are (so that the reader may draw an independent conclusion).

      There are more than eighty different SEC-MALS experiments presented in this manuscript. We simply cannot display a visual representation for all of them. Most of these constructs contain point substitutions and so they have equivalent theoretical masses. We showed traces for the wild-type constructs for each of the four species analyzed (human, fly, nematode and yeast), as well as for the mutations that are highlighted in Figs 5 and 6.

      To provide a better visual cue, we re-formatted the Y-axes of all of the SEC-MALS traces in the paper. As you can now see, the tick marks for the molar masses are expressed as units of a single SMN•Gem2 heterodimer. In this way, the oligomerization of yeast, fly, human and nematode heterodimers can all be compared by simply counting up the rungs on the right hand side of the Y-axis.

      In some cases, the conclusions reached in the paper are not clearly supported by the data or are self-contradictory. An example is the discussion of the residue H273 (human numbering). In Fig. 4B, the mutation H273R is said to have a wild-type "Oligomer Status". But in Fig. 5B, it is "Dimer-Tetramer+". The text says that H273R is "only partially impaired" in forming oligomers; the authors apparently mean the data presented in Fig. 5B but refer to the contradictory result in Fig. 4B.

      The disconnect between Fig 4B and 5B was a simple cut/paste error that has been corrected. First of all, the primary data are the same. Those data are shown visually in Fig. 5B and in tabular form in Fig. 4B. The biophysical parameters of this particular mutant (H273R) are, however, unusual and hard to describe. Although high MW species are detected (some of which appear to be very high MW aggregates that are on the leading edge of the peak), the mutation also clearly pushes the equilibrium toward dimers (and in the case of the fly ortholog, monomers). We have revised the text and the display elements to so indicate.

      Another example centers on the discussion of the putative "dominant-negative" effect of some human missense mutations. But they do not point to any human data that support this contention (SMA-associated missense mutations are usually discovered in mixed heterozygotes have a deletion in the other copy of the Smn gene), but they cite data that suggest a more nuanced position regarding negative dominance would be appropriate.

      Due to the variable copy number of SMN2 genes and the extremely low frequency of SMA patients bearing SMN1 point mutations, the human data on this topic are a complete minefield. So the fact that we do not point to any human data on this is irrelevant. Less complex model systems often point the way to uncover nuances like this that are later identified in humans.

      We were the first to show a dominant phenotypic effect for certain SMN point mutants in vivo (Praveen et al. 2014). Specifically, we showed that two YG box missense mutants (dmY203C/hsY272C and dmM194R/hsM263R) have a more severe phenotype than the SmnX7 micro-deletion null mutation. That is, animals that contain no zygotic SMN protein actually live longer than the ones that express these missense mutations. The fact that the dmG026S/hsG275S mutation fails to bind to the wild-type protein is consistent with the slightly milder phenotype we observe for G206S vs Y203C and M194R (Spring et al. 2019). All three of these mutant proteins (human or fly) fail to self-interact (many previous papers), but only G206S fails to interact with wild-type SMN (Fig 4C).

      We included the data here because they help illustrate the conservation between human and fly systems. We also note that, in their detailed comments (see detailed subpoint 9c, below), this same referee called this finding “ground breaking.” We apologize for the lack of clarity in this section and have revised the text.

      Finally, the paper suffers throughout from a lack of precision of language that undercuts its conclusions at numerous points. They continually rely on qualitative statements rather than hard, statistically rigorous facts, e.g. "more intimate," "a bit of a sequence outlier," "very modest."

      This has been addressed.

    1. Author Response:

      Reviewer #1:

      This manuscript by Gabor Tamas' group defines features of ionotropic and metabotropic output from a specific cortical GABAergic cell cortical type, so-called neurogliaform cells (NGFCs), by using electrophysiology, anatomy, calcium imaging and modelling. Experimental data suggest that NGFCs converge onto postsynaptic neurons with sublinear summation of ionotropic GABAA potentials and linear summation of metabotropic GABAB potentials. The modelling results suggest a preferential spatial distribution of GABA-B receptor-GIRK clusters on the dendritic spines of postsynaptic neurons. The data provide the first experimental quantitative analysis of the distinct integration mechanisms of GABA-A and GABA-B receptor activation by the presynaptic NGFCs, and especially gain insights into the logic of the volume transmission and the subcellular distribution of postsynaptic GABA-B receptors. Therefore, the manuscript provides novel and important information on the role of the GABAergic system within cortical microcircuits.

      We have made all changes humanely possible under the current circumstances and we are open to further suggestions deemed necessary.

      Reviewer #2:

      The authors present a compelling study that aims to resolve the extent to which synaptic responses mediated by metabotropic GABA receptors (i.e. GABA-B receptors) summate. The authors address this question by evaluating the synaptic responses evoked by GABA released from cortical (L1) neurogliaform cells (NGFCs), an inhibitory neuron subtype associated with volume neurotransmission, onto Layer 2/3 pyramidal neurons. While response summation mediated by ionotropic receptors is well-described, metabotropic receptor response summation is not, thereby making the authors' exploration of the phenomenon novel and impactful. By carrying out a series of elegant and challenging experiments that are coupled with computational analyses, the authors conclude that summation of synaptic GABA-B responses is linear, unlike the sublinear summation observed with ionotropic, GABA-A receptor-mediated responses.

      The study is generally straightforward, even if the presentation is often dense. Three primary issues worth considering include:

      1) The rather strong conclusion that GABA-B responses linearly summate, despite evidence to the contrary presented in Figure 5C.

      2) Additional analyses of data presented in Figure 3 to support the contention that NGFCs co-activate.

      3) How the MCell model informs the mechanisms contributing to linear response summation.

      These and other issues are described further below. Despite these comments, this reviewer is generally enthusiastic about the study. Through a set of very challenging experiments and sophisticated modeling approaches, the authors provide important observations on both (1) NGFC-PC interactions, and (2) GABA-B receptor mediated synaptic response dynamics.

      The differences between the sublinear, ionotropic responses and the linear, metabotropic responses are small. Understandably, these experiments are difficult – indeed, a real tour de force – from which the authors are attempting to derive meaningful observations. Therefore, asking for more triple recordings seems unreasonable. That said, the authors may want to consider showing all control and gabazine recordings corresponding to these experiments in a supplemental figure. Also, why are sublinear GABA-B responses observed when driven by three or more action potentials (Figure 5C)? It is not clear why the authors do not address this observation considering that it seems inconsistent with the study's overall message. Finally, the final readout – GIRK channel activation – in the MCell model appears to summate (mostly) linearly across the first four action potentials. Is this true and, if so, is the result inconsistent with Figure 5C?

      GABAB responses elicited by three and four presynaptic NGFC action potentials were investigated to have a better understanding about the extremities of NGFC-PC connection. Although, our spatial model suggests that in L1 in a single volumetric point one or two NGFCs could provide GABAB response with their respective volume transmission, it is still important that in the minority of the percentage three or more NGFCs could converge their output. The experiments in Fig 5 not only offer mechanistic understanding that possible HCN channel activation and GABA reuptake do not influence significantly the summation of metabotropic receptor-mediated responses, but also support additional information about the extensive GABAB signaling from more than two NGFC outputs. Interestingly in this experiment the summation until two action potentials show very similar linear integration as seen in the triplet recordings. This result suggests that the temporal and spatial summation is identical when limited inputs are arriving to the postsynaptic target cell. Similar summation interaction can be seen in our model until two consecutive GABA releases. Three or four consecutive GABA releases in our model still produces linear summation, our experiments show moderate sublinearity. One possible answer for this inconsistency is the vesicle depletion in NGFCs after multiple rapid release of GABA, which was not taken into account in our model.

      Presumably, the motivation for Figure 3 is that it provides physiological context for when NGFCs might be coactive, thereby providing the context for when downstream, PC responses might summate. This is a nice, technically impressive addition to the study. However, it seems that a relevant quantification/evaluation is missing from the figure. That is, the authors nicely show that hind limb stimulation evokes responses in the majority of NGFCs. But how many of these neurons are co-active, and what are their spatial relationships? Figure 3D appears to begin to address this point, but it is not clear if this plot comes from a single animal, or multiple? Also, it seems that such a plot would be most relevant for the study if it only showed alpha-actin 2-positive cells. In short, can one conclude that nearby, presumptive NGFCs co-activate, and is this conclusion derived from multiple animals?

      The aim of Fig. 3 D was to indicate that the active, presumably NGFCs are spatially located close to each other. The figure comes from a single animal. We agree with the reviewer, therefore changed the scatter plot figure in Fig. 3D to another one, that provides information about the molecular profiles of the active/inactive cells. We made an effort to further analyze our in vivo data and the spatial localization of the monitored interneurons (see Author response image 3.). The results are from 4 different animals, in these experiments numerous L1 interneurons are active during the sensory stimulus, as shown in the scatter plot. We calculated the shortest distance between all active cells and all ɑ-actinin2+ that were active in experiments. The data suggest that in the case of identified active ɑ-actinin2+ cells, the interneuron somas were on average 182.69+60.54 or 305.135+34.324 μm distance from each other. Data from Fig. 2D indicates that the average axonal arborization of the NGFCs is reaching ~200-250μm away. Taken these two data together, in theory it is probable that the spatial localization would allow neighboring NGFCs to directly interact in the same spatial point.

      The inclusion of the diffusion-based model (MCell) is commendable and enhances the study. Also, the description of GABA-B receptor/GIRK channel activation is highly quantitative, a strength of the study. However, a general summary/synthesis of the observations would be helpful. Moreover, relating the simulation results back to the original motivation for generating the MCell model would be very helpful (i.e. the authors asked whether "linear summation was potentially a result of the locally constrained GABAB receptor - GIRK channel interaction when several presynaptic inputs converge"). Do the model results answer this question? It seems as if performing "experiments" on the model wherein local constraints are manipulated would begin to address this question. Why not use the model to provide some data – albeit theoretical – that begins to address their question?

      We re-formulated the problem to be addressed in this Results section. We admit that our model is has several limitations in the Discussion and, consequently, we restricted its application to a limited set of quantitative comparisons paired to our experimental dataset or directly related to pioneering studies on GABAB efficacy on spines vs shafts. We believe that a proper answer to the reviewer’s suggestion would be worth a separate and dedicated study with an extended set of parameters and an elaborated model.

      In sum, the authors present an important study that synthesizes many experimental (in vitro and in vivo) and computational approaches. Moreover, the authors address the important question of how synaptic responses mediated by metabotropic receptors summate. Additional insights are gleaned from the function of neurogliaform cells. Altogether, the authors should be congratulated for a sophisticated and important study.

      Reviewer #3:

      The authors of this manuscript combine electrophysiological recordings, anatomical reconstructions and simulations to characterize synapses between neurogliaform interneurons (NGFCs) and pyramidal cells in somatosensory cortex. The main novel finding is a difference in summation of GABAA versus GABAB receptor-mediated IPSPs, with a linear summation of metabotropic IPSPs in contrast to the expected sublinear summation of ionotropic GABAA IPSPs. The authors also provide a number of structural and functional details about the parameters of GABAergic transmission from NGFCs to support a simulation suggesting that sublinear summation of GABAB IPSPs results from recruitment of dendritic shaft GABAB receptors that are efficiently coupled to GIRK channels.

      I appreciate the topic and the quality of the approach, but there are underlying assumptions that leave room to question some conclusions. I also have a general concern that the authors have not experimentally addressed mechanisms underlying the linear summation of GABAB IPSPs, reducing the significance of this most interesting finding.

      1) The main novel result of broad interest is supported by nice triple recording data showing linear summation of GABAB IPSPs (Figure 4), but I was surprised this result was not explored in more depth.

      We have chosen the approach of studying GABAB-GABAB interactions through the scope of neurogliaform cells and explored how neurogliaform cells as a population might give rise to the summation properties studied with triple recordings. This was a purposeful choice admittedly neglecting other possible sources of GABAB-GABAB interactions which possibly take place during high frequency coactivation of homogeneous or heterogeneous populations of interneurons innervating the same postsynaptic cell. We agree with the reviewer that the topic of summation of GABAB IPSPs is important and in-depth mechanistic understanding requires further separate studies.

      2) To assess the effective radius of NGFC volume transmission, the authors apply quantal analysis to determine the number of functional release sites to compare with structural analysis of presynaptic boutons at various distances from PC dendrites. This is a powerful approach for analyzing the structure-function relationship of conventional synapses but I am concerned about the robustness of the results (used in subsequent simulations) when applied here because it is unclear whether volume transmission satisfies the assumptions required for quantal analysis. For example, if volume transmission is similar to spillover transmission in that it involves pooling of neurotransmitter between release sites, then the quantal amplitude may not be independent of release probability. Many relevant issues are mentioned in the discussion but some relevant assumptions about QA are not justified.

      Indeed, pooling of neurotransmitter between release sites may affect quantal amplitude, therefore we examined quantal amplitude under low release probability conditions using 0.7- 1.5 mM [Ca]o to detect postsynaptic uniqantal events initiated by neurogliaform cell activation (Author response image 7). This way we measured similar quantal current amplitudes comparing with BQA method with no significant difference (4.46±0.83 pA, n=4, P=0.8, Mann-Whitney Test).

      3) The authors might re-think the lack of GABA transporters in the model since the presence and characteristics of GATs will have a large effect on the spread of GABA in the extracellular space.

      We agree that the presence of GAT could effectively shape the GABA exposure, e.g. (Scimemi 2014). During the development of the model, we took into consideration different possibilities and solutions to create the model’s environment. To our knowledge, there is no detailed electron microscopic study that would provide ultrastructural measurements of structural elements around the NGFC release sites and postsynaptic pyramidal cell dendrites in layer 1 while preserving the extracellular space. Moreover, quantitative information is scarce about the exact localization and density of the GATs along the membrane surface of glial processes around confirmed NGFC release sites. We felt that developing a functional environment that would contain GABA transporters without possessing such information would be speculative. Furthermore, during the development of the model it became clear that incorporating thousands of differentially located GABA transporters would massively increase the processing time of single simulations including monitoring each interaction between GATs and GABA molecules, and requiring computational power calculating the diffusion of GABA molecules in the extracellular space, even if GABA molecules are far from the postsynaptic dendritic site without any interaction.

      As an admittedly simple and constrained alternative, we decided to set a decay half-life for the GABA molecules released. This approach allows us to mimic the GABA exposure time of 20-200 ms, based on experimental data (Karayannis et al 2010). In the model the GABA exposure time was 114.87 ± 2.1 ms with decay time constants of 11.52 ± 0.14 ms. After ~200 ms all the released GABA molecules disappeared from the simulation environment.

      A detailed extracellular diffusion aspect was out of the scope of our model, we were interested in investigating how the subcellular localization of receptors and channels determine the summation properties.

      4) I'm not convinced that the repetitive stimulation protocol of a single presynaptic cell shown (Figure 5) is relevant for understanding summation of converging inputs (Figure 4), particularly in light of the strong use-dependent depression of GABA release from NGFCs. It is also likely that shunting inhibition contributes to sublinear summation to a greater extent during repetitive stimulation than summation from presynaptic cells that may target different dendritic domains. The authors claim that HCN channels do not affect integration of GABAB IPSPs but one would not expect HCN channel activation from the small hyperpolarization from a relatively depolarized holding potential.

      Use-dependent synaptic depression of NGFC induced postsynaptic responses was nicely documented by Karayannis and coworkers (2010) although they investigated the GABAA component of the responses and they found that the depression is caused by the desensitization of postsynaptic GABAA receptors. We are not aware of experiments published on the short term plasticity of GABAB responses. In our experiments represented in Fig 5 we found linearity in the summation of GABAB responses up to two action potentials and sublinearity for 3 and 6 action potentials. In fact, our results show that no synaptic depression is detectable in response to paired pulses since amplitudes of the voltage responses were doubled compared to a single pulse which means that the paired pulse ratio is around 1. To verify our result, we repeated our dual recording measurements with one, two, three and four spike initiation in the presynaptic neurogliaform cell (Author response image 6). Measuring both the amplitude and the overall charge of GABAB responses we again found linear relationship among one and two spike initiation protocol.

      Author response image 6 - Integration of GABAB receptor-mediated synaptic currents (A) Representative recording of a neurogliaform synaptic inhibition on a voltage clamped pyramidal cell. Bursts of up to four action potentials were elicited in NGFCs at 100 Hz in the presence of 1 μM gabazine and 10 μM NBQX (B) Summary of normalized IPSC peak amplitudes (left) and charge (right). (C) Pharmacological separation of neurogliaform initiated inhibitory current.

  4. May 2021
    1. Author Response:

      Reviewer #1:

      The primary objective of this manuscript was to examine if multi-kinase inhibitor YKL-05-099 can inhibit salt inducible kinases (SIKs) with the goal to examine a new class of bone anabolic agents for the treatment of osteoporosis. They found that YKL-05-099 was successful in increasing anabolism and, surprisingly, decreasing bone resorption, leading them to investigate why this inhibitor differed from the effects of deletion of SIK2 and SIK3. They found that YKL-05-099 also inhibited the CSF1 (M-CSF) receptor, thus, inhibiting osteoclast activity. This is an interesting manuscript but there are some flaws in the conduct of the experiments and in the analyses which lessen its impact. Nevertheless, it opens the way for another possible oral therapeutic for osteoporosis.

      Reviewer #2:

      This work tests the ability of a kinase inhibitor to increase bone mass in a mouse model of osteoporosis. The inhibitor, which targets SIK and other kinases, was shown previously by these investigators to increase trabecular bone mass in young intact mice. Here they show that it increases trabecular, but not cortical, bone in oophorectomized mice and that this is associated with increased bone formation and little or no effect on bone resorption. In contrast, postnatal deletion of SIK2 and SIK3 increased both bone formation and resorption, suggesting that the inhibitor targets other kinases to control resorption. Indeed, the authors confirm that the inhibitor effectively suppressed the activity of CSF1R, a receptor tyrosine kinase essential for osteoclast formation. The authors also provide some evidence of unwanted effects of the inhibitor on glucose homeostasis and kidney function.

      Overall, the studies are performed well with all the necessary controls. The effects of the inhibitor on CSF1R inhibition are convincing and provide a compelling explanation for the net effects of the compound on the skeleton.

      1) The ability of the inhibitor to increase trabecular but not cortical bone mass will likely limit its appeal as an anabolic therapy. Indeed, the authors show that PTH, but not the inhibitor, increases bone strength. However, this limitation is not addressed in the manuscript. In addition, the mechanisms leading to these site-specific effects were not explored.

      We thank the reviewer for bringing up this important point. We have expanded the fifth paragraph of our discussion to include this important limitation, and to review potential mechanisms explaining this apparent compartment-selective of YKL-05-099:

      "Finally, our OVX study demonstrated that YKL-05-099 treatment increased trabecular, but not cortical, bone mass (Figure 1). In contrast, sclerostin antibody treatment increases both trabecular and cortical bone mass (58). At this point, we do not understand the mechanistic basis of this compartment-selective effect of this small molecule. Analysis of cortical bone at multiple time points after YKL-05-099 treatment is needed to conclusively demonstrate the absence of a cortical bone effect of this compound. However, Sik2/3 gene deletion appears to preferentially increase remodeling and bone formation on cancellous bone surfaces (Figure 4D). Therefore, it is possible that SIK inhibitors may stimulate remodeling-based bone formation. Further studies using larger animals with dedicated assays to measure modeling versus remodeling based bone formation (59) are needed to assess this possibility. Moreover, our current studies do not assess whether or not SIK inhibitors, like PTH or sclerostin antibody treatment, stimulate bone formation by activation of previously-quiescent bone lining cells (60, 61). The relative contribution of sclerostin suppression to YKL-05-099- mediated bone formation also remains to be determined."

      2) The mechanisms by which YKL-05-099 increases bone formation remain unclear. The authors point out that their previous studies indicate that the compound stimulates bone formation by suppressing expression of sclerostin. However, YKL-05-099 increased trabecular bone in the femur but not spine of intact mice and did not increase cortical bone in intact or OVX mice. In contrast, neutralization of sclerostin increases trabecular bone at both sites in intact mice as well as increases cortical bone thickness. These differences do not support the idea that YKL-05-099 increases bone formation by suppressing sclerostin.

      This is also a very important point, and has been addressed in the revised Discussion as detailed immediately above in point #1.

      3) The authors repeatedly state that the kinase inhibitor uncouples bone formation and bone resorption. However, the authors do not provide any direct evidence that this is the case. Although the term coupling is used to refer to a variety of phenomena in skeletal biology, the most common definition, and the one used in the review cited by the authors, is the recruitment of osteoblasts to sites of previous resorption. The authors certainly provide evidence that the kinase inhibitor independently targets bone formation and bone resorption, but they do not provide evidence that the mechanisms leading to recruitment of osteoblasts to sites of previous resorption has been altered. The resorption that takes place in the inhibitor-treated mice likely still leads to recruitment of osteoblasts to sites of resorption. Thus coupling remains intact.

      We thank the reviewer for raising this very important perspective. Here, we have relied upon serum bone turnover markers and histomorphometry on trabecular bone surfaces to measure bone formation and bone resorption. Others in the field have performed dynamic histomorphometry to assess calcein labeling adjacent to associated cement lines in order to distinguish between modeling and remodeling-based bone formation. Such analysis is operator dependent and nearly impossible to perform on trabecular surfaces in mice. The vast majority of published studies where this endpoint is reported are in non-human primates (ref 59) or human biopsy samples. Our claim here that YKL-05-099 treatment stimulates bone formation without increasing bone resorption is based upon the analytic methods that were used which demonstrate that this compound increases bone formation (BFR/BS by histomorphometry, osteoblast numbers by histomorphometry, and serum P1NP) without simultaneously stimulating bone resorption (osteoclast numbers and eroded surface by histomorphometry and serum CTX). Therefore, we have added the following new text in the Discussion:

      "Our current claim that YKL-05-099 ‘uncouples’ bone formation and bone resorption is based upon our histomorphometry and serum bone turnover marker data which clearly show that this agent increases bone formation without increasing the measured resorption-related parameters. Future study is needed using dynamic histomorphometry in association with cement line visualization (62) to see if this compound can stimulate modeling-based bone formation independent of bone resorption."

      4) The results of the current study nicely confirm previous findings by the same authors, demonstrating the reproducibility of the effects of the inhibitor. They also provide a compelling explanation for the net effect of the inhibitor on bone resorption (it stimulates RANKL expression but inhibits CSF1 action). While this latter finding will likely be of interest to those exploring SIK inhibitors for therapeutic uses, overall this study may be of limited appeal to a broader audience.

      We thank the reviewer for raising this point, and would like to take this opportunity to highlight the novel aspects of the current work:

      1) The first demonstration of dual kinase targeting by YKL-05-099 and the potential of this dual targeting to be exploited for therapeutic purposes in bone

      2) The first investigation of the effects of SIK inhibitor treatment in an osteoporosis disease model

      3) The first characterization of postnatal Sik2/3 mutant mice and demonstration that these animals only display obvious changes in bone biology

      4) Use of conformation-specific kinase modeling to demonstrate shared common features between how YKL-05-099 might engage two distinct kinases

      Given the dearth of novel bone anabolic targets, the enormous public health problem of osteoporosis, and the widespread interest in SIKs for multiple disease indications, we believe that the novel findings presented here are important for a broad audience.

      Reviewer #3:

      In this study, Tang and colleague report that the multikinase inhibitor YKL-05-099 increases bone formation and decreases bone resorption in hypogonadal female mice with mechanisms that are likely to involve inhibition of SIKs and CSFR1, respectively. The authors also report that postnatal mice with inducible, global deletion of SIK2 and SIK3 show an increase of bone mass that is associated to both an augmentation of bone formation and bone resorption.

      The paper provides novel and interesting information with potentially highly relevant translational implications. The quality of the data is outstanding and most of the authors' conclusions are supported by the data as shown.

    1. Author Response:

      Reviewer #1:

      Sierra M. Barone developed an automated, quantitative toolkit for immune monitoring that would span a wide range of possible immune changes, identify and phenotype statistically significant cell subsets, and provide an overall vector of change indicating both the direction and magnitude of shifts, either in the immune system as a whole or in a key cell subpopulation. The machine learning workflow Tracking Responders Expanding (T-REX) was a modular data analysis workflow including UMAP, KNN, and MEM. T-REX is designed to capture both very rare and very common cell types and place them into a common context of immune change. T-REX was analyzed data types including a new spectral flow cytometry dataset and three existing mass cytometry datasets.

      The conclusions of this paper are mostly well supported by data, but one aspect need to be clarified and extend. Cytometry tools like SPADE, FlowSOM, Phenograph, Citrus, and RAPID generally work best to characterize cell subsets representing >1% of the sample and are less capable of capturing extremely rare cells or subsets distinguished by only a fraction of measured features. Tools like t-SNE, opt-SNE, and UMAP embed cells or learn a manifold and represent these transformations as algorithmically-generated axes.

      We appreciate this point and have added a new Figure 7 that quantifies performance vs. commonly used analysis tools.

      The advantages of T-REX tool were not very clear.

      We believe the newly added Figure 7 and associated text in the Results helps to show the advantages of T-REX (and specific choices in the algorithm) over other tools and algorithm choices. We make 9 different comparisons and use four widely-used algorithms from cytometry, including t-SNE, FlowSOM, Phenograph, Citrus, and SPADE.

      Reviewer #2:

      This study presents a novel machine learning tool (termed T-REX) for automated analysis of single cell cytometric data that is capable of identifying rare cell populations, such as antigen-specific T cells. This ability to detect low frequency cells is a distinct advantage over existing tools. The demonstration of this ability is appropriately shown by examining antigen-specific CD4+ T cells before and after rhinovirus infection in a challenge study. Useful demonstrations are also included for examining SARS-CoV-2-specific T cells and changes in cellular populations in cancer patients upon treatment. These examples use both mass cytometry and fluorescence-based cytometry. Since both of these are commonly-used single-cell technologies that generate highly complex data sets, new automated analysis methods such as T-REX are needed.

      The first data set examined changes in cell phenotype before and 7 days after rhinovirus infection in healthy adults. The flow cytometric staining panel included markers of T cell differentiation and activation as well as rhinovirus-specific tetramers. The results of T-REX convincingly demonstrate "hotspots" that are expanded at 7 days and enriched for tetramer-staining cells. Thus, this study succeeds at demonstrating the utility of this method for identification of rare cells and the authors use this data set to appropriately determine the model parameters. Combining the results of this algorithm with the "Marker Enrichment Modeling (MEM)" method to characterize the markers expressed on those cell populations identified through T-REX is also very informative since this automates the characterization (that traditionally needs to be done by manual investigation).

      This first data set is relevant for this demonstration, but in some aspects it represents a best case scenario. "Phenotypic" identification of antigen-specific T cells in this way is only possible because the time point was chosen to capture the relatively narrow window when T cells would be activated, and there was access to a baseline sample for comparison. The authors do address the second point, and perform the analysis comparing day 7 to a later time point, day 28, as an appropriate alternative. The first concern limits the generalizability of this approach. In fact, the second example dataset examining mass cytometry data in patients with COVID-19 does in fact demonstrate limited ability to detect change in cell populations for many study participants.

      We appreciate the point that the rhinovirus dataset timepoints were selected carefully to help reveal the antigen specific T cells and that in other disease settings one may not know the optimal timepoint and need to sample multiple times. We have added a note addressing this in the Discussion. To the second point, we agree that in the COVID-19 response example that it is difficult to pinpoint the antigen specific cells. We believe this is due to the high degree of overall change in the immune system and agree that it could also not represent the ideal timepoint (i.e., there are multiple reasons it is a different scenario). This is also captured in the updated Discussion text thinking through the applicability of T-REX to different scenarios. Notably, we predict that in the COVID-19 vaccine response setting, where there is far less overall immune change, that T-REX might fruitfully identify antigen specific cells (this is beyond the scope of the present manuscript, but mentioned as a next step in the Discussion).

      Reviewer #3:

      Barone, Paul et al. present a new computational method, named T-REX, to detect changes in immune cell populations from repeated cytometry measurements (before and after infection or treatment). The proposed method is designed to detect changes in rare and common cells with particular focus on the former. T-REX detects subpopulations of cells showing marked differences in abundance between the proportion of cells from different time points (before and after infection) from a single individual. The method relies of a dimensionality reduction step using UMAP followed by a K-nearest neighbor (KNN) search to identify cells that have a large fraction (>0.95) of neighbors from one time point, indicating expansion or shrinkage of certain cell populations. Areas in the UMAP with clustered expanding or shrinking neighborhoods are labeled as hotspots. Cells in these hotspots were further characterized and enriched markers were identified using MEM, a method published earlier by the same authors. T-REX was applied to a newly collected dataset of rhinovirus infection and three publicly available datasets of SARS-CoV-2 infections, melanoma immunotherapy and AML chemotherapy. The results are presented clearly and the authors discuss in details several examples in which the cells identified by T-REX have a phenotypic profile which align with previous knowledge, indicating the relevance of the results.

      Strengths:

      • T-REX is based on a simple pipeline including UMAP and KNN. This is an advantage especially given the large number of cells collected. Further, the proposed approach has a key advantage since it allows the analysis of one sample at a time, which is practical if one wants to analyze a new sample. There is no need to rerun the analysis on an aggregate of a large number of samples.

      • The new rhinovirus dataset is of great value to the community.

      Weaknesses:

      • The paper lacks a comparison to other methods for differential abundance testing. In particular, it is not clear how T-REX differs from the Differential abundance test proposed by Lun et al. (https://doi.org/10.1038/nmeth.4295). Similarly, there are no experiments or results to support the authors' initial claim that T-REX outperforms current clustering-based methods (SPADE, FLOWSOM, Phenograph,…etc.) in capturing changes in rare (<1%) cell populations.

      We appreciate this point and have selected a few of the top algorithms, as well as algorithms we and collaborators use regularly, and we made direct performance comparisons of accuracy and time to support our initial claim (new Figure 7 and associated Results paragraph).

      • T-REX relies on arbitrary cutoffs (0.95 and 0.5 %) to define expansion or shrinkage in the neighborhood of each cell (0.95 and 0.5 %) rather than a formal statistical test. These cut-offs were defined based on the ability to detect tetramer positive cells in one subject only. This greatly limits the generalizability of the method.

      We appreciate these points that the ‘optimal’ k value was determined using a single individual and that a single one size fits all cutoff may not be ideal for all situations. In the case of rhinovirus, we can use the tetramer+ cells as a type of truth, and we proceeded to use the additional patients to test for optimal k-values. The results of this analysis are referred to as numbers in the Results text and we have added text to highlight this point. Briefly, in continued analysis of the rest of the infected rhinovirus subjects, optimal k values ranged from 30 to 80. Additional optimization based on a formal statistical test is something we would like to explore in a subsequent study, but is beyond the scope of this manuscript. We have noted both points around the k-value and the cutoff in new text added to the Discussion. In other biological studies, we have seen a T-REX change cutoff of 85% be useful, so we imagine users will need to test this on their own for their biological questions.

      • The authors do not motivate the use of UMAP prior to the KNN graph reconstruction. While UMAP is a clearly powerful method to visualize single cell data, the resulting embedding can potentially show distinct groups of points when the high dimensional manifold is more continuous. For this reason, KNN graphs are usually built using the high-dimensional data (or principal components).

      We have added a new figure that includes comparing KNN on the UMAP coordinates to KNN on the original high-dimensional feature space as well as other comparisons (Figure 7). The analysis on UMAP or t-SNE axes outperformed analysis of high-dimensional data (termed “original features” in the text) when using KNN and Phenograph. SPADE was the only algorithm that identified both significant regions of change when starting with the original features.

      • Given that T-REX is mainly developed to detect changes in rare cell populations, the paper lacks an assessment of the method's sensitivity. For instance, cells were subsampled equally from each time point. An assessment of the effects of this subsampling step is necessary. In general, a guide to the users indicating the limitations of T-REX will be greatly helpful.

      We appreciate this point and have added a new Supplemental Figure 7 to assess the sensitivity of T- REX with subsampling. We have also expanded on the limitations and uses of T-REX in the discussion.

      • Given that the main aim of T-REX is to detect differences in rare cells, the rational to perform a separate analysis for CD4 positive cells is not clear. One would expect these differences to be identified also in the analysis performed using all cells.

      We appreciate this point and agree that T-REX using all the cells in the rhinovirus study could be interesting. The main focus in this study was on CD4+ T cells specifically, as rhinovirus is known to induce expansion of circulating virus-specific CD4+ T cells in the blood and these T cells were the only ones marked by tetramers in this study. We did run T-REX on all the cells from a rhinovirus subject (RV001) and still were able to capture the rare virus-specific, CD4+ T cells. We have added a new Supplemental Figure 1 with that result. However, CD4+ were the target of the analysis used to test the algorithm and compare to a known “truth” in that population of cells.

      • The paper lacks a discussion on the effects of batch effects between the different time points on the performance of T-REX.

      We appreciate this point and have addressed limitations and considerations with batch effects in the Discussion. Notably, in the rhinovirus study, each subject is from a single batch (so different batches were not pooled for T-REX, but the results of T-REX were comparable across batches, as seen by the same MEM label phenotype being revealed across subjects). Separate batches would need batch normalization before being run through T-REX, especially when using t-SNE or UMAP, since T- REX is designed to be very sensitive to slight changes.

    1. Author Response:

      Reviewer #1:

      In the early days of the pandemic there was unqualified enthusiasm for convalescent plasma therapy. This enthusiasm shifted dramatically as several trials showed no apparent benefit. Although this manuscript does not show a causal relationship between convalescent plasma therapy and prognosis it is provocative and suggests that further work is needed to assess its utility.

      Strengths of the manuscript include the comprehensive review of existing datasets and the use of state-of-the-art statistical methods for examining potential confounders such as patient age, seasonal variation in hospital admissions that might have impacted quality of care, and the emergence of SARS-CoV-2 variants. Weaknesses include lack of data that might have enabled identification of patients who are likely to benefit from convalescent plasma and characteristics of plasma (such as neutralization titers) that may be associated with efficacy. These weaknesses do not indicate a lack of effort on the part of the team; there is simply no way to obtain the data.

      We appreciate the reviewer kind words. We are aware of efforts by others who are doing extensive electronic health record analysis and are expected to have data on plasma characteristics and outcome.

      Reviewer #2:

      The use of convalescent plasma (CCP) to treat patients with Covid-19 has changed over the course of the pandemic (from rates as high as 40% of hospitalized patients in October, 2020 to a low of less than 10% by March 2021). To explore the efficacy of CCP therapy and the impact of the drop in CCP use, the authors assess whether there was a link between CCP use and patient mortality rates over time in the U.S. Using information from blood centers to estimate CCP usage and population level information on deaths from public databases, they found a strong inverse correlation between CCP usage per hospital admission and deaths due to Covid-19 after admission. The model estimates that the case fatality rate decreased by 1.8 percentage points for every 10 percentage point increase in the rate of CCP use. The detailed analysis suggests that the observed effect could not be attributed to changes in patient ages over time or the emergence of variant viruses. Other cofounders such as changes in the use of additional therapeutic agents or clinical interventions were not analyzed. The authors acknowledge the main limitation of this type of analysis i.e. that establishing a correlation does not prove a causal role. With that caveat, they conclude that the decline in usage may have resulted in excess deaths, possibly 29,000 to 36,000 over the past year in the U.S. Because the decreased usage of CCP occurred during the time that several randomized clinical trials and some media coverage reported no benefit of CCP, the authors suggest that resultant "plasma hesitancy" may have contributed to increased mortality. These findings add an important perspective to future considerations for clinical care, treatment guidelines and regulatory approvals of CCP. Emphasizing the importance of using high-titer units and administering CCP early in the disease course, the authors urge a more nuanced interpretation of the available evidence and a holistic approach to decisions about the use of CCP in individual patients.

      Reviewer #3:

      This is an important manuscript on COVID-19 convalescent plasma (CCP) that challenges the findings of the larger Mayo Clinic CCP study demonstrating a lack of efficacy. Their main findings are that there is a strong inverse correlation between CCP use and mortality for admitted patients in the USA. Overall this is a well written manuscript without any overt weaknesses.

      We appreciate the kind words.

    1. Author Response:

      Reviewer #2:

      The authors investigated how alternative polyadenylation (APA) is modulated in yeast using appropriate transcriptomic methodologies.

      The authors found that mutants for mRNA 3' end formation factors and cordycepin treatment alter alternative polyadenylation in the same manner, generating transcripts with longer 3'UTRs, due to a switch to distal polyadenylation sites (PAS). Most mutants analyzed cause a PAS switch, in particular mutants for RNA14, PCF11, YSH1, FIP1, NAB4 and PAP1. They also found that MPA and a rpb1 mutant, with a slower transcription elongation rate, reverts the cordycepin effect of distal PAS selection. This implies that in yeast, as in higher organisms, APA is modulated by RNAPII elongation. There is nucleosome depletion in the 3' end of convergent genes that undergo cordycepin-driven APA alterations, which is a new finding.

      On the basis of their data, the authors propose a kinetic model for APA in yeast that is regulated by the concentration of core mRNA 3' end factors and nucleotide levels, which in turn modulates RNAPII elongation. This integrative model has been already described in higher organisms, but not in yeast, and overall this study covers an impressive body of work that makes an important contribution to the field.

      1) The authors show that cordycepin have the same effect in APA as most of the 3' end factors mutants used, but there is a lack of integration between the two sets of PAS-seq data. The cordycepin APA effect may be due to decreased expression of mRNA 3' end factors but this hypothesis was not fully explored. Treating those mRNA 3' end mutants with cordycepin could shed some light on this.

      The effect of cordycepin on cleavage factor expression is explored in supplemental Figure 2A. If anything, the increased expression of CFIA and CFIB encoding genes PCF11, RNA14 and HRP1/NAB4 support our model. I.e., that due to an increased transcriptional rate in cordycepin treated cells, the level of the cleavage and polyadenylation machinery fails to meet transcriptional demand. Their expression therefore increased to re-balance the connection between transcription and 3’ end formation.

      2) A new role for SEN1 in APA for a subset of protein coding was observed. The SEN1 mechanism could be clarified if the authors show that SEN1 is within the subset of convergent genes analyzed, and also if SEN1 expression changes upon cordycepin treatment.

      SEN1 expression changes upon cordycepin treatment were shown in supplemental figure 2. There was an increase in SEN1 expression. As mentioned in the text “this might reflect a need to process the elevated level of cryptic unstable transcripts (CUTs) seen following cordycepin treatment (Holbein et al., 2009), or suggest a compensatory mechanism whereby cells under 3’ end stress co- opt an alternative pathway to appropriately cleave mRNA transcripts (Rondon et al., 2009).” Sen1 is also included within Supplemental file 5. as a convergent gene that does not undergo significant APA following cordycepin treatment.

    1. Author Response:

      Evaluation Summary:

      This paper will be of interest to biologists who study mechanisms of cell-to-cell variability in gene expression and those who wish to have a tool to alter variability in mammalian cells. Key regulators of gene expression variability in mammalian cells are identified and noise modulation in a synthetic system is shown. The data quality is high. A model for the origin of the observed noise is proposed, but will require some additional experimental evidence.

      We thank the reviewers for their thorough reviews, insightful critics, and very constructive suggestions of our manuscript. It genuinely helps us improve our work and manuscript. We have performed all the additional experiments suggested. We believe that our new results and revised manuscript answered these questions raised by the reviewers and editors.

      Reviewer #1 (Public Review):

      The manuscript aims to identify origins of stochasticity ('noise') in mammalian gene expression focused on the case when a single transcription factor controls the expression of a target gene. It also aims to devise strategies to control mean and variance of gene expression independently.

      The experimental approach uses a light-induced transcriptional activator in two stimulation modes, namely amplitude modulation (AM: time-constant light input) and pulse width modulation (PWM: periodic light inputs in the form of a pulse train). Perturbation experiments target histone-modifying enzymes to influence epigenetic states, with corresponding measurements of single-cell epigenetic states and mRNA dynamics to dissect mechanisms of noise control. Beyond this synthetic setting, the study is complemented by endogenous gene expression noise in human and mouse cells under the same perturbations.

      Major strengths of the study are:

      • The experimental demonstration that, and under which conditions PWM can reduce gene expression noise in mammalian cells; the corresponding data sets could be very valuable for further quantitative analysis.
      • Providing strong evidence via perturbation studies that the extent of gene expression noise is linked to chromatin-modifying activities, specifically opposing HDAC4/5 histone deacetylase activities and CBP/p300 histone acetyltransferase activities.
      • Proposing a positive-feedback model established by these two opposing activities that is consistent with the reported data from perturbation experiments and on chromatin accessibility / modification states.
      • Providing evidence that also in the natural (human and mouse cell) setting, the regulators HDAC4/5 and CBP/p300 contribute to the control of gene expression noise.

      We thank the reviewer for the careful analysis of our manuscript.

      Major weaknesses are:

      We appreciate that the reviewer pointed out two studies with E. coli and yeast with similar PWM. We believed that their concepts were different. The concept of “stabilized unstable steady states” was a specifically developed in control chaos in physical by Ott, Grebogi, and Yorke (OGY theory, https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.64.1196 ). Their motivation was to feedback control chaos with small perturbation in the systems. Non-feedback control with small periodic perturbation has also been shown to control chaos by stabilizing unstable steady state. The E. Coli work to stabilize an unstable steady state could be considered as an extension of these concepts in complex biological systems. In addition, the location of unstable steady state in a bistable system would decrease with increasing light intensity, as shown in the black dashed line in Figure 2E, inconsistent with our result that the mean mRuby is monotonically correlated with the mean light intensity (Figure 1C).

      It is correct that the hypothesis proposed by Benzinger and Khammash in their yeast paper, that the cooperative TF-gene expression curve is sufficient to generate bimodal distribution with high variable TF distribution, shown in Figure 1G. But it is not the case in our study. In our experiment, GAVPO and mRuby expression do not exhibit clear cooperativity. In addition, the authors didn’t show bimodality unless a non-isogenic cell population is used (Fig. 3h in Benzinger and Khammash’s paper).

      • Insufficient evidence for the postulated bistability caused by positive feedback on chromatin states in the mammalian system analyzed, which has implications for the mechanistic explanations provided (e.g., if PWM allows rapid cell switching between 'high' and 'low' states as postulated).

      We agree with the reviewer that the current technology limits the possibility to obtain more direct evidence of bistability in chromatin states. Our scATAC-seq data shows that chromatin openness oscillated between light “on” and “off” phase with reduced heterogeneity comparing to the dark control. Our bulk data suggest that H3K27ac has larger differences between “high” and “low” states. A better measurement would be single-cell ChiP-seq for H3K27ac. However, the current single-cell ChiP-seq technologies provide coverages too low (~1% of scATAC-seq reads) to support measurements at specific loci (https://www.nature.com/articles/s41592-021-01060-3, https://www.nature.com/articles/s41587-021-00869-9 ).

      • Limited theoretical support for the proposed (not directly observable) mechanisms that uses a mathematical model illustrating the potential consistency, but the model is not directly linked to the experimental data and hence of limited use for their interpretation.

      Our ODE model wasn’t built to fit to the experimental data. We used it to generate hypotheses with perturbation in HDAC4/5 and CBP/p300. We validate the model prediction of inhibition p300 reducing heterogeneity.

      It was validated in experiments. We have built a stochastic model containing all the processes in our ODE model, considered nine independent promoters, and have written the code for stochastic simulation algorithm similar to the yeast paper, and performed optimization. But we don’t have enough CPU time to fit to the experimental data and finding the “global minimum” using the parallel tempering Monte-Carlo method (https://pubmed.ncbi.nlm.nih.gov/19810318/).

      Overall, the authors achieved their aim of elucidating mechanisms for noise control in mammalian gene expression by identifying specific, opposing regulators of chromatin states, with clear support in the synthetic setting, and evidence in endogenous expression control. Conceptual advances regarding strategies for the external control of gene expression noise appear limited because of prior work, which includes more in-depth theoretical analysis in simpler (bacterial, yeast) systems.

      Hence, the likely impact of the work will be primarily on the more detailed (in terms of histone regulators, etc.) study of noise control in mammalian cells, while the data sets presented in the study could prove valuable for follow-up quantitative (model-based) analyses because they are unique in combining different readouts such as single-cell protein and mRNA abundances as well as histone and chromatin states.

      We appreciate that reviewer finds this manuscript support that the molecular mechanisms regulate mammalian gene expression noise control in both synthetic and endogenous gene regulations.

      Reviewer #2 (Public Review):

      The manuscript describes a tool to independently tune mean protein expression levels and noise. Light induces dimerization and subsequent activation of transcriptional activator GAVPO. By introducing 5xUAS (a target sequence for dimerized GAVPO) upstream a mRuby reporter gene, the effect of light can be measured on mRuby mean and noise.

      By pulsing light at different periods (from 100-400 minutes), the authors reduce the mRuby noise for intermediate average light intensities. Notably, the pulses are all applied at an absolute light intensity of 100 uW/cm2, with the average light intensity being modulated through the light-off time-periods. Therefore, as all periods tend towards 100 uW/cm2 average light intensity, the PWM duty cycles becomes more similar to the 100 uW/cm2 AM case.

      Strengths:

      The proposed method is an elegant way to independently tune protein mean and noise. This would have a broad application in the field and is much needed to be able to study the consequence of protein expression noise, independently of mean. In addition, the authors use multiple powerful single-cell techniques to try and determine the mechanism underpinning the light-induced noise modulation.

      During constant exposure to light, increased light intensity increases the mean expression of mRuby, while decreasing the noise. This high noise is mostly due to observed bimodality in mRuby expression. Through ODEs and by using small molecule inhibitors, the authors show that this bimodality is caused by some cells being stably off, while other cells enter an on state. In this on state a positive feedback can occur where initial binding of dimerized GAVPO induces histone acetylation and chromatin accessibility, and thus stimulates further GAVPO binding. Bistability induced by constant light exposure is disrupted using small molecule inhibitors of CBP/p300 HAT activity, indicating that histone regulation is a cause for this observed bistability. The stable on state is demonstrated to be more active and accessible through ChIP-seq and ATAC-seq respectively.

      We appreciate that reviewer recognize that our method of independent tuning protein mean and noise has a broad application and is much needed, and our adaptation of integrating multiple single cell analyses to determine noise control mechanism. We believe that this method would be proven especially useful in cell fate control studies, in vitro with stem cell differentiation or in vivo with embryo development.

      Weakness:

      The single-cell ATAC-seq data indicate that pulsing light induces switching from an accessible (light on) to inaccessible (light off) chromatin state. The authors argue that the switching back into a chromatin inaccessible state prevents the positive feedback to occur and thus reduces noise. However, there are weaknesses in the description of the mechanism by which the pulses modulate (i.e., reduce) noise. Overall, since these sections in the manuscript are not easy to understand, it is difficult to parse what mechanism the authors attributed to the observed noise reduction and to assess if the data supports the conclusions.

      We apologize for the lack of clarity in this aspect. We have extensively rewritten the descriptions in the related sections. As the PWM light intensities alternate between 100 uW/cm2 and dark, which located at high and low monostable states. We need to show if the fraction of times at each state are sufficient. The scATAC-seq data indicate, one 150-minute of 100 uW/cm2 light pulse is sufficient to elevate the chromatin accessibility while reduce the cell-cell variations, two features of the high monostable state. The 450-minute dark period will reduce the chromatin accessibility. In this dark period, the cells will fall back to the low monostable state without sufficient activated GAVPO. H3K27ac has larger dynamic range between low and high state (Figure 3J), but single-cell ChiP-seq methods don’t provide sufficient coverage to assess H3K27ac heterogeneity at the 5xUAS-mRuby loci. Nevertheless, indirect evidences with perturbation of p300 activation or GAVPO-p300 interactions support this picture.

      The data from the single-mRNA live-cell imaging experiments are somewhat ambiguous and do not necessarily support some of the arguments. The conclusion that transcription, nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM is not clear from the data. Especially, since most cells do not show any pulsatile behavior both in the single-cell ATAC-seq and the live-cell imaging data.

      We improved the presentation of the data. With the data presented in logarithm scale, it is visible that most cells exhibit pulsatile behavior (new Figure 5C). These can be further visualized with averaging over subpopulation of cells. As shown in Figure 5G in the revised manuscript. there are approximated 57% of cells show oscillations. The mean mRNA shows a damped periodic oscillation. The statement that nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM are postulated due to the rate constants in the literatures, and removed in the revised manuscript. The half-life of mRuby is about 24 hours, sufficiently longer than the period of PWM. We have added an analysis of single-cell mRuby dynamics with 400 min PWM, which don’t exhibit periodic oscillations (Figure 5-figure supplement 2).

      Reviewer #3 (Public Review):

      The authors use a synthetic light-controlled transcription factor (GAVPO) to test a model of bistable gene expression that is hypothesized to originate from positive feedback via local histone modifications by trans-activator recruitment of CBP/p300 to facilitate open chromatin, which facilitates GAVPO binding, etc… Their proposed model for the origin of bistability is important because it should apply to any trans-activator that recruits CBP/p300 to modify chromatin and active gene expression. The authors show that periodic modulation of light reduces the bimodal distribution at intermediate light-intensity levels to a unimodal distribution. This is an elegant demonstration of how GAVPO and different temporal patterns of light can reduce cell-to-cell variability in gene expression, if needed.

      Strengths:

      The authors generate an impressive amount of single-cell data of gene expression and chromatin state (flow cytometry, single-cell sequencing, live-cell MS2-tagging) at different intensity levels. The periodic modulation of GAVPO activity by light is a practical demonstration of how to sculpt the gene expression output in useful ways. This may be a very useful tool for future biologists.

      We thank the reviewer for the positive comments on the mammalian noise control mechanism we discovery and its broad implications.

      Weakness:

      The proposed model for bistability is not convincingly tested or supported by the existing data. Each reporter should exhibit a bistable response because the positive feedback is localized to the promoter via cis-effects on gene expression by local chromatin state/GAVPO binding. The authors show a bimodal distribution of gene expression in a population of cells, which is consistent with a bistable response in a single reporter gene. However, their strain has 9 independent reporters integrated into the genome. Thus, I would expect to see up to 10 peaks, not 2 peaks. Moreover, the mathematical model used to validate their observations does not model the total expression from 9 independent promoters, which is a critical omission given the cis-nature of the positive feedback loop. The fact that these 9 promoters generate 2 peaks at intermediate light intensity suggests that the GAVPO bistability likely originates from a trans-effect, i.e., either all 9 promoters are OFF or all 9 promoters are ON, not a cis-effect.

      We appreciate the reviewer’s insight. We agree that theoretically there should be potentially 10 peaks. The separation between two adjacent “high” peaks is about 2 folds. The experimentally measure high mRuby peak with the lowest CV is about 0.47 (cells under maximum light with LMK-235 and A485, Figure 3B). This variation could overshadow the 2-fold differences in mean mRuby and prevent the recognition of multiple “high” peaks. On the other hand, the difference between low state and any of the high states is large enough to be recognized as separate peaks. We emulate the case with the 9 sites chose “low” and “high” states stochastically and stochastically (Figure 3-figure supplement 2). The 9 potential high peaks are convoluted into a broader peak, similar to experimental observations.

      We agree that our model is very simple and didn’t model the total expression from independent promoter. We have built a stochastic model containing all the processes in our ODE model, considered nine independent promoters. Unfortunately the fitting to experimental data using the parallel tempering Monte-Carlo method costs too much time.

      We performed additional experiments to mutate p65AD of GAVPO to specifically reduce its interaction with CBP/p300. The disappearance of bimodal distribution validates that the direct interaction between UAS-binding GAVPO and CBP/p300 causes the bistability, not a trans-effect through intermediates. We performed single-cell mRuby dynamics and selected cells with nearly identical GAVPO (Figure 2H). The mRuby-high cells elevated earlier and stay at high state (red lines in Figure 2G), and the mRuby-low cells remain low (blue lines in Figure 2G). There are a few cells seem to make the transitions between the two states. These data are consistent with bistability model with small rates of stochastic transition in between. Prior exposure to 100 uW/cm2 light also tilted the distribution toward the “high” state, validate the hysteresis properties of the bistability (Figure 2I-J).

    1. Author Response:

      Reviewer #2:

      Motivated behaviors, such as food seeking when hungry, can also occur spontaneously at irregular intervals. Understanding how this irregular expression arises is important for understanding behavior and is relatively little investigated. The present work thus addresses an important and under-investigated area in neurobiology. Its demonstration of a potential cellular mechanism for irregular behavioral production has wide relevance, ranging from how cells make "decisions" to how whole organisms do so.

      Intact Aplysia occasionally produce bites even in the absence of food, and isolated buccal ganglia (which contain the biting central generator circuit) will occasionally spontaneously produce fictive bite motor patterns. The activity of central pattern generator networks has almost exclusively been ascribed to the actions of the voltage-gated channels in the network neuron cell membranes and the synaptic connectivity among the network's neurons. Bédécarrats et al. show that a small, highly regular cell membrane voltage oscillation occurs in a neuron (B63) in the biting neural network, and that occasionally this oscillation becomes large enough to trigger a plateau potential in B63 and a single fictive bite from the entire circuit. They show that this oscillation is not due to cell membrane voltage dependent conductances, but instead from process involving the endoplasmic reticulum, mitochondria, or both. Although organelle-driven changes in cellular or tissue activity have been observed in other cell types, this is, to my knowledge, its first observation in a neural network. These data thus are potentially of great importance in understanding how neural networks function, most of which do not show the great regularity of central pattern generated behaviors.

      The presented data seem, to me, strong with respect to the small potential oscillations not being generated by voltage-dependent cell membrane conductances, and somehow involving the intracellular organelles. What is less clear to me is how local release of Ca from endoplasmic reticulum or mitochondria would result in changes in ion composition under the cell membrane, which is what gives rise to the cell membrane potential. Ca is highly buffered in the cytoplasm. It is thus unclear to me that free Ca would remain so for any length of time after release. It does, of course, in muscles, but these are evolved for this to occur. The authors themselves raise a variant of these concerns in the Discussion when considering how the B63 cell membrane voltage oscillations are transmitted to neurons electrically coupled to B63, invoking as a possibility Ca activation of second messengers, which would then themselves be responsible for the cell to cell communication. It seems to me that the same concerns arise with respect to how Ca release at sites distant from the cell membrane could charge the membrane's capacitance.

      A second remarkable observation is that B63 depolarization and firing does not reset the organelle-derived slow oscillation. B63 firing should result in substantial Ca concentration changes, at least in a shell under the cell membrane, so a possible feedback mechanism can be imagined. Most biological processes contain multiple feedback process that link cause and effect (e.g., the sequential current activations that return a cell to rest after an action potential, the interactions between sympathetic and parasympathetic system activity that maintain functionally proper body activation, the interactions that regulate hormonal levels). One possibility the authors mention is that the organelle-derived oscillation is used only for intermittent bite activities, and in feeding bites are instead generated solely by standard cell-membrane voltage-dependent processes. Regardless, it is a striking observation that merits additional investigation.

      These issues, however, do not change the data, which show a clear association of disruption of endoplasmic reticulum and mitochondrial function and cessation of the cell membrane voltage oscillation. Nor is it reasonable to expect an article like this, showing an organelle-driven cell membrane potential oscillation for the first time in a neuron, to describe every aspect of the mechanism by which it occurs. Indeed, it is a measure of the article's interest that it prompts such thinking. It will be very interesting to see the effects of similar organelle-disrupting treatment on the activity of other well-defined neural networks.

      The reviewer’s concern about how a local release of Ca from the endoplasmic reticulum or mitochondria could alter the ion composition under the cell membrane, and thereby cell membrane potential, raises an important issue in the context of our paper. However, we should point out that there is a considerable body of experimental evidence indicating that plasma membrane ionic conductances are altered by organelle-released Ca in many different non-muscle cells and tissues, including Aplysia and crustacean stomatogastric ganglion neurons (e.g., Hickey et al. 2010, J Neurophysiol 103:1543–1556; Knox et al., 1996, J Physiol 494:627-39; Kadiri et al., 2011, J Neurophysiol 106:1288–1298). Nonetheless, how intracellular Ca is able to alter plasma membrane conductance despite the presence of strong buffering mechanisms remains an open question. As the reviewer posits in a subsequent comment, this capability may indeed be due to a close proximity of the organelle and plasma membranes (see response to comment 8 below), and/or is mediated indirectly by a Ca-dependent activation of second messenger cascades (Lorenzetti et al. 2008 Neuron 59: 815–828).

      The reviewer also mentioned a striking observation that B63 depolarization and firing does not reset the organelle-derived slow oscillation. We can only speculate at this stage, but this lack of resetting may be related to cell compartmentalization. As now discussed in the manuscript, the Ca and associated voltage oscillations may be generated in a compartment (e.g., the neuropile) that is remote from the site where production of plateau potentials occurs (e.g., the soma). A propagation of the voltage oscillation from the first compartment (similar to postsynaptic potentials) could trigger the plateau at the second distant locus. Conversely, local calcium influxes induced by depolarization or plateau production at this latter site may be insufficient to alter the distant organelle-derived Ca oscillation. A paragraph in which this idea is expanded further has been added to the Discussion.

      Reviewer #3:

      In this report the authors characterize a mechanism that plays a role in inducing the rhythmic depolarizations that are observed in identified neurons that are part of the feeding CPG in Aplysia. The neurons studied (B63 neurons) are of interest because previous work has established that they play an important role in triggering cycles of motor activity. Further, previous work from this group has demonstrated that activity in the B63 neurons can be modified by operant conditioning.

      The authors present this study as though previous work had established that plateau potentials generated in the B63 neurons play an important role in driving network activity. For example, in line 102 they state "This essential role played by B63 is partly mediated by a bistable membrane property, which allows the sudden switching of the neuron's resting membrane potential to a depolarized plateau…" To support this statement, they reference Susswein et al. 2002, which does not support this statement. In the Susswein et al. study it is the B31/32 neurons that are modeled as having plateau properties.

      If previous work has not established the role of the B63 plateau potentials, the only data that speak to this issue are presumably in the current report. In this study the authors do provide data that indicate that the B63 neurons generate low amplitude oscillations that are not likely to depend on input from the electrically coupled neurons studied (notably B31). The authors also show that in some instances, these depolarizations do trigger plateau potentials in B63. It is, however, not clear that the B63 generated plateau potentials are then responsible for triggering network activity (e.g., as opposed to a situation where depolarizing input from B63 triggers plateau potentials in B31/32 and the depolarization in B31/32 drives the rest of the feeding circuit). For example, in Figs. 6A and Supplemental Fig. 4A it does not appear that the plateau depolarization in B63 is being transmitted to other electrically coupled neurons to any large extent.

      A clarification of this issue is important because it potentially impacts thinking concerning how 'decision making' is occurring. If decision making means induction of a motor program and this does not occur unless the depolarization in B63 is transmitted to B31/32, the process is more complicated than what the manuscript currently suggests.

      The title is misleading since there are no studies of behavior in this report.

      In part, interest in the mechanisms that drive spontaneous oscillatory activity in the B63 neurons stems from the overall context of this work. Namely the authors have previously established that oscillatory activity can be modified through associative learning. In the Sieling et al. 2014 study they demonstrate that two aspects of plasticity are accounted for by changes in synaptic properties and an effect on a leak current. For readers trying to understand this body of work as a whole, the Discussion should more clearly indicated how the results of the present study integrate with these previous findings.

      We agree with the reviewer that the present study is the first to establish that B63 is intrinsically capable of generating plateau potentials. We have therefore modified the manuscript to clarify this point:

      Lines 80-86 now state: “Thus, deciphering the mechanisms underlying the bursting activity of these key decision neurons is critical to understanding the process of radula motor pattern expression. Although earlier modeling evidence suggested that B63 bursting might rely on the cell’s electrical synapses with other circuit neurons that possess a plateau potential-generating capability (Susswein et al., 2002), the actual triggering process for spontaneous B63 bursts and consequently the irregular emission of buccal CPG output remains unknown.”

      Lines 98-106 now state: “This essential role played by B63 is partly mediated by sustained, large amplitude membrane depolarizations that activate high frequency bursts of action potentials (Figure 1C; see also Nargeot et al., 2009). Consistent with these underlying depolarizations arising from a bistable membrane property (Russell and Hartline, 1978), a brief intracellular injection of depolarizing current into an otherwise silent B63 neuron can trigger a depolarizing plateau and accompanying burst discharge that far outlasts the initiating stimulus (Figure 1D). The stimulated B63 in turn activates a similar burst-generating depolarization in the contralateral B63 cell and elicits a single BMP by the buccal CPG network.”

      Line 177-180: now explicitly draws conclusion on the endogenous origin of B63 plateau potentials: “Significantly, the continued expression of this burst-generating capability under functional synaptic isolation confirmed that the underlying plateau potentials, as suggested by evidence reported above (see Figure 1C), arose from an endogenous membrane property of the B63 neurons themselves.”

      The reviewer also raises the issue of the decision-making process leading to buccal motor pattern genesis. Our previous study found that experimental depolarization or hyperpolarization of B63 with intracellular current injection respectively either triggers or prevents buccal motor pattern genesis (Nargeot et al., 2009). Earlier modeling data indicated how experimentally elicited activity in a passive B63 neuron could trigger plateau potential generation in the electrically coupled B31/32 cells (Susswein et al., 2002). The present study, essentially conducted with chemical synapses blocked, including the strong excitatory synapse from B63 to B31/32, investigated the origin of the spontaneous intrinsic, rather than extrinsically elicited activity of B63 and did not address the cell’s dynamic relationship with other network neurons. Our data indicate that B63 generates a spontaneous pacemaker activity that can trigger endogenous plateau potentials without the involvement of any extrinsic influences. That said, however, due to our experimental conditions with the functional suppression of chemical synapses, we agree with the reviewer that the present study does not establish whether B63 is uniquely sufficient in the decision process for the induction of BMPs.

      This point is now made in the revised manuscript (Lines 577-583), which states: “However, because our experiments were mainly conducted with all the network’s chemical synapses blocked, we were unable to establish whether B63’s endogenous oscillatory and plateau properties are alone sufficient in the decision process for BMP genesis. Nevertheless, in normal saline conditions with the network remaining functionally intact, in contrast to all other identified circuit cells, the B63 neuron pair are the only elements found to be necessary and sufficient for triggering motor pattern expression and resultant food-seeking movement (Hurwitz et al., 1997; Nargeot et al., 2009)”.

      The title of the revised manuscript, now “Organelle calcium-derived voltage oscillations in pacemaker neurons drive the motor program for food-seeking behavior in Aplysia” hopefully satisfies the reviewer’s concern.

      We fully acknowledge the importance of ultimately placing our present findings in context with those reported previously, especially in relation to B63’s oscillatory behavior and associative learning. However, at this stage our data are insufficient to allow such an integrated assessment, although obviously this is a major goal of our future research.

    1. Author Response:

      Reviewer #1:

      Gentile A et al show a novel role of Snai1b in growth regulation of zebrafish myocardial wall. Specifically, authors show that zebrafish lacking Snai1b exhibit cardiac looping defects (~50% penetrance), consistent with previously described morpholino mediated Snai1b knockdown phenotype. Extruding cardiomyocytes away from cardiac lumen, mostly in the atrioventricular canal region were observed in remaining 50% of Snai1b knockout zebrafish. Using RNA-seq, authors identified several dysregulated genes, including enrichment of intermediate filament genes in Snai1b knockout zebrafish. Among these dysregulated genes, authors suggest that increased Desmin expression and its aberrant localization promote cardiomyocyte extrusion in Snai1b knockout zebrafish hearts. Overall, present manuscript describes a novel phenomenon during cardiac development, hence, it is of interest to developmental biologists.

      1) Snai1 is known to affect cushion formation in atrioventricular canal region. It would be helpful to establish cause and effect relationship for Snai1b in this region. Zebrafish lack global Snai1b expression - so it would be helpful to show if defective cushion promotes cardiomyocyte extrusion in atrioventricular canal region. Tnnt2 morpholino experiments provides some insights, however, it does not rule out role of defective atrioventricular cushion (defective EMT).

      We thank the reviewer for these suggestions. While we agree that this point is interesting, it must be noted that atrioventricular (AV) valve formation in zebrafish starts at ⁓56 hpf, with the collective migration of the valve endothelial cells (Gunawan et al., 2019; Gunawan et al., 2020). Zebrafish heart valves are functional starting at approximately 72 hpf, when they can efficiently close the lumen (Gunawan et al., 2019; Gunawan et al., 2020). Thus, the endothelial-to-mesenchymal transition process in the AV canal takes place after CMs start extruding in snai1b mutants (48 hpf).

      Nevertheless, we examined valve formation in snai1b mutants and observed that the early stages of valve development seem unaffected, and that wild-type like valve leaflets appear by 72 hpf (Figure R1). Together, these data indicate that the CM extrusion defects in snai1b mutants are not a secondary effect of valve dysfunction.

      2) For Figure 2 - additional histology / immunohistology to show extrusion, cohesion, and orientation of cardiomyocytes at a section level (2D) in Snai1b knockout hearts could help to characterize phenotype at a cellular level. It is assumed that all cardiomyocytes lack Snai1b protein (immunostaining would help), however, only few cardiomyocyte show extrusion. Minor point - Cartoon images in figure 2 are somewhat disconnected from immunostaining images.

      We thank the reviewer for these suggestions. While we agree with the point regarding the Snai1b immunostaining, there are no commercially available or published antibodies that detect zebrafish Snai1b, particularly one that differentiates between Snai1a and Snai1b. To better characterize the snai1b mutant phenotype at the cellular level, we have now included quantification of CM apical surface areas and aspect ratios at 52 and 74 hpf. We found that the CMs in snai1b mutants appear smaller and more rounded compared with those in wild- type embryos. We also found a smaller ventricular volume in snai1b mutant hearts, potentially due to the changes in CM shape. These new results are shown in Figure 1 - supplement 5. We also changed the cartoons in Figure 2, as suggested.

      3) It is unclear whether Snai1b knockout hearts exhibit defective contractile phenotype and whether there is a cardiac phenotype in surviving adult zebrafish. It is also unclear whether RNA-seq and SEM from adult zebrafish heart represent embryonic extrusion and intermediate filament defects.

      We thank the reviewer for these comments. However, characterizing the snai1b mutant adult phenotypes is beyond the scope of this manuscript. It is important to clarify that the RNA- seq and SEM experiments were performed in embryonic hearts.

      4) It is unclear why only few cardiomyocytes show extrusion when most of cardiomyocytes, if not all, are overexpressing Desmin gene.

      We agree with the reviewer. As we showed by immunostaining in wild-type hearts (Figure 3I), a subset of CMs – the few extruding wild-type CMs – exhibit Desmin enrichment in their basal domain. We speculate that only this subset of CMs exhibits basal Desmin enrichment because their position within the myocardium exposes them to higher mechanical forces due to increased blood flow and looping morphogenesis, which in turn raises their propensity to extrude. Indeed, as we show in Figure 1 – supplement 1F, most of the CM extrusions in snai1b mutants are observed at the AV canal, where CMs experience the highest level of mechanical forces (Lombardo et al., 2019; Campinho et al., 2020).

      5) Molecular link connecting Snai1b and cardiac filaments genes is not determined.

      We have now used a luciferase assay in HEK293T cells to test the regulation of desmb expression by Snai1b. It was previously shown by ChiP-seq in mouse skeletal myoblasts that Snai1 can bind to the proximal promoter of Desmin (Soleimani et al., 2012). Our in silico analysis uncovered an 800 base pair region upstream of the start codon of zebrafish desmb that exhibits a high degree of similarity (>45%) with the mammalian sequence and is thus a promising proximal promoter for desmb. Furthermore, Kürekçi et al. recently reported that the zebrafish desmb promoter contains putative Snai1b-binding sites (Kayman Kürekçi et al., 2021). We cloned this 800 bp region upstream of a luciferase reporter and co-transfected the resulting plasmid with a plasmid expressing zebrafish Snai1b, which led to a significant decrease of luciferase activity compared with the proximal promoter alone. These data suggest that Snai1b binds to the proximal promoter of desmb and represses its transcription, potentially implicating Snai1b as a direct regulator of desmb expression. This new result is shown in Figure 3 – supplement 1D.

      Reviewer #2:

      An intact myocardium is essential for cardiac function, yet much remains unknown regarding the cell biological mechanisms maintaining this specialized epithelium during embryogenesis. In this manuscript, Gentile and colleagues discover a novel role for the repressive transcription factor Snai1b in supporting myocardial integrity. In the absence of Snai1b, cardiomyocytes exhibit an enrichment of intermediate filament genes, including desmin b. In addition, the authors detect mislocalization of Desmin, along with adherens junction and actomyosin components, to the basal membrane in snai1b mutant cardiomyocytes, and these mutant cells exhibit an increased likelihood of extrusion from the myocardium. Ultimately, the authors put forward a model wherein Snai1b protects cardiomyocytes from extrusion at least in part by regulating the amount and organization of Desmin in the cell, thereby supporting myocardial integrity.

      Overall, the authors highlight an important aspect of epithelial maintenance in an environment that experiences significant biomechanical stress due to cardiac function. By generating a promoter-less allele of snai1b, the authors have created a clean genetic model in which to work. Coupled with beautiful microscopy and transcriptomics, this story has the potential to enlighten both cell biologists and cardiovascular biologists on the underpinnings of myocardial integrity. However, clarifications regarding the overall model would be particularly beneficial for the reader.

      We thank the reviewer for their highly supportive comments.

      1) A clearer discussion of the proposed molecular mechanism for Snai1b function would aid a reader's overall contextualization of this work. At one point, the authors suggest that Snai1b regulates N-cadherin localization to adherens junctions, thereby stabilizing actomyosin tension at cell junctions. Later, it is suggested that Desmin activates the actomyosin contractile network at the basal membrane. It is unclear whether the authors believe that these are separate events or whether they may be coupled, perhaps through Desmin disruption at the lateral membranes, leading to modifications in nearby adherens junctions. A more thorough investigation of the phenotype resulting from desmin b overexpression may clarify this relationship.

      We thank the reviewer for these comments. We have now included Desmin and p-myosin immunostaining of desmb overexpressing CMs. We observed an increased level of Desmin protein in the desmb overexpressing CMs, as well as its basal localization. We also observed increased p-myosin localization basally, as we did in snai1b mutant CMs. N-cadherin immunostaining at the junctions was reduced in desmb overexpressing CMs, as in snai1b mutant CMs. Altogether, these results indicate that myocardial-specific desmb overexpression phenocopies snai1b mutants. We have also included deeper quantitative analysis of the immunostaining, now distinguishing the results between extruding and non- extruding desmb overexpressing CMs. These new results are shown in Figure 4F-I’ and Figure 4 – supplement 2.

      Although both reduced junctional N-cadherin and abnormal basal localization of actomyosin factors are consistently observed in snai1b mutants and in desmb overexpressing embryos, additional tools will need to be developed and used to determine whether they are separate or coupled events.

      2) It appears that extruded cells do not bud off from the myocardium, but rather remain on the apical surface of the existing myocardium. However, it is unclear whether this change in tissue architecture affects cardiac function or the overall morphology of the chamber. A brief discussion of these possibilities would have helped to contextualize the significance of this phenotype.

      We thank the reviewer for this interesting point. We have now included a time-lapse spinning disk movie of wild-type and snai1b mutant hearts from 52 to 70 hpf. At the starting timepoint (t0) in snai1b mutant hearts, we observed extruding CMs that were still embedded within the myocardium. Within 6 hours, we did not observe extruding CMs in the same location as we had at t0, but instead found CMs outside of the myocardial wall and they remained in the pericardial cavity for several hours. These new results suggest that CMs do indeed extrude out of the myocardium in snai1b mutant hearts, and they are shown in Figure 1 – supplement 1I-K and video 1.

      Additionally, we quantified the heart rate, ejection fraction, and fractional shortening at 52 and 74 hpf. At 52 hpf, we did not find significant differences between wild-type and snai1b mutants, but at 74 hpf, the heart rate, ejection fraction, and fractional shortening were significantly lower in snai1b mutants compared to wild types. Furthermore, snai1b mutants exhibited reduced ventricular volume at 52 and 74 hpf. As the reduction in cardiac function occurs after CMs start to extrude, these data indicate that CM extrusion has an impact on the overall morphology of the ventricle and cardiac function. These new results are shown in Figure 1 – supplement 5.

      3) The authors show that cardiomyocyte extrusion is most prevalent near the atrioventricular canal, and they suggest that this regionalized effect is due to the different types of extrinsic factors, like biomechanical forces, that this region experiences. However, it is also possible that regional differences in certain intrinsic factors are involved, such as junctional plasticity, actomyosin activity at the basal membrane, etc. To distinguish between these possibilities, it would have been informative to know whether the extent of N-cadherin/α-18/p-Myosin/Desmin mislocalization varies depending on the regional location of cardiomyocytes within the snai1b mutant heart. For example, do cardiomyocytes near the atrioventricular canal exhibit more extreme effects on N-cadherin/α-18/p-Myosin/Desmin localization than cardiomyocytes in further away portions of the ventricle? Or, do these cells exhibit similar degrees of protein mislocalization, but cells near the atrioventricular canal have a lower threshold for extrusion?

      We thank the reviewer for this interesting point. We hypothesize that CMs closer to the AV canal exhibit more severe effects on N-cadherin/α-catenin epitope α-18/p-myosin/Desmin localization, due to the higher mechanical forces they experience (Lombardo et al., 2019; Campinho et al., 2020). However, our immunostaining procedure for zebrafish embryos requires deyolking to allow access of the tissue to the antibody, which unfortunately leads to the loss of the atrium and the part of the AV canal closest to the atrium; thus, this procedure renders it difficult to perform quantitative analysis of the immunostaining signal throughout the heart.

      Reviewer #3:

      Gentile, A. et al. generated snai1b mutant zebrafish embryos and showed that loss of Snai1b led to two mutant phenotypes in the heart: i) hearts with clear looping defects, ii) hearts without looping defects that displayed abnormal cardiomyocyte (CM) extrusion. The authors focused on the second class of mutants and found that loss of Snai1b led to reduction of N-cadherin at cell junctions and basal accumulation of phosphorylated myosin light chain and the α-18 epitope of α-catenin, indicative of mechanical activation. Bulk RNA-sequencing of isolated hearts revealed an upregulation of intermediate filament (IF) genes in Snai1b mutants, and of particular interest, the authors identified upregulation of the muscle-specific IF gene desmin b. Immunofluorescent imaging revealed that Desmin was not only upregulated in Snai1b mutants, but mis-localized away from cell junctions and accumulated at the basal side of extruding cells along with actomyosin machinery. Accordingly, CM-specific overexpression of Desmin was sufficient to promote cell extrusion.

      The presented work is particularly interesting because it identifies a new role for the Snai1b transcription factor in maintaining proper tissue structure, independent of its typical function in regulating epithelial to mesenchymal transition (EMT). Overall, the experiments were well designed and controlled, and the data is clearly and logically presented. However, some of the findings could be explained by alternative hypotheses and other interesting aspects of the data were left unexplored.

      One hypothesis that was not sufficiently discussed is that loss of Snai1b may prevent cardiomyocytes from undergoing the EMT that is necessary for normal delamination and trabeculation, and thus cells are instead extruded away from the lumen to prevent overcrowding in the developing myocardium. In fact, the authors present evidence that EMT is blocked and acknowledge that extrusion is a known mechanism for preventing overcrowding. It would be interesting to see whether extrusion away from the lumen also occurs if EMT is blocked through other means.

      We thank the reviewer for these interesting questions regarding CM overcrowding, EMT, and CM extrusion. To test the hypothesis that CMs in snai1b mutants are extruding to prevent overcrowding in the developing myocardium, we treated embryos with an ErbB2 inhibitor to reduce CM proliferation (however, ErbB2 also regulates EMT). We did not observe a significant difference in the number of extruding CMs in treated snai1b mutants compared with control (Figure R2), while when we treated wild-type embryos with the ErbB2 inhibitor, we observed an increase in CM extrusion (data not shown). In addition, previous cell transplantation studies (Liu et al., 2010) analysed wild-type hearts with a few erbb2 mutant CMs and did not report CM extrusion. Altogether, these data suggest that increased CM extrusion in snai1b mutants is not caused by increased CM proliferation or defective EMT. However, additional analysis using tools that specifically block CM proliferation versus EMT will be needed to further investigate these interesting questions.

      The authors show that extruding cells do not seem to be dead or dying, and that a small number of CMs do extrude in wild type embryos. This raises the intriguing possibility that some amount of CM extrusion is necessary for normal development and that these cells may give rise to epicardial or other cell types. Live-imaging and lineage-tracing studies would inform whether the extrusion observed in mutant embryos is an enhancement of a normal morphogenetic process or an additional abnormal response to loss of Snai1 function.

      We thank the reviewer for these interesting comments. We have now included a time-lapse spinning disk movie of the snai1b mutant hearts from 52 to 70 hpf, in which we observed CMs outside of the myocardial wall and they remained in the pericardial cavity for several hours (see our rebuttal to Major Point #2 in reviewer 2’s comment).

      To test the hypothesis that CM extrusion is a normal process that gives rise to other cell types outside the myocardial wall, we performed a lineage-tracing experiment in wild-type embryos. We treated Tg(myl7:creERT2);(-3.5ubb:loxP-EGFP-loxP-mCherry) embryos with tamoxifen from 24 to 72 hpf, and imaged the larvae at 96 hpf. We found no switched (EGFP to mCherry) CM-derived cells in the pericardial cavity (Figure R3). From our lineage tracing analysis, we believe that the extruded CMs do not contribute to other cardiac cells. Our time-lapse movies also show that in snai1b mutants, extruded CMs are not attached to and are positionally distant from the heart (Figure 1 – figure supplement 1I-K), thereby indicating that it is unlikely the extruded CMs give rise to epicardial or other cardiovascular cells.

      One particularly interesting observation that was left unexplored was the identification of a second class of Snai1b mutants with defective heart looping. It isn't clear whether these embryos also display enhanced CM extrusion, or if there are other clearly aberrant cell behaviors. Furthermore, it would be very interesting to know whether there is any evidence that the defective looping is due to the same changes in cytoskeletal gene expression and protein organization observed in the class of Snai1b mutants that were detailed throughout the manuscript.

      We thank the reviewer for these comments. We have now examined in more detail CM morphology in unlooped snai1b mutant hearts and have included some quantification in the revised manuscript. We found that the number of extruding CMs is similar in all snai1b mutants regardless of the looping phenotype. These new data are included in Figure 1 – figure supplement 1C-E. It will indeed be interesting to investigate whether changes in cytoskeletal gene and protein expression are also an underlying cause of the cardiac looping phenotype in snai1b mutants.

      The authors suggest that Snai1b regulates Desmin in two ways: 1) overall expression levels, and 2) post-translationally to control its localization at cell junctions. Although the first claim is sufficiently supported, the second claim lacks experimental evidence. An alternative explanation is that overexpression of Desmin in response to loss of Snai1b leads to mislocalization independent of an interaction with Snai1b. This point could be clarified by examining Desmin localization in the desmb overexpression system. In addition, assaying for co-IP of Snai1b and Desmin could demonstrate a direct interaction between the two and better support a role for Snai1 in regulating post-translational localization of Desmin.

      We thank the reviewer for these comments. We have now performed anti-Desmin immunostaining in desmb overexpressing CMs, and found that Desmin is enriched basally. This result suggests that an overabundance of Desmin can lead to its basal enrichment. However, whether Snai1b and Desmin interact at the protein level will need additional tools and analyses, and thus we have removed the corresponding sentence from the manuscript.

      Although the authors convincingly show that Desmin accumulates with other contractile machinery at the basal side of extruding CMs in Snai1b muntants, additional evidence is needed to support a causal link between basal Desmin accumulation and extrusion. For instance, if knockdown or inhibition of Desmin prevents extrusion in the Snai1b mutants, the causal relationship would be much clearer.

      We thank the reviewer for this suggestion. We used a desmb ATG morpholino to knock down desmb in snai1b mutants. Although we used a low concentration of the morpholino (0.5 ng), we surprisingly observed an increased number of extruding CMs in both wild types and snai1b mutants compared with standard control morpholino injections (Figure R4). We hypothesize that the right balance of Desmin expression is needed to preserve myocardial wall integrity; too much or too little of Desmin increases CM extrusions. However, we cannot exclude the possibility that the effects observed are due to off-target effects of the morpholino. Due to these uncertainties, we did not include the desmb morpholino data in the manuscript.

    1. Author Response:

      The authors thank the editors and reviewers for their thoughtful review. We focus our response on three points the editors considered essential revisions:

      The authors will work on the following: 1) Providing direct evidence that DENV sfRNA from mosquito saliva is delivered into saliva-exposed human cells. 2) Although the effect of sfRNA has been well established in many cell types, we will test effects on human cells found in the skin.

      We expressed disagreement over a third request to test the outcomes of high sfRNA:gRNA to that of low sfRNA:gRNA infection in mice. The mouse model is not an appropriate model to test an anti-innate immune mediator in dengue and indeed results obtained with this model would not be biologically relevant. We are pleased that the editors have agreed with this rebuttal.

      We believe that the data presented already are very exciting and deserve publication in eLife and by further enhancing these with the first two revisions we believe the paper will be an even stronger contribution.

    1. Author Response:

      Reviewer #1:

      After infection, new HIV-particles assemble at the host cell plasma membrane in a process that requires the viral protein Gag. Here, Inamdar et al. showed that a component of the host cell, the membrane curvature-inducing protein IRSp53, contributes to efficiently promote the formation of viral particles in synergy with the viral Gag protein.

      In cells depleted of IRSp53, the formation of HIV-1 Gag viral-like particles (VLPs) was compromised. The authors showed in compelling electron micrographs that the formation of VLPs was arrested at about half stage of particle budding. Biochemical data (co-IPs and analysis of VLPs and HIV particle content), super-resolution nanoscopy (single molecule localization microscopy) data, and in vitro biophysics measurements (in GUVs), all seem to indicate a functional connection between Gag and the iBAR-domain containing protein IRSp53. The combination of the different techniques and approaches is a clear strength of this manuscript. However, to my opinion, the interpretation of some of the experimental data is somehow limited by the lack of some appropriate controls (that are lacking for different reasons, as the authors state in some parts of the text). These are:

      1) Specificity of the IRSp53 siRNA. Although the authors showed that the siRNA used can deplete the expression of the protein (both endogenous and ectopic), they did not presented any rescue experiments of the phenotypes (or corroboration with different siRNA oligoes).

      We have tried several different commercial and home-designed siRNA targeting IRSp53 from different companies (providing single siRNA and multiple siRNA mix): we have summarizing all in the Figure R1 (see below). One can see that indeed only 2 siRNA were effective in extinguishing IRSp53 gene: one from Invitrogen on endogenous IRSp53 and ectopic IRSp53-GFP and one from Dharmacon that was only effective on ectopic IRSp53-GFP, as revealed by Western Blot (Fig R1A). Furthermore, the specificity of the siRNA was challenge by testing siRNA IRSp53 on human IRSp53-GFP and on mouse I-BAR-GFP in HEK293T transfected cells and visualized by fluorescence microscopy. Results show in figure R1B that only siIRSp53 is able to extinguished human IRSp53-GFP and not mouse I- BAR-GFP. SiIRTKS and siCtrl are not extinguishing any of these genes. Overall these results confirm the specificity of IRSp53 siRNA-mediated knockdowns.

      Figure R1: Specificity of siRNA-mediated knockdowns: (A) Western blots of HEK293T cells lysates probed with anti-IRSp53 antibody (and house-keeping gene GAPDH) showing a series of different siRNA IRSp53 (and siRNA Control, CTRL from Invitrogen, Dharmacon or Sigma) on endogenous and ectopic IRp53 genes in human HEK293T cells and their efficacy in specifically down regulating IRSp53. (B) siRNA IRSp53 from Invitrogen was tested for its specificity in extinguishing human IRSp53-GFP protein expressed in transfected HEK293T cells, but not mouse I-BAR-GFP, and as compare to siRNA control and IRTKS, revealed by fluorescence imaging (GFP).

      To further answer the reviewers’ comments, we also perform one rescue experiment of the phenotype as shown in Figure R2 below. We observed that, upon co-transfection of pGag+pIRSp53- GFP+siRNA IRSp53 (lane 2), about 50% of the ectopic IRSp53-GFP was extinguished (since this construct is not siRNA resistant), leaving 50% of this ectopic protein expressed in the cells. In this context, one can observe that Gag-VLP release is ~50% (lane 2), similar to the condition pGag+siCTRL (lane 3). When we compare this to pGag+siIRSp53 (lane 4) which is reduced by 2-3 fold (data from Figure 1b of the manuscript), we can say that the remaining IRSp53-GFP in the Lane 2 seems to rescue the defect caused by extinction of the endogenous IRSp53. In the condition pGag+pIRSp53- GFP +siCTRL, VLP-Gag release was slightly reduced. This is an atypical rescue experiment since we do not have an IRSp53-GFP that is resistant to the siRNA IRSp53 used in this study (Figure R1B), but it suggests that if IRSp53-GFP is overexpressed in the presence of Gag and the siRNA IRSp53, VLP-Gag release is at a normal 50% level in contrast to the absence of IRSp53-GFP (compare lane 2 with lane 4). Unfortunately, due to limited time and by the siRNA IRSp53 out of stock, and the delay in supply, we could only provide one experiment. We thus decided to show it for answering the reviewers but not as part of a figure in the final manuscript.

      Figure R2: Rescue of siRNA IRSp53 knock-down with overexpression of IRSp53-GFP: 293T cell were transfected with pGag, pIRSp53 and siRNA control (siCTRL, lane 1) or siRNA IRSp53 (lane 2); cell lysat and VLP wre loaded on SDS-PAGE gels and immunoblots were revealed with anti-GFP (for IRSp53-GFP) and anti-CAp24 (for HIV-1 Gag). One graph on the left shows the percentage of IRSp53-GFP expression upon siRNA IRSp53 cell treatment (lane 2) as compare to the siRNA CTRL (lane 1). The graph on the right shows the resulting gel quantification for the % of Gag-VLP release upon siRNA IRSp53 cell treatment (lane 2) as compare to the siRNA CTRL (lane 1) in the presence of IRSp53-GFP over-expression, or without (lane 3 and 4, as in Figure 1b). N=1 rescue experiment.

      2) In the co-IPs (IRSp53 IP + Gag co-IP) there is no assessment of the IRSp53 IP efficiency in the different conditions. The authors argued that IgG signal masking precluded them from doing that.

      See the new figure 2. In the new figure 2b, we have assess the IP/co-IP of IRSp53-GFP/Gag efficiency by adding a complete experiment showing that an anti-GFP is able to pull down IRSp53- GFP very efficiently (lanes 2 and 3) and co-IP Gag efficiently (lane 3) accordingly to the input and remaining flowthrough. Using IRSp53-GFP and an anti-GFP antibody, we could bypass the IgG signal masking the endogenous IRSp53 with the IRSp53 antibody’s IP.

      3) The authors observed an increase in the membrane-bound pool of IRSp53 when Gag is present (Fig. 2c). It is not clear whether this is specific for IRSp53 or other IBAR proteins can also be more membrane-bound as a result of Gag expression.

      See the new figure 2. In the new figure 2d, we have re-loaded all the gel fractions on new SDS- PAGE gels and probed the corresponding immunoblots for Gag, IRSp53, IRTKS, Tsg101 and the cellular markers, Lamp2 (for membrane fractions) and ribosomal S6 protein (for cytosolic fractions). One can see that after quantification of the IRSp53 versus IRTKS bands in the HEK293T cell control and in the Gag expressing cells, only IRSp53 is increasing at the cell membranes upon Gag expression and not IRTKS.

      Reviewer #3:

      Inamdar et al. used biochemical and microscopy assays to investigate the role of I-BAR domain host proteins on HIV-1 assembly and release from HEK 293T and Jurkat cells. They show that siRNA knockdown of IRSp53, but not a similar I-BAR domain protein IRTKS, inhibits HIV-1 particle release from 293T cells after transfection of the HIV-1 provirus or HIV-1 Gag in cells. The authors then show that HIV-1 Gag associates with IRSp53 in the host cell membrane and cytoplasm, using biochemical assays and super resolution microscopy. In addition, IRSp53 is incorporated into HIV-1 particles along with other previously identified host proteins. Then using in vitro-derived membrane vesicles ("giant unilamellar vesicles" or GUVs), the authors indicate that HIV-1 Gag can associate with IRSp53, particularly on highly curved structures.

      The conclusions are largely supported data, with the virology and biochemical results being particularly strong, but the mechanistic studies in GUVs appear somewhat preliminary and are not entirely clear. The GUV experiments would benefit from better quantification of measurements and manipulation to simulate actual cellular scenarios. In addition, while it is appreciated that the HEK 293T cell line is convenient for biochemical and imaging studies, they are not biologically relevant HIV-1 target cells. While the authors present examples of reproducibility of their results in a CD4+ T cell line, these data are buried in the supplemental figures, whilst it would have been better to highlight them and perhaps include primary CD4+ T cells.

      1) Immortalized cell lines do not always recapitulate primary cells. It is unclear what the role of IRSp53 is in the membrane curvature of CD4+ T cells and whether expression levels and localization are consistent with Jurkat T cells.

      Please consider the general responses to the Editors, which is:

      We have published that IRSp53 (using siRNA) is involved in HIV-1 particle release on primary T cells (PBMC derived T cells) in Thomas et al, JVI 2015, so high probability is that it would be the same in different cell type, transfected HEK293T cells, transfected or infected Jurkat T cells and infected primary T cells. But we have not done the extensive super-resolution microscopy on infected primary T cells because this would require time overconsuming study. We are currently proceeding in setting up condition with an infectious HIV-1 virus carrying mEOS2 photoactivable protein for being able to infect primary T cells and go on for further research using infectious relevant system and super- resolution microscopy, but it is not ready for this current manuscript as it would require months of extra work and experiments.

      Although, we agree with the reviewer #3 that the localization of Gag in Jurkat T cells and in primary CD4 Tc cells is different at the cell level (in primary T cells HIV-1 Gag is more polarized at uropods, as referred in the literature – see for an example Bedi et al/Ono’s Lab), but at the nanoscopic level of the budding sites, chances are that it would be similar but it need to be checked in future studies.

      2) Description of some of the microscopy measurements could be improved. In lines 204-206 of the text and Figure S5, it is unclear how the localization of precision was determined to be approximately 16 nm for PALM-STORM.

      These lines have been changed in the main text as they were not mandatory to understand how we determine the size of the VLP clusters. However, we have now detailed in figure S5 how we measure localisation precision.

      The following text has been added to the legend of the figS5:

      “Distribution of localisation precisions for PALM (in green) or STORM (in red) as given by Thunderstorm analysis in Fiji : Localisation precision distribution exhibit maxima at 16 nm and a mean±sd value of 20±5 nm for PALM, and a maxima of 26 nm, corresponding to a mean±sd value of 27±10 nm for STORM. The localization precision is obtained by eq 17 of (Thompson et al., 2002).”

      As well as the reference of the original paper (Thompson et al. 2002, Biophysical Journal).

      In Figure 4b, it is understood from the text (lines 252-256) that the red bars denote the Mander's coefficient for colocalization of the GFP-tagged proteins with Gag-mCherry (presumably the average of multiple experiments with standard deviations or errors of the mean, although this is not stated in the figure legend), it is unclear what the green bars are showing.

      Yes, the red bars denote the Mander's coefficient for colocalization of the Gag-mCherry with the GFP-proteins, and the green bar denote for colocalization of the GFP-tagged proteins with Gag- mCherry, showing for more than 300 green and red vesicles, thant indeed all the Gag-VLP are green in the case of IRSp53-GFP (red bar) but that not all the GFP-IRSp53-GFP “green” vesicles are (+) for Gag: this indicates that vesicles produced by transfected HEK cells produced GAG/IRSp53 VLP but also IRSp53-GFP vesicles. Thanks to the reviewer to point this out. We added the explanation in the main text (page 12, lanes 272-282) and in the figure legend of Figure 4b.

      Also, the histograms for IRSp53 and IRTKS colocalized with Gag look similar in Figure S10, suggesting that they are not different in Jurkat cells, but this is not addressed.

      Yes. We have now addressed this particular point in the global response to the reviewers. Indeed, the figure 3 and 4 were remodelled into new figure 3 showing, in the same figure, HEK and Jurkat cells results and in figure 4 the simulations results. Overall, the PALM/STORM microscopy analysis results on Gag/IRSp53 colocalization are very similar in both cell types.

      3) GUVs are first referenced on page 7 after description of Figure 2, the significance of which is confusing to the reader. However, the actual experimental data are described on pages 12-13 and Figures 5 and S11. A better description of these structures would be warranted for an audience that is unfamiliar with them. In addition, the biologic concentrations of I-BAR proteins at cell membranes are not provided and it is unclear what conditions used in Figures 5 and S11 represent a "normal CD4+ T cell" situation. It appears that the advantage of this in vitro system is that different factors can be provided or removed to simulate different cellular scenarios. For example, relatively low IRSp53 concentrations may simulate siRNA knockdown experiments in Figure 1, which could recapitulate those results that less viral particles are released from the membrane. In addition, the authors state that HIV-1 Gag preferentially colocalizes with IRSp53 as the tips of the GUV tubular structures (Figure 5b,c), but this is not actually shown or quantified. Similar quantification as shown in Figure 1e could be performed to strengthen this argument.

      We thank the review for pointing this out. We now described all the GUV result in section 5.

      Considering the biological concentrations of I-BAR proteins in cells, to the best of our knowledge, there is no measurement of it. We thus could not relate concentrations used in the GUV experiments with those in cells.

      We could not perform quantification as in Figure 1e because the majority of the tubes in GUVs were moving too rapidly, preventing us from acquiring images with higher spatial resolution (see Fig. S11, and Movie 2 and 3). However, we would like to point out that the Gag signals appeared dotty inside GUVs (see Fig. S11, and Movie 2 and 3), which is very different from the signals of I-BAR that are clearly along the tubes (see Fig. S10c). Moreover, for tubes that were not moving too fast, we found that for all the tubes (17 tubes), Gag signals are exclusively located at the tips of the tubes (see new Fig. 6d). Also, the sorting maps shown in Fig. 6c and Fig. S10 d indicate the relative accumulations of Gag at the tips of the tubes. To make it clearer that the Gag signals were located at the tips of the tubes, in the current manuscript, we have added the new Fig. S11, Movie 1, 2 and 3, and included zoom-in images in Fig. 6b, 6c and a new Fig. 6d. Also, we have included the quantitation results (17 tubes) in the manuscript.

    1. Author Response:

      Reviewer #2:

      Recommendations:

      A) In the section 'Electrophysiological characterization of TcMscS', the authors present compelling evidence that TcMscS gates in response to tension in the membrane. However, it is unclear, both in the text and the caption, if the trace shown in Figure 2 panel C was collected under tension. If it was, please include the applied pressure value in either the text or caption. Additionally, within this section the applied pressure to the patch is frequently unclear. One way to clear this up would be to 1- add the applied pressure to each trace or to 2- add the applied pressure for each patch to the figure caption. -In Panel E: can you comment on the conductance of the channels in the three traces? Why do you see channels that are approximately 1/2 the size of the first trace in the second two traces?

      Figure 2 has been corrected, the pressure value for Panel A is denoted below the panel and only one scale bar is now shown in Panel E. The legend for the figure was re-written to assign the pressure ramp to panels C and D.

      B) In the section 'TcMscS gene targeting by CRISPR-Cas9' the authors utilized CRISPR to KO TcMscS to determine its function, based on the immunofluorescence and qPCR TcMscS has been successfully knocked out. In lines 251-264, the authors complemented the KO with an overexpression vector in an attempt to confirm the role of TcMscS. In this section, it is very unclear what strains C1 and C2 are and how they are different from one another. Neither of these constructs successfully restores the growth rate. The authors can clarify the differences between the two constructs or they can remove this section from the manuscript, particularly Figure 5 supplement 3. The manuscript is strong and compelling without this panel.

      We agree with the reviewer that, in this case, the complementation did not fully revert the KO phenotype, but instead had a detrimental effect, probably due to the overexpression of functional channels. We decided to include the data to explain why we could not verify the reversion of the phenotype. We believe is important to share this information with the community, since often negative results are excluded from manuscripts and, in this case, could be useful for other groups to have a precedent on the toxic effect of overexpressing channels in T. cruzi.

      C1 and C2 correspond to two different clones of the complementation with TcMscS-myc tagged. We show both clones to illustrate the consistency of the phenotype of the complemented strains. A clarification of C1 and C2 clonal nature has been indicated in the manuscript (lines 267-268).

    1. Author Response:

      Reviewer #1 (Public Review):

      The study by Hendley et al takes advantage of duct-specific DBA-lectin expression to purify pancreatic ductal populations that were then subjected to scRNA-seq analysis. The ability to enrich for this relatively low abundant pancreatic cell population resulted in a more robust dataset that had been generated previously from whole pancreas analyses. The manuscript catalogs several different gene clusters that delineate heterogeneous subpopulations of three different pancreatic ductal subpopulations in mice: mouse pancreatic ductal cells, pancreatobiliary cells, and intra pancreatic bile duct cells. Additional comparisons of the resulting data sets with published embryonic and adult datasets is a strength of the study and allows the authors to subclassify the different ductal cell populations and facilitates the identification of potentially novel subpopulations. Pseudotime analysis also identified gene programs that led the authors to speculate the existence of an EMT axis in pancreatic ducts. Overall, the data analyses is strong, but the authors tend to draw conclusions that are not fully supported by the presented data.

      The second half of this study focuses on three candidate proteins that were identified in the transcriptome analysis - Anxa3, SPP1 and Geminin. Crispr-Cas9 was used to delete each gene in an immortalized human duct cell line (HPDE). Deletion of each gene resulted in increased proliferation; SPP1 mutant cells also displayed abnormal morphology. Additional functional studies of the cell lines or in mouse models suggested a role for SPP1 in maintaining the ductal phenotype and Geminin in protecting ductal cells from DNA damage, respectively. Although the provided phenotypic analysis suggest important functional roles for these proteins, follow up studies will be required to fully understand the role of these genes in homeostatic or cancer conditions.

      Strengths:

      1) Enrichment of pancreatic ductal populations enhanced the robustness of the scRNA-Seq dataset

      2) Quality of the sequencing data and extensive computational analysis is extremely good and more comprehensive than previously published datasets

      3) Comparative analysis with existing mouse and human data sets

      4) Use of human ductal cell lines and mouse models to begin to explore the function of candidate ductal genes.

      Weaknesses:

      1) There are many suppositions based on gene expression changes that are somewhat overstated.

      2) The conclusion that there is an EMT axis in pancreatic ducts is not fully supported by the gene expression and immunofluorescence data

      3) A good rationale for choosing Anxa3, SPP1 and Geminin for additional functional analysis is not provided. In addition, it isn't clear why Anxa3 function isn't pursued further.

      4) Although extensive models (transplanted cells for SPP1 and mouse conditional KOs for Geminin) were generated, the functional analysis for each gene is preliminary; additional longer term studies will be necessary to fully understand the role of these proteins in pancreatic duct development and cancer.

      We would like to thank the Reviewer for their fair and thoughtful review of our manuscript. We agree with the comments and have addressed them as described in detail below. In particular, we have focused on streamlining the presentation and description of our bioinformatic analysis, providing additional rationale for using the particular genes we focused on in the follow-up analyses, and including additional data to support the EMT axis.

      Reviewer #2 (Public Review):

      In this study the authors address the heterogeneity of the mouse ductal cell at the single cell level and conduct functional studies for selected marker genes. They isolated duct cells using the DBA lectin as a molecular surface marker. This is an noteworthy approach as it does not rely on the specificity and expression levels of reporter lines. Isolated cells contained a majority of non-duct cells that were identified by their transcriptomic profile and excluded from further analysis. The transcriptomic profiles of bona fide duct cells were then subjected to standard analyses for differentially expressed genes, activated pathways and lineage relationships. Of particular interest is the comparison of these data with human data from a recently published study that used a different sorting strategy for duct cells. As more studies at the single cell level are conducted, these types of comparisons need to become part of them in order to derive commonalities and identify deficits due to methodological or technological limitations. The study was by necessity descriptive up to this point and the authors addressed this with functional studies on SPP1 and GMNN which suggested that SPP1 is necessary for the maintenance of the ductal differentiated phenotype whereas GMNN protects cells against DNA damage during increased proliferation triggered by chronic pancreatitis.

      It is an interesting study, but there are caveats, particularly concerning the functional studies. The functional analysis of SPP1 needs to be strengthened and some findings on the the analysis of GMNN clarified. There is also an over reliance on the outcome of pathways analyses and upstream regulators which are often treated as actual findings rather than possibilities to be explored in this or future studies. The single cell RNA Seq analysis would benefit from reducing speculation and restrict descriptions to the essential features of each cluster. Main figures for this analysis could also be simplified along the same lines.

      We thank the reviewer for appreciating our study as “interesting” and for considering our investigations as a “noteworthy approach”. We are glad that the reviewer acknowledges our efforts in delivering a manuscript with necessary descriptive bioinformatics analysis followed up with functional studies for select subpopulation markers. Conversely, we took the constructive criticism seriously and added new data to further substantiate our claims.

      Reviewer #3 (Public Review):

      In this study, the authors present a high-resolution single-cell transcriptomic atlas of the pancreatic ductal tree. Using a DBA+ lectin sorting strategy murine pancreatic duct, intrapancreatic bile duct, and pancreatobiliary cells were isolated and subjected to scRNA-seq. Computational analysis of the datasets unveiled important heterogeneity within the pancreatic ductal tree and identified unique cellular states. Furthermore, the authors compared these clusters to previously reported mouse and human pancreatic duct populations and focused on the functional properties of selected duct genes, including Spp1, Anxa3 and Geminin. Overall, the results presented here suggest distinct functional roles for subpopulations of duct cells in maintenance of duct cell identity and implication in chronic pancreatic inflammation. Finally, such detailed analysis of the pancreatic duct tree is relevant also in the context of cancer biology and might help elucidating the transition from pancreatitis to pancreatic cancer and/or different predisposition to cancer.

      The study is very well done, with careful controls and well-designed experiments.

      We thank the reviewer for appreciating our study as “very well done” as well as envisaging the potential relevance of our findings to cancer biology.

    1. Author Response:

      Reviewer #1 (Public Review):

      We thank the Reviewer #1 for their valuable comments. We agree with the Reviewer that our current results are not sufficient to confirm the therapeutic effects. The statement related to therapy is removed.

      The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. Endometrial cells for patients with and without fibrosis were subjected to expression profiling analysis, and circPTPN12 and miR-21-5p were strongly separate in fibrosis in endometrial, with circPTPN12 acting as an inhibitory factor for miR-21-5p. Through the use of various molecular approaches, the authors further that miR-21-5p inhibition results in upregulation of ΔNp63α, and transcription factor that induces EMT. The role of circPTPN12 was also confirmed in vivo using a mouse model of mechanically induced endometrial fibrosis. The authors concluded that targeting the path circPTPN12/miR-21-5p/∆Np63α may be a therapeutic strategy for endometrial fibrosis.

      The authors clearly and convincingly show the involvement of the circPTPN12/miR-21-5p/∆Np63α in EMT and its potential involvement in endometrial fibrosis. Whether or not this can be a therapeutic target is too preliminary at this point. First because the in vivo experiments confirm the link between circPTPN12/miR-21-5p/∆Np63α at the RNA level only (p63) and it would be more convincing to see protein data as well.

      We did try to detect the protein of ΔNp63α in mouse with immunochemistry and immunofluorescence, using three antibodies (CST, cat# 67825 and 39692; Abcam, ab124762). Unfortunately, we did not obtain positive results. However, ΔNp63α mRNA was significantly changed.

      The involvement of p63 in the process remains a little elusive in this paper.

      We have reported that ΔNp63α is ectopically expressed in endometrial epithelial cells in IUA patients (Cao et al., 2018), and showed that ΔNp63α promotes the expression of SNAI1 by DUSP4/GSK3B pathway and induces EECs-EMT and fibrosis (Zhao et al., 2020). We've put this description of ΔNp63α in the discussion section (2nd paragraph).

      In addition, if the authors believe this pathway can be a real future target to treat endometrial fibrosis, they could better contextualise such a statement, specifically describe what kinds of therapeutic intervention they think of, like regression or prevention of fibrosis. These should be tested in vitro and in vivo.

      Our results showed that replenishing miR-21-5p can reverse EMT and remit endometrial fibrosis in vivo and in vitro. However, the therapeutic intervention of miR-21-5p in clinic needs more research on other animal models such as rats, pigs, and non-human primates. Thus, we removed therapeutic statement (page 1, Line 1-2; and page 2, Line 37-40; and page 4, Line 74-76; page 13, Line 273).

      More evidence of the involvement of circPTPN12/miR-21-5p/∆Np63α and the correlation between the three players using clinical material is also necessary.

      The involvement of ∆Np63α in endometrial fibrosis has been proved in our published paper and results are quoted in this paper (Zhao et al., 2020). The correlation between circPTPN12 and miR-21-5p using clinical material was listed in Figure 2J. In vivo and ex vivo experiments had confirmed that overexpression of circPTPN12 downregulates miR-21-5p and upregulates ∆Np63α (Figure 3H/Figure 4J/ Figure 5B/ Figure 5E). In addition, ex vivo experiments suggested that the decrease of ∆Np63α is secondary to the increase of miR-21-5p (Figure 4C-E).

    1. Author Response:

      Reviewer #1 (Public Review):

      The dark structure of GtACR1 has been almost simultaneously published at the end of 2018 and beginning of 2019 by the Deisseroth and Spudich groups, respectively. Both groups did not manage to solve a structure with an ion bound and there is very limited information on the open conformation of the channel. Both groups identified a central constriction site as being central for the gating mechanism but the Spudich group proposes two additional constrictions (C1 and C3). In this work Li et al are able to solve the structure of a GtACR1 with a bromide bound near C3, which clearly represents a significant step towards understanding the mechanism of light gated anion channels. The structure reveals that Br binds to the intracellular constriction site (C3) resulting in a small opening of C3. The data support the notion that the partial electropositivity of Pro58 together with two tryptophans play a critical role in anion interaction at C3, which was also confirmed by mutagenesis studies. In addition, there was a noteworthy conformational change in the Bromide bound protein in the extracellular constriction (C1), a 180 degree flip of Arg 94 resulting in a salt bridge to Asp 234 and a slight opening of the C1 constriction.

      While the data and conclusions are sound, the lack of discussion of their data in the context of the work of others is a bit surprising.

      We thank the reviewer for thorough reading of our submission and constructive criticism, which helped us to improve the quality of our manuscript. As requested, we added the following paragraph at the end of the Results section (lines 219-233):

      “Studies in 3 different laboratories have concluded that Asp234 is neutral in the dark state from measurements of the D234N mutant of GtACR1 by UV-vis absorption spectroscopy (Kim et al., 2018; Sineshchekov et al., 2016), Resonance Raman spectroscopy (Yi et al., 2016), and FTIR (Kim et al., 2018). Both studies of independently determined crystal structures of GtACR1 attribute the major component of its neutralization to hydrogen-bonding to Tyr207 and Tyr72 (Kim et al., 2018, Li et al., 2019), leaving open partial electronegativity of Asp234 participating in hydrogen-bonding to the protonated Schiff base (PSB). The Asp234 residue is expected to be functionally important given its proximity to the PSB and its nearly universal conservation in microbial rhodopsins. Kim et al (Kim et al., 2018) conducted an extensive analysis of Asp234 and report that the D234N mutation nearly abolished photocurrents. Reduced photocurrents to 20% of wild-type from the D234N mutation were also observed by Sineshchekov et al. (Sineshchekov et al., 2015). Differences in extent of photocurrent reduction are likely attributable to different assay conditions used in these studies. The electrostatic interaction of Arg94 with Asp234 in the pre-activated state may be correlated with the change in the electron conjugation of the retinylidene polyene chain in the dark that we observed by FTIR.”

      Reviewer #2 (Public Review):

      In the manuscript entitled "The Crystal Structure of Bromide-bound GtACR1 Reveals a Pre-activated State in the Transmembrane Anion Tunnel", Li et al. analyzed the effect of bromide binding to GtACR1 by X-ray crystallography and electrophysiology. The authors propose that a bromide ion is bound to the intracellular pocket in the dark, inactivated state and induces a structural transition from an inactivated to a pre-activated state.

      I agree that some of the amino acid residues in the current crystal structure change their conformations compared to the previous one reported in 2019 (Li et al., 2019), and it is very impressive that the authors determined the structure using state-of-the-art crystallography technique, ISIMX. However, unfortunately, most of the conclusions and claims described in the manuscript are not well supported by the authors' data.

      1) The most serious problem is that the evidence of bromide binding is too weak. The authors showed the composite omit map in Supplementary Figure 1A, but they should present an anomalous difference Fourier map to validate the bromide binding. The authors also claim that they replaced the bromide ion to the water, run the PHENIX refinement, and observed a strong positive electron density at the bromide position in the Fo-Fc difference map (Supplementary Figure 1B). However, when I do the same thing using the provided coordinate and map (I really appreciate the honesty and transparency of the authors), I could not reproduce their result; a weak positive electron density is observed between the bromide position and Pro58 in chain A and there is no positive peak at the position in chain B (Fo-Fc, contoured at 3σ). I am wondering the occupancy and B-factor of the water molecule they show in Supplementary Figure 1B.

      We appreciate the reviewer’s effort in analysis of our structure. As described in the Discussion section (lines 238-248), the identification of bromide is supported by multiple lines of evidence: (1) the composite omit map indicates the presence of bromide at the cytoplasmic port (Suppl. Fig. 1A-1B); (2) we exclude the possibility of a water at the bromide position as demonstrated in the Fo-Fc difference map (Suppl. Fig. 1C-1D); (3) the bromide binding site exhibits a similar chemical conformation seen in chloride-binding structures (Auffinger et al., 2004); (4) functional analysis of W250F and W246F are consistent with the H-bond interaction in the bromide binding site (Fig. 2B); (5) Specific interaction of GtACR1 with bromide in the dark state was further demonstrated by FTIR analysis (Fig. 3). Differences in major bands that reflect the ethylenic (C=C) stretch mode of the retinylidene chromophore show a large bromide-induced alteration in the electron conjugation of the retinylidene polyene chain in the dark, confirming that bromide causes a significant structural change. In sum, these data confirm the bromide binding conformation in the structure.

      We agree with the reviewer that the signal of bromide in chain A is stronger than in chain B. We now address the difference throughout the main text and Suppl. Fig. 1. The datasets were collected at 0.91882 Å wavelength, but we did not detect any strong bromide signals in the anomalous difference Fourier map. This may be due to preferential orientation of the thin-plate GtACR1 crystals in the IMISX plate. The weak Br signals may also be attributed to the weak bromide binding conformation, its partial occupancy, and poor intrinsic order. It is not unusual that anomalous signals are influenced by the location of the scatter. For example, in our previous structural determination of YfkE (Wu, PNAS 2013), Seleno-methionine was used to label 12 native Met residues. However, we could identify only 10 Se positions and the other 2 Se were undetectable in the anomalous difference map, despite the dataset collection at the Se absorption peak wavelength. Therefore, the lack of strong anomalous signals does not exclude the presence of bromide in the structure.

      Regarding the reviewer’s question, the occupancy of the water is 1 and its B-factor is 71.

      In addition to the insufficient evidence, the current models of bromide ions have significant steric clashes. The PDB validation report shows that the top 5 serious steric clashes observed in the coordinate are the contacts between the bromide ions and surrounding residues (PDB validation report, Page 10). I analyzed them and found that the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are only 2.43Å and 2.36Å, respectively. The authors claim that such a close proline-halide interaction has also been observed in the structure of the chloride-pump rhodopsin CIR, but in the structure (PDB ID: 5G28), the distances between the chloride ion and CD and CG atoms of Pro45 are much larger (3.43 and 3.91Å, respectively) and there is no steric clash. Moreover, the authors claim that Pro58 changes its conformation by bromide binding, but it is very possible that the PHENIX program just displaces Pro58 to alleviate the steric clash between the proline and the bromide ion, so the authors should carefully check the possibility.

      Overall, the authors should analyze the density again, provide more solid evidence for the bromide binding such as anomalous difference Fourier map, and if they could, they should correct the current significant steric clashes in their models.

      We thank the reviewer for pointing out the steric clashes. We have corrected them in the revised structure as demonstrated in the latest validation report. As described in the Results section (line 107-109), the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are now 3.6 Å and 3.1 Å (see the updated structure pdb), respectively, and the distance between the bromide ion and CG and CD atoms of Pro58 in chain B are 4.0 Å and 3.2 Å, respectively, similar to those distances between the chloride ion and CD and CG atoms of Pro45 in ClR (3.43 and 3.91Å, respectively). These modifications do not alter the structure beyond the local binding site of the bromide, and do not change our conclusions.<br> We do not agree that the Br--induced conformational changes are due to the refinement program. To further confirm the Pro58 position, we have performed a refinement by removing Pro58 and adjacent residues using PHENIX. The resulted electron density map shows a positive electron density at the Pro58 position, confirming the conformational changes induced by bromide binding.

      2) To analyze the functional importance of putative bromide binding, the authors prepared W246E and W250E mutants and analyzed their electrophysiological properties. Because tryptophan and glutamate are so different in terms of volume and charge, they should analyze other mutants as well. The authors claim that bromide is stabilized by a hydrogen bond interaction formed by the indole NH group of W246, so they should at least test the W246F mutant.

      We thank the reviewer for this important suggestion, which helps confirm the bromide binding conformation. The glutamate substitutions were chosen to assess the specific anion selectivity and conductivity of GtACR1 due to the negative charge of its side chain. We now include the data of W246F and W250F in Fig 2B. W250F shows reduction of the current amplitude by 50%, whereas W246F behaves like WT. These results are consistent with the structural observations in which W250, but not W246, stabilizes bromide via H-bond interaction. These results are provided in the Results section (lines 136-142) and in the revised Fig. 2B.

      3) The authors claim that the bromide binding in the intracellular pocket induces the conformational change of R94, but the causal relationship is doubtful. As mentioned in the manuscript, R94 forms a salt-bridge with D234 in chain A. However, the arginine has a completely different conformation and does not have any interaction with D234 in chain B. If the bromide binds both in chain A and B and induces the conformational change of R94, why only R94 in chain A interacts with D234? The authors change the pH in the crystallization condition compared to their 2019 study (Li et al., 2019), so the pH may affect the protonation state of D223 and/or other titratable residues and induces the conformational change of R94. The authors should provide more solid evidence for the causal relationship between the bromide binding and the conformational change of R94.

      We did not change the pH in the crystallization condition compared to our previous crystallization of GtACR1. Both structures were obtained at pH 5.5 as noted in the manuscript. In our structure, the only bromide binding site was identified near C3 and no bromide was found at C1. We address this result in Discussion (lines 276-286) as follows:

      “The conformational change of Arg94 near C1 is not likely to be directly induced allosterically by bromide binding at distant C3 since it is only observed in chain A, not in chain B. Instead, this conformational change may reflect the intrinsic flexibility property of Arg94 in the tunnel in the bromide-bound state. Although both Arg94 of GtACR1 (in chain A) and Arg95 of CIR adopt a similar conformation (Fig. 4B), these two counterpart residues appear to be stabilized by distinct H-bond networks. In GtACR1, inward Arg94 only forms a salt-bridge with Asp234 and an H-bond with a water molecule (Suppl. Fig. 2A). However, in the CIR structure, in addition to the salt bridge, R95 is further stabilized by three polar residues, Asn92, Gln224, and Thr228, via two water molecules from the extracellular side of the protein (Suppl. Fig. 2B). The absence of these polar residues and waters in the vicinity may liberalize Arg94 and facilitate its flip-flopping in the tunnel of GtACR1.”

      4) The authors assume that the conformational change of R94 creates a functional anion binding site with the Schiff base in GtACR1, but it is too speculative. If the anomalous difference Fourier map does not support the idea, they should delete it.

      Our hypothesis (not an assumption) is based on the following facts: (1) both rhodopsin proteins GtACR1 and ClR transport the same halide substrates; (2) the chain A of GtACR1 adopts a nearly identical chemical conformation to that in the chloride-binding site (site 1) of CIR, in which the counterpart residue R95 forms a chloride binding site with the Schiff base (Fig. 4B); and (3) Arg94 is important to anion conductivity of GtACR1 (Li et al. eLife 2019). It is reasonable to hypothesize that Arg94 forms a putative anion binding site with the Schiff base in GtACR1. To make this hypothesis clear, we listed these facts in the text and rephrased our hypothesis as follows (lines 217-219): “Based on the similar chemical conformations (Fig. 4B), it is possible that Arg94 rotates its side chain to form an anion binding site with the Schiff base in GtACR1.”

    1. Author Response:

      Reviewer #1:

      This work is aiming at the characterization of the molecular and kinetic mechanism of how three members of the SLC6 family of transporters, namely for dopamine (DAT), norepinephrine (NET), and serotonin (SERT), transport substrate across the membrane, and how the transport process is affected by cations. The authors use electrophysiology and sophisticated rapid solution exchange methods, in conjunction with fluorescence recordings from single cells, to correlate flux (from fluorescence) with electrical activity (from currents).

      The strength of the methods is based on the application of a kinetic method with high time resolution, allowing the isolation of fast processes in the transport mechanism, and their modeling using a kinetic multistep scheme. In particular useful is the combination with fluorescence recording from single cells, which allows the authors to measure flux and current in the same cell under voltage clamp conditions. This is an elegant approach to get information on the voltage dependence of substrate flux, which is difficult to obtain with other methods. As to the strength of the results, the data are generally of high quality, showing the kinetic and mechanistic similarities and differences between the three transporters under observation. Another strength is that the results are quantitatively represented by kinetic simulations, which appear to fit the experimental data well.

      The major weakness of the research is related to interpretation of the experimental results. While the authors propose a unified K+ interaction mechanism for the three transporters, DAT, NET and SERT, the proposed K+ association/dissociation mechanism is 1) highly unusual, and 2) not unique in the ability to explain the experimental data. As to point 1), the DAT mechanism (Fig. 7A) proposes a sequence of intracellular K+ association and dissociation steps. Since the intracellular [K+] remains constant, such a sequence requires a change of affinity for K+, which is initially high when K+ associates (33 microM according to the provided rate constants) and then has to be low for K+ dissociation (3.3 mM). Such an affinity change requires input of free energy, to promote K+ dissociation. From the provided rate constants and at room temperature this free energy change can be approximated as 11.4 kJ/mol. This is a large energy amount, in fact larger than what is stored in the physiological concentration gradient for one Na+ ion as a driving force for transport. It appears that the transporter would waste a lot of energy for no apparent benefit, with a futile K+ association/dissociation cycle, that would just generate heat.

      Therefore, while the authors have achieved their aim of quantitatively assessing transporter function and thorough description by a kinetic mechanism, their final proposed mechanism does not support all of the conclusions because it is by far from unique in being able to explain the data (point 2) above). While this may be true for other transport mechanisms proposed in the past, the mechanism proposed here is somewhat odd with respect to energy requirements. Thus, it would require extraordinary experimental proof to propose it in exclusion of other, maybe more plausible mechanisms.

      Despite these shortcomings, the potential impact of the work is high, because a unifying theory of cation interaction and stoichiometry of the monoamine transporter members of the SLC6 family has been missing in the literature. In addition, the elegant method of combining single cell electrophysiology and fluorescence flux measurements is impactful, especially in the whole cell recording method, allowing the control of intracellular ionic composition.

      We thank reviewer 1 for his comments on the kinetic modelling. We do not claim that the mechanism, which we propose, is unique in its ability to explain the data. However, we should like to argue that the proposed mechanism is plausible and parsimonious. We, much like reviewer 1, initially asked the question, whether a mechanism requiring an ion such as potassium to associate and subsequently dissociate from the same side of the transporter was energetically feasible. In fact, one of the main reasons for employing kinetic models was to address this specific issue.

      If detailed balance in a kinetic model is maintained (i.e., the product of the rates in the forward direction of a loop equals the product of the rates in the reverse direction), the model is energetically sound (i.e., such a model does not violate the laws of thermodynamics). It is true that for a spontaneous reaction to occur, the Gibbs free energy has to be negative. In a multistep process, however, this consideration only pertains to the “initial” and the “final” state. As long as the Gibbs free energy between these two states is negative the reaction will proceed, even if the Gibbs free energy between “intermediate” states is positive. This point is illustrated in the schemes below.

      Scheme (A) maps out the Gibbs free energy of the outer loop of the kinetic model of DAT (i.e., this path describes the conformational trajectory, which the transporter takes in the presence of intracellular K+- see scheme in Fig.7A of the manuscript). For calculating the Gibbs free energy of this loop, we assumed a pre-equilibrium condition (i.e., an extracellular and intracellular substrate concentration that we arbitrarily set to 10 μM and 100 nM, respectively) and the membrane voltage as 0 mV. As shown in the scheme, the Gibbs free energy between the “initial-left” and the “final-right” state is negative. Accordingly, the multistep reaction can proceed spontaneously.

      In scheme (B), we mapped out the Gibbs free energy for the same path and the same pre-equilibrium condition as shown in scheme (A); the only difference is that the membrane potential was now assumed to be -60 mV. This is to show that voltage is also a determining factor of the extent by which the Gibbs free energy changes.

      In Scheme (C), we mapped out the Gibbs free energy at equilibrium (the difference in Gibbs free energy between the “initial” and the “final” state is zero). This condition is met when the intracellular substrate concentration is 155 μM. At this intracellular substrate concentration, the energy stored in the substrate gradient notably matches exactly the energy of the Na+ gradient. The model therefore predicts that no energy is dissipated as heat, an observation that is in contrast to the concern raised by reviewer 1. We admit that the model can be criticized on this ground, because arguably, a realistic process is expected to dissipate energy as heat even if it involves a microscopic system (as is the case here). Determination of how much heat is generated in a transport cycle is, however, beyond the scope of the present manuscript and warrants a detailed study. In such a study, one could investigate if any heat loss generated can be compensated by, for instance, the occasional antiport of K+ by DAT, which, as we point out in the discussion, is possible. In this context, we stress that the energetic costs would have been much higher, if we had assumed non-obligatory antiport of K+ through DAT. Such a mechanism predicts that the K+ gradient is constantly dissipated in the absence of the substrate, which would indeed create the futile heat loss reviewer 1 is concerned about.

      An alternate hypothesis to the actions of intracellular K+ on the DAT transport cycle would be to propose the presence of a regulatory K+ binding site. We are reluctant to assume this mechanism for the simple reason that there is little evidence for such sites from the available crystal structures. The view that K+ binds to Na2 site in DAT, NET and SERT is consistent with our data (see Fig.5). These observations are aided by a previous study that shows K+ can bind to the Na2 site in DAT, as determined by extensive molecular dynamic simulations (Razavi et al., 2017, cited in the manuscript). By its very nature, the Na2 site cannot serve as a regulatory K+ binding site; for the transporter to proceed in the transport cycle, K+ must at some point dissociate from the Na2 site.

      On further scrutiny of our model for DAT, NET and SERT, we noticed that the extra and intracellular affinities for Na+ were set too high. We regret this oversight that arose because we had only simulated experiments in which the intracellular Na+ concentrations had been zero. The selected Na+ affinities would not have allowed the transporter to function properly at a physiological intracellular Na+ concentration (which is ~10 mM). We now rectified this problem by lowering the inner and outer Na+ affinity by a factor of 10. In Fig.7 of the main manuscript and supplementary figure 6, we have now replaced all previous simulations of the three transporters with the predictions of the newly amended model. As seen, the changes in the binding parameters for Na+ in the model could still account for the key findings of this study.

      Reviewer #2:

      Bhat et al. study transport mechanism of three members of the SLC6 family, i.e. DAT, NET and SERT, using a combination of cellular electrophysiology, fluorescence measurements - taking advantage of a fluorescent substrate (APP+) that can be transported by each of these different transporters - and kinetic modelling. They find that DAT, NET and SERT differ in intracellular K+ binding. In DAT and NET, intracellular K+ binding is transient, resulting in voltage-dependent transport. In contrast, SERT transports K+, and the addition of a charged substrate to the transport cycle makes serotonin transport voltage-independent.

      This is an extremely nice and interesting manuscript, based on a series of beautifully designed and executed experiments that are convincingly analyzed via a kinetic model. I have only some suggestions:

      1) Fig. 4: I find the description of Fig. 4 extremely difficult to understand. In clear contrast to the introductory sentence "Previous studies showed that Kin+ was antiported by SERT, but not by NET or DAT (Rudnick & Nelson, 1978; Gu et al., 1996; Erreger et al.,2008), SERT appears to be able to transport APP+ without K+ in Fig. 4. I was trying to understand this obvious discrepancy for a long time, until I found the authors coming back to this point in the discussion "However steady-state assessment of transporter mediated substrate uptake is hindered by the fact that all three monoamine transporters can also transport substrate in the absence of Kin+". This is a little late, and the author should address this point more explicitly in the result section, close to the description of Fig. 4.

      We agree with reviewer 2’s comments pertaining to the SERT data represented in Fig.4C. The observations made from this dataset seem confusing in the absence of any relevant context. We have added the following statements to clarify any discrepancy arising from Fig. 4 (lines 266-273): “Owing to the instrumental role of Kin+ in the catalytic cycle of SERT, the observed lack of difference in APP+ uptake profiles by SERT-expressing cells in the presence or absence of Kin+ seem contradictory. This discrepancy can be explained as follows: 1) SERT can alternatively antiport protons to complete its catalytic cycle (Keyes and Rudnick, 1982; Hasenhuetl et al. 2016) and 2) APP+ is a poor SERT substrate (as determined by lack of APP+ induced steady state currents, Fig. 2F and 3F) that may be shuttled into SERT-expressing cells at rates slower than the rate limiting isomerization of SERT from inward open to outward open state.”

      2) Throughout the whole manuscript I am missing statistical details in comparisons.

      Statistical details for comparisons, which were done on some data sets in Fig. 4, Fig.5 and Fig.6, have now been incorporated in the manuscript text.

      3) Since APP+ might also only bind to the transporter or even only bind to the cell membrane, the authors might want to look at how the time course of the cellular APP+ signal depends on the size of the cells or on the ratio of transport currents and capacitance. It is of course possible that the tested cells do not differ sufficiently in size to permit such comparison. The authors should at least comment on this possibiliy.

      We are working on monoclonal lines. Thus, the differences in cell size are small (between 25- 30 pF). In the new supplementary figure 1, we show that our (previously held) conjecture that the fast component represents membrane binding was wrong. In fact, analysis of the APP+ fluorescence in control cells (supplementary figure 1D) suggests that APP+ adherence to the plasma membrane does not contribute to significant fluorescence signal. We apologize for this misinterpretation and please refer to the responses to reviewer 1 for more details.

      4) Another set of results one might look at are the time courses of fluorescence decay after the end of the APP+ perfusion (Fig. 2 and 4). Substrate (APP+) outward transport should have a comparable voltage dependence as substrate uptake, moreover it should depend on the amount of substrate that entered to the cell before. Could the authors provide such result and use them to exclude specific/unspecific APP+ binding?

      In supplementary figure 1 (panel, A and C) and video files 1 and 2, we show that APP+ adheres to intracellular membranes of organelles. This has also been shown previously by others (Solis Jr. et al., 2012; Karpowicz Jr et al., 2013; Wilson et al., 2014, cited in the manuscript). Because these structures serve as sinks, there is no (or only little) free APP+, which is available for outward transport.

      Reviewer #3:

      The sodium-coupled biogenic transporters DAT, NET and SERT, terminate the synaptic actions of dopamine, norepinephrine and serotonin, respectively. They belong to the family of Neurotransmitter:sodium:symporters. These transporters have very similar sequences and this is reflected at the structural level as judged by similarity of the crystal structures of the outward-facing conformations DAT and SERT. However, earlier functional studies indicated that transport by SERT is electroneutral because the charges sodium ions and substrate moving into the cell are compensated by the outward movement of potassium ions (or protons) to complete the transport cycle. On the other hand, DAT and NET are electrogenic. Moreover, potassium ions are not extruded by these transporters and the Authors set out to investigate if the electrogenicity is related to difference in potassium handling between SERT and the two other biogenic transporters. This was done by analyzing the role of intracellular cations and voltage on substrate transport by the three biogenic amine transporters. This was achieved by the simultaneous recording of uptake of the fluorescent substrate APP+ and the current induced by this process under voltage-clamp conditions by single HEK293 cells expressing the transporters. The Authors found that even though uptake by NET and DAT did not require internal potassium, these transporters could actually interact with internal potassium as judged by the voltage dependence of the so-called peak current. This voltage dependence was very steep in the absence of both sodium and potassium. However, in the presence of either cation this voltage dependence became less steep when either of these cations was present in the internal milieu, indicating that not only sodium but also potassium could bind from the inside. The same result was obtained with SERT. However, uptake by SERT was found to be much less dependent on the membrane voltage than that by DAT and NET and was stimulated by internal potassium, consistent with the proposed electroneutrality of the former. The observations indicate that the structural similarity of the three biogenic amine transporters is also reflected in their ability to bind potassium, even though this cation can translocate to the outside only in SERT.

      Strengths:

      Development of a sophisticated technique to interrogate the mechanism of sodium coupled biogenic amine transport in single cells. Rigorous analysis of the data. Conclusions supported by the data. The methodology can be used to obtain novel insights into the mechanism of other transporters.

      Weaknesses:

      The presentation could be made more "user friendly" by explaining in more detail what is happening as we go through the data. For instance, peak and steady state currents are shown already in Figure 1, but an (too brief) explanation is only provided when describing Figure 5. A schematic in the first part of the Results would be useful. Some information of on the structural background should be provided as well as a full description of the transport cycle, namely the number of sodium ions translocated per cycle and the argument why chloride remains bound to the transporter throughout the cycle. The control that in contrast to potassium, lithium is inert should be performed not only for DAT, but also for the two other transporters.

      We thank Dr. Kanner for these recommendations. Regarding the role of Na+ and Cl- in the transport cycle of the monoamine transporters, we have briefly mentioned the same in the introduction as follows: “The crystal structure of both hSERT and dDAT show two bound Na+ ions. However, only one Na+ ion is thought to be released on the intracellular side in both transporters (Rudnick & Sandtner, 2019). Cl-, on the other hand, has been shown to play a modulatory role in the transport cycle of SERT and DAT, but Cl- is not essential for the transport stoichiometry (Erreger et al., 2008; Hasenhuetl et al., 2016).”

      As for the control experiments with Li+, we are very grateful to Dr. Kanner for his suggestions. En route to extending the observations, which we obtained with DAT in the presence of high intracellular Li+, to NET and SERT, we stumbled upon some unexpected results: while IV relations of peak currents with high intracellular Li+ or NMDG+ in NET were identical (similar to DAT), SERT gave us exactly the opposite profiles. IV relations of high intracellular Li+ in SERT were as shallow as those in the presence of high +++ intracellular K or high intracellular Na . This is indicative of intracellular Li binding to SERT, an observation not previously reported that further highlights the differences in DAT/NET and SERT in cation binding. We believe that our observations with Li+ and SERT could be expanded on in a separate story. We have accordingly changed the manuscript text in the Results and Discussion as follows:

      Results (lines 320-337):

      “Because the absence of Kin+ affected the slope of the IV-relation of the peak current, we surmised that potassium bound from the intracellular side not only to SERT but also possibly to DAT and NET. We explored this conjecture by determining the IV relation of peak currents through all three +++ transporters in the presence of lithium (Liin = 163 mM) instead of Kin . Li is believed to be an inert cation, because it does not support substrate translocation by SLC6 transporters. As expected, the IV relation of peak currents through DAT and NET were similar in the presence of 163 mM Lin+ to those recorded in the absence of Kin+ (cf., diamond and triangle symbol in Fig. 5J and 5K). These observations clearly indicate that Kin+ binds to both DAT and NET and rule out an alternative explanation, i.e. that the effect can be accounted for water and monovalent cations briefly occupying a newly available space in the inner vestibule. SERT, on the hand, show shallow IV relations of peak currents with high Liin+ when compared to those acquired in the absence of Kin+ (cf., diamond and triangle symbol in Fig. 5L). This is indicative of Liin+ binding to SERT on the intracellular side. The exact nature of Liin+ binding to SERT has not been reported previously and warrants further investigation. The IV relations of peak currents are similar in the presence of 163 mM Kin+ (Fig. 5A-C) and of 163 mM Nain+ (Fig. 5G-I) in DAT, NET and SERT (cf. circle and square symbols in Fig. 5J-L). This is consistent with the idea that Nain+ and Kin+ bind to overlapping sites in these transporters. “

      Discussion (lines 524-527):

      “Interestingly, differences between DAT/NET and SERT are further substantiated by the ability of SERT+ to bind to intracellular Li . The exact nature of this interaction is unknown and necessitates an in-depth investigation that is beyond the scope of this study.”

    1. Author Response:

      Reviewer #1:

      In this manuscript, Mouat et al. investigated the contribution of viral infection to the severity of arthritis in mice. Epstein-Barr virus (EBV) infection is associated with rheumatoid arthritis (RA). By assessing arthritis progression in type II collagen-induced arthritis (CIA) induced mice with or without latent 𝜸HV68 (murine gammaherpesvirus 68) infection, authors showed that latent 𝜸HV68 exacerbates progression of CIA. Additionally, profile of immune cells infiltrating the synovium was altered in 𝜸HV68-CIA subjects - these subjects presented with a Th1-skewed immune profile, which is also observed in human RA patients. Assessment of immune cells in the spleen and inguinal lymph nodes also showed that latent 𝜸HV68 infection alters T cell response towards pathogenic profile during CIA. Lastly, authors showed age-associated B cells (ABCs) are required for the effects of latent 𝜸HV68 infection on arthritis progression exacerbation. Findings presented in the manuscript provides important insights and resource to clinical RA research.

      There are some statistical analyses that need to be updated for completeness and appropriateness of use. In addition, the authors will need to highlight that all analyses were conducted in young mice, whereas RA occurs in aged individuals.

      We appreciate the thoughtful feedback from this reviewer. In response to their suggestions, we have updated our statistical analyses throughout the manuscript. In addition, we have added information on the age of primary EBV infection and age of RA onset to clarify that our age of infection and CIA induction model the timing in humans of EBV infection during adolescence and arthritis development typically during adulthood. We thank the reviewer for their feedback which has aided in strengthening this manuscript.

      Reviewer #2:

      In this study, the authors investigate the long-appreciated but little understood link between chronic infection with Epstein-Barr virus and rheumatoid arthritis (RA). Using a collagen-induced (CI)-model of arthritis and a natural murine analog of EBV (gammaherpesvirus 68, HV68), the authors demonstrate that latent infection with HV68 exacerbates clinical progression of CI-arthritis and is associated with changes in the immune cell and cytokine profile in the spleens and joints of HV68 infected mice. The most compelling finding is that an infection can indeed exacerbate the progression of secondary diseases, and the requirement of age-associated B-cells (ABCs) to the severe disease progression. While this study addresses a timely and important question-how chronic infections affect subsequent or secondary disease progression-additional work as well as a clarification of the experimental design is encouraged to understand some of the key conclusions.

      We thank this reviewer for their helpful comments and agree that further understanding the link between chronic infections and subsequent diseases is important.

      Based on their helpful comments we have clarified experimental approaches throughout the manuscript, such as timing of disease induction following ACRTA- γHV68 infection and further explanations of why certain parameters were examined, which have improved the manuscript. We appreciate the time this reviewer took to provide us with thoughtful and helpful comments.

      Reviewer #3:

      The authors developed an in vivo model of EBV's contribution to RA that recapitulates aspects of human disease. They examined the role of age-associated B cells and find that they are critical mediators of the viral-enhancement of arthritis.

      The manuscript is written in a well-structured form that facilitates the reading and following the incremental experimental setups. The manuscript is appropriate for publication after revisions.

      Some of the statistical measures did not show significant values while the author based several statements as if there is a difference (they rather used phrases as increased/fold change). Whether this is strong enough to support their statements is not clear.

      Overall, this report provides important insights regarding the association between latency, age-associated B cells, and the enhancement of RA in a mouse model. If these insights are translatable to RA immunology in humans is to be further investigated.

      We thank the reviewer for their comments and appreciation of our work. We have edited the text to more accurately describe the differences that we observe in support of our conclusions.

    1. Author Response:

      Reviewer #1:

      Weaknesses: The main aim of the study is to identify biomarkers that predict S/MD dengue early in the course of dengue. This requires biomarkers of which the levels change early after symptom onset. However, levels of several of the biomarkers did not change markedly between the two time points (early vs late), suggesting that the levels of these biomarkers had not yet changed on day 1-3, thereby questioning their use as 'early biomarkers'.

      Thank you, we acknowledge that the levels of some of the biomarkers are not markedly different between early and late time points. However this does not affect the aims of the study; firstly the late time-point may not represent the patient’s baseline as this time-point was within 2-3 weeks of the acute illness and secondly, our focus was on the first 3 days of illness, in order to identify early predictors, noting that this may not represent the peak for many of the biomarkers, which would be in the critical phase. However, we still were able to achieve our main aim which was to compare biomarkers on days 1-3 between patients who progressed to more severe outcomes and those who did not.

      The authors selected the biomarkers based on earlier pathophysiology studies. An alternative approach might have been to first measure a larger set of candidate biomarkers in a selection of patients and select only those biomarkers showing a clear change in the early phase.

      Thank you for your suggestion. For this study, due to the limited number of outcomes (moderate-severe events - 281 cases) and limited volume blood samples, we selected 10 biomarkers as the events-per-variable should be greater than 10 and we also would like to investigate the non- linear effect and interaction of the biomarkers [Heinze et al., Biom J 2018]. We therefore selected the most promising biomarkers systematically based on pilot data and published literature.

      Reference: Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J 2018; 60(3): 431-49.

      The predictive values of many of the biomarkers was only modest or absent. In addition, some of the findings appear a bit counterintuitive. Examples include the trend of the association of IP-10 with S/MD dengue that changed from positive to negative in the global model, and the opposite trends of some of the biomarkers (e.g. IL-8, ferritin) in adults and children. The authors acknowledge the existence of differences in dengue pathology between children and adults, but could discuss the possible biological reasons in more detail. For example, why would specifically IL-8 or ferritin have an oppositie effect in children and adults.

      The trend of the association of IP-10 with S/MD changed from the single to global model does not diminish the possibility of that biomarker being selected in the best combinations. In this study we do not try to elucidate causal pathways. Another biomarker in our model may be a mediator or confounder of IP-10 in the pathway to the outcome. This could be IL-1RA, as its association with S/MD was similar between the single and global model, and the correlation between IP-10 and IL-1RA was strong (Spearman’s rank correlation coefficient was 0.75). A change in direction after correction for another variable is often referred to as Simpson’s paradox. We have added this point to the discussion of the revised manuscript (page 14, lines 10-16).

      The opposing effect in children and adults is likely to be due to the composite endpoint of severe and moderate dengue. As shown in the analysis of severe dengue alone (figure S5, table S6), the effects of IL-8 and ferritin were similar in children and adults, which suggests these biomarkers are still associated with severe disease in all age groups and that the difference is driven by the moderate dengue group. In addition, uncomplicated dengue in adults have higher ferritin levels compared to in children, with increasing age and chronic conditions in adults likely contributing to this. We have added this point to the discussion in the revised manuscript (page 14, lines 21-26 and page 15, lines 1-2).

      The study does not include a validation cohort. The authors conclude that their findings 'assist the development of biomarker panels for clinical use.' Can the authors put into perspective the performance of their current combined biomarker panel to rule out S/MD dengue.

      Thank you for your comment, this is a case-control and preliminary study to investigate the potential combination of biomarkers associated with dengue clinical outcomes. We quantify importance by means of AIC and p-value. Another dataset without selection by outcome is needed to validate the findings in relation to predictive value. We have added to the limitations that this was not a prediction study, therefore, the performance of the combined biomarker panel with respect to predictive value was not performed (page 16, lines 14-17).

      Overall, the authors show convincingly in a unique cohort that biomarkers can be helpful to triage dengue patients already in the first days from symptom onset. Identification of the best biomarkers for this goal, validation in other cohorts, and a better understanding of differences between children and adults are required before such panels can be introduced in daily clinical practice.

      Thank you for your comment.

      Reviewer #2:

      The main weakness is the exclusion of virological markers, such as plasma/serum viral RNA levels or NS1 antigenaemia. Indeed, previous observations have found severe dengue patients to have higher viraemia in the acute phase of illness compared to those with uncomplicated dengue. More recently, several mechanistic studies have suggested that dengue virus NS1 protein could bind endothelial cells to disrupt its integrity, leading to vascular leakage. Indeed, the authors have pointed out these findings in lines 20-25 on page to lines 1-2 on page 6. Despite these reports, it is curious that the authors have not included either viraemia or NS1 antigenaemia as possible biomarkers for severe dengue.

      Thank you, we acknowledge that plasma viremia and NS1 antigenaemia levels are important factors in dengue disease outcomes. In this study, only enrolment viremia levels were available, but NS1 antigenaemia levels were not. We have previously investigated the association between viremia levels and clinical outcomes using a pooled dataset of the IDAMS international study and other three studies in Vietnam. We found that higher plasma viremia was associated with increased dengue severity [Vuong et al., Clin Infect Dis 2020]. For this study, the main aim was to investigate host biomarkers which could be combined in a multiplex test panel.

      However, as suggested, we have added the information of viremia levels to table S3 (which was previously table 2) of the revised manuscript. Also, we have performed a sensitivity analysis to include viremia levels as a potential biomarker and we have found that: (1) higher plasma viremia was associated with increased the risk of severe/moderate dengue in both single and global models, and (2) viremia was not selected in children but was selected fourth in adults when performing the best subset procedure. We have added this information in the Statistical analysis (page 10, lines 20- 24) and Results sections (page 13, lines 17-20), and the Supplementary file (appendix 8, figure S8, tables S13-S15, pages 30-34).

      Reference: Vuong NL, Quyen NTH, Tien NTH, et al. Higher plasma viremia in the febrile phase is associated with adverse dengue outcomes irrespective of infecting serotype or host immune status: an analysis of 5642 Vietnamese cases. Clin Infect Dis 2020.

      The manuscript in its present form may favour those with a strong statistical background to fully appreciate the nuances. Clearer explanations on the statistical findings would, I think, be helpful to those without such statistical background but who would nonetheless be in positions to translate these findings into clinical practice.

      We have added more explanation in the Statistical analysis, Results and Discussion sections to clarify statistical methods used in this study and the interpretation of the results.

      Most of the cases included in this study had DENV-1 infection. The biomarkers identified in this study may thus be DENV-1 specific and may not be readily applied to triage dengue cases caused by other DENV infection.

      In our study, DENV-1 accounted for 42% of all cases. We have performed a sensitivity analysis taking into account differences between serotypes. The results showed that there was no significant difference between serotypes with respect to the association between the biomarkers and primary endpoint in both the single and global models. This suggests that the study’s results are applicable for all serotypes. This information has been added in the Statistical analysis (page 10, lines 18-20) and Results sections (page 12, lines 18-20), and the Supplementary file (appendix 5, figures S3-S4, tables S4-S5, pages 13-17).

      Reviewer #3:

      1) For general ease of readership, it would greatly help if the authors can explain the choice of the statistical method used in the data analysis and perhaps briefly explain the model and how AIC should be interpreted in the main rather than the supplementary text).

      We have clarified in more details in the Statistical analysis section of the revised manuscript.

      2) While this reviewer understands that the authors want to focus on host immune and inflammatory biomarkers but it would be helpful if NS1 and viremia data are also shown ( at least in supplementary data) if these have been found not to correlate with disease severity.

      Thank you please see response to comment #1 of reviewer #2. Quantitative NS1 results were not available in this study. We have added viremia in a sensitivity analysis and the results showed that higher viremia was associated with increased risk of severe/moderate dengue, similar to our previous study [Vuong et al., Clin Infect Dis 2020]. In the best subset procedure, viremia was not selected in children and was selected fourth in adults.

      Reference: Vuong NL, Quyen NTH, Tien NTH, et al. Higher plasma viremia in the febrile phase is associated with adverse dengue outcomes irrespective of infecting serotype or host immune status: an analysis of 5642 Vietnamese cases. Clin Infect Dis 2020.

      3) It is Interesting to note that some biomarkers ( particularly the vascular markers) in severe group do not return to the same baseline as mild cases at convalescence even after >20 days. Whether such individuals already are at higher inflammatory state at baseline (pre-infection) as a result of underlying co-morbidities such as obesity or diabetes? Table 1 did not provide such information but would be interesting to show if there is any difference in health state in the 2 groups especially for obesity.

      We have added the information of obesity and diabetes in table 1, Results section (page 11, lines 13-14). There were 5 patients with diabetes; obesity was balanced between groups (14% in control group and 10% in S/MD group).

      4) It is rather confusing that the 2nd paragraph of discussion stated "Balancing model fit, robustness, and parsimony, we suggest the combination of five biomarkers IL-1RA, Ang-2, IL-8, ferritin, and IP-10 for children, and the combination of three biomarkers SDC-1, IL-8, and ferritin for adults to be used in practice."

      But the concluding paragraph went on to state "The best biomarker combination for children includes IL-1RA, Ang-2, IL-8, ferritin, IP-10, and SDC-1; for adults, SDC-1, IL-8, ferritin, sTREM-1, IL-1RA, IP-10, and sCD163 were selected." This should be clarified further.

      Thank you for pointing this out. The conclusion was based on the best combinations (taking into account AIC only), which consisted of 6 biomarkers for children and 7 biomarkers for adults. In the discussion, we reduced the number of biomarkers, taking into consideration not only the AIC, but also parsimony for clinical translation purposes, while keeping the model fit as good as possible (taking a difference of AIC of less than 5 compared to the best combination). We therefore suggested a combination of 5 biomarkers for children and 3 biomarkers for adults, considering these 3 factors - model fit, robustness and parsimony. We have clarified this point in the Discussion section of the revised manuscript (page 15, lines 20-25).

    1. Author Response:

      Reviewer #1:

      Summary and Strength:

      Single-cell RNA sequencing is the most appropriate technique to profile unknown cell types and Koiwai et al. made good use of the suitable tool to understand the heterogeneity of shrimp hemocyte populations. The authors profiled single-cell transcriptomes of shrimp hemocytes and revealed nine subtypes of hemocytes. Each cluster recognizes several markers, and the authors found that Hem1 and Hem2 are likely immature hemocytes while Hem5 to Hem9 would play a role in immune responses. Moreover, pseudotime trajectory analysis discovered that hemocytes differentiate from a single subpopulation to four hemocyte populations, indicating active hematopoiesis in the crustacean. The authors explored cell growth- and immune-related genes in each cluster and suggested putative functions of each hemocyte subtype. Lastly, scRNA-seq results were further validated by in vivo analysis and identified biological differences between agranulocytes and granulocytes. Overall, conclusions are well-supported by data and hemocyte classifications were carefully performed. Given the importance of aquaculture in both biology and industry, this study will be an extremely useful reference for crustacean hematopoiesis and immunity. Moreover, it will be a good example and prototype for cell-type analysis in non-model organisms.

      Thank you very much for your kind review. We hope that this paper will lead to a better understanding of the immune system of shrimp and further development of aquaculture.

      Weaknesses:

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis QC and in vivo lineage validation need to be clarified.

      1) It is not a trivial task to perform genome-wide analyses of gene expression on species without sufficient reference genome/transcriptome maps. With this respect, the authors should have de novo assembled a transcriptome map with a careful curation of the resulting transfrags. One of the weaknesses of this study is the lack of proper evaluation for the assembly results. To reassure the results, the authors would need to first assess their de novo transcripts in detail and additional data QC analysis would help substantiate the validity.

      The genome sequence of the kuruma shrimp M. japonicus has only been registered, and the high-quality data has not been published yet. Therefore, we could not perform validation using the genome sequence. However, by applying the BUSCO tool to the assembled sequences, we verified the quality of the assembly genes. Line 80-82 and 634-636.

      2) The authors applied SCTransform to adjust batch effects and to integrate independent sequencing libraries. SCTransform performs well in general; however, the authors would need to present results on how batch effects were corrected along with before and after analysis. In addition, the authors would need to check if any cluster was primarily originated from a single library, which could be indicative of library-specific bias (or batch effects).

      Thank you for your suggestion. The triplicate distribution after batch correction is shown in the Figure 2-figure supplement 1 and Figure 5-figure supplement 1. Line 123 (Figure 2-figure supplement 1), 244 ( Figure 5-figure supplement 1) and 686-689.

      3) Hem6 cells lack specific markers and some cells in this cluster are scattered throughout the other clusters (Fig. 1 & 2). Based on the pattern, it is possible that these cells are continuous subsets of other clusters. It would be good if the authors could group these cells with Hem7 or other clusters based on transcriptomic similarities or by changing clustering resolution. Additionally, they may also be a result of doublets, and it is unclear whether doublets were removed. Hem6 cells require additional measures to fully categorize as a unique subset.

      Based on the new UMI counts, we re-did in silico clustering and pseudotime analysis with new parameters. For Doublets, we assumed UMI less than 4000 this time because none of them had prominent UMI. Line 118 (Figure 2), 237 (Figure 5), 686-689 and 710-712.

      4) The authors took advantage of FACS sorting, qRT-PCR, and microscopic observation to verify in silico analyses and defined R1 and R2 populations. While the experiments are appropriate to delineate differences between the two populations, it is not sufficient to determine agranulocytes as a premature population (Hem1-4) and granulocytes as differentiated subsets (Hem5-9). To better understand the two groups (ideally nine subtypes), additional in vivo experiments would be essential. For example, proliferation markers (BrdU or EdU) could be examined after FACS sorting R1 and R2 cells to show R1 cells (immature hemocytes) are indeed proliferating as indicated in the analyses.

      Since stable culture of shrimp hemocytes is still difficult, it is difficult to implement BrdU assay now. We believe the advantages of our study are that single-cell analysis can be used in shrimp, that we explored marker candidates, and that we were able to provide guidelines for cell classification in the future. Of course, we are going to adapt BrdU or EdU assay on hemocytes in the feature.

      5) FACS-sorted R1 or R2 population does not look homogeneous based on the morphology and having two subgroups under nine hemocyte subtypes may not be the most appropriate way to validate the data. The better way to prove each subtype is to use in situ hybridization to validate marker gene expressions and match with morphology.

      What we want to show here is that it is very difficult to classify hemocytes by morphologically, and even if we could, it is likely to be divided into two rough groups (FACS result). As in the answer to the question above, we believe the advantage of this project is that we were able to search for marker candidates and provide guidelines for cell classification in the future. Of course, in the future, we hope to look at the function and expression of each gene. Since it is difficult to perform the in-situ assay or BrdU assay in shrimp hemocytes immediately, we have removed the Figure 7.

      Reviewer #2:

      In this manuscript Koiwai et al. used single cell RNA sequencing of hemocytes from the shrimp Marsupenaeus japonicus. Due to lack of complete genome information for this species, they first did a de novo assembly of transcript data from shrimp hemocytes, and then used this as reference to map the scRNA results. Based on expression of the 3000 most variable genes, and a subsequent cluster analysis, nine different subpopulations of hemocytes were identified, named as Hem1-Hem9. They used the Seurat marker tool to find in total 40 cluster specific marker transcripts for all cluster except for Hem6. Based upon the predicted markers the authors suggested Hem1 and Hem2 to be immature hemocytes. In order to determine differentiation lineages they then used known cell-cycle markers from Drosophila melanogaster and could confirm Hem1 as hemocyte precursors. While genes involved in the cell cycle could be used to identify hemocyte precursors, the authors concluded that immune related genes from the fly was not possible to use to determine functions or different lineages of hemocytes in the shrimp. This is an important (and known) fact, since it is often taught that the fruit fly can be used as a general model organism for invertebrate immunologists which obviously is not the case. Even among arthropods, animals are different. The authors suggest four lineages based upon a pseudo temporal analysis using the Drosophila cell-cycle genes and other proliferation-related genes. Further, they used growth factor genes and immune related genes and could nicely map these into different clusters and thereby in a way validating the nine subpopulations. This paper will provide a good framework to detect and analyze immune responses in shrimp and other crustaceans in a more detailed way.

      Strengths:

      The determination of nine classes of hemocytes will enable much more detailed studies in the future about immune responses, which so far have been performed using expression analysis in mixed cell populations. This paper will give scientists a tool to understand differential cell response upon an injury or pathogen infection. The subdivision into nine hemocyte populations is carefully done using several sets of markers and the conclusions are on the whole well supported by the data.

      Thank you for taking the time to review our paper. We hope that this paper will serve as a guideline for crustacean hemocyte research.

      Weaknesses:

      One obvious drawback of the paper is first the low number of UMIs. A total number of 2704 cells gave a median UMI as low as 718 which is very low. Especially shrimp no. 2 has an average far below 500 and should perhaps be omitted. Therefore, one question is about cell viability prior to the drop-seq analysis. The fact of this low number of UMIs should be discussed more thoroughly.

      By confirming the mitochondrial-derived sequences, we cleared up the suspicion that large numbers of dead cells were contaminating. We have also succeeded in increasing the number of UMIs by changing mapping software and adjusting the parameters. The value of UMIs is still lower than that of other model organisms, but we think that will improve as the reference genome is published in the future. I have discussed this in the manuscript. Line 87-89, 118 (Figure 2) and 716-717.

      Details about how quality control (QC) was performed would be needed, for example the cutoff values for number of UMI per cell, and also one important information showing the quality is the proportion of mitochondrial genes.

      As we answered in the above section, we checked and figured the results of mitochondrial contents. Since there are no set rules here, we set the parameters for one cell based on the initial distribution diagram. Line 87-89, 118 (Figure 2) and 686-689

      The clustering into nine subpopulations seems solid, however the determination of lineages based upon the pseudo time analysis with cell-cycle related genes is not that strong. The authors identify four lineages, all starting from hem1 via hem2-Hem3- Hem4 and then one to Hem5, another through part of Hem 6 to Hem 7, next through part of Hem 6 to Hem 8 and finally through part of Hem 6 to Hem 9. Referring to Figure 3 - supplement 3, it seems as if Hem6 could be subdivided into two clusters, one visible in B and C, while another part of Hem & is added in D.

      Based on the new UMI counts, we re-did in silico Clustering and pseudotime analysis with new parameters. It made more clear result. Line 118 (Figure 2), 237 (Figure 5), 686-689 and 710-712.

      Also, the data in figure 3 - supplement 1 showing expression of cell cycle markers do not convincingly show the lineages. Cluster Hem 3 and 4 seems to express much fewer and lower amount of these markers compared to cluster Hem6 - Hem9.

      As a result of the new clustering and other analyses, we can now see more clearly how growth-related genes vary along the clusters (Figure 7). Line 366 (Figure 7).

      It is also clear (from figure 5 - supplement 1) that there are more than one TGase gene and the authors would need to discuss that fact related to differentiation.

      Thank you for your suggestion. We discussed about different type of TGase in revised paper. Line 386-399, 457 (Figure 8-figure supplement 2).

      While the part to determine subpopulations is very strong, the part about FACS analysis and qRT-PCR is weaker than the other sections, and doesn't add so much information. Validation of marker genes and the relationship between clusters and morphology shown in figure 6 is not totally convincing. It seems clear that both R1 and R2 contains a mixture of different cell types even if TGase expression is a bit higher in R1. A better way to confirm the results could be to do in situ hybridization (or antibody staining) and show the cell morphology of some selected marker proteins in a mixed hemocyte population. FACS sorting is very crude and does not really separate the shrimp hemocytes in clear groups based on granularity and size. This may be because the size of hemocytes without granules vary a lot. You need cell surface markers to do a good sorting by FACS.

      We agree your comments that in situ hybridization or antibody staining are powerful tools to support our new findings. However, it is difficult to perform in-situ assay or preparation of antibody for shrimp hemocytes immediately. What we want to show here is that it is very difficult to classify hemocytes by morphologically, and even if we could, it is likely to be divided into two rough groups (FACS result). As in the answer to the question above, we believe the advantage of this project is that we were able to search for marker candidates and provide guidelines for cell classification in the future. Of course, in the future, we hope to look at the function and expression of each gene.

      Another minor issue is the discussion about KPI. There are a huge number of Kazal-type proteinase inhibitors in crustaceans and it is not clear from this data if the authors discuss a specific KPI-gene, and there is a mistake in referring to reference 65 which is about a Kunitz-type inhibitor.

      Thank you for your important pointing. In case of kuruma shrimp, de novo assembled genes and blast results showed low (around 60%) identity against L. vannamei’s Kazal-type proteinase inhibitor, not against kuruma shrimp. Therefore, we could not discuss about which type of KPI in this study. We consider it important that further research on KPIs for kuruma shrimp be conducted in the future. Also, as you pointed out, reference 65 was wrong, so we removed it. Line 474 (Figure 8-figure supplement 5).

      In summary, this paper is a very important contribution to crustacean immunology, and although a bit weak in lineage determination it will be of extremely high value.

      Thank you for giving us a good feedback. We understand that the evaluation of the gene as a marker and the expression of the marker gene in each cell is poor in not being able to confirm. However, we believe that our research will hopefully serve as a basis for future research.

      Reviewer #3:

      This manuscript by Koiwai et al. described the single-cell RNA-seq analysis of shrimp hemocytes and was submitted as a Resource Paper in eLife. In this study, they identified 9 cell types in shrimp hemocytes based on their transcriptional profiles and identified markers for each subpopulation. They predicted different immune roles among these subpopulations from differentially expressed immune-related genes. They also identified cell growth factors that might play important roles in hemocyte differentiation. This study helps to understand the immune system of shrimp and maybe useful for improving the control of the pathogen infections. The analysis of the data and interpretation is overall good but there are also some concerns:

      Thank you for your careful peer review. We hope that this paper will be useful to other researchers in the future. We have made a revise based on your comments, please review it again.

      1) The number of UMI and genes detected per cell after mapping to the in-house reference genome does not appear to be presented, and the similarities or differences between the three replicated samples are not discussed, as well as the low number of genes detected per cell (~300 in this study) .

      By confirming the mitochondrial-derived sequences, we cleared up the suspicion that large numbers of dead cells were contaminating. We have also succeeded in increasing the number of UMIs by changing mapping software and adjusting the parameters. The value of UMIs is still lower than that of other model organisms, but we think that will improve as the reference genome is published in the future. I have discussed this in the manuscript. Line 87-89, 118 (Figure 2) and 686-689.

      2) The correlation between the morphology and the expression of marker genes demonstrated in Figure 6 is questionable. Cells of the same size could express totally different genes. On the other hand, cells that are different in size can express nearly identical genes. The evidence presented in this manuscript is not enough to support a correlation between cell size and gene expression. Therefore, the author would either need to provide more evidence to support this correlation, or not make such correlation.

      Yes, we agree your comments. What we want to show here is that it is very difficult to classify hemocytes by morphologically, and even if we could, it is likely to be divided into two rough groups (FACS result). So, it is not surprising that similar cells may or may not express similar genes. However, some of genes can be used as markers for cell (may refer to cell size too), such as TGase or proPO genes.

      3) There are many spindle-shaped cells in Figure 6B, but none of them appeared in Figure 6C and D after sorting, and the reason for this is unclear.

      We don't have any idea why the cells were deformed either, and we think this is exactly why it is so difficult to classify hemocytes by morphologically. This reason is unknown as cell culture is also not currently possible.

      4) The hemocyte differentiation model in Figure 7 is not supported by any experimental data.

      We understood your comment. Since we could not conduct any functional research about marker genes, we have removed figure 7.

    1. Author Response:

      Reviewer #1 (Public Review):

      Strengths:

      1) The model structure is appropriate for the scientific question.

      2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

      3) Generally speaking, the analysis supports the conclusions.

      Other considerations:

      1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

      We agree that this would be an important extension for future work and have indicated this in the Discussion, along with the types of data necessary to fit such models:

      “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

      2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

      Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc….). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

      We have highlighted this important limitation in the Discussion:

      “Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts \cite{Krieger2020-ss}. Addressing these data barriers will also be key for translating these and similar models into actionable policy proposals on vaccine distribution and non-pharmaceutical interventions.”

      3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

      We agree this would be of interest and have extended the range of R0 values. Figure 1 has been updated accordingly (see below); we also updated the text with new findings: “After fitting the models across a range of $\epsilon$ values, we observed that as $\epsilon$ increases, HITs and epidemic final sizes shifted higher back towards the homogeneous case (Figure \ref{fig:model2}, Figure 1-figure supplement 4); this effect was less pronounced for $R_0$ values close to 1.”

      Figure 1: Incorporating assortativity in variable exposure models results in increased HITs across a range of $R_0$ values. Variable exposure models were fitted to NYC and Long Island serosurvey data.

      4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

      We have conducted additional analyses exploring the important suggestion by the reviewers that social distancing could affect these conclusions. The text and figures have been updated accordingly:

      “Finally, we assessed how robust these findings were to the impact of social distancing and other non- pharmaceutical interventions (NPIs). We modeled these mitigation measures by scaling the transmission

      rate by a factor $\alpha$ beginning when 5\% cumulative incidence in the population was reached. Setting the duration of distancing to be 50 days and allowing $\alpha$ to be either 0.3 or 0.6 (i.e. a 70\% or 40\% reduction in transmission rates, respectively), we assessed how the $R_0$ versus HIT and final epidemic size relationships changed. We found that the $R_0$ versus HIT relationship was similar to in the unmitigated epidemic (Figure 1-figure supplement 5). In contrast, final epidemic sizes depended on the intensity of mitigation measures, though qualitative trends across models (e.g. increased assortativity leads to greater final sizes) remained true (Figure 1-figure supplement 6). To explore this further, we systematically varied $\alpha$ and the duration of NPIs while holding $R_0$ constant at 3. We found again that the HIT was consistent, whereas final epidemic sizes were substantially affected by the choice of mitigation parameters (Figure 1-figure supplement 7); the distribution of cumulative incidence at the point of HIT was also comparable with and without mitigation measures (Figure 2-figure supplement 8). The most stringent NPI intensities did not necessarily lead to the smallest epidemic final sizes, an idea which has been explored in studies analyzing optimal control measures \cite{Neuwirth2020- nb,Handel2007-ee}. Longitudinal changes in incidence rate ratios also were affected by NPIs, but qualitative trends in the ordering of racial and ethnic groups over time remained consistent (Figure 3- figure supplement 3).

      Figure 1-figure supplement 6: Final epidemic sizes versus $R_0$ in variable exposure models with mitigation measures for $\alpha = 0.3$ (top) and $\alpha = 0.6$ (bottom). NPIs were initiated when cumulative incidence reached 5\% in all models and continued for 50 days. Models were fitted to NYC and Long Island serosurvey data.

      Figure 1-figure supplement 7: Sensitivity analysis on the impact of intensity and duration of NPIs on final epidemic sizes. HIT values for the same mitigation parameters were 46.4 $\pm$ 0.5\% (range). The smallest final size, corresponding to $\alpha = 0.6$ and duration = 100, was 51\%. Census-informed assortativity models were fit to Long Island seroprevalence data. NPIs were initiated when cumulative incidence reached 5\% in all models.

      See points 1 and 2 above for examples of additional data required.

      Minor issues:

      -This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

      To clarify this, we have replaced instances of “activity level” (and similar) with “total contact rate”, indicating the total number of contacts per unit time per individual; e.g. “The estimated total contact rate ratios indicate higher contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

      We have also clarified our definition of contacts: “We define contacts to be interactions between individuals that allow for transmission of SARS-CoV-2 with some non-zero probability.”

      -The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

      We have revised the abstract to convey these same concepts in a more accessible manner: “A simple model where interactions occur proportionally to contact rates reduced the HIT, but more realistic models of preferential mixing within groups increased the threshold toward the value observed in homogeneous populations.”

      -I would cite some of the STD models which have used similar matrices to capture assortative mixing.

      We have added a reference in the assortative mixing section to a review of heterogeneous STD models: “Finally, under the \textit{assortative mixing} assumption, we extended this model by partitioning a fraction $\epsilon$ of contacts to be exclusively within-group and distributed the rest of the contacts according to proportionate mixing (with $\delta_{i,j}$ being an indicator variable that is 1 when $i=j$ and 0 otherwise) \cite{Hethcote1996-bf}:”

      -Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

      We have added these helpful examples into the text: “Variable susceptibility to infection across racial and ethnic groups has been less well characterized, and observed disparities in infection rates can already be largely explained by differences in mobility and exposure \cite{Chang2020-in,Zelner2020- mb,Kissler2020-nh}, likely attributable to social factors such as structural racism that have put racial and ethnic minorities in disadvantaged positions (e.g., employment as frontline workers and residence in overcrowded, multigenerational homes) \cite{Henry_Akintobi2020-ld,Thakur2020-tw,Tai2020- ok,Khazanchi2020-xu}.”

      -Line 193: "Higher than expected" -> expected by who?

      We have clarified this phrase: “The estimated total contact rate ratios indicate higher exposure contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

      -A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc….)

      We agree and have added this to the Discussion: “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

      Reviewer #2 (Public Review):

      Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

      We thank the reviewer for the supportive comments.

      1) Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

      We have included this important point in the limitations section on potential serosurvey biases: “First, biases in the serosurvey sampling process can substantially affect downstream results; any conclusions drawn depend heavily on the degree to which serosurvey design and post-survey adjustments yield representative samples \cite{Clapham2020-rt}. For instance, because the serosurvey we relied on primarily sampled people at grocery stores, there is both survival bias (cumulative incidence estimates do not account for people who have died) and ascertainment bias (undersampling of at-risk populations that are more likely to self-isolate, such as the elderly) \cite{Rosenberg2020-qw,Accorsi2021-hx}. These biases could affect model estimates if, for instance, the capacity to self-isolate varies by race or ethnicity -- as suggested by associations of neighborhood-level mobility versus demographics \cite{Kishore2020- sy,Kissler2020-nh} -- leading to an overestimate of cumulative incidence and contact rates in whites.”

      2) It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

      We agree with this important point and indeed this was the goal, in part, of the analyses in Figure 2. We have added additional text to the Discussion highlighting this: “Projecting the epidemic forward indicated that the overall HIT was reached after cumulative incidence had increased disproportionately in minority groups, highlighting the fundamentally inequitable outcome of achieving herd immunity through infection. All of these factors underscore the fact that incorporating heterogeneity in models in a mechanism-free manner can conceal the disparities that underlie changes in epidemic final sizes and HITs. In particular, overall lower HIT and final sizes occur because certain groups suffer not only more infection than average, but more infection than under a homogeneous mixing model; incorporating heterogeneity lowers the HIT but increases it for the highest-risk groups (Figure \ref{fig:hitcomp}).”

      For vaccination, see our response to Reviewer #1 point 4.

      3) The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

      This is a great suggestion, but this type of data has generally not been available due to privacy concerns from disaggregating mobility data by race and ethnicity (Kishore et al., 2020). Instead, we modeled NPIs as mentioned in Reviewer #1 point 4, with the caveat that reduction in mobility was assumed to be identical across groups. We added this into the text explicitly as a limitation: “Third, we have assumed the impact of non-pharmaceutical interventions such as stay-at-home policies, closures, and the like to equally affect racial and ethnic groups. Empirical evidence suggests that during periods of lockdown, certain neighborhoods that are disproportionately wealthy and white tend to show greater declines in mobility than others \cite{Kishore2020-sy,Kissler2020-nh}. These simplifying assumptions were made to aid in illustrating the key findings of this model, but for more detailed predictive models, the extent to which activity level differences change could be evaluated using longitudinal contact survey data \cite{Feehan2020-ta}, since granular mobility data are typically not stratified by race and ethnicity due to privacy concerns \cite{Kishore2020-mg}.”

      Reviewer #3 (Public Review):

      Ma et al investigate the effect of racial and ethnic differences in SARS-CoV-2 infection risk on the herd immunity threshold of each group. Using New York City and Long Island as model settings, they construct a race/ethnicity-structured SEIR model. Differential risk between racial and ethnic groups was parameterized by fitting each model to local seroprevalence data stratified demographically. The authors find that when herd immunity is reached, cumulative incidence varies by more than two fold between ethnic groups, at approximately 75% of Hispanics or Latinos and only 30% of non-Hispanic Whites.

      This result was robust to changing assumptions about the source of racial and ethnic disparities. The authors considered differences in disease susceptibility, exposure levels, as well as a census-driven model of assortative mixing. These results show the fundamentally inequitable outcome of achieving herd immunity in an unmitigated epidemic.

      The authors have only considered an unmitigated epidemic, without any social distancing, quarantine, masking, or vaccination. If herd immunity is achieved via one of these methods, particularly vaccination, the disparities may be mitigated somewhat but still exist. This will be an important question for epidemiologists and public health officials to consider throughout the vaccine rollout.

      We thank the reviewer for the detailed and helpful summary and suggestions.

    1. Author Response:

      Reviewer #1 (Public Review):

      The gist of this work is that the simple concept of a solubility product determines a threshold for phase separation, thereby enabling buffering even in systems where phase separation is driven by heterotypic interactions. The solubility product or SP is determined by the number of complementary interaction sites and the coordination number i.e., the number of bonds one can make per site.

      The work appears to be motivated by two questions: Are concentrations buffered in systems where heterotypic interactions drive phase separation thereby negating the presence of a rigorously definable saturation concentration? This question was motivated by work from Klosin et al., showing how phase separation can enable buffering of noise in transcription. They relied on the concept of a saturation concentration. In a paper that followed a few months after, Riback et al., showed that the concept of a saturation concentration ceases to exist, as defined for systems where phase separation is driven purely by homotypic interactions. This was taken to imply that the formation of multicomponent condensates via a blend of homotypic and heterotypic interactions causes a loss of buffering capacity afforded by phase separation. The second question motivating the current work is the apparent absence of a theoretical framework for "varying threshold concentrations" in systems governed by heterotypic interactions.

      Using two flavors of simulations, the authors propose that the SP sets an upper limit on the convolution of concentrations that determine phase separation. They show this via simulations where they follow the formation of clusters formed by linear multivalent macromolecules and monitor the emergence of a bimodal distribution of clusters. In 1:1 mixtures of multivalent macromolecules they find that SP sets a threshold beyond which a bimodal distribution of clusters emerges. The authors further find that SP sets an upper limit even in systems that deviate from the 1:1 stoichiometry.

      The authors proceed to show that the SP is influenced by the valence of multivalent macromolecules. They also demonstrate that short rigid linkers can cause an arrest of phase separation through a so-called "dimer trap" reminiscent of the "magic number" postulate put forth by Wingreen and colleagues.

      Is the work significant, novel, and timely? Effectively the authors propose that the driving forces for phase separation can be distilled down to the concept of a solubility product. Given prior knowledge of the valence, coordination number, and affinities can one predict concentration thresholds for phase separation? The authors suggest that this can be gleaned from either network based simulations, which are very inexpensive, or through more elaborate simulations. They further propose that it is the solubility product that sets the threshold.

      It is worth noting that the authors are quantifying what is known in the physical literature as a percolation threshold. The seminal work of Flory and Stockmayer dating back to the 1940s showed how one can calculate a percolation threshold by taking in prior knowledge of valence, coordination numbers, and affinities whilst ignoring cooperativity. These ideas have been refined and advanced in several theoretical contributions by various labs. While none of the papers in the physical literature use the concept of a solubility product, they rely on the concept of a percolation threshold because the transition to large, system-spanning clusters is a continuous one and it is debatable if this is a bona fide phase transition. Rather it is a topological transition.

      Yes, we agree that the novelty and importance of our work rests in the application of the simple and accessible concept of solubility product, which has not been previously considered in relation to LLPS. The relationship of our analysis to the physics underlying phase diagrams is discussed in a new paragraph within the Discussion.

      As for novelty, unfortunately the authors disregard prior work that showed how linker length impacts local vs. global cooperativity in phase transitions that combine phase separation and percolation. Ref. 23 is the work in question and it is mentioned in passing, even though the contributions here are entirely a redux.

      We have eliminated the results on how molecular structural features control LLPS to fully focus our paper on the Ksp concept, as suggested by the Editor. However, in our original manuscript, we described results not just related to linker length, but also steric effects.

      The concept of a solubility product, introduced here to model / understand phase behavior of multivalent macromolecules, is an interesting and potentially appealing simple description. It might make the understanding of phase transitions more accessible, but it has problems: (a) it does not define phase separation; rather it defines percolation transitions; (b) without prior knowledge of the relevant quantities, the solubility product cannot be readily inferred, even from simulations, although one can scan parameter space to arrive at predictions regarding the apparent valence and coordination numbers. (c) the solubility product does not tell us much about properties of condensates, interfaces, or the driving forces for phase transitions that are influenced by the collective effects of interaction domains / motifs and spacers.

      Recent papers have drawn attention to the potential importance of buffering as a biological function of biomolecular condensation, and also the failure of buffering in heterotypic LLPS. We felt that the Ksp would help “rescue” the idea of buffering, as Reviewer 1 has so aptly put it below. We have refocused the paper to emphasize this. Of course, we describe this for a series of ideal systems with known valency and affinities. However, theoretical systems are always “ideal” and the deviations from ideality are what make experiments so vital. We have added a paragraph in the Discussion that relates our work to the physics of phase transitions, providing 2 citations, [13, 21], to support taking the percolation threshold as a proxy for the phase boundary. We also point out at the end of the Discussion, how the Ksp concept might be validated experimentally and might be useful in categorizing the effective valency of molecules comprising a cellular condensate.

      Finally, as for the absence of a theoretical explanation for the apparent loss of buffering in systems with heterotypic interactions, the authors would do well to see the work of Choi et al., published in PLoS Comput. Biol. in 2019. Figure 12 in that work clearly establishes that the concentrations of A and B species in the coexisting dilute phase are set by the slopes of tie lines - the lines of constant chemical potential. These slopes are set by the relative strengths of homotypic vs. heterotypic interactions, and to zeroth order, that is the physical explanation.

      We apologize for missing this very relevant work and have now cited it several times in the paper. However, as Reviewer 1, states, Figure 12 treats the potential competition between homotypic and heterotypic interactions within a system. We did not address this in our paper. Rather, for our purposes, homotypic interactions are a special case that still fits within the solubility product framework. We do now address the relationship of tie-lines in phase diagrams to the Ksp in the Discussion paragraph mentioned above.

      Reviewer #2 (Public Review):

      This paper asks whether systems composed of more than one component (heterotypic) that undergo liquid-liquid phase separation will follow the same rules as ionic solutions. The question is motivated by (i) the behavior of homotypic solutions, where after phase separation, monomer concentrations remain fixed despite addition of new components, which is not true for heterotypic systems and (ii) the known behavior of multivalent ionic salts. This idea has not previously been tested. They show quite clearly through simulations that the solubility product, Ksp, can be used as a quantitative metric to delineate phase transition behavior in heterotypic systems. This is a valuable contribution to the understanding of phase separation in these systems, and could be impactful in analyzing experimental observables, at least in vitro, to determine the valency of interacting systems. It provides a relatively straightforward conceptual basis for observed partitioning of components into dilute and dense phases. The result seems robust and likely to be reproducible experimentally and through alternative simulation studies, particularly given its established history in quantifying the related phenomena in ionic salts.

      A weakness is the rather qualitative comparison to experiment, which is justified by the authors based on the unknown valency of the experimental system. There is also no quantitative comparison between simulation types (spatial vs non-spatial). However, the simulations do seem sufficiently detailed to test and validate the Ksp concept.

      Strengths:

      • The paper is very focused, and uses multiple simulation 'experiments' to test the role of the Ksp in delineating the phase transition, showing good agreement for multiple systems, with both matched and distinct stoichiometries between the components. They see typical behavior at the phase transition point, where they observe the largest variability or fluctuations in the formation of the dense phase. Thus the results strongly support the conclusion that the Ksp delineates phase transitions in these 2-3 component systems.

      • A comparison is made to a recent experimental result with three components, showing qualitative agreement with an observed lack of buffering, which was unexpected at the time due to the behavior observed for homotypic systems. Here this result is now rationalized via the Ksp, which does plateau despite the monomer concentrations changing.

      • Spatial simulations probe the role of structure and flexibility in impacting phase separation, finding general agreement with previously published experimental and modeling work. These observations about flexibility and matched valency are also relatively intuitive.

      Weaknesses

      • There is no quantitative comparison between the two simulation approaches (spatial and non-spatial), which should be straightforward. By using the same composition and KD in both types of simulations and directly comparing outcomes, it would help explain when and why the spatial simulations differ from the non-spatial ones-see subsequent comments below:

      • A related methodological point: On Line 97 it states that NFSim does not allow intramolecular bonds to form, but this is not true. On one hand, they can be written out explicitly. E.g. A(a1!1, a2).B(b1!1, b2)->A(a1!1, a2!2).B(b1!1, b2!2), would form a second bond between an AB complex that already had one bond. While quite tedious, these could be enumerated, allowing for the zippering effect they see spatially, although the rates would not be bimolecular. This would still leave out intra-complex bonds between proteins without a direct link. However, based on the NFsim website, by default it does in fact allow these types of intra-complex bonds to be formed (http://michaelsneddon.net/nfsim/pages/support/support.html) see "Reactant Connectivity Enforcement". So it is not clear to me which option was used in this paper. According to what is written in the methods, no intra-complex bonds are formed, but this is not the default in NFsim and is indeed allowable.

      The reviewer misinterpreted this admittedly unclear statement: “The binding rules only allow inter-molecular binding; internal bond formation within the molecular clusters is not permitted, as NFsim cannot account for proximity of binding sites within clusters.” We did not intend this to imply that NFSim does not support intramolecular binding; rather we meant that our choice was to only allow intermolecular bond formation. We made this choice because, being non-spatial, NFSIM cannot account for spatial proximity or steric effects. We have clarified this in the revised ms as follows: “We chose binding rules to only allow inter-molecular binding; we felt this was appropriate because NFsim cannot account for spatial proximity of binding sites or steric crowding within clusters.”

      • The spatial simulations do not show the bimodal distribution under the fixed concentrations (Fig S9). This is a significant difference from the non-spatial result. They attribute this to a 'dimer trap', but given they see the dense phase in the clamped monomer simulations, this cannot be the only explanation. What about kinetic effects, due to the differences in initial concentrations of monomers in the two simulation approaches? The rate constants are not listed anywhere. They only seem to see large clusters at fixed concentrations for the mismatched sizes (Fig S12B), where the Ksp behavior does not hold. Can they increase monomer flexibility more and start to see bimodal at fixed concentration, or change the rates and see a bimodal distribution?

      In general, there is a limited ability of a small number of molecules in the FTC simulations to form a clear bimodal distribution, whether spatial or non-spatial. This is directly demonstrated in Figure 1C, where the non-spatial simulations become increasingly bimodal as the number of molecules increases, keeping concentration constant. Because of the greater computational cost of SpringSaLaD calculations, we kept the FTC simulations in Figure 7 to 200 molecules. However, the histograms that are averaged over 50 runs obscure the clear separation that is apparent when examining molecule size distribution in individual trajectories for the FTC case. We now include these in the supporting figures as Figure 1- figure supplement 3 (NFsim) and Figure 7- figure supplement 2 (SpringSaLaD). Above Ksp, we see a consistent group of small oligomers (which is reinforced in the averaged histograms) and individual large clusters (which are smeared out in the average histograms). As Reviewer 2 noted, we were also able to convincingly demonstrate bimodality at and above Ksp with the CMC simulations, which are allowed to continue until they stochastically nucleate large clusters and take off.

      All the FTC simulations are run to steady state, so only the Kds should matter, not the rate constants, which were actually available in the input files in the Git repository; we have now included the SpringSaLaD rate constants in the manuscript as well.

      • Related-I am surprised that the sterically hindered monomers would not form large clusters at fixed concentration, as it looks like it is impossible for them to 'zipper' up their binding sites and become trapped in dimers. Is the distribution at fixed concentrations bimodal? The data is not shown.

      We have removed the additional spatial simulation Results for structures other than the one in Figure 7 as requested by the Editor. We hope to thoroughly explore the molecular-structural determinants of Ksp and LLPS in a subsequent paper.

      Reviewer #3 (Public Review):

      In this work, Chattaraj and colleagues utilize simulation models to study collective behaviors of molecules with multiple binding sites (multivalency). When the concentrations are low, the molecules do not bind to each other frequently, and they are called free. On the other hand, if the concentrations increase, they start to bind and eventually form a wide network of molecules connected by molecular binding. This transition can be considered as a model for liquid-liquid phase separation. Their major claim is that the solubility product, a simple product of the concentrations of the free molecules, can be used as a proxy to the phase separation threshold (known as the saturation concentration). They observed in various simulation conditions that as the total concentration of molecules increases, the solubility product first increases but eventually converges to a certain value, and the value is consistent over different simulation conditions. The value is the upper limit of the solubility product, after which the molecules start to form a molecular network.

      After establishing the model, they tested systems with different valences. Higher valency leads to reduction of the threshold (and phase separation occurs at lower concentrations). The theory was also valid for systems with non-equal valences (e.g. pentavalent A + trivalent B). They applied their models to a three-component system, and found that the results qualitatively explain the published experimental patterns. Lastly, using off-lattice coarse-grained simulations, they show that the linker flexibility and the spacing of binding sites are important determinants of the threshold, which confirms the findings from other computational and experimental works.

      The authors successfully defend their claim by using different types of simulations, and their methods to crosscheck the physical validity of their models may be useful for other simulation works. For example, the authors checked if increasing the number of molecules and reducing the system size give the same results for equal concentrations. Also, they employed two different methods (so-called FTC and CMC in the manuscript) to determine the threshold concentrations. However, the conclusions are not easily transferable to real biopolymer systems, since it is hard to determine the valences (and binding affinities) of biopolymers such as intrinsically disordered proteins.

      Our work was motivated by recent work highlighting the importance of buffering as a biological function of biomolecular condensation, but also the failure of buffering in heterotypic LLPS. We realized that Ksp offers a more general framework than buffering that encompasses complex multicomponent (heterotypic) systems. But our original manuscript was not sufficiently focused on this primary motivation and has been revised accordingly. Of course, we used simulations on ideal systems to establish this idea. We suggest at the end of the discussion that the Ksp concept may potentially be used to derive effective parameters for experimental systems.

    1. Author Response:

      Reviewer #2 (Public Review):

      The manuscript by Li et al describes the development of styrylpyridines as cell permeant fluorescent sensors of SARM1 activity. This work is significant because SARM1 activity is increased during neuron damage and SARM1 knockout mice are protected from neuronal degeneration caused by a variety of physical and chemical insults. Thus, SARM1 is a key player in neuronal degeneration and a novel therapeutic target. SARM1 is an NAD+ hydrolase that cleaves NAD+ to form nicotinamide and ADP ribose (and to a small extent cyclic ADP ribose) via a reactive oxocarbenium intermediate. Notably, this intermediate can either react with water (hydrolysis), the adenosine ring (cyclization to cADPR), or with a pyridine containing molecule in a 'base-exchange reaction'. The styrylpyridines described by Li et al exploit this base-exchange reaction; the styrylpyridines react with the intermediate to form a fluorescent product. Notably, the best probe (PC6) can be used to monitor SARM1 activity in vitro and in cells. Upon validating the utility of PC6, the authors use this compound to perform a high throughput screen of the Approved Drug Library (L1000) from TargetMol and identify nisoldipine as a hit. Further studies revealed that a minor metabolite, dehydronitrosonisoldipine (dHNN), is the true inhibitor, acting with single digit micromolar potency. The authors provide structural and proteomic data suggesting that dHNN inhibits SARM1 activity via the covalent modification of C311 which stabilizes the enzyme in the autoinhibited state.

      Thanks to the positive comments and suggestions from Reviewer #2 !

      Key strengths of the manuscript include the probe design and the authors demonstration that they can be used to monitor SARM1 activity in vitro in an HTS format and in cells. The identification of C311 as potential reactive cysteine that could be targeted for drug development is an important and significant insight.

      Key weaknesses include the fact that dHNN is a highly reactive molecule and the authors note that it modifies multiple sites on the protein (they mentioned 8 but MS2 spectra for only 5 are provided). As such, the compound appears to be a non-specific alkylator that will have limited utility as a SARM1 inhibitor. Additionally, no information is provided on the proteome-wide selectivity of the compound.

      Although dHNN may react with cysteines in general, our results indicate it does target specifically Cys311. Quantification of cysteine-containing peptides of other proteins showed no dHNN modification. So, we conclude that dHNN shows significant specificity to the Cys311 of SARM1. Some other SH-reactive agents we tested show little inhibition on SARM1. The evidence for Cys311 being dominant includes quantification of the intensity of the modified peptides and normalizing with that of the corresponding total peptides, with or without modification, showing that the modification is mainly on Cys311 (Figure 5—figure supplement 1). The dominant role of Cys311 is also confirmed by our mutagenesis and structural studies. Our result strongly suggested that the C311 is a druggable site for designing allosteric inhibitors against SARM1 activation.

      dHNN is effective in inhibiting SARM1 activation and AxD at low micromolar range, making it a useful inhibitor. Considering that the neuroprotective effect of NSDP, an approved drug, may well be due to dHNN, labeling it as inhibitor of SARM1 serves focus more attentions.

      Revision has been made in Discussion.

      An additional key weakness is the lack of any mechanistic insights into how the adducts are generated. Moreover, it is not clear how the proposed sulphonamide and thiohydroxylamine adducts are formed.

      From the images presented, it is unclear whether there is sufficient 'density' in the cryoEM maps to accurately predict the sites of modification.

      Please refer to Fig . 5 F, in which we show the close up view of dHNN in the ARM domain. dHNN ( purple ) linked to the residue C311 and formed the hydrophobic interactions with surrounding residues E264, L268, R307, F308, and A315. The extra electron densities near the residue C311 fit the shape of dHNN and were shown as grey mesh.

      Finally, the authors do not show whether the conversion of PC6 to PAD6 is stable or if PAD6 can also be hydrolyzed to form ADPR.

      PAD6 is stable and cannot be hydrolyzed by the activated SARM1, as shown in the following figure. The reactions contain 10μM PAD6, 100 μM NMN, 2.65 μg/mL SARM1 or blank as a control. The PAD6 fluorescence was monitored for one hour and did not change in both groups.

    1. Author Response:

      We thank you for the careful review and the opportunity to resubmit this manuscript. We particularly acknowledge the reviewer who helped to clarify the statistical arguments and stimulated our re-analysis of all results. This re-analysis has helped to change the focus of the work to identify significantly variable (higher) familial cancer risks in several race/ethnically described minority groups in the US, which we feel has broadened the message stimulating a word change in the title.

      Reviewer #1 (Public Review):

      This is a very well written and comprehensive paper that is a valuable contribution to the literature of childhood cancers. It shows that some childhood cancers have an inherited component and the risk could be to the mother or to the siblings. Although the relative risks are significant, childhood cancer is fortunately rare and the actual risk to the siblings is small.

      Can we assume this is less than one percent? i think it would be helpful to provide some absolute risk numbers for the siblings so that parents could be reassured that the risk to other children is small.

      Response: We appreciate this comment on absolute risk. It is true that the actual risk is very small given the rarity of childhood cancers. We calculated the overall absolute risk for mothers and siblings of a proband and compared it with the general population. It now reads “Moreover, due to the rarity of childhood cancers, the absolute risk is very small, but still higher among young siblings and mothers in the current study (0.074%) compared to general population (0.023%) of the same age group” in line 316 of the Discussion section.

      Do the authors have a suggestion on what genetic tests should be done on children with cancer? Do you have recommendations to make? i assume that the authors do not recommend screening of siblings for cancer except in rare cases. It would be useful to see what the authors recommend.

      Response: In this manuscript we do not provide clinical recommendations as we feel that is out of the scope of this research. Instead, we are making several points:

      1) That conventional US-based birth and cancer registries can be utilized to study familial-based cancer risks.

      2) That different ethnic groups appear to have different familial risks for some cancer subtypes.

      3) Early onset parental cancers can add information about familial-based risks.

      4) Second primary malignancies are enriched in families that exhibit familial risks (line 260 of the Results section). These characteristics will provide useful information for genetic counselors who need to advise families on their own decisions about genetic testing and family planning. At the present time the genetic counseling clinical discipline is tasked to make specific recommendations to families about screening siblings for cancer and presence of cancer predisposition alleles, such advice is stimulated by examining family history of cancer. Our work suggests that Latino families may have a higher risk of familial alleles in solid tumors overall, which may promote more attention or scrutiny of families by ethnicity.

      Are there some sites where the risk to siblings is there but not to parents which might suggest recessive inheritance?

      Response: this is an interesting question, but there are two reasons why our study may not be adequate to assess this. First, our sample size may not be large enough to adequately study this point. The risk to cancer in the general population is higher in children than it is in young adults – and therefore the low numbers of cancer in mothers that we see is largely a reflection of the low risk of cancer in young adults, since we cut off our observational age at 26 (due to the extent of follow-up on our young population). There is a lack of cancer at many of the ICC-03 defined childhood cancer sites among our parents, making it impossible to estimate cancer risk in the adults. Second, childhood cancers are biologically distinct from adults, so the risk imparted for childhood cancer from predisposition alleles that affect those cancers may not always have any effect on young adult cancers. Additionally, the progenitor cells at risk from childhood cancer may have differentiated, leading to no cells “at risk” of transformation after adolescence and the effect of childhood cancer predisposition alleles on those adult cancers not a meaningful comparison. Of course, there are exceptions to this such as TP53 alleles which affect cancer risk of many subtypes at any age.

      If the childhood cancer is rare and fatal one might not see it in the parents because of loss or reproductive fitness. Please comment.

      Response: We appreciate this comment a lot and have the same concern that patients with cancer that have a strong genetic cancer predisposition may not be capable to reproduce (even if the patient survives). We added a comment in the discussion section, and it now reads “Furthermore, it is likely that the low number of mothers with cancer is a result of bias against some very strong cancer predisposition alleles, so the patients could not survive long enough or be healthy enough to reproduce” on line 408.

      Should we assume that the higher risks for Latino children are purely due to genetic influences? Could there be environmental factors at play as well?

      Response: We appreciate this comment and totally agree that environmental factors also play a role. Not only genetic factors, but also the environmental factors, and the interaction between genetic and environmental factors would contribute to the variation in relative risks. We have addressed this point in lines 341 (“This familial concordance is likely due to both shared genetic and environmental…”) and 419 (“Second, the comparative attributable fraction of familial risk based on environmental risk factors interacting…”) of the discussion section. We believe that this point should stimulate further research, and we are constructing our own future studies to explore environmental factors along with genetics.

      Reviewer #2 (Public Review):

      [...] Although the authors comment that the results from the Chi-Sq test are not consistent with the specific group SIRs and 95%CIs, they do not explain how these results can be so different.

      I am concerned that there is either an error in the calculations or an error in the assumptions. It is not acceptable to have such contradictory results between the two distinct methods.

      For example, for hematological cancers the 95% CI for Latinos is entirely contained within the 95%CI for Non-Latino white, while this gives a p less than 0.05. The authors need to explore why these methods are giving very different answers and be clear that the low p-values are not simply an artifact of poor assumptions.

      Response: We sincerely appreciate the comments from Reviewer 2. And we want to thank Reviewer 2 for pushing on the inconsistency between confidence intervals and p-value comparing the SIRs between race/ethic groups. While overlapping CI’s do not necessarily indicate a lack of significance in the effect sizes, the apparent contrast in these statistical measures was too extreme to be believable and indeed there was an error.

      We reconstructed our data from scratch and recalculated all statistical comparisons with our statistician, Dr. W. J. Gauderman, and found a recurrent mistake in the calculation of p-value comparing the SIRs between race/ethic groups. We have corrected this mistake throughout the manuscript. Please refer to the new Figure 1, 3, and supplementary materials for the corrected numbers. The p values are now somewhat attenuated, and significant differences between Latinos and NL whites persist for solid tumors. In addition, Asians have significantly increased familial risk for hematologic cancers, and non-Latino Blacks have significantly increased risk of solid tumors when compared to non-Latino whites. Because of this broader enhanced risk evident in minority groups (with the corrected statistical comparisons), the focus of the manuscript was changed slightly emphasizing higher risks among minority groups in respective hematologic and solid tumor categories. There were also SIR differences suggested between many individual types of cancer, while not reaching formal statistical significance.

    1. Author Response:

      Reviewer #1 (Public Review):

      This Research Advance builds on the findings of this group's 2019 eLife paper which showed that conserved acidic and basic helices associate to enable heteropolymer formation by Snf7 and Vps24. This work provides some general structure/sequence relationships among the homologous ESCRT-III proteins that will be of interest to those in the ESCRT field. While there are no new mechanistic principles obtained from this study, the data allow the authors to propose a model of the minimal or core units needed for ESCRT-III membrane remodeling.

      The focus is largely on similarities and differences between the closely related Vps24 and Vps2, where they show that a few key point mutations or chimeric swaps (for Vps4 binding by the C-terminal region of Vps2) can exchange their functions. The last portion of the paper further tests similarities within the subgroups of ESCRT-III proteins to experimentally test functional groupings defined by sequence relationships.

      We thank the reviewer for their generous comments. We’d like to emphasize that one of the main focus behind this study is to be able to generate minimal ESCRT-III system that can be functional. We study Vps24 and Vps2 to generate a model ESCRT-III module with their specific properties. We previously engineered Snf7 to replace Vps20 (and other ESCRT components, eLife 2016). In this paper, we also extend some of the analysis to other ESCRT-III components. We agree that this current manuscript combines previously described mechanisms to understand the minimal ESCRT-III system and provides us a direction to understand why in some cases (for example archaeal system), there may be only two ESCRT-III subunits. This work, following up on previous works from our lab and others, takes us one additional step toward that direction.

      In addition, we’d also like to highlight from our work that in yeast, MVB biogenesis does have strong contributions from Did2 (CHMP1) and Vps60 (CHMP5), but not from Ist1 (IST1) and Chm7 (CHMP7) (Fig. 5). These have previously been under-emphasized in the literature.

      Reviewer #2 (Public Review):

      The manuscript by Emr and colleagues addresses the important question of how core ESCRT-III members Vps2 and Vps24 interact to form functional polymers using protein engineering and genetic selection approaches.

      Major findings are:

      Vps2 overexpression can functionally replace Vps24 in MVB sorting.

      Helix 1 N21K, T28A, E31K mutations, Vps2, were identified to be sufficient for suppression, concluding that Vps2 and its' over expression can replace the function of Vps24 and Vps2.

      Vps24 over expression does not rescue delta Vps2. The authors propose that this is due to the lack of the MIM and helix5 binding sites for Vps4 present in Vps2.

      Vps24 E114K mutation was identified to rescue deltaVps2 upon over expression and even better as a Vps24/Vps2 chimera suggesting that auto-activated Vps24 that can recruit Vps4 can functionally replace Vps2.

      Analyzing the effect of single ESCRT-III deletions on Mup1 sorting confirmed Snf7, Vps20, Vps2 and Vps24 as essential for sorting.

      In summary, the manuscript provides new insight into the assembly of ESCRT-III. It confirms some redundancy of VPS2 and Vps24 and shows how Vps2 can substitute Vps24 but not vice versa.

      We thank the reviewer for this summary of our work. One point we’d like to emphasize is that while we agree that Snf7, Vps20, Vps2 and Vps24 form a minimal core subunit to form MVBs, there are important functions of other ESCRT-III molecules Did2 and Vps60 (Figure 5 and supplement) for MVB biogenesis.

      Comments:

      The three minimal principles for ESCRT-III assembly stated in the abstract are not novel. Spiral formation of ESCRT-III has been described before for yeast Vps2-Vps24 as well as its mammalian homologues. The requirement for VPS4 recruitment is also well documented and finally, the manuscript does not provide proof for lateral association of the spirals via hetero-polymerization.

      We agree with the first two comments about spiral formation and Vps4 recruitment. We’d like to emphasize that the lateral association through heteropolymerization mechanism extends from our previous work (eLife 2019) and supported by this work through mutational analysis of Vps2’s helix-1 motif. In our previous work, we provided evidence of the association of Snf7’s helix-4 region with Vps24’s helix-1 region, and also lateral association of Snf7 and Vps24/Vps2 with in vitro assays. In the previous work, we didn’t characterize Vps2-Snf7 interaction, which we do further in this work. We find that charge-inversion mutations in Vps2 increases its affinity to Snf7, and this effect is sufficient to replace Vps24. We believe that these analyses strengthen our model and also enhance our knowledge of ESCRT-III polymerization. Therefore, this manuscript a strong extension/advance on our previous eLife paper, and both papers should be analyzed together.

      The authors show that 8-fold over expression is necessary to rescue Mup1 sorting to an extent of 40%. The authors hypothesize that over expression of Vps2 can rescue Vps24 deletion because Vps2 may have a lower affinity for Snf7 than Vps24. This is in agreement with data on mammalian homologues which showed that indeed CHMP3 binds with 10x higher affinity to CHMP4B than CHMP2A (Effantin et al, 2012). This could have been included in the discussion, since the function of yeast and mammalian core ESCRT-III proteins is most likely not different.

      We apologize for this oversight and have included appropriate reference to this paper in the next version.

      The authors designed several chimeric Vps24/Vps2 constructs and show that some of the Vps24 chimera including Vps2 helix 5 and the MIM are fully functional in Mup1 sorting in delta Vps24 cells, but lack the ability to functionally replace Vps2 in Vps2 delta cells. It is unclear whether the chimeras are in the closed conformation in the cytosol. It would be interesting to know whether they are activated more easily and possibly prematurely.

      With our current assays we cannot distinguish the open vs. closed conformations in solution vs. membrane for Vps24. We do not think that these chimeras are activated prematurely because they do remain functional (as highlighted by the reviewer) in vps24∆ strain.

      We’d like to thank the reviewer for pointing us to these mutants, which have encouraged us to further study these and related chimeras. To understand the role of swapping the Vps2 helix5 and MIM region further, we have added a couple of more experiments that would allow us to further understand the role of these motifs.

      We replaced the helix-5 and MIM regions of Vps2 onto Snf7 to ask whether this construct remains functional, and whether they can replace function of Vps24-Vps2 (by directly recruiting Vps4).

      In these set of data, we present evidence that when incorporated into Snf7, the helices 5 and MIM motifs of Vps2 make this chimeric Snf7 dysfunctional (Fig. 3 – Supp. 3). These data are consistent with the reviewers’ interpretation that premature recruitment of Vps4 to ESCRT-III filaments is presumably dysfunctional. However, inclusion of these motifs to Vps24 most likely does not prematurely disassemble ESCRT-III filaments, hence they remain functional. Also, mere substitution of the H5 and MIM motif to Snf7 (and therefore the Vps4 binding) is not sufficient for ESCRT-III function in cells.

      The larger point behind this set of analyses is that there are additional functions of Vps24-Vps2 beyond just recruitment of the AAA+ ATPase Vps4. Since we extensively analyzed the lateral association of Vps24-Vps2 to Snf7 in our previous manuscript (Banjade et al., eLife 2019), we ascribe these additional functions to lateral polymerization of Vps24-Vps2 on the Snf7 filament.

      The authors show that Vps24 E114K can form some kind of polymers in the presence of Vps2 in vitro while no polymerization is observed for wt Vps24 at 1 µM. It would be interesting to know whether wt Vps24 polymerizes at higher concentrations in this assay.

      We don’t observe polymers with 15 µM of Vps24 and 15 µM of Vps2, as the proteins start forming amorphous assemblies. We do refer to other manuscripts in the past who have observed similar linear polymers of Vps24 at higher concentrations (>300 µM) and longer incubation. So we believe that the ESCRT-III proteins Vps24 and Vps2 are able to form copolymers with a similar structure that is enhanced when these “activating” mutations are included.

      While the conclusion that E114K shifts the equilibrium to the open state is plausible, there is no evidence provided that this mimics Vps2 as stated. If so, Vps24 E114k should form the same polymers as shown in figure 4 supp 1 in the absence of Vps2 and spiral formation with snf7 should not require Vps2.

      We agree with this interpretation from the in vitro assays, and have appropriately changed the language in the manuscript. We now describe the effect of the E114K protein to “enhance” associated with existing Vps2. We hypothesize that this enhanced association to Vps2 occurs due to an “activation” process whereby Vps24 adopts a higher population of an open (or a semi-open) conformation, and have changed the language to reflect this interpretation. As an aside, we do note that Snf7 and Vps24 do form helices at higher concentrations without Vps2, as we showed in Banjade et al., eLife 2019.

      The speculation in the results section that Vps24 may not extend its helices 2 and 3 in an activated form due to potential helix breaking Asn residues in the linker region is not backed up by data, and it would have been appropriate to indicate this in the manuscript.

      We have now moved this analysis to the discussion and emphasized that this is a hypothesis. We also added the following sentence when describing the data regarding the mutations in the potential helix-breaking Asn residues: “We note that these data are indicative of mutations that control the conformations of the proteins. However, further biophysical analyses will be required for definitive evidence of this conformational flexibility.”

      The proposal that Vps2-Vps24 heteropolymers are formed by interactions along helices 2 and 3 is not supported by data presented in the manuscript. The authors would need to use recombinant proteins to test their mutants in biophysical interaction studies.

      We have now moved this interpretation to the discussion. Further dissection with biochemical and biophysical assays of Vps24-Vps2 would be a future direction in this area.

      Reviewer #3 (Public Review):

      This study sought to identify essential features of ESCRT-III subunits, with a focus on the yeast proteins Vps2 and Vps24, in order to reveal the required features of both subunits. The combined genetic and biochemical studies solidified the model that essential functions of ESCRT-III polymers - spiral formation, lateral association, and binding of Vps4 - are mostly distributed between different subunits (with some redundancy) and can be engineered into a single polypeptide. This study also sheds light on the long-standing and initially surprising finding that ESCRT-dependent budding of HIV does not require CHMP3 (Vps24), presumably because the distribution of distinct functions between different ESCRT-III subunits is not absolute.

      Inspired by earlier studies, the ability of overexpression of one ESCRT-III subunit to compensate for deletion of another subunit was explored using sorting assays. The demonstration of partial rescue inspired a mutagenesis approach that identified three residues that cluster on one face of a helix that enhanced rescue, and therefore confer functionality that in wt is primarily provided in the deleted subunits, which in this case is binding to Snf7. Extension of this analysis by protein engineering further demonstrated that the essential role of recruiting the Vps4 ATPase is normally performed by Vps2 but can be transferred to Vps24 by substitution of residues near the ESCRT-III subunit C-terminus. Similarly, it is shown that sequences that alter the propensity for bending of a helix at a point where open and closed ESCRT-III subunits differ in conformation contributed to the ability of Vps24 to substitute for deletion of Vps2, presumably by conferring the ability to adopt the open, activated conformation as well as the closed conformation.

      I don't have concerns about design or technical aspects of the experimental approach.

      We appreciate the reviewer’s comments and the summary of our work.

    1. Author Response:

      Reviewer #2:

      The manuscript "Adult Stem Cell-derived Complete Lung Organoid Models Emulate Lung Disease in COVID-19" by Das and colleagues introduces a new model system of airway epithelium derived from adult lung organoids (ALO) to be utilised for the study of COVID-19-related processes. In this manuscript two main novelties are claimed: the development of a new model system which represents both proximal as distal airway epithelium and a computationally acquired gene signature that identifies SARS-CoV-2-infected individuals. While interesting data are presented, the novelty claim is questionable and the data is not always convincing.

      Strengths:

      Multiple model systems have been developed for COVID-19. The lack of a complete ex vivo system is still hampering quick development of efficient therapies. The authors in this manuscript describe a new model system which allows for both proximal and distal airway infectious studies. While their claim is not completely novel, the method used can be used in other studies for the discovery of potential new therapies against COVID-19. Moreover, their computational analyses shows the promise of bioinformatics in discovering important features in COVID-19 diseased patients which might elucidate new therapeutic targets.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated and their model system is not completely novel. That is, insufficient analyses are performed to fully support the key claims in the manuscript by the data presented. In particular:

      The characterisation of the adult lung organoids and their monolayers is insufficient and sometimes incorrect. Their claims are based on contradicting data which includes cell composition in the culture system. Therefore, the claim of a novel model system seems invalid and rushed. Moreover, the characterisation of a new gene signature is based on this model system which has been infected with SARS-CoV-2. The infection however is hard to interpret and therefore claims are hard to validate.

      First of all, we thank the reviewer for a very thoughtful and in-depth review that inspired us to do additional analyses to address the criticisms that we believe are not just fair and justified, but also constructive. Coming from a thought leader in the field, they also squarely point at essential areas where we needed to make improvements with additional analyses. For that, we are grateful.

      There appears to be three major critiques:

      (i) NOVELTY: The reviewer questions whether the model itself is novel, and asked how this is any different from the previously published manuscript (Lamers et al., in EMBO J (2021)40:e105912) describing lung organoids with mixed cellularity, also claimed as complete with proximal and distal components that was publicly released after ours was submitted to eLife.

      (ii) INCOMPLETE: The reviewer noted that the model system was not fully characterized to have reached that potential and the impact of culture systems on cell composition and such details were incompletely analyzed (hence, rushed and incomplete).

      (iii) CLARITY ON GENE SIGNATURE: Characterization of organoids with new gene signature was added to interpret.

      Overview of how we tackled these three points head on:

      (i) NOVELTY: As for what is novel in our model (i.e., ALO), and how does it compare to the model described by Lamers et al., in EMBO J, 2021, we provide metrics for which is closer to the human disease when infected with SARS-CoV-2. Figure 6H was added.

      (ii) INCOMPLETE: We agree about the ‘rushed’ aspect. We were working amidst a pandemic to race to the finish line. But during the revision we could add much more characterization data, which we hope mitigates the concerns raised by this reviewer. Three new figures (Figure 2- Figure Supplement 3-4-5) and some IF panels in main Figure 3 were added.

      (iii) GENE SIGNATURE: As for the characterization of the SARS-CoV-2 infected ALO-derived monolayers using a new gene signature, we apologize that we might not have written with sufficient clarity as to what was the source of the signature. In the revised version of the manuscript, we have now explicitly stated in the edited Figure panel 6A that the signature (166-gene viral pandemic signature, a.k.a. ViP) was derived from human clinical samples, after a comprehensive analysis of > 45,000 datasets. This paper has been accepted in eBioMedicine in April 2021, and the preprint is available in BioRxiv 2020-PMID: 32995790.

      Reviewer #3:

      The authors have developed a new culture method to expand adult lung cells in vitro as 3-D organoids. This culture system is different from previous organoid cultures which include either bronchiolar, or alveolar, lineages. Rather, the authors attempted to preserve both lineages over long-term passaging. The 3-D cultured organoids can be dissociated and re-plated as 2D monolayers, which can be either cultured immersed in medium or in air-liquid interface (ALI) conditions, exhibiting a different bias towards alveolar and airway lung cell types respectively. The 2D monolayer cultures can be infected by COVID-19 virus and showed a progressive increase in virus load, which was distinct from iPSC- derived alveolar type 2 (AT2) cell and bronchiolar epithelial cell culture control infections. Through bioinformatics analysis, the authors were able to show that their monolayer cultures acquired similar immune response features to an in vivo COVID infection dataset, indicating that this culture system may be suitable for modeling COVID infection in vitro. It is particularly interesting that the bioinformatics analyses suggested that this adult human lung organoid system, with both airway and alveolar phenotypes, showed greater resemblance to the transcriptional immune response of severely COVID-infected lungs than either cultured cell type alone. This aspect of the manuscript strongly suggests that the authors' approach of developing a mixed lung organoid model is an extremely good one.

      However, the data presented in figures 2 and 3 cast serious doubts over the long-term reproducibility of the organoid system. That individual organoids contain both airway and alveolar lineages has not yet been convincingly demonstrated (Fig 2). In addition, bulk RNAseq experiments illustrate that the overall cell composition of the cultures drifts significantly during long-term passaging (Fig 3). Due to this variability, the organoids' ability to act as a suitable model for viral infections that would be amenable to drug screening approaches is also questionable.

      We thank the reviewer for the generally positive nature of the comments. The reviewer made some key and thoughtful suggestions on how to improve the manuscript; we greatly appreciate the effort and time that went into making them. Besdes the encouraging comments and the suggestions, the reviewer also raised some criticisms that are along the same lines as those that were raised also by Reviewers 1 and 2. We have tried our best to address these criticisms and agree that mitigating these are essential for widespread acceptance of the model by others.

    1. Author Response:

      Reviewer #1:

      The authors sought to assess the relationship between developmental lineage and connectivity.

      This is a tour de force. It relies on detailed EM reconstructions, knowledge of complete neuroblast lineages thus correlating wiring with lineage, and through genetic manipulations of N gene function correlates developmental programs with wiring. The conclusion is important and provides a well described cellular and genetic system for linking the developmental program of a cell to its connection specificity. It provides a framework for considering how to study these questions in other regions of Drosophila and can be extended to the study of more complex mammalian systems where a similar neuroblast-lineage strategy generates different neuron types.

      There are no major weakness.

      This is an excellent study and, in my opinion, is ready to publish in its current form.

      We appreciate this comment!

      Reviewer #2:

      The conclusions of this paper are mostly well supported by data, however, there are several points that should be discussed further in the manuscript:

      1) The authors state that overexpression of Notchintra transforms Notch OFF neurons into Notch ON neurons. However, since this decision happens at the level of the GMC, wouldn't be more correct to say that Notch OFF neurons were not produced and only Notch ON neurons were generated? Moreover, the authors state that the Notchintra overexpression phenotypes are due to hemilineage transformation rather than to death of Notch OFF neurons, by providing the total neuronal number in both experimental conditions using NB5-2 lineage. I think this statement is too much of a generalization when only one NB lineage has been analyzed and should be addressed in more lineages to claim this as a general mechanism. Moreover, the opposite hypothesis could have also been tested to make the argument stronger: Would depletion of Notch in GMCs make all neurons in a lineage target the ventral neuropil domain?

      We agree, and now provide cell counts for WT and Notch-intra in all four lineages (5-2, 7-1, 7-4, and 1-2) in the text. In all cases, the number of neurons in wild type and Notch-intra lineages are not significantly different, supporting the Notch OFF to Notch ON transformation. We don't say that Notch-OFF neurons are missing, because there is no loss of neurons from the lineage, but rather the neurons that would have been Notch-OFF in wild type are now duplicating the Notch-ON neurons. Regarding presenting the opposite transformation, we tried to do it with misexpressing UAS-numb, but were unable to get the expected positive control phenotype in which all five Eve+ U neurons are transformed to Eve-negative siblings (Skeath and Doe, 1998). Thus, we were not able to do lineage-specific Notch inhibition. Unfortunately, we can’t use whole embryo N or N pathway mutants, as has been done before (Skeath and Doe, 1998), because they have massive disruption in the CNS that obscures lineage specific axon phenotypes.

      2) Temporal cohorts described in this work are an approximation to neuronal temporal identity. The authors validate the correlation of early and late temporal cohorts to the expression of the temporal TFs Hb and Cas (Fig 4G). Given the resolution of the TEM dataset and the existence of specific NBs and neuronal drivers for the neurons studied, a correlation between the 4 temporal cohorts presented in this work and the 4 temporal TFs Hb, Kr, Pdm and Cas expressed by these neurons could have been possible and would have presented a more comprehensive view of the relationship between tTF expression and neurite and synapse localization. Does temporal cohort between lineages (cortex neurite length) mean expression of the same temporal TF? For example: would mid-early neurons in different lineages express the same temporal factor?

      Excellent question! We show that radial position is a proxy for temporal identity, but the precise relationship of Hb, Kr, Pdm, and Cas expressing neurons to the four radial “bins” we describe remains unknown. In fact, a graduate student is doing these experiments by generating MCFO single neuron clones in newly hatched larvae (the stage of the TEM volume) and staining with Hb, Kr, and Cas temporal transcription factors (it is impossible to so this with Pdm because neurons lose expression at stage 15). This will be many months of work and probably over a thousand MCFO+ neurons to analyze, and we feel it is beyond the scope of the paper -- although very important and very interesting! Plus, we are still limited in lab time due to University of Oregon covid restrictions.

      Since shared temporal identity between different lineages on its own does not confer shared neuronal projections, but shared temporal cohort hemilineage does: Does this mean that the expression of a given temporal TF and/or neuronal birth order does not play a role in this shared connectivity? Please clarify these ideas in the text.

      We have tried to clarify this in the text. Whereas temporal identity alone has no detectable role in generating common synapse localization or connectivity, it does have some role in the context of hemilineage identity. That is, hemilineage temporal cohorts have more shared synapse localization and connectivity than either temporal or hemilineage identity alone. See Figure 6 for synapse localization, and Figure 7 for connectivity data.

      3) Although the authors claim so, it is not convincing that the role of spatial patterning in neuronal connectivity has been assessed in this work, since the authors do not present an obvious correlation between specific connectivity features (morphology, axon or synapses localization) and the position of a given NB in the VNC. This should be clarified in the text.

      Great point! We agree that spatial patterning was not directly tested in our manuscript, thank you for pointing this out. Our claim that spatial patterning is involved is simply based on the idea that lineages (and thus hemilineages) are more related to one another than neurons from other hemilineages suggesting that the identity of the parent neuroblast plays some role. You make the excellent point that we did not look at the relationship of projections from all NBs in a “row” or “column” within the NB array. That analysis would potentially reveal a role for spatial factors in determining neuron projections. Unfortunately, we have a very limited set of neurons from any one row or column, not enough to make claims about direct relationships between row or column identity and targeting/connectivity.

      Reviewer #3:

      Specific comments:

      1) Figure 1; page 3: The authors refer to the "striking" similarity between EM reconstructions and GFP filled clones and yet there are clear differences in some of the clones in the extent and localization of arborization. This may be in part technical but almost certainly also reflects inter individual differences in single neuron morphology. Since EM reconstructions presumably come for, one animal, the use of GFP clones allows the authors to map the degree of variation between clones and it would be interesting for them to show this.

      That is an interesting point. Elegant work from Tzumin Lee and Jim Truman have shown that clones from larval neuroblasts are very similar, and our qualitative findings support this conclusion. Thus, it would be a quite minor advance for us to quantify clonal similarity in embryonic neuroblasts. Plus, since the number of neurons in a clone varies slightly, we would have to count neuron numbers per clone and only compare those with identical neuron numbers, which is possible but time-consuming. Then there are the covid restrictions which make it difficult to rapidly generate new clones to increase the number with identical neurons. All in all, we decided that the benefit of answering this question was not worth the cost of performing it, and that other experiments were a higher priority in our limited research time. We have toned down the language to remove the word “striking” in the Introduction.

      2) Figures 2 and 4; pages 3-5: Along the same lines as above, the authors make categorical statements about the mapping of arbors to dorsal and ventral regions of the nerve cord and correlate that to hemilineage identity. Again, there is clear mixing in almost all neuroblast lineages, that seems to range from 15-30% as a rough estimate, and perhaps a bit more dorsally than ventrally, which the authors do not comment on (except to say it's "mostly non-overlapping"). This is a pity because they obviously have the tools to do so quantitatively and the information is already there in their data.

      Yes, good point – there is some overlap in most lineages for both axon/dendrite targeting (Figure 2) and synapse targeting (Figure 4). We now quantify the synapse similarity and observe that hemilineage-related neurons have much greater synapse similarity than they have with their sister hemilineage. The non-overlapping relationship between hemilineages is somewhat obscured by the simple posterior view shown in Figures 2 and 4, so we add a new figure (Figure 4 – supplement 2) that shows hemilineage synapse targeting in all three axis: A/P, M/L, and D/V. This makes it possible to see the true relationship.

      3) The analysis of Notch activity in hemilineages is excellent and very interesting, as is the new tool they develop. However, the analysis lacks loss of Notch function data and where and when Notch signaling is required to segregate the connectivity space (i.e. in neurons or in precursors such as Nbs and GMCs). Is this a binary fate specification mechanism or lateral inhibition among competing neurons? What about Notch activity manipulation in single neurons? If the authors wish to draw strong conclusions about the role of Notch in segregating target space and its relation to hemilineage identity, these experiments are essential. Alternatively, drawing subtler conclusions and acknowledging these caveats would be very welcome.

      Great point about the possible role of non-canonical Notch signaling in post-mitotic neurons (PMID: 22608692). We do not have the tools to perform lineage-specific, axon-specific removal of Notch protein. In theory we could do single neuron MARCM experiments, but these are extremely difficult due to the perdurance of the Gal80 protein, which would prevent us from assaying in newly hatched larvae. We add a Discussion section addressing the unresolved issue of post-mitotic neuron Notch function: “Another point to consider is the potential role of Notch in post-mitotic neurons (Crowner et al., 2003), as our experiments generated Notch-intra misexpression in both new-born sibling neurons as well as mature post-mitotic neurons. Future work manipulating Notch levels specifically in mature post-mitotic neurons undergoing process outgrowth will be needed to identify the role of Notch in mature neurons, if any.”

      4) Figure 7; Page 7: The authors state that 75% of hemilineage neurons correlated by temporal identity are separated by 2 synapses or less, suggesting greater connectivity than expected. How are these data normalized? What is the expected connectivity between neurons that are less related along these two developmental axes?

      Thanks for the question, which helped us change the text for clarity. The quantifications in Figure 7 actually do compare connectivity between unrelated neurons. Thus, we have changed “random” to “unrelated” in the text and figure legends. Additionally, the methods for this analysis were obviously not clear enough, so we have updated them with this text below:

      Path Length Analysis:

      We computed the pairwise path length between all hemilineages as well as all sensory and motor neurons in A1 in the undirected connectivity graph. We found that neurons that are unrelated by developmental grouping had an average path length greater than that of neurons related by hemilineage. Additionally, we found that the average path length for neurons related by hemilineage alone had an average path length greater than that of neurons in hemilineagetemporal-cohorts. For this analysis, unrelated neurons were defined as neurons that were in the same D/V axis (i.e. dorsal to dorsal and ventral to ventral) and same hemisegment (left or right), but not in the same hemilineage. Hemilineage comparisons were neurons in the same hemilineage, but not in the same temporal cohort. Significance was determined with a two-sample KS test on the empirical distributions of pairwise path lengths.

      Independent of path length, we also calculated connectivity similarity between related neurons in Figure 8. Similarity here was defined as the cosine of the angle between the input or output vectors of each neuron. Similarity by this metric was also found to be greater for developmentally related neurons. Finally, we added this line to Figure 7 legend to clarify normalization: “Frequency corresponds to the fraction of pairwise distances observed for each group.”

      5) Figure 8; page 7 and discussion: The authors conclude that the combination between temporal identity and hemilineage identity predicts connectivity beyond what would be predicted by spatial proximity alone. This conclusion is problematic at least two levels. First, practically what really matters for proximity is proximity during the time in development when synapses are forming between neuronal pairs, not proximity at the end in the final pattern.

      This is a good point that we need to clarify, although we note that synaptic connectivity is not a "one and done" in the embryo, but rather a continuous process that extends from the late embryo into the third larval instar ("Conserved neural circuit structure across Drosophila larval development revealed by comparative connectomics" by Gerhard, Andrade, Fetter, Cardona, and Schneider-Mizell, eLife 2017).

      Nevertheless, we now add the following additional text to the Results and to the Discussion. To the Results: “Interestingly, even neurons with the highest observed levels of overlap were not always connected (Figure 8A''). Thus, proximity alone can't explain the observed connectivity, consistent with a role for hemilineage-temporal cohorts providing increased synaptic specificity. Of course, our assays are in newly hatched larvae, and it is likely that dendritic arbors are more widely distributed during circuit establishment in the late embryo (Valdes-Aleman et al., 2021), yet only a specific region of the neuropil is targeted by larval hatching, which suggests the initial broad dendrite targeting is not sufficient to establish connectivity to many neurons contacted by these early dendrites, again arguing against a simple proximity mechanism.” To the Discussion: “Our results strongly suggest that hemilineage identity and temporal identity act combinatorially to allow small pools of neurons to target pre- and postsynapses to highly precise regions of the neuropil, thereby restricting synaptic partner choice. Yet precise neuropil targeting is not sufficient to explain connectivity, as many similarly positioned axons and dendrites fail to form connections (Figure 8C), despite active synapse addition throughout larval life (Gerhard et al., 2017).”

      Second, conceptually, opposing spatio-temporal mechanisms with proximity-based bias for connectivity makes no sense because that's exactly what spatio-temporal mechanisms achieve: getting neurons to the same space at the same time so connectivity can happen. At any rate, drawing strong conclusions about where and when neurons meet to form (or not form) synapses requires live imaging and absent that authors should refrain from making such a string statement about what their excellent correlative dataset means.

      Yes, spatiotemporal mechanisms get axons (or dendrites) to precise neuropil domains, but that does not invariably generate connectivity. What is interesting is that hemilineage-temporal cohorts share more connectivity than predicted by proximity alone. Thus, proximity is necessary but not sufficient for proper connectivity. An additional mechanism is in play, and our data suggests that is due to the neuron's hemilineage-temporal identity. We agree that our data are correlative – shared development correlates with shared connectivity – so we have moved any suggestion of possible mechanism from the Results to the Discussion. We agree this is an important change that will increase manuscript accuracy, and also provide a clear future direction for mechanistic experiments. Thanks for helping us focus the paper better.

    1. Author Response:

      Reviewer #1:

      In this manuscript the authors show that a designer exon containing a Fluorescent Protein insert can be used to edit vertebrate genes using an NHEJ based repair mechanism. The approach utilizes CRISPR to generate DSBs in intronic sequences of a target gene along with excision of a donor fragment from a co-transfected plasmid to initiate insertion of the exon cassette by ligation into the chromosome DSB.

      I like the idea here of inserting FP sequences (and other tags) into introns in this way. Focusing on the N- and C-termini for insertions has always seemed arbitrary to me. In practice these internal sites may even tolerate tag insertions better than the termini. However, this remains to be seen.

      My major reservation with this study is that the concepts here are not particularly novel. The approach is very similar to a concept already well established in gene-therapy circles of using introns as targets for inserting a super-exon preceded by a splice acceptor to correct inborn genetic lesions. The methodology employed is essentially HITI (https://www.nature.com/articles/nature20565).

      What is new is the finding that FP insertions are frequently expressed and at least partly functional as evidenced by their ability to localize to the expected intracellular structures. However, no actual functional data is provided in this study so it remains to be seen how frequently the insertion of FP exons is tolerated. It would help the study substantilly to have functional information for a few insertions.

      The value and utility of this study hinges on whether insertions of this type frequently retain function. The authors speculate that "labeling at an internal site of a gene is feasible as long as the insertion does not disrupt the function of the encoded protein. Many introns reside at the junctions of functional domains because introns have evolved in part to facilitate functional domain exchanges (Kaessmann et al., 2002; Patthy, 1999)." Thus an analysis of how often intron tags are tolerated as homozygotes would be helpful for users who will worry that a potentially "quick and dirty" CRISPIE insertion might not accurately report on the function and localization of their protein of interest.

      We thank the reviewer for appreciating our idea. CRISPIE is indeed improved HITI, with the notable difference that the insertion takes place at the intronic region and that a designer intron/exon module is used. This design has a significant benefit in that INDELs in both labeled and unlabeled alleles will be unlikely to cause mutations at the levels of mRNA and proteins. CRISPIE is also different from the super-exon, which is now cited (Bednarski et al, 2016). CRISPIE does not involve the 3’ UTR and the poly A signal. This makes the donor template more standardized and smaller. Transcriptional controls embedded in endogenous introns after the editing sites can be retained in CRISPIE, but not when super-exons are used. We also achieve much higher efficiency in vivo than previous editing methods, which we feel is an important advance.

      We now provide three different experiments to address the function of CRISPIEd β-actin and, in one experiment, the function of CRISPIEd α-tubulin 1B. One of the key functions of the cytoskeleton is to support growth. We now show that neither CRISPIE labeling of β-actin (hACTB), at two different intronic loci, and nor CRISPIE labeling of α-tubulin 1B (TUBA1B) affect the growth of U2OS cells (New Experiment #1; Figure 1H, and Figure 1-figure supplement 4), suggesting that labeled β-actin and α-tubulin are functional. In addition, as suggested, we now demonstrate that cells homozygous for CRISPIE insertions are viable and able to divide (New Experiment #2; Figure 4-figure supplement 1). We also show that two important neuronal functional parameters – the mEPSC frequency and amplitude – are not altered by CRISPIE labeling of hACTB in neurons in cultured hippocampal slices. (New Experiment #3; Figure 5– figure supplement 2).

      Having shown the above results, we also hope to emphasize that, although CRISPIE provides a way to perform FP tagging of endogenous protein with high efficiency and low error rates, it cannot ensure that FP-tagging itself is benign for all proteins. Numerous studies have overexpressed FP-tagged proteins, which is well documented to have side effects. The CRISPIE method empowers researchers by allowing them to tag endogenous proteins without overexpression. However, if the FP-tagging itself affects protein function, CRISPIE will not be helpful. Each FP-tagging project, whether it is based on CRISPIE or other methods, will requires its own systematic characterization. We have now made this clear in the discussion (pg. 17): “… although CRISPIE enables the tagging of endogenous proteins with low error rates, it does not ensure that the tagged protein functions the same as the wild-type protein. Not all tagging is benign, and rigorous characterizations will be needed for each tagging experiment.”

      Other comments:

      1) Were homozygotes identified and were they viable in each instance?

      We now provide data showing that cells homozygous for CRISPIE insertions are viable and able to divide (New Experiment #2; Figure 4-figure supplement 1).

      2) You say: "The CRISPIE method should be broadly applicable for use with different FPs or with other functional domains, different protein targets, and different animal species." I don't know if you optimized your FP to avoid potential reverse strand splice acceptors, but some discussion of this important point should be made so that those trying to apply the approach will make sure that strong acceptors are not included accidentally in reverse oriented inserts.

      Our RT-PCR does not detect reversed inserts at the mRNA level. We now add in the Discussion that donor design needs to eliminate unintended splicing sites in the reverse orientation. We write (pg. 17): “It should also be noted that, when designing the donor template, care should be given to not create unintended splicing acceptor sites in the inverted orientation. Otherwise, inverted insertion events can cause mutations at the mRNA and protein levels.”

      3) Would your mRNA sequencing methodologies detect defective transcripts where the splice acceptor and a portion of the upstream FP exon was inserted causing a frame shifted and mispliced mRNA? Such mRNAs would be unstable due to NMD and thus not detected readily in a PCR based approach. Thus disruption of the mRNA by partial insertion of your donor (or fragments of the other co-injected DNA) might be much more widespread than is measured here. This could be tested by recovering clones that partially inserted the donor in the forward orientation and carefully monitoring for defects in mRNA splicing of the inserted allele. Were such clones detected and how frequently?

      Our method should detect defective mRNAs, if they are not degraded. However, if defective mRNAs are quickly degraded, they are not measured in our current RT-PCR and NGS experiments, as described in Figure 2. While we cannot address this question directly, we now provide evidence that the cell growth and neuronal function after CRISPIE labeling of β-actin remain normal.

      We also thank the reviewer for suggesting the cloning approach. This proposed experiment, however, may potentially be affected if potential defective mRNAs can result in decreased cell survival/growth. Although this experiment will require time beyond the three-month revision period expected by eLife due to the length of time required to clone cells, we will keep this in mind in our future efforts.

      4) You note that in the case of vinculin the coding sequence of the last exon of hVCL was included in the insertion donor sequence, and a stop codon was introduced at the end of the mEGFP coding sequence. This is essentially the strategy for super-exon insertion into targets for gene therapy, instead of a splice donor on the C-terminus you include a stop codon. You should site these previous studies. Inclusion of a stop codon in frame would be expected to cause NMD, did you also include transcription termination signals?

      NMD will happen if the stop codon is further than about approximately 50 nucleotides upstream of any exon-junction complexes (Lewis et al, PNAS 2009). However, NMD won’t occur if it is within 50 nucleotides. For example, synaptophysin – a highly expressed neuronal protein – has its stop codon at its second to last exon within 50 nucleotides of the exon junction. The stop codon we used for labeling hVCL is also within 50 nucleotides (~20 nt) of the exon junction.

      We now cite Bernarski et al, 2016, which describes the use of super-exons in gene therapy. At the same time, we think that our approach is still different from the super-exon concept. After the stop codon, the 3’ UTR is not included. Instead, a splicing donor is included, allowing the exon to be spliced to the subsequent endogenous exon. This allows the insert to remain small for high insertion efficiency and makes it easy to produce the template (some 3’ UTRs can be several kilobase pairs in length), while utilizing the endogenous translational controls built into the native 3’ UTR.

      Reviewer #2:

      In-frame insertion of fluorescent protein tags into endogenous genes allows observation of protein localization at native expression levels, and is therefore an essential approach for quantitative cell biology. Once limited to unicellular model organisms such as yeast, endogenous gene tagging has become well-established in invertebrate model systems such as C. elegans and Drosophila since the advent of CRISPR technology in the last decade. However, a robust and widely accepted endogenous gene tagging strategy for mammalian cells has remained elusive. This is largely due to the fact that homologous recombination, the method used to create knock-ins in invertebrates, is inefficient (or sometimes doesn't work at all) in mammalian cells, especially those that do not divide rapidly.

      Several studies have attempted to bypass the need for homologous recombination by using a different method, non-homologous end joining (NHEJ) to insert GFP tags into vertebrate genomes (e.g. Auer et al. Genome Res 2014; Suzuki et al. Nature 2016; Artegiani et al. Nature Cell Biol. 2020). Such approaches can be orders of magnitude more efficient than homologous recombination, but the generated alleles require careful validation because of the error-prone nature of NHEJ.

      Here, Zhong and colleagues improve upon the existing NHEJ-based gene tagging approaches by designing synthetic exons (comprising a FP coding sequence with 5' and 3' splice sites) that can be inserted into native introns using NHEJ. The beauty of this approach is that any mutations (indels) created by the error-prone NHEJ repair mechanism are spliced out, and therefore do not affect the sequence of the encoded protein. A limitation is that tags must be inserted internally within a protein of interest and cannot be targeted to the extreme N- or C-terminus, but this limitation is clearly stated and discussed by the authors. Overall, this is a novel (to my knowledge) and powerful strategy that is likely to advance the field.

      We thank the reviewer for the very positive comments regarding our CRISPIE method.

    1. Author Response:

      Reviewer #1:

      In this manuscript Lituma et al. provides compelling evidence demonstrating the physiological role of presynaptic NMDA receptors at mossy fiber synapses. The existence of these receptors on the presynaptic site at this synapse was suggested more than 20 years ago based on morphological data, but their functional role was only shown in a single abstract since then (Alle, H., and Geiger, J. R. (2005)). The current manuscript uses a wide variety of complementary technical approaches to show how presynaptic NMDA receptors contribute to shaping neurotransmitter release at this synapse. They show that presynaptic NMDA receptors enhance short-term plasticity and contribute to presynaptic calcium rise in the terminal. The authors use immunocytochemistry, electrophysiology, two-photon calcium imaging, and uncaging to build a very solid case to show that these receptors play a role at synaptic communication at mossy fiber synapses. The authors conclusions are supported by the experimental data provided.

      The study is built on a solid and logical experimental plan, the data is high quality. However, the authors would need to provide stronger evidence to demonstrate the physiological function of these receptors. It is hard to reconcile these experimental conditions with the authors' claim in the abstract: 'Here, we report that presynaptic NMDA receptors (preNMDARs) at hippocampal mossy fiber boutons can be activated by physiologically relevant patterns of activity'. We know that extracellular calcium can have a very significant impact of neurotransmitter release and how short-term plasticity is shaped. For this reason, it would be important to explore how the activity of these receptors at more physiological calcium concentrations contribute to calcium entry and short-term plasticity at these synapses.

      We thank the reviewer for noting our study is “built on a solid and logical experimental plan, the data is of high quality”. We agree with the reviewer that exploring the role of preNMDAR under more physiological conditions is extremely important. In response, we have performed new experiments at 35 ºC and at a more physiological 1.2 mM Ca+2 and 1.2 mM Mg+2 concentrations. Our new results, now included in Figure 4-figure supplement 1, demonstrate that our conclusion that preNMDARs at mossy fiber boutons can be activated by physiologically relevant patterns of activity is also true under more physiological recording conditions.

      Reviewer 2:

      Lituma et al. examined the presence and functions of preNMDARs in dentate gyrus granule cells (GCs) in the hippocampus. The authors found that GluN1+ preNMDARs are indeed present at mossy fiber (mf) terminals with electron microscopy. With pharmacological and genetic approaches, the authors showed that preNMDARs are important in low frequency facilitation (LFF), burst-induced facilitation and information transfer at the mf-CA3 synapse. The authors further demonstrated that this preNMDAR contribution is independent of the somatodendritic compartment of the GCs. With 2-photon calcium imaging, the authors found that preNMDARs contribute to presynaptic Ca2+ transients and can be activated by local glutamate uncaging. Separately, the authors showed that GluN1+ preNMDARs might also contribute to BDNF release at mossy fiber terminals during repetitive stimulation. Lastly, non-postsynaptic NMDARs specifically mediates mf transmission onto mossy cells, similar to mf-CA3 synapses, but not interneurons. The authors concluded that preNMDARs mediate synapse-specific transmission originating from the GCs/mf inputs.

      Overall, the study provides compelling evidence from a battery of techniques, ranging from EM, pharmacology, genetic deletion, electrophysiology to 2-photon imaging/uncaging. The data supports a coherent story on the presence of preNMDARs at mf terminals and that preNMDARs play important roles in LFF.

      In conclusion, this study reveals how NMDA receptors can be found in unexpected locations and how they may have unconventional functions, i.e. outside the narrow textbook view that they primarily serve as coincidence detectors in Hebbian learning. This study thus helps to change the way we think about NMDA receptor functioning, so should be of broad interest.

      We appreciate the reviewer’s comments that our study provides compelling evidence for the presence and role of preNMDARs at mossy fiber terminals. We also agree with the reviewer that our study challenges the way we think about NMDA receptor function.

      Reviewer #3:

      In this manuscript Lituma and colleagues investigate a potential role for presynaptic NMDARs at hippocampal mossy fiber (MF) synapses in regulating synaptic transmission. The combined use of electron microscopy, electrophysiology, optogenetics, calcium imaging, and genetic manipulations expertly employed by the authors yields high quality compelling evidence that presynaptic NMDARs can participate in activity dependent short term facilitation of release onto postsynaptic CA3 pyramid and mossy cell targets but not onto inhibitory interneurons. Moreover, presynaptic NMDAR activation is demonstrated to be particularly effective in promoting BDNF release from MF boutons. The investigation is well designed with a clear hypothesis, appropriate methodological considerations, and logical flow yielding results that fully support he authors conclusions. The manuscript fills an important gap in our understanding of MF regulation by unambiguously confirming a functional role for presynaptic NMDARs that were first described anatomically at MF terminals nearly 30 years ago. Combined with a handful of other studies describing presynaptic NMDARs at various central synapses this study expands the role of NMDARs as critical players in synaptic plasticity on both sides of the cleft.

      We very much appreciate the reviewer’s positive remarks of our study as “well designed with a clear hypothesis, appropriate methodological considerations, and logical flow”. We concur that the manuscript fills an important gap in understanding MF regulation by preNMDARs and expanding the role of NMDARs in synaptic plasticity on both sides of the cleft.

    1. Author Response:

      Reviewer #1 (Public Review):

      The work by Wang et al. examined how task-irrelevant, high-order rhythmic context could rescue the attentional blink effect via reorganizing items into different temporal chunks, as well as the neural correlates. In a series of behavioral experiments with several controls, they demonstrated that the detection performance of T2 was higher when occurring in different chunks from T1, compared to when T1 and T2 were in the same chunk. In EEG recordings, they further revealed that the chunk-related entrainment was significantly correlated with the behavioral effect, and the alpha-band power for T2 and its coupling to the low-frequency oscillation were also related to behavioral effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

      Overall, I find the results interesting and convincing, particularly the behavioral part. The manuscript is clearly written and the methods are sound. My major concerns are about the neural part, i.e., whether the work provides new scientific insights to our understanding of dynamic attention and its neural underpinnings.

      1) A general concern is whether the observed behavioral related neural index, e.g., alpha-band power, cross-frequency coupling, could be simply explained in terms of ERP response for T2. For example, when the ERP response for T2 is larger for between-chunk condition compared to within-chunk condition, the alpha-power for T2 would be also larger for between-chunk condition. Likewise, this might also explain the cross-frequency coupling results. The authors should do more control analyses to address the possibility, e.g., plotting the ERP response for the two conditions and regressing them out from the oscillatory index.

      Many thanks for the comment. In short, the enhancement in alpha power and cross-frequency coupling results in the between-cycle condition compared with those in the within-cycle condition cannot be accounted for by the ERP responses for T2.

      In general, the rhythmic stimulation in the AB paradigm prevents EEG signals from returning to the baseline. Therefore, we cannot observe typical ERP components purely related to individual items, except for the P1 and N1 components related to the stream onset, which reveals no difference between the two conditions and are trailed by steady-state responses (SSRs) resonating at the stimulus rate (Fig. R1).

      Fig. R1. ERPs aligned to stream onset. EEG signals were filtered between 1–30 Hz, baseline-corrected (-200 to 0 ms before stream onset) and averaged across the electrodes in left parieto-occipital area where 10-Hz alpha power showed attentional modulation effect.

      To further inspect the potential differences in the target-related ERP signals between the within- and between-cycle conditions, we plotted the target-aligned waveforms for these experimental conditions. As shown in Fig. R2, a drop of ERP amplitude occurred for both conditions around T2 onset, and the difference between these two conditions was not significant (paired t-test estimated on mean amplitude every 20 ms from 0 to 700 ms relative to T1 onset, p > .05, FDR-corrected).

      Fig. R2. ERPs aligned to T1 onset. EEG signals were filtered between 1–30 Hz, and baseline-corrected using signals -100 to 0 ms before T1 onset. The two dash lines indicate the onset of T1 and T2, respectively.

      Since there is a trend of enhanced ERP response for the between-cycle relative to the within-cycle condition during the period of 0 to 100 ms after T2 onset (paired t-test on mean amplitude, p =.065, uncorrected), we then directly examined whether such post-T2 responses contribute to the behavioral attentional modulation effect and behavior-related neural indices. Crucially, we did not find any significant correlation of such T2-related ERP enhancement with the behavioral modulation index (BMI), or with the reported effects of alpha power and cross-frequency coupling (PAC). Furthermore, after controlling for the T2-related ERP responses, there still remains a significant correlation between the delta-alpha PAC and the BMI (rpartial = .596, p = .019), which is not surprising given that the PAC is calculated based on an 800-ms time window covering more pre-T2 than post-T2 periods (see the response to point #4 for details) rather than around the T2 onset. Taken together, these results clearly suggest that the T2-related ERP responses cannot explain the attentional modulation effect and the observed behavior-related neural indices.

      2) The alpha-band increase for T2 is indeed contradictory to the well known inhibitory function of alpha-band in attention. How could a target that is better discriminated elicit stronger inhibitory response? Related to the above point, the observed enhancement in alpha-band power and its coupling to low-frequency oscillation might derive from an enhanced ERP response for T2 target.

      Many thanks for the comment. We have briefly discussed this point in the revised manuscript (page 18, line 477).

      A widely accepted function of alpha activity in attention is that alpha oscillations suppress irrelevant visual information during spatial selection (Kelly et al., 2006; Thut et al., 2006; Worden et al., 2000). However, it becomes a controversial issue when there exists rhythmic sensory stimulation at alpha-band, just like the situation in the current study where both the visual stream and the contextual auditory rhythm were emitted at 10 Hz. In such a case, alpha-band neural responses at the stimulation frequency can be interpreted as either passively evoked steady-state responses (SSR) or actively synchronized intrinsic brain rhythms. From the former perspective (i.e., the SSR view), an increase in the amplitude or power at the stimulus frequency may indicate an enhanced attentional allocation to the stimulus stream that may result in better target detection (Janson et al., 2014; Keil et al., 2006; Müller & Hübner, 2002). Conversely, the latter view of the inhibitory function of intrinsic alpha oscillations would produce the opposite prediction. In a previous AB study, Janson and colleagues (2014) investigated this issue by separating the stimulus-evoked activity at 12 Hz (using the same power analysis method as ours) from the endogenous alpha oscillations ranging from 10.35 to 11.25 Hz (as indexed by individual alpha frequency, IAF). Interestingly, they found a dissociation between these two alpha-band neural responses, showing that the RSVP frequency power was higher in non-AB trials (T2 detected) than in AB trials (T2 undetected) while the IAF power exhibited the opposite pattern. According to these findings, the currently observed increase in alpha power for the between-cycle condition may reflect more of the stimulus-driven processes related to attentional enhancement. However, we don’t negate the effect of intrinsic alpha oscillations in our study, as the current design is not sufficient to distinguish between these two processes. We have discussed this point in the revised manuscript (page 18, line 477). Also, we have to admit that “alpha power” may not be the most precise term to describe our findings of the stimulus-related results. Thus, we have specified it as “neural responses to first-order rhythms at 10 Hz” and “10-Hz alpha power” in the revised manuscript (see page 12 in the Results section and page 18 in the Discussion section).

      As for the contribution of T2-related ERP response to the observed effect of 10 Hz power and cross-frequency coupling, please refer to our response to point #1.

      References:

      Janson, J., De Vos, M., Thorne, J. D., & Kranczioch, C. (2014). Endogenous and Rapid Serial Visual Presentation-induced Alpha Band Oscillations in the Attentional Blink. Journal of Cognitive Neuroscience, 26(7), 1454–1468. https://doi.org/10.1162/jocn_a_00551

      Keil, A., Ihssen, N., & Heim, S. (2006). Early cortical facilitation for emotionally arousing targets during the attentional blink. BMC Biology, 4(1), 23. https://doi.org/10.1186/1741-7007-4-23

      Kelly, S. P., Lalor, E. C., Reilly, R. B., & Foxe, J. J. (2006). Increases in Alpha Oscillatory Power Reflect an Active Retinotopic Mechanism for Distracter Suppression During Sustained Visuospatial Attention. Journal of Neurophysiology, 95(6), 3844–3851. https://doi.org/10.1152/jn.01234.2005

      Müller, M. M., & Hübner, R. (2002). Can the Spotlight of Attention Be Shaped Like a Doughnut? Evidence From Steady-State Visual Evoked Potentials. Psychological Science, 13(2), 119–124. https://doi.org/10.1111/1467-9280.00422

      Thut, G., Nietzel, A., Brandt, S., & Pascual-Leone, A. (2006). Alpha-band electroencephalographic activity over occipital cortex indexes visuospatial attention bias and predicts visual target detection. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 26(37), 9494–9502. https://doi.org/10.1523/JNEUROSCI.0875-06.2006

      Worden, M. S., Foxe, J. J., Wang, N., & Simpson, G. V. (2000). Anticipatory Biasing of Visuospatial Attention Indexed by Retinotopically Specific α-Bank Electroencephalography Increases over Occipital Cortex. Journal of Neuroscience, 20(6), RC63–RC63. https://doi.org/10.1523/JNEUROSCI.20-06-j0002.2000

      3) To support that it is the context-induced entrainment that leads to the modulation in AB effect, the authors could examine pre-T2 response, e.g., alpha-power, and cross-frequency coupling, as well as its relationship to behavioral performance. I think the pre-stimulus response might be more convincing to support the authors' claim.

      Many thanks for the insightful suggestion. We have conducted additional analyses.

      Following this suggestion, we have examined the 10-Hz alpha power within the time window of -100–0 ms before T2 onset and found stronger activity for the between-cycle condition than for the within-cycle condition. This pre-T2 response is similar to the post-T2 response except that it is more restricted to the left parieto-occipital cluster (CP3, CP5, P3, P5, PO3, PO5, POZ, O1, OZ, t(15) = 2.774, p = .007), which partially overlaps with the cluster that exhibits a delta-alpha coupling effect significantly correlated with the BMI. We have incorporated these findings into the main text (page 12, line 315) and the Fig. 5A of the revised manuscript.

      As for the coupling results reported in our manuscript, the coupling index (PAC) was calculated based on the activity during the second and third cycles (i.e., 400 to 1200 ms from stream onset) of the contextual rhythm, most of which covers the pre-T2 period as T2 always appeared in the third cycle for both conditions. Together, these results on pre-T2 10-Hz alpha power and cross-frequency coupling, as well as its relationship to behavioral performance, jointly suggest that the observed modulation effect is caused by the context-induced entrainment rather than being a by-product of post-T2 processing.

      4) About the entrainment to rhythmic context and its relation to behavioral modulation index. Previous studies (e.g., Ding et al) have demonstrated the hierarchical temporal structure in speech signals, e.g., emergence of word-level entrainment introduced by language experience. Therefore, it is well expected that imposing a second-order structure on a visual stream would elicit the corresponding steady-state response. I understand that the new part and main focus here are the AB effects. The authors should add more texts explaining how their findings contribute new understandings to the neural mechanism for the intriguing phenomena.

      Many thanks for the suggestion. We have provided more discussion in the revised manuscript (page 17, line 447).

      We have provided more discussion on this important issue in the revised manuscript (page 17, line 447). In brief, our study demonstrates how cortical tracking of feature-based hierarchical structure reframes the deployment of attentional resources over visual streams. This effect, distinct from the hierarchical entrainment to speech signals (Ding et al., 2016; Gross et al., 2013), does not rely on previously acquired knowledge about the structured information and can be established automatically even when the higher-order structure comes from a task-irrelevant and cross-modal contextual rhythm. On the other hand, our finding sheds fresh light on the adaptive value of the structure-based entrainment effect by expanding its role from rhythmic information (e.g., speech) perception to temporal attention deployment. To our knowledge, few studies have tackled this issue in visual or speech processing.

      References:

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

      Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., & Garrod, S. (2013). Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain. PLoS Biol, 11(12). https://doi.org/10.1371/journal.pbio.1001752

      Reviewer #2 (Public Review):

      In cognitive neuroscience, a large number of studies proposed that neural entrainment, i.e., synchronization of neural activity and low-frequency external rhythms, is a key mechanism for temporal attention. In psychology and especially in vision, attentional blink is the most established paradigm to study temporal attention. Nevertheless, as far as I know, few studies try to link neural entrainment in the cognitive neuroscience literature with attentional blink in the psychology literature. The current study, however, bridges this gap.

      The study provides new evidence for the dynamic attending theory using the attentional blink paradigm. Furthermore, it is shown that neural entrainment to the sensory rhythm, measured by EEG, is related to the attentional blink effect. The authors also show that event/chunk boundaries are not enough to modulate the attentional blink effect, and suggest that strict rhythmicity is required to modulate attention in time.

      In general, I enjoyed reading the manuscript and only have a few relatively minor concerns.

      1) Details about EEG analysis.

      . First, each epoch is from -600 ms before the stimulus onset to 1600 ms after the stimulus onset. Therefore, the epoch is 2200 s in duration. However, zero-padding is needed to make the epoch duration 2000 s (for 0.5-Hz resolution). This is confusing. Furthermore, for a more conservative analysis, I recommend to also analyze the response between 400 ms and 1600 ms, to avoid the onset response, and show the results in a supplementary figure. The short duration reduces the frequency resolution but still allows seeing a 2.5-Hz response.

      Thanks for the comments. Each epoch was indeed segmented from -600 to 1600 ms relative to the stimulus onset, but in the spectrum analysis, we only used EEG signals from stream onset (i.e., time point 0) to 1600 ms (see the Materials and Methods section) to investigate the oscillatory characteristics of the neural responses purely elicited by rhythmic stimuli. The 1.6-s signals were zero-padded into a 2-s duration to achieve a frequency resolution of 0.5 Hz.

      According to the reviewer’s suggestion, we analyzed the EEG signals from 400 ms to 1600 ms relative to stream onset to avoid potential influence of the onset response, and showed the results in Figure 4. Basically, we can still observe spectral peaks at the stimulus frequencies of 2.5, 5 (the harmonic of 2.5 Hz), and 10 Hz for both power and ITPC spectrum. However, the peak magnitudes were much weaker than those of 1.6-s signals especially for 2.5 Hz, and the 2.5-Hz power did not survive the multiple comparisons correction across frequencies (FDR threshold of p < .05), which might be due to the relatively low signal-to-noise ratio for the analysis based on the 1.2-s epochs (only three cycles to estimate the activity at 2.5 Hz). Importantly, we did identify a significant cluster for 2.5 Hz ITPC in the left parieto-occipital region showing a positive correlation with the individuals’ BMI (Fig. R3; CP5, TP7, P5, P7, PO5, PO7, O1; r = .538, p = .016), which is consistent with the findings based on the longer epochs.

      Fig. R3. Neural entrainment to contextual rhythms during the period of 400–1600 ms from stream onset. (A) The spectrum for inter-trial phase coherence (ITPC) of EEG signals from 400 to 1600 ms after the stimulus onset. Shaded areas indicate standard errors of the mean. (B) The 2.5-Hz ITPC was significantly correlated with the behavioral modulation index (BMI) in a parieto-occipital cluster, as indicated by orange stars in the scalp topographic map.

      Second, "The preprocessed EEG signals were first corrected by subtracting the average activity of the entire stream for each epoch, and then averaged across trials for each condition, each participant, and each electrode." I have several concerns about this procedure.

      (A) What is the entire stream? It's the average over time?

      Yes, as for the power spectrum analysis, EEG signals were first demeaned by subtracting the average signals of the entire stream over time from onset to offset (i.e., from 0 to 1600 ms) before further analysis. We performed this procedure following previous studies on the entrainment to visual rhythms (Spaak et al., 2014). We have clarified this point in the “Power analysis” part of the Materials and Methods section (page 25, line 677).

      References:

      Spaak, E., Lange, F. P. de, & Jensen, O. (2014). Local Entrainment of Alpha Oscillations by Visual Stimuli Causes Cyclic Modulation of Perception. The Journal of Neuroscience, 34(10), 3536–3544. https://doi.org/10.1523/JNEUROSCI.4385-13.2014

      (B) I suggest to do the Fourier transform first and average the spectrum over participants and electrodes. Averaging the EEG waveforms require the assumption that all electrodes/participants have the same response phase, which is not necessarily true.

      Thanks for the suggestion. In an AB paradigm, the evoked neural responses are sufficiently time-locked to the periodic stimulation, so it is reasonable to quantify power estimate with spectral decomposition performed on trial-averaged EEG signals (i.e., evoked power). Moreover, our results of inter-trial phase coherence (ITPC), which estimated the phase-locking value across trials based on single-trial decomposed phase values, also provided supporting evidence that the EEG waveforms were temporally locked across trials to the 2.5-Hz temporal structure in the context session.

      Nevertheless, we also took the reviewer’s suggestion seriously and analyzed the power spectrum on the average of single-trial spectral transforms, i.e., the induced power, which puts emphasis on the intrinsic non-phase-locked activities. In line with the results of evoked power and ITPC, the induced power spectrum in context session also peaked at 2.5 Hz and was significantly stronger than that in baseline session at 2.5 Hz (t(15) = 4.186, p < .001, FDR-corrected with a p value threshold < .001). Importantly, Person correlation analysis also revealed a positive cluster in the left parieto-occipital region, indicating the induced power at 2.5 Hz also had strong relevance with the attentional modulation effect (P7, PO7, PO5, PO3; r = .606, p = .006). We have added these additional findings to the revised manuscript (page 11, line 288; see also Figure 4—figure supplement 1).

      2) The sequences are short, only containing 16 items and 4 cycles. Furthermore, the targets are presented in the 2nd or 3rd cycle. I suspect that a stronger effect may be observed if the sequence are longer, since attention may not well entrain to the external stimulus until a few cycles. In the first trial of the experiment, they participant may not have a chance to realize that the task-irrelevant auditory/visual stimulus has a cyclic nature and it is not likely that their attention will entrain to such cycles. As the experiment precedes, they learns that the stimulus is cyclic and may allocate their attention rhythmically. Therefore, I feel that the participants do not just rely on the rhythmic information within a trial but also rely on the stimulus history. Please discuss why short sequences are used and whether it is possible to see buildup of the effect over trials or over cycles within a trial.

      Thanks for the comments. Typically, to induce a classic pattern of AB effect, the RSVP stream should contain 3–7 distractors before the first target (T1), with varying lengths of distractors (0–7) between two targets and at least 2 items after the second target (T2). In our study, we created the RSVP streams following these rules, which allowed us to observe the typical AB effect that T2 performance was deteriorated at Lag 2 relative to that at Lag 8. Nevertheless, we agree with the reviewer that longer streams would be better for building up the attentional entrainment effect, as we did observe the attentional modulation effect ramped up as the stream proceeded over cycles, consistent with the reviewer’s speculation. In Experiments 1a (using auditory context) and 2a (using color-defined visual context), we adopted two sets of target positions—an early one where T2 appeared at the 6th or 8th position (in the 2nd cycle) of the visual stream, and a late one where T2 appeared at the 10th or 12th position (in the 3rd cycle) of the visual stream. In the manuscript, we reported T2 performance with all the target positions combined, as no significant interaction was found between the target positions and the experimental conditions (ps. > .1). However, additional analysis demonstrated a trend toward an increase of the attentional modulation effect over cycles, from the early to the late positions. As shown in Fig. R4, the modulation effect went stronger and reached significance for the late positions (for Experiment 1a, t(15) = 2.83, p = .013, Cohen’s d = 0.707; for Experiment 2a, t(15) = 3.656, p = .002, Cohen’s d = 0.914) but showed a weaker trend for the early positions (for Experiment 1a, t(15) = 1.049, p = .311, Cohen’s d = 0.262; for Experiment 2a, t(15) = .606, p = .553, Cohen’s d = 0.152).

      Fig. R4. Attentional modulation effect built up over cycles in Experiments 1a & 2a. Error bars represent 1 SEM; p<0.05, * p<0.01.

      However, we did not observe an obvious buildup effect across trials in our study. The modulation effect of contextual rhythms seems to be a quick process that the effect is evident in the first quarter of trials in Experiment 1a (for, t(15) = 2.703, p = .016, Cohen’s d = 0.676) and in the second quarter of trials in Experiment 2a (for, t(15) = 2.478, p = .026, Cohen’s d = 0.620.

      3) The term "cycle" is used without definition in Results. Please define and mention that it's an abstract term and does not require the stimulus to have "cycles".

      Thanks for the suggestion. By its definition, the term “cycle” refers to “an interval of time during which a sequence of a recurring succession of events or phenomena is completed” or “a course or series of events or operations that recur regularly and usually lead back to the starting point” (Merriam-Webster dictionary). In the current study, we stuck to the recurrent and regular nature of “cycle” in general while defined the specific meaning of “cycle” by feature-based periodic changes of the contextual stimuli in each experiment (page 5, line 101; also refer to Procedures in the Materials and Methods section for details). For example, in Experiment 1a, the background tone sequence changed its pitch value from high to low or vice versa isochronously at a rate of 2.5 Hz, thus forming a rhythmic context with structure-based cycles of 400 ms. Note that we did not use the more general term “chunk”, because arbitrary chunks without the regularity of cycles are insufficient to trigger the attentional modulation effect in the current study. Indeed, the effect was eliminated when we replaced the rhythmic cycles with irregular chunks (Experiments 1d & 1e).

      4) Entrainment of attention is not necessarily related to neural entrainment to sensory stimulus, and there is considerable debate about whether neural entrainment to sensory stimulus should be called entrainment. Too much emphasis on terminology is of course counterproductive but a short discussion on these issues is probably necessary.

      Thanks for the comments. As commonly accepted, entrainment is defined as the alignment of intrinsic neuronal activity to the temporal structure of external rhythmic inputs (Lakatos et al., 2019; Obleser & Kayser, 2019). Here, we are interested in the functional roles of cortical entrainment to the higher-order temporal structure imposed on first-order sensory stimulation, and used the term entrainment to describe the phase-locking neural responses to such hierarchical structure following literature on auditory and visual perception (Brookshire et al., 2017; Doelling & Poeppel, 2015). In our study, the consistent results of power and ITPC have provided strong evidence that neural entrainment at the structure level (2.5 Hz) is significantly correlated with the observed attentional modulation effect. However, this does not mean that the entrainment of attention is necessarily associated with neural entrainment to sensory stimulus in a broader context, as attention may also be guided by predictions based on non-isochronous temporal regularity without requiring stimulus-based oscillatory entrainment (Breska & Deouell, 2017; Morillon et al._2016).

      On the other hand, there has been a debate about whether the neural alignment to rhythmic stimulation reflects active entrainment of endogenous oscillatory processes (i.e., induced activity) or a series of passively evoked steady-state responses (Keitel et al., 2019; Notbohm et al., 2016; Zoefel et al., 2018). The latter process is also referred to as “entrainment in a broad sense” by Obleser & Kayser (2019). Given that a presented rhythm always evokes event-related potentials, a better question might be whether the observed alignment reflects the entrainment of endogenous oscillations in addition to evoked steady-state responses. Here we attempted to tackle this issue by measuring the induced power, which emphasizes the intrinsic non-phase-locked activity, in addition to the phase-locked evoked power. Specifically, we quantified these two kinds of activities with the average of single-trial EEG power spectra and the power spectra of trial-averaged EEG signals, respectively, according to Keitel et al. (2019). In addition to the observation of evoked responses to the contextual structure, we also demonstrated an attention-related neural tracking of the higher-order temporal structure based on the induced power at 2.5 Hz (see Figure 4—figure supplement 1), suggesting that the observed attentional modulation effect is at least partially derived from the entrainment of intrinsic oscillatory brain activity. We have briefly discussed this point in the revised manuscript (page 17, line 460).

      References:

      Breska, A., & Deouell, L. Y. (2017). Neural mechanisms of rhythm-based temporal prediction: Delta phase-locking reflects temporal predictability but not rhythmic entrainment. PLOS Biology, 15(2), e2001665. https://doi.org/10.1371/journal.pbio.2001665

      Brookshire, G., Lu, J., Nusbaum, H. C., Goldin-Meadow, S., & Casasanto, D. (2017). Visual cortex entrains to sign language. Proceedings of the National Academy of Sciences, 114(24), 6352–6357. https://doi.org/10.1073/pnas.1620350114

      Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. https://doi.org/10.1073/pnas.1508431112

      Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency bands comodulate behavior. Proceedings of the National Academy of Sciences, 111(41), 14935–14940. https://doi.org/10.1073/pnas.1408741111

      Keitel, C., Keitel, A., Benwell, C. S. Y., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-Driven Brain Rhythms within the Alpha Band: The Attentional-Modulation Conundrum. The Journal of Neuroscience, 39(16), 3119–3129. https://doi.org/10.1523/JNEUROSCI.1633-18.2019

      Lakatos, P., Gross, J., & Thut, G. (2019). A New Unifying Account of the Roles of Neuronal Entrainment. Current Biology, 29(18), R890–R905. https://doi.org/10.1016/j.cub.2019.07.075

      Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal Prediction in lieu of Periodic Stimulation. Journal of Neuroscience, 36(8), 2342–2347. https://doi.org/10.1523/JNEUROSCI.0836-15.2016

      Notbohm, A., Kurths, J., & Herrmann, C. S. (2016). Modification of Brain Oscillations via Rhythmic Light Stimulation Provides Evidence for Entrainment but Not for Superposition of Event-Related Responses. Frontiers in Human Neuroscience, 10. https://doi.org/10.3389/fnhum.2016.00010

      Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

      Zoefel, B., ten Oever, S., & Sack, A. T. (2018). The Involvement of Endogenous Neural Oscillations in the Processing of Rhythmic Input: More Than a Regular Repetition of Evoked Neural Responses. Frontiers in Neuroscience, 12. https://doi.org/10.3389/fnins.2018.00095

      Reviewer #3 (Public Review):

      The current experiment tests whether the attentional blink is affected by higher-order regularity based on rhythmic organization of contextual features (pitch, color, or motion). The results show that this is indeed the case: the AB effect is smaller when two targets appeared in two adjacent cycles (between-cycle condition) than within the same cycle defined by the background sounds. Experiment 2 shows that this also holds for temporal regularities in the visual domain and Experiment 3 for motion. Additional EEG analysis indicated that the findings obtained can be explained by cortical entrainment to the higher-order contextual structure. Critically feature-based structure of contextual rhythms at 2.5 Hz was correlated with the strength of the attentional modulation effect.

      This is an intriguing and exciting finding. It is a clever and innovative approach to reduce the attention blink by presenting a rhythmic higher-order regularity. It is convincing that this pulling out of the AB is driven by cortical entrainment. Overall, the paper is clear, well written and provides adequate control conditions. There is a lot to like about this paper. Yet, there are particular concerns that need to be addressed. Below I outline these concerns:

      1) The most pressing concern is the behavioral data. We have to ensure that we are dealing here with a attentional blink. The way the data is presented is not the typical way this is done. Typically in AB designs one see the T2 performance when T1 is ignored relative to when T1 has to be detected. This data is not provided. I am not sure whether this data is collected but if so the reader should see this.

      Many thanks for the suggestion. We appreciate the reviewer for his/her thoughtful comments. To demonstrate the AB effect, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a, and 2b)—a short-SOA condition where T2 was located at the second lag of T1 (i.e., SOA = 200 ms), and a long-SOA condition where T2 appeared at the 8th lag of T1 (i.e., SOA = 800 ms). In a typical AB effect, T2 performance at short lags is remarkably impaired compared with that at long lags. In our study, we consistently replicated this effect across the experiments, as reported in the Results section of Experiment 1 (page 5, line 106). Overall, the T2 detection accuracy conditioned on correct T1 response was significantly impaired in the short-SOA condition relative to that in the long-SOA condition (mean accuracy > 0.9 for all experiments), during both the context session and the baseline session. More crucially, when looking into the magnitude of the AB effect as measured by (ACClong-SOA - ACCshort-SOA)/ACClong-SOA, we still obtained a significant attentional modulation effect (for Experiment 1a, t(15) = -2.729, p = .016, Cohen’s d = 0.682; for Experiment 2a, t(15) = -4.143, p <.001, Cohen’s d = 1.036) similar to that reflected by the short-SOA condition alone, further confirming that cortical entrainment effectively influences the AB effect.

      Although we included both the long- and short-SOA conditions in the current study, we focused on T2 performance in the short-SOA condition rather than along the whole AB curve for the following reasons. Firstly, for the long-SOA conditions, the T2 performance is at ceiling level, making it an inappropriate baseline to probe the attentional modulation effect. We focused on Lag 2 because previous research has identified a robust AB effect around the second lag (Raymond et al., 1992), which provides a reasonable and sensitive baseline to probe the potential modulation effect of the contextual auditory and visual rhythms. Note that instead of using multiple lags, we varied the length of the rhythmic cycles (i.e., a cycle of 300 ms, 400 ms, and 500 ms corresponding to a rhythm frequency of 3.3 Hz, 2.5 Hz, and 2 Hz, respectively, all within the delta band), and showed that the attentional modulation effect could be generalized to these different delta-band rhythmic contexts, regardless of the absolute positions of the targets within the rhythmic cycles.

      As to the T1 performance, the overall accuracy was very high, ranging from 0.907 to 0.972, in all of our experiments. The corresponding results have been added to the Results section of the revised manuscript (page 5, line 103). Notably, we did not find T1-T2 trade-offs in most of our experiments, except in Experiment 2a where T1 performance showed a moderate decrease in the between-cycle condition relative to that in the within-cycle condition (mean ± SE: 0.888 ± 0.026 vs. 0.933 ± 0.016, respectively; t(15) = -2.217, p = .043). However, by examining the relationship between the modulation effects (i.e., the difference between the two experimental conditions) on T1 and T2, we did not find any significant correlation (p = .403), suggesting that the better performance for T2 was not simply due to the worse performance in detecting T1.

      Finally, previous studies have shown that ignoring T1 would lead to ceiling-level T2 performance (Raymond et al., 1992). Therefore, we did not include such manipulation in the current study, as in that case, it would be almost impossible for us to detect any contextual modulation effect.

      References:

      Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18(3), 849–860. https://doi.org/10.1037/0096-1523.18.3.849

      2) Also, there is only one lag tested. The ensure that we are dealing here with a true AB I would like to see that more than one lag is tested. In the ideal situation a full AB curve should be presented that includes several lags. This should be done for at least for one of the experiments. It would be informative as we can see how cortical entrainment affects the whole AB curve.

      Many thanks for the suggestion. Please refer to our response to the point #1 for “Reviewer #3 (Public Review)”. In short, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a and 2b), and the results replicated the typical AB effect. We have clarified this point in the revised manuscript (page 5, line 106).

      3) Also, there is no data regarding T1 performance. It is important to show that this the better performance for T2 is not due to worse performance in detecting T1. So also please provide this data.

      Many thanks for the suggestion. Please refer to our response to the point #1 or “Reviewer #3 (Public Review)”. We have reported the T1 performance in the revised manuscript (page 5, line 103), and the results didn’t show obvious T1-T2 trade-offs.

      4) The authors identify the oscillatory characteristics of EEG signals in response to stimulus rhythms, by examined the FFT spectral peaks by subtracting the mean power of two nearest neighboring frequencies from the power at the stimulus frequency. I am not familiar with this procedure and would like to see some justification for using this technique.

      According to previous studies (Nozaradan, 2011; Lenc e al., 2018), the procedure to subtract the average amplitude of neighboring frequency bins can remove unrelated background noise, like muscle activity or eye movement. If there were no EEG oscillatory responses characteristic of stimulus rhythms, the amplitude at a given frequency bin should be similar to the average of its neighbors, and thus no significant peaks could be observed in the subtracted spectrum.

      References:

      Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

      Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the Neuronal Entrainment to Beat and Meter. The Journal of Neuroscience, 31(28), 10234–10240. https://doi.org/10.1523/JNEUROSCI.0411-11.2011

    1. Author Response:

      Reviewer #1:

      In the manuscript by Kymre, Liu and colleagues, the authors investigate how pheromone signals are interpreted by the projection neurons of the male moth brain. While the olfactory neurons and glomerular targets of pheromone signaling is known, the signaling of the projection neurons (output neurons) that carry pheromone signaling to higher regions of the brain remained unknown. The authors utilized a series of technically challenging experiments to identify the anatomy and functional responses of projection neurons responding to pheromone mixtures, primary pheromone, secondary pheromone, and behavioral antagonist odors. By calcium imaging of MGC mALT neurons, the authors identify that odor responses in PNs are broader than the olfactory neuron counterparts (ie, the behavioral antagonist activates OSNs innervating the dma glomerulus, whereas the antagonist actives dma and dmp glomeruli). The authors then perform a series of elegant experiments by which the odor responses of different mALT PNs are recorded by electrophysiology, and the anatomy of the recorded neurons identified by dye fill and computer reconstruction. This allowed analysis of the temporal response properties of the neurons to be correlated with their axonal processes in different brain regions. The data suggest that attractive pheromone signals activate the SIP and SLP regions, while aversive signals primarily active regions in the LH. Finally, the authors present a model of pheromone signaling based on these findings.

      The work presents the first glimpse at the signaling from mALT PNs. The technical challenges in performing these experiments did limit the number of neurons that could be recorded and imaged. As such, the comprehensiveness of the study was not clear, or if additional experiments might alter the findings. The connection of protocerebrum anatomy with functional signaling (as summarized in Figure 6) could have been more clearly articulated.

      The manuscript could benefit by revisions to the text and figure presentations that would make it more accessible to a broader audience.

      We thank the reviewer for the comments and suggestions. We understand that the issue regarding completeness of data aroused concern. The neuron collection obtained via intracellular recording always makes up a compromise between a collection that covers absolutely “all” neurons and a neuron collection that includes the majority of neurons, reflecting the activity of the whole neuron population. We considered our neuron collection as representative for two main reasons: (1) The neurons included in this study were randomly collected from all three MGC units and not aimed from one specific unit. The proportions of identified neurons originating from each MGC unit are highly consistent with the volume of the relevant unit. (2) Up to now, our collection of MGC PNs comprises every previously reported neuron type not only in H. armigera but in all heliothine moths studied. Evidently, our anatomical data provided a solid foundation making it unlikely that a considerable amount of new MGC PN types would be discovered in future studies. However, the principal objection raised by the reviewer is very timely – since we were not able to confirm that our collection included every MGC PN, the possibility of additional neuronal types remains open.

      Therefore, we decided to examine the content validity of our framework based on the features of the current neuron collection - that is, whether the presented outline would be fundamentally altered if additional PNs were included. A computational experiment was conducted including the mean firing traces of four neuron groups, each innervating the same protocerebral region. Here, the firing traces of individual PNs were shuffled based on formation of new neuron assemblies by randomly recruiting two-thirds of the PNs in the group. The data shuffling was repeated 5 times, and each time a different assembly of neurons was included. Cross correlations between the mean firing traces of each assembly showed that neuronal response profiles were unchanged in the neuropils associated with distinct behavioral valences (Fig. 7F). This high association contrasted with low correlations between the firing traces of every two PNs (Fig. 7G), indicating the representativeness of the presented data on the 42 MGC-PNs identified here. The issue concerning the completeness of the findings is included in a special paragraph in the discussion and in Fig. 7D-G.

      We also thank the reviewer for pointing out the importance of an expedient data presentation including a written text and figure material clearly communicating the major findings. In line with the editor’s recommendations, we have performed comprehensive revision of all main parts of the manuscript. We have, for example, included an introductive figure (Fig. 1) providing essential background information. In the result section, we profoundly reorganized the data presentation by highlighting the major findings both in the text and figure material. As suggested by the editor, a new figure is made, figure 3 (substituting the original Fig.2), visualizing the main neuron types in separate panels as well as in joint plots (confocal data and 3D-models), and presenting descriptive/predictive frameworks reflecting the stimulus evoked neuronal activity within the relevant output regions of the PNs. The discussion is also reshaped, for instance, by including the issue of parallel olfactory processing in the current species as well as across different species. Altogether, we believe the revision has made the article more relevant to a broad audience. We hope our study dealing with one of the severe pest insect species that inhabit our planet will be of interest.

      Reviewer #2:

      Using calcium imaging of mALT PNs in the AL as well as intracellular recordings and subsequent stainings of individual PNs, the authors evaluate the response properties of different PNs to the three pheromone components, including the primary pheromone Z11-16:AL, the secondary component Z9-16:AL and a minor component Z9-14:AL which functions as an antagonist at higher concentrations. The authors conclude from their data that PNs have widespread aborizations in higher brain centers that are organized according to behavioral significance, i.e. with regard to attraction versus repulsion. Although the authors characterize morphologically and functionally a considerable number of neurons, the data are highly descriptive and exhibit a rather large level of variability which impedes, in my opinion, a generalization of response properties for different neuron types. The conclusion that the projection patterns in the higher brain centers, such as the LH, VLP and SIP reflect behavioral significance proves rather difficult from the data presented in this study. Additional data, such as e.g. calcium imaging of pheromone responses in the higher brain areas would support the notion of a valence-based map in these regions.

      The intracellular recordings are certainly elaborate, but do not allow drawing a general picture about how coding of pheromones in the individual MGC compartments of the AL is transformed into a representation in higher brain centers. In my opinion the authors could not sufficiently address their major goal which is to understand how the neuronal circuitry underlying pheromone processing is encoding the individual pheromone components that induce opposite valences. The study would highly benefit if the authors would reconstruct their individual PN staining and register them into a standard moth brain (as done in other insect species, such as honeybees and flies) to allow a categorization and matching of morphological properties. Then the different PNs could be compared based on morphological parameters and subsequently be assigned to specific neuron classes, while response properties could be assessed for the different types.

      First, we would like to thank the reviewer for the suggestions. The reviewer points out that additional experiments, «such as calcium imaging of pheromone responses in the higher brain areas” might support the notion of valence-based maps in these regions. Unfortunately, these kinds of experiments are currently not feasible for the neuron groups we are interested in. Fura labeled calcium imaging has its restriction since this method can only be used to examine a brain region based on retrograde labeling of the neurons of interest, such as applying dye into the calyx for examining the responses of medial-tract PN dendrites in the antennal lobe (see Fig. A1 below). Notably, the calcium-imaging measurements from the LH in honeybee, obtained from retrogradely labeled lateral tract PNs, could be performed because of the accessibility of this PN population type for such an experiment (see Fig. B below; Roussel et al., 2014, Current Biology 24, 561-567). The PNs of interest here, confined to the mALT and mlALT, end up in the lateral protocerebrum. Therefore, measuring calcium imaging responses in the lateral protocerebrum from retrogradely labelled neurons confined to these tracts appears to be unfeasible (Fig. A2 below). So far, no study has managed to perform retrograde labeling of the axon terminals of mALT/mlALT PNs in the higher brain centers of moths. Considering utilization of the bath application technique including a membrane-permeable calcium indicator, this method gives access to calcium signals only in the most superficial brain areas. The neuropil regions innervated by the mALT PNs are located too deep (the only accessible output region would be the calyces). Finally, the moth species used here lacks proper genetic tools that might allow investigation of a specific strain expressing a calcium indictor.

      Figure(A1-A2): Fura retrograde labeling of PNs confined to the medial tract (mALT) from two different brain cites in moth. Figure B: Fura retrograde labeling of lateral-tract (lALT) PNs in honeybee brain. Calcium imaging measurements are feasible in the areas marked in green, including the antennal lobe (AL in A-B) and a part of lateral protocerebrum region (B). While the areas marked in red (shown in A1-A2) are not ideal for imaging experiment, as the neuronal signals (black arrows) will be physically blocked by the damaged axons.

      In addition, the reviewer has the following objection: “Although the authors characterize morphologically and functionally a considerable number of neurons, the data are highly descriptive and exhibit a rather large level of variability which impedes, in my opinion, a generalization of response properties for different neuron types.” We assume the reviewer refers to the individual neuron data when he/she points out the relatively high variability. Indeed, the high-resolution information obtained by the intracellular recording/staining technique include descriptive data with a certain extent of variability – particularly regarding the spiking data representing every single action potential at the time scale of a few milliseconds. The main reason for performing both in vivo calcium imaging and intracellular recording experiments is that these two approaches form an optimal combination of illustrating the neuronal activity in different granularities. During calcium imaging, we recorded pheromone responses in distinct groups of MGC PNs, i.e., at a higher population scale. One main restriction of calcium imaging is the low temporal resolution (sampling frequency in this study was 100 ms). For comparison, the intracellular recordings had a sampling frequency less than 1 ms. Altogether, by combining the two techniques we could collect data from the relevant MGC-PNs both at the neuron population level (low temporal resolution) and single neuron level (high spatial and temporal resolution). Comparison of the data obtained from the two experimental approaches demonstrated a high degree of correspondence. We believe that the high-resolution intracellular recording data reflect the peculiar features that precisely characterize individual neurons. Otherwise, in case the reviewer has objections against the detailed descriptions in the results part, we have revised the original manuscript (including text and figure material) emphasizing on the main findings and minimizing the description of details.

      The reviewer also suggests registering the neurons into a standard brain framework to “allow drawing a general picture about how coding of pheromones in the individual MGC compartments of the AL is transformed into a representation in higher brain centers”. To register individual PNs into a standard brain is no doubt an ideal method to compare the neurons’ architecture within the same species as well as across different models – especially if we want to compare the neurons’ projection patterns. Unlike the honeybee and the fruit fly already having an averaged standard brain available (reconstructed and standardized based on morphological data from different individuals), H. armigera has a representative brain (reconstructed from morphological data of one individual), published by Chu et al., (2020a). As we have experienced, errors due to local distortions often occur when registering neurons into a representative brain. The same is to some degree also the case for registration of neurons into an averaged brain framework. How informative the results are, will always depend both on the resolution of the standard and the resolution of the neuron data. Thus, the accuracy and the quality of the registration is based on the richness of details in the raw image data, i.e. how dense the registration grid is. If only a few neuropils are used, the precision of registration will obviously be limited. An ideal reconstruction for registration would include a dense grid of landmarks - or, as in the fruit fly, the actual image data.

      Generally, the terminal projections of medial- and mediolateral tract MGC PNs in the moth cover several widespread areas in the protocerebrum and the most important objective of the current study was to map the neuropils innervated by each of the 32 physiologically identified neurons presented here. In line with the suggestion from the reviewer, we have added AMIRA reconstructions in the revised manuscript, including not only the skeleton of individual PNs but also 3D reconstructions of the neuropil regions innervated by each neuron. These data, confirming the neurons’ morphological properties, are presented in the figure supplement. In addition, for visualization purposes, we plotted each traced skeleton onto the representative brain, based on the reconstructed data obtained by using the ‘transform editor’ function in AMIRA (Fig. 3). In the revised version of the manuscript, we have also submitted all morphological data (confocal stacks and 3D-AMIRA reconstructions) of the main MGC-PN types to the newly established Insect brain database (InsectbrainDB, 2021) – a unified and open access platform for archiving and sharing functional data obtained not only from H. armigera but from other insect species as well.

      In addition to registering different PNs into a common frame, another reliable evidence for such comparison is raw confocal data including identifiable neurons simultaneously stained in the same brain. In Fig. 3C, we demonstrate overlapping terminal projections in the LH of two uniglomerular MGC-PNs originating from each of the two smaller MGC-units, the dma and dmp. And in Fig. 4, we show the terminal projections of MGC-PNs confined to each of the three main tracts, demonstrating overlapping terminal arbors for medial- and mediolateral-tract neurons whereas the lateral-tract neuron projects to a separate area.

      Reviewer #3:

      Summary of goals:

      In the moth Helicoverpa armigera the authors examined whether projection neurons from different antennal lobe tracts encoding sex-pheromone components with different valence occupy distinct projection areas in the protocerebrum of the midbrain.

      Strengths and weaknesses of methods and results:

      Methods chosen are adequate and state of the art. In vivo calcium imaging allowed for more easy imaging of a population of neurons, in search for statistically significant responses to pheromone components of different concentrations, quality, and valence. The main, general drawbacks of calcium imaging is the lower temporal resolution that does not allow for detection of single action potentials at the scale of few ms and the inability of fine spatial resolution of projection patterns of single neurons. This was compensated for by excellent intracellular recordings of single antennal lobe projection neurons, stainings of single cells, and embedding in the 3D standardized H. armigera brain. The data a very carefully analyzed with adequate analysis software and adequate statistical analysis and the most relevant results are shown in very good Figures. I also very much appreciate all of the supplementary figures. I do not see any relevant weakness in the methods and the respective results. However, as outlined in detail in the reply to the authors, the wording of the manuscript can be improved, to make it clearer and understandable without the need to read previous publications.

      Everybody working with odors knows about the difficulty to precisely control and measure the exact molar concentration of odorants applied. But since the authors showed in previous publications that they take great care to control odor stimuli they should include also in the Material and Methods of this publication more details about concentration of the respective odor stimuli or mixtures employed.

      Did they achieve their aims? Do data support conclusions?

      Yes, the data support their conclusions as clearly shown in their excellent recordings, their excellent combination of physiological and morphological analysis, as well as their thorough statistical analysis.

      Discussion of the likely impact of the work on the field, utility of methods:

      This is an excellent, synergistic collaboration of different international experts in insect olfaction. It is still under-estimated how important the combination of single cell analysis in intracellular recordings with neural network analysis via calcium imaging is. Schemes of frequency encoding versus temporal encoding can only be deciphered with a clever combination of these techniques. This manuscript adds important insights into information processing of olfactory stimuli of antagonistic valence. It starts to become clear that in different sensory systems valence of aversive versus attractive sensory stimuli is processed in parallel pathways. Most likely antagonistic pathways connected to different neuronal units in premotor areas of the midbrain, connecting to parallel de- and ascending pathways of central pattern generators in the thorax. In addition, the current work provides relevant new information about processing of pheromone information in the different antennal lobe tracts in another important species. Thus, we may be one step closer to the future manipulation of sexual reproduction of specific insect pests.

      Context for others for interpretations:

      Sympatric heliothine moths use the same sex-pheromone components but at different concentration ratios, allowing for distinction of species that do not inter-mate. Thus, understanding how pheromone components at defined concentrations with opposite valence are processed in the brain to guide aversive or attractive behavioral interactions is relevant not only for determining principles of higher-order olfactory processing, but also to understand evolution of new species.

      We thank the reviewer for the comments and suggestions. To improve the part of the manuscript covering background information, we have included a new figure in the introduction section, Fig. 1, providing an overview of the olfactory pathway in male moths. Here, the schematic drawing (A) contains an overview of the uniglomerular medial-tract PNs confined to the plant-odor and pheromone sub-system, respectively, and their distinct paths from the periphery to higher olfactory centers. In the schematic drawing (B), we provide an overview of the three main ALTs in the moth. A detailed description of the system is included in the relevant figure legend. In addition, we have included a section in the discussion that compares morphological and physiological properties of MGC-PNs confined to each of the three parallel tracts. Finally, a consideration implying the distinct roles of the parallel ALTs is added.

      As suggested by the reviewer, we have added more precise information about the relevant odor stimuli in the revised version of the manuscript. We have clarified all details regarding pheromone concentrations as well as ratios in the materials and method section. In addition, we included relevant background knowledge on species-specific pheromone blends of sympatric moth species.

  5. Apr 2021
    1. Author Response:

      Reviewer #1:

      Guo et al. describes interesting experiments recording from various sites along a cortico-cerebellar loop involved in limb control. Using neuropixels recordings in motor cortex, pontine nuclei, cerebellar cortex and nuclei, the authors amass a large physiological dataset during a cued reach-to-grasp task in mice. In addition to these data, the authors 'ping' the system with optogenetic activation of pontocerebellar neurons, asking how activity introduced at this node of the loop propagates through the cerebellum to cortex and influences reaching. From these experiments they conclude the following: the cerebellum transforms activity originating in the pontine nuclei, this activity is not sufficient to initiate reaches, and supports the long standing view that the cerebellum 'fine tunes' movement, since reaches are dysmetric in response to pontine stimulation. Overall these data are novel, of high quality, and will be of interest to a variety of neuroscientists. As detailed below however, I think these data could provide much more insight than they currently do. Thus below I provide some suggestions on improving the manuscript.

      1) Since the loop is the focus of this study, it would be nice if the authors better characterized latencies of responsivity to pontine stimulation through the loop, to address how cortically derived information routed to the cerebellum may loop back to influence cortical function. In the data provided, we know that pontine stimulation modulates Purkinje and deep nuclear firing (but latency to responses are not transparently provided in the main text, if anywhere), while motor cortical responses peak at 120 ms (after stimulus onset?, unclear), and that this responsivity is preferentially observed in neurons engaged early in the reaching movement. Is the idea, then, that cortical activity early in the reach is further modulated by cerebellar processing to (Re) influence that same cortical population? Does this interpretation align with the duration of reaches, the duration of early responsive activity during reach, and the latency of responsivity; or is the idea that independent information from other modalities entering the pontine nuclei modulates early cells? Latency to respond at the different nodes, might aid in thinking through what these data mean for the function of the loop.

      We thank the reviewer for this important suggestion, and we have now added measurements of the latency from the onset of sinusoidal PN stimulation to neural responses in Purkinje cells, DCN neurons, and motor cortex (Supplemental Fig. 7), and observe a progressive recruitment of laser-evoked spiking along this pathway. There is a tradeoff between temporal resolution (which increases with decreasing bin width) and statistical power (which decreases with decreasing bin width), and we have opted to use 10 ms bins in a sliding window, which provides a reasonable compromise between these criteria. Although we potentially detect fewer tagged neurons at shorter latencies than we would with larger bins, this approach enables us to detect the timing of the earliest responses (defined as the earliest time point at which 5% of the neurons eventually recruited are responsive). Note that the sinusoidal stimulation used in these experiments is not ideal for latency measurements, as it takes 6.25 ms for the laser to reach peak power. We have also added a similar analysis for the response latency of PN neurons to pulse train stimulation of motor cortex (Supplemental Fig. 1). Based on these analysis, our estimate of the delay for signals to propagate across the entire loop is 26 ms: PN to motor cortex (21 ms) + motor cortex to PN (5 ms). Given that the movement duration (lift-to-grab) is approximately 110 ms on average, this would allow ~4 full feedback cycles throughout the reach. Thus, these delays are consistent with the possibility that cortical activity during planning or early in the reach is further modulated by cerebellar processing to influence that same cortical population later in the reach. Regarding the earliest motor cortical responses that we observe in PN-tagged units, it's possible that they may result from ponto-cerebellar input driven by other cortical regions. Alternatively, the responses of motor cortical neurons early in the movement may be driven more directly by other cortical areas or the basal ganglia, but these early-responding neurons may also receive strong ponto-cerebellar input due to plasticity during development or learning.

      2) Many of the figures need work to aid interpretation. Axis labels are often missing (eg 2F); color keys are often unlabeled (2F); color gradients often used but significance thresholds are hard to evaluate (using same colors for z scores and control / laser is confusing 6, 8); and within-figure keys would be useful (5D-h). These issues occur throughout the manuscript.

      We have added the axis and color labels in Fig. 2F, and have added additional annotation throughout the main and supplemental figures. For firing rate z-score heatmaps, we have kept the gray color scale for control and laser to facilitate direct comparison between the panels, but have added orange and blue boxes around the heatmaps in Fig. 6, 7, S8, and S9 to emphasize that they reflect different experimental conditions.

      3) Relatedly, but also conceptually, Figure 3B has particular issues, such as identifying where the neuropixel multiunit activity is coming from. I assume that in the gray boxes illustrating the spatio-temporal profile of spiking band activity that the lower part of the box is the ventral direction, upper, dorsal. This is not spelled out. From the two examples it would seem that the spiking band is in different places in the cerebellum, undermining, I think, the objective of the figure. It would be sensible to revisit this entire figure to identify the key takeaways and design figures around those ideas. As it stands, these examples appear anecdotal. Consider moving this to a supplement. Powerband density strength is missing an axis. More importantly, it would be nice to corroborate the interpretation of the MUA with the single unit recordings, since the idea is that many neurons are entraining to the PN activity. Yet, the examples don't seem particularly entrained. Is the activity being picked up on just axonal firing of the PN axons? Fourier analysis of spiking of isolated neurons in cerebellum should be used to corroborate the idea that cerebellar neurons are entraining, rather than the neuropixel picking up entrained PN axons.

      To examine spike entrainment to the 40 Hz PN stimulation for Purkinje cells and DCN neurons, we computed the phase of sinusoidal stimulation coinciding with each individual spike. If a neuron is entrained to the stimulation, the phase distribution for its spikes will differ from the uniform distribution on the circle; this can be assessed for each cell using a Rayleigh test. Furthermore, we can calculate the strength of entrainment and preferred phase by calculating the magnitude and angle of the mean resultant for each cell. If a neuron’s spikes are completely unrelated to the stimulation phase, the mean resultant length will tend to 0 as the number of spikes observed goes to infinity. If, on the other hand, a neuron is completely entrained (with every spike occurring at exactly the same phase), the mean resultant length will be 1. This approach is illustrated schematically in Supplemental Fig. 6A.

      This new analysis revealed two key features of the data we had not previously appreciated. First, it revealed PN-stimulation-induced changes in neural activity that were not apparent from the mean firing rate profiles: most Purkinje cells and DCN neurons were significantly entrained to the 40 Hz stimulation. Second, the entrainment strength was higher in the DCN than Purkinje cells (Supplemental Fig. 6B-D), suggesting the corticonuclear pathway amplifies the rhythmic input. This result is strikingly similar to published observations obtained from slice electrophysiology and anesthetized mice (Person & Raman, 2012), which we now discuss in the text. It is also possible that direct excitation from PN collaterals contributes to the DCN entrainment.

      We agree that the original analysis of multiunit activity is difficult to interpret, for two reasons: (1) the signal likely reflects the combined contribution of multiple cell types, including pontine mossy fiber terminals, and (2) the depth profile will differ for different electrode penetrations, due to the geometry of the cerebellar cortex. Furthermore, this analysis is largely redundant, since we have recorded from individual Purkinje cells and added new analyses demonstrating their entrainment to the 40 Hz stimulation (Supplemental Fig. 6). We have now moved this figure to the supplement and added labels to all axes (Supplemental Fig. 3).

      4) The use of the GLM is puzzling. In addressing the question of how cerebellum and motor cortex interact (from the Abstract, "how and why" do these regions interact) it is unclear why these regions are treated separately. I would have expected some kind of joint GLM where DCN activity is used to predict M1 variance (5 co-recordings are reported but nothing to analyze?); or where DCN + M1 activity is used to decode kinematics to see if it is better than one or the other alone. As it stands, we learn that there is more kinematic information in the motor cortex than in DCN. This is not necessarily surprising given previous literature on cerebellar contributions to reaching movements. In principle the idea that 'PN stimulation might perturb reaching kinematics through descending projections to the spinal cord, or by altering activity in motor cortex' is treated as mutually exclusive outcomes, though it is highly unlike to be so.' Analyzing M1+DCN together could address whether DCN activity adds nothing to decoding kinematics that isn't there in M1 or adds something that M1 does not have access to. The main point here is that the physiological datasets could be better leveraged with these fits to derive insight into the interactions of the loop. R2 should be provided in the GLMs (Fig 8) to assess statistically how well they perform relative to one another, not just correlations between the two.

      We have added two additional analyses to address these questions. First, in addition to motor cortex-based and DCN-based decoders for all sessions (Fig.8 and Supp. Fig.12A-D, G-H; all the R2 values are reported in Supp. Fig. 12C-D, G-H) we now also train a decoder using both motor cortical and DCN multiunit activity in sessions with simultaneous recordings (Supp. Fig.12E-F, I-J). When we train only on control trials, the decoder performs about equally well with or without the DCN multi-units for control trials (Supplemental Fig. 12E), but performs slightly worse on laser trials in comparison to using only cortical data (Supplemental Fig. 12F). When we train on both control and laser trials, adding DCN multi-units slightly degrades decoding performance on both control and laser trials in 3 out of 5 sessions (Supplemental Fig. 12I-J). Based on this comparison, it does not appear that DCN contributes kinematic information that is not already present in cortex. However, there are several cautionary notes to consider in interpreting these results. (1) This dataset consist of only 5 sessions, in all of which the recording yield in DCN was not as high as in cortex, so it is possible that dimensions of activity unique to DCN may not have been sampled enough in these experiments. (2) Our task involves only a single reaching target (in comparison to, e.g., center-out reaching tasks with eight targets which are possible in primates) so we cannot assess whether DCN contains directional-specific kinematic information not present in cortex. Thus, in light of these factors, it is difficult to draw strong conclusions from our experiments about differences in kinematic information between motor cortex or DCN. A more rigorous comparison requires carefully controlled experiments with many reaching targets, as in Fortier, Smith, & Kalaska (1993).

      Second, we have added an additional analysis to determine how predictive cortical activity is of DCN activity at the single-trial level, and vice versa. We considered several possible statistical approaches to this issue. Computing pairwise correlations of neurons in the cortex and DCN would be one possible method, but the outcome of this analysis would be difficult to interpret, as the sign and timing of firing rate peaks will vary across neurons. Another approach would be to regress principal component scores in one region - or their derivatives, as in Sauerbrei et al., 2020 - on the scores in another region. However, because cortex and DCN are bidirectionally connected, the choice of which region’s scores should be considered as the dependent variables is ambiguous, and this approach will merely “align” activity in one region (as a projection onto regression coefficients) with activity in the other. Ideally, we would like to find simultaneous linear transformations of both cortical and DCN activity that would maximally “align” them with one another, and to compute the correlations of the aligned neural trajectories. This is precisely what canonical correlation analysis (CCA) does, and CCA has been used increasingly in recent years to align population activity from different brain regions or samples - e.g., Lara et al., Nat. Comm. (2018), Perich et al., Neuron (2020), and Gallego et al., Nat. Neuro (2020). We took this approach with our simultaneous recordings of multiunit activity in the motor cortex and DCN, and found that:

      (a) In each of the 5 sessions, CCA found two pairs of canonical variates that were strongly correlated (Supplemental Fig. 11A, first two columns; Supplemental Fig. 11B, correlations in the range 0.58-0.88 for the first two canonical variates), and two pairs of canonical variates weakly correlated (Supplemental Fig. 11B, correlations <0.27 for the last two canonical variates)

      (b) The first two canonical variates accounted for half or more of the variance in each region (49%-64% in cortex, 51%-70% in DCN; Supplemental Fig. 11C, left column)

      (c) Between a quarter and a half of the variance in each region was accounted for by canonical variates in the other region (25%-50% of variance in DCN explained by cortex, 26%-47% in cortex explained by DCN; Supplemental Fig. 11C, right column)

      From these results we conclude that, within the constraints of our behavioral task, some but not all of the dominant dimensions of cortical and cerebellar activity are strongly correlated. We also performed additional CCA analyses using only laser trials or only control trials, to assess whether PN perturbation strongly affected the similarity in population activity between the two regions, but found limited differences between the results of the two analyses (Supplemental Fig. 11D).

      Reviewer #2:

      Guo et al examine the cortico-cerebellar loop during skilled forelimb movements in mice. The authors use optogenetic stimulation of the pontine nuclei (PN) and recordings in PN, cerebellar cortex, cerebellar nuclei (DCN), and motor cortex to show that PN output is transformed into a variety of activity patterns at different stages of the cortico-cerebellar loop. Stimulation only slightly alters movement-related activity in these structures and degrades movement accuracy. The authors conclude that the cortico-cerebellar loop fine tunes dexterous movement. The study is technically impressive, employing recordings in 4 brain regions, and recordings during optogenetic manipulations and behavior. The experiments are well done and the analyses are appropriate. The comparison across brain regions is comprehensive. The results that PN perturbation alters skilled movement and the perturbed activity could predict perturbed movement are important. The study adds to a long line of work supporting the view that the cortico-cerebellar pathway is required for fine motor control. I have a few comments on the interpretation and analysis which I believe could be addressed with changes to the text and additional analysis.

      1) The authors conclude that the cortico-cerebellar loop "does not drive movement" but "fine tunes" the movement. While I generally agree with this interpretation, I wonder if the authors could flush out the concepts of "driving movement execution" vs. "fine-tuning movement" more clearly. Do authors consider them separate processes? How can they be disentangled? I also feel the data on its own has some limitations that should be considered or discussed. First, the data shows that PN stimulation degrades movement accuracy. However, this does not yet reveal the function of the cerebellar loop in fine motor control. Certain places in the text makes stronger assertions (for example, "cortico-cerebellar loop fine-tunes movement parameters") that I feel the data does not support. It is not clear from the data how the loop tunes movement parameters. Second, Fig. 5F shows that stimulating PN blocked movement initiation in some sessions (this is also mentioned in the Methods). Could the authors consider the possibility that stimulating PN at a higher intensity might block movement? This is related to the distinction between "driving" vs. "fine-tuning" movement. At the very least, the authors should discuss these limitations and possibilities.

      In our view, the claim that a brain area drives reaching means that it is necessary for generating the large changes in muscle activity that set the limb in motion towards the target. The claim that a brain area fine-tunes reaching means that it is necessary for generating smaller changes in muscle activity that subtly adjust the limb trajectory and enable precise and accurate behavior. Previous work has demonstrated that motor cortex drives reaching: if it is transiently silenced, the initiation of reaching is robustly blocked (see Guo et al. 2015, Sauerbrei et al. 2020, and Galinanes et al. 2018). In the present manuscript, we show that perturbation of the PN has a very different effect: mice are usually able to initiate reaching, but they are less skillful (the success rate drops), slower (movement duration increases), and less precise (endpoint standard deviation increases). Our interpretation of these results is that while the total output of cortex drives movement (likely through corticospinal and cortico-reticulospinal routes), the cortico-cerebellar loop makes more subtle adjustments to the ongoing movement; that is, it fine- tunes. We have updated the text (in particular, the Abstract, Introduction par. 1, and Discussion par. 1-2) to clarify the distinction between driving and fine-tuning.

      We agree that several interpretive statements in the previous version (especially concluding sentences at the end of some Results paragraphs) were not clearly connected with the data, and we have removed or modified these statements. We now lay out our interpretation of the data as evidence for a cortico-cerebellar contribution to fine-tuning, rather than driving, in the first two paragraphs of the Discussion, but emphasis that this is an interpretation, rather than a direct description of the data. We have also changed the title to more directly state our experimental observations.

      We now mention the possibility that stronger stimulation or inactivation of PN neurons might have robustly blocked movement, and also mention several experimental variables which might have contributed to animal-to-animal variability in behavioral effects: “It is possible that the variability of behavioral effects ...” (Discussion).

      2) Related to point 1, in Fig. 5F, for stimulation trials in which mice failed to initiate movement, did mice fail to move altogether, or did they move in an abnormal fashion?

      We have added a new video documenting the behavior of the animal with the largest blocking effect from PN stimulation (supplemental video 2). This animal does not struggle through a partial reach, but fails to initiate movement. Small movements of the arm occurred (this also occurred in control trials), but these were not tightly synchronized with the onset of the laser across trials.

      3) In the abstract, the authors state that PN stimulation is "reduced to transient excitation in motor cortex". Also in the results (page 5) and discussion (page 8), "pontine stimulation only led to increases in cortical firing rates". These statements are based on the comparison between Fig 3D, 3F, and 4B. But I think the current presentation is somewhat misleading. First, Fig 3D, 3F, and 4B use different neuron selections that make direct comparison difficult. Fig 3 shows all neuron from Purkinje cell and DCN recordings. Fig 4B shows only PN-tagged motor cortex neurons. Furthermore, based on the methods description, it appears that PN-tagged neurons were defined using one-sided sign-rank test. Since the test is one tailed, does that mean neurons shown in Fig 4B are, by definition, neurons significantly excited by photostimulation? Looking at Fig 4B and 4C closely, there appear to be neurons suppressed by PN stimulation. Could the authors organize the rows in Fig 4 in the same way as Fig 3, where neurons that show suppression are grouped together?

      We now display the PN stimulation-aligned firing rates in the same format for Purkinje cells (Fig. 3B), DCN neurons (Fig. 3D), and motor cortical cells (Fig. 4A, lower), with all neurons in a single panel, sorted by response magnitude, for each area. The dominant response pattern in the cortical population is a transient firing rate increase, and this is more readily apparent with the new panel in Fig. 4A (lower). We also use a two-tailed test (which has slightly less statistical power, but allows us to test for both firing rate increases and decreases) for the identification of PN-tagged cortical neurons, and display neurons with stimulation-locked increases (n = 94) and decreases (n = 13) separately (Fig. 4B). In Fig. 4B-C, we still sort the neurons by their reach- related responses, as this reveals a difference in lift-aligned patterns between tagged and non- tagged neurons, which would be masked if we ordered according to stimulation-aligned responses. In Fig. 4D-E, we pool neurons with PN-stimulation-aligned increases and decreases into a “PN-tagged” group, as the small number of stimulation-aligned decreasing neurons (n = 13) does not allow adequate statistical power for a 3x3 contingency table test or for within-group averaging of lift-aligned firing rates.

      4) Fig 7 shows that PN stimulation has only subtle effects on movement-related activity in motor cortex. However, only a small portion (1/8) of the motor cortex neurons show modulation to PN stimulation. Fig 7 shows all neurons. Would the results look similar for PN-tagged neurons?

      We have added a new analysis to address this question, shown in Supplemental Fig. 10. The laser - control difference in lift-aligned activity are indeed larger for PN-tagged neurons; however, the largest peak in this difference occurs before lift, when the laser has been turned on, but the animal hasn’t started to move (Supplemental Fig. 10C).

      5) Page 3 "Our observation that the activity of some motor cortex-recipient PN neurons is aligned both to the cue and movement suggests that these neurons might integrate signals of multiple modalities." Presumably, motor cortex neurons also have cue and movement-related activity and PN simply inherits this activity from the motor cortex.

      As described in our response to the first reviewer’s seventh comment, we cannot conclude that the cue-related responses in the PN are inherited entirely from motor cortex. Briefly, (1) it has been difficult for us to reliably disassociate cue and movement responses for individual motor cortical cells (for instance, the GLM approach we took with PN neurons resulted in very poor model fits when applied to cortical cells), though our previous work has suggested that at the population level, the dominant signal in motor cortex is aligned to movement onset. To reliably disentangle cue and movement responses in cortex, we would need to train mice to wait for a relatively long and variable delay period before reaching. (2) The PN receive convergent input from many cortical areas, and there is likely a convergence of multiple inputs onto the motor- cortex-tagged PN units (c.f. the convergence of inputs from visual and somatosensory cortex onto individual PN neurons in rats reported in Potter, Ruegg, & Wiesendanger,1978). Hence it is possible (if not likely) that the multi-modal activity we observe in PN neurons results from the integration of inputs from different cortical areas, rather than being entirely inherited from motor cortex.

      6) Do Purkinje cells follow the 40 Hz PN stimulation like in the multi-unit recordings. The PSTHs in Fig 3 are too smoothed out to see this.

      As described in the response to reviewer 1.3 above, we have added a new analysis to the manuscript to address this question (Supplemental Fig. 6). Most Purkinje cells and DCN neurons are entrained to the 40 Hz stimulation, and the entrainment is much stronger in the DCN, consistent with previous work (Person & Raman, 2012).

      7) For the correlation analysis in Fig 6C top and 7C top, is the correlation computed from z-scored firing rates rather than on raw firing rates? This is not clear from the text. If computed on raw firing rates, one would expect the correlation to be above 0 even before photostimulation, since different neurons exhibit different baseline firing rates that presumably will be the same across control and stim trials.

      The correlations were indeed computed on z-scores, rather than raw firing rates, for this reason. We have clarified this in the Methods section. This analysis was designed to capture correlations in movement-related modulation between control and laser trials, and we z-scored the firing rates to avoid the confound that would have been introduced by baseline differences.

      Reviewer #3:

      It is generally thought that the cerebellum is primarily involved in the short-timescale control of movements, while motor cortex is involved in motor planning. The present paper follows classic studies in primates and a recent study in mouse that investigated the role of cortico-cerebellar loops in motor control. To date, studies in both species applied perturbations to the cerebellum to then study changes in cortical activity. For example, it has been long known that cooling deep cerebellar nucleus produces changes in the responses of motor cortex neurons in primate (e.g., Meyer-Lohmann et al., 1975). Further, Gao and colleagues' recent paper (Nature 2018) used optogenetics to perturb responses in the deep cerebellar nucleus before licking movements. The authors of this 2018 nature paper conclude that persistent neural dynamics are maintained during voluntary movements by connectivity in within this cortico-cerebellar loop.

      The experiments are well performed, and the results are logically organized and presented. However, a main concern is that the authors have not well justified that these experiments prove a conceptual advance. The conclusions appear to be largely consistent with those of prior work, both regarding changes in the responses of motor cortex neurons, and resultant (subtle) changes in behavior (i.e., altered arm kinematics). The impact of the paper would be improved if the authors adapted a more precise style of reporting the novelty of their results throughout.

      Major concerns:

      1) The experiments are well performed, and the results are logically organized and presented. However, a main concern is that the authors have not well justified that these experiments prove a conceptual advance. As noted above, prior studies have probed the role of cortico-cerebellar loops by applying perturbations to cerebellar activity (cerebellar cortex and/or deep cerebellar nuclei) and quantifying changes in cortical activity prior to and during movement. The main novelty of the present study is that the authors perturbed the loop at a different locus, namely in the pontine nuclei (PN) which send projections to the cerebellum rather than directly to the cerebellum. The rationale for why this specific perturbation provides a conceptual advance to the field was not adequately motivated.

      The authors do clearly review prior literature showing that perturbation of cortico-cerebellar projections impacts the rest of the loop and behavior, they also well explain the application of their exciting new tool to specifically target PN neurons with their optogenetic stimulation. Yet, the authors do not motivate why it is important to specifically perturb the pontine nuclei (PN) to gain new insights into the role of "cortico-cerebellar loops" nor do they provide any reason to expect a difference in changes in loop dynamics for perturbations applied versus to the DCN. Indeed, the conclusions appear to be largely consistent with those of prior work, both regarding changes in the responses of motor cortex neurons, and resultant (subtle) changes in behavior (i.e., altered arm kinematics). Generally, these results are similar to those previously reported in primate DCN cooling experiments characterizing changes in hand movement in in a voluntary tracking task (e.g., Brooks et al., 1973; Conrad and Brooks 1974).

      We agree that the rationale and conceptual advance require clarification. Previous work has established that silencing motor cortex blocks reaching (Guo et al. 2015, Sauerbrei et al. 2020, Galinanes et al. 2018), but the perturbations used in these studies were not selective to specific output channels (e.g., corticospinal, corticoreticulospinal, or corticocerebellar), and simultaneously influenced many projection targets of motor cortex. Other work from the Brooks, Prut, Person, and Svoboda groups has shown that altering cerebellar output impairs movement planning or execution, but their methodology did not test the effects of disrupting specific cerebellar inputs (e.g., from cortex). Thus, we would argue that previous studies have not provided direct evidence of the behavioral and neural effects of disrupting cortico-cerebellar signals. The central goal of the present manuscript is to test how selective impairment of cortico-cerebellar communication - not the simultaneous impairment of corticospinal, corticoreticulospinal, and cortico-cerebellar communication, and not a nonselective disruption of cerebellar output - disrupts behavior and neural dynamics across the cortico-cerebellar loop. Our conceptual advance, then, is to show that impairment of cortico-cerebellar communication does not typically block movement execution (as simultaneous perturbation of all motor cortical outputs does), but disrupts the fine kinematic details, similar to a direct manipulation downstream in the cerebellum. We have updated the text, particularly the Abstract, Introduction par. 1, and Discussion par. 1-2, to clarify this rationale and conclusion.

      2) The description of the connectivity of the loop illustrated in Figure 1 is straightforward. Motor cortex recipient PN neurons project to PN neurons, which then project directly to the cerebellar cortex and deep cerebellar nuclei, etc. Thus, the effect of any perturbation to PN neurons should be realized rapidly within neurons in the cerebellar cortex and deep cerebellar nuclei if they are part of this direct loop. However, onset latencies for the effect of the perturbations are not documented for these experiments (Figs 3&6 in the test/reaching conditions, and associated text). Similarly, latencies are not reported for the onset of changes in motor cortex neuron responses to PN perturbations in either condition (Figs 4&7 in the test/reaching conditions, and associated text). The only reference I could find to latencies specified the that required to reach the peak firing rate - not latency of the change. Specifically: "these were stereotypical, mostly consisting of transient excitation (Fig. 4B, left; median time of firing rate peak 120 ms)" - 120ms seems very long for the loop in Fig 1. It would be useful to know the latency between optogenetic stimulation in PN and changes in PN firing rate. And then the question is at what latency are the neurons in subsequent nodes altered? Quantification of latencies of the effects that are observes in the different nodes of the cortico-cerebellar loops would strengthen the authors' conclusion that they are actually studying the direct loop in Figure 1 which would then make the study's conclusions more compelling.

      We agree that it is important to characterize the latencies of neural responses to PN stimulation, and now provide these numbers for Purkinje cells, DCN neurons, and motor cortical neurons in the text and Supplemental Fig. 7. On stimulation of the PN, activity propagates first to Purkinje cells, then the DCN, and finally to motor cortex. We also quantify the latency of PN responses to motor cortical stimulation in Supplemental Fig. 1. (For a discussion of the rationale and limitations of our method, see also our response above to reviewer 1’s first comment.) Unfortunately, we have not been able to measure the delay from stimulation onset to the earliest spikes induced by ChR2 currents in PN neurons, as this would require simultaneous insertion of a stimulation fiber and recording probe to a deep target in the PN. Furthermore, we note that the earliest measurable response in Purkinje cells occurs 10 ms after stimulation onset, and this is likely an overestimate of the minimum latency, as it takes 6.25 ms for the laser to reach peak power under sinusoidal stimulation.

      3) Overall, there was often a sharp incongruity between the complexity of many of the findings described in results and accompanying figures and the short summary conclusion provided for the Results. Here is one of many examples (bottom of page 5), where the authors conclude "These results demonstrate that the cortico-cerebellar loop does not drive reaching, but fine-tunes the behavior to enable precise and accurate movement." Yet, what the results above describe is considerable heterogeneity and variability across animals and cases. These conclusion should be more aligned with/ justified by the author's description of their actual results.

      Throughout the Results section, we have now tied the interpretations more closely to the data. For example, in the instance the reviewer mentions, we now state: “These results demonstrate that PN stimulation impairs reaching performance, typically by disrupting precision, accuracy, duration or success rate of the movement.” In the first two paragraphs of the Discussion, we lay out our interpretation of the data as evidence that the cortico-cerebellar loop contributes to fine- tuning the movement, rather than driving it, but emphasize that this is an interpretation rather than a description of experimental results. Furthermore, we now address possible factors that could underlie the diversity of behavioral effects in the fourth paragraph of the Discussion (“It is possible that the variability of behavioral effects ...”).

      4) A related issue is the disconnection between description and summary, in the description of Figure 6- 8. The emphasis on correlation, yet the authors' main point here seems to be that there are changes in the activity in cortex and DCN induced by the PN stimulation during movement explain the changes in hand trajectory. For example, Figure 6D and its implications are not effectively described in the text.

      The main conclusion of figures 6 and 7 is that PN stimulation during movement alters movement-aligned cortical and DCN activity, but this modulation is typically subtle; that is, activity on control and laser trials is highly correlated for most neurons and time points. This is in contrast with more dramatic effects observed for perturbations delivered to other nodes in the loop; for instance, thalamic perturbations can robustly prevent the generation of the cortical pattern that drives movement (Sauerbrei et al. 2020). Supplemental Fig. 8D-E and Supplemental Fig. 9D-E suggest that these subtle stimulation-induced changes during movement are largely consistent with the changes that would be expected based on neural responses to laser alone, outside engagement with the task. Finally, the decoding analysis in Fig. 8 allows us to interpret these subtle neural changes: they do not appear to be random, but are consistent with the effects of stimulation on the hand. That is, the difference in hand velocity between laser and control trials decoded from neural activity is correlated with the observed hand velocity difference. We have added a video (supplemental video 3) to better visualize this result in all three spatial dimensions simultaneously, and have edited the text in the Results section to clarify these findings.

      5) Finally, the authors conclude that changes in the activity in cortex and DCN induced by the PN stimulation during movement explain the subtle deviations in hand trajectory and conclude that the cortico-cerebellar loop is responsible for fine-tuning movement parameters (bottom pf page 5 and top of page 8). However, i) the statement that this pathway fine-tunes motion is not justified by the analysis, and ii) the novelty is not made clear relative to prior work that has investigated cortico-cerebellar loop (beyond the experimental difference in perturbation site).

      Regarding (i), we agree that the fine-tuning is an interpretation rather than a direct reflection of the data presented in the paragraph, and have altered the statement accordingly: “Overall, these results show that the subtle changes in the activity in cortex and DCN induced by the PN stimulation during movement are consistent with the changes in hand trajectory for individual mice.” We now explain our interpretation of the data as supporting a fine-tuning role in the Discussion, rather than the Results. Regarding (ii), we have now clarified in the Abstract, Introduction, and Discussion that perturbation of the PN enables us to test the effects of a selective disruption of cortico-cerebellar communication, in contrast with direct manipulations of motor cortex or cerebellum (see also our response to comment 3.1 above).

      Overall, the text that follows in the discussion presented the findings in a far more clear and compelling way than much of the text in the Abstract, Introduction and Results "perturbing cortico-cerebellar communication did not block movement execution: animals were typically able to generate the basic motor pattern during optogenetic stimulation of the PN, and neural activity in cortex and cerebellum largely recapitulated the firing patterns observed during normal movement. Instead, PN perturbation altered arm kinematics, decreasing the precision and accuracy of the reach, and perturbation-induced shifts in neural activity explained these behavioral effects." The paper would be improved if the authors adapted this more precise style of reporting throughout.

      We have edited the main text throughout to improve clarity and precision.

    1. Author Response:

      Reviewer #2:

      Non-canonical pathways for regulating protein synthesis serve important roles for controlling gene expression in critical developmental pathways. Homeobox (Hox) genes encode many mRNAs regulated at the level of translation. A general feature for many of these mRNAs has been the proposal they are regulated by Internal Ribosome Entry Sites (IRESs) and possess sequences in the 5'-untranslated regions (5'-UTR) of the mRNA that prevent canonical cap-dependent translation, termed "translation inhibitory elements" or TIEs. However, the mechanisms by which these Hox mRNAs are regulated remain unclear. Here, the authors focus on two Hox mRNAs, Hox a3 and Hox a11, and find they use entirely different means to achieve the same end of repressing cap-dependent translation. Hox a3 uses the non-canonical translation initiation factor eIF2D and an upstream open reading fram (uORF), whereas a11 uses a "start-stop" uORF followed by a thermodynamically stable stem-loop to inhibit translation. Overall, the experiments support the major conclusions drawn by the authors, and nail down mechanisms that have been left unresolved since the Hox mRNAs were first discovered to be regulated at the level of translation. These results will be of wide interest to the translation and developmental biology fields.

      Some issues the authors should consider:

      1) The mapping of the TIE boundaries are in general well-supported by the luciferase reporter experiments. However, there seems to be a disconnect in the luciferase values in Fig. 1B compared to the western blots in Supplementary Fig. 1D, however. For example, in the a3 case the 106 and 113 bands don't seem to correspond to levels consistent with the luciferase activity. For a11, the 153 band is not consistent with the luciferase activity. Also, the gels at the bottom are confusing. Should 74 in the left gel be 77? It would help to have a clearer explanation in the figure legend.

      The reviewer is right, supplementary figure 1D is misleading. We have clarified the data with a new supplementary figure 1D. The gels presented in this figure are not western blots, they are SDS-page analysis of translated product (i.e. Renilla luciferase protein) in the presence of 35S-Methionin. Since the function of TIE elements was measured in comparison with reporters that do not contain any TIE element, we loaded on each gel a reference (lanes w/o TIE) for quantification purposes. Since the exposure time of distinct gels was variable, one should not compare the intensities in between gels. We added the quantification of the gel intensity related to the reference construct (w/o TIE). We agree with the reviewer that the two gels at the bottom are not informative, we removed them from the new supplemental figure 1D.

      2) The results in the various sucrose gradients are not entirely convincing as presented. In all these cases, the experiment would benefit from the use of high-salt conditions (See Lodish and Rose, 1977, JBC 252, 1181-ff) in the gradient to remove background 80S not engaged with mRNAs. For the +cycloheximide sample in Fig. 8, this looks more like a "half-mer" between a monosome and disome, rather than a standard polysome.

      We do not agree with the point raised by the reviewer on sucrose gradients. Obviously this is due to a misunderstanding of the conducted experiments. We would like to remind that the plots shown in the manuscript represent the percentage of mRNA transcripts labelled with a radioactive cap that were introduced in cell-free translation extracts. Therefore, since we monitor only radioactivity, the sole radioactive mRNA transcripts tested in these experiments are observed, consequently there is no background 80S that are not engaged with mRNAs. Such background 80S are visible on the OD profile shown now in a novel supplementary figure S6. However, non-engaged 80S are not radioactive and mRNAs that are not engaged in the 80S are found in the RNP fraction. The absence of radioactive background 80S is further corroborated by the use of edeine that prevents the codon-anticodon interaction (see data below).

      When we setup our experimental strategy, we first used edeine to validate our protocol, in this case no radioactive 80S is observed confirming that no background 80S is present in our assays. In conclusion, peaks at the level of 80S can only be radioactive mRNA engaged in an 80S. We have extended the figure legend to clarify the conducted experiments.

      Concerning Fig 8, we agree that this experiment is not conclusive and propose to remove it as mentioned in response to a comment from reviewer #1.

      3) In Fig. 7, it would be helpful to see the absolute level of translation from the reporters, as it is not clear what the baseline level of translation is in the knockdown cell lines. It's hard to judge the eIF4E knockdown case in particular without this information. Also in panel B, the GGCCC147 cell line is missing.

      As previously mentioned, we agree that Fig 7 is misleading and we have completely remodelled the figure in the revised manuscript. See also point 5 from reviewer #1. Because the GGCCC147 mutation had no effect in RRL, we decided not to test it in HEK cells and focused on the GGCC107 that has a significant effect both in RRL and in HEK cells.

      4) From the MS experiments in Fig. 6 and Supplementary Fig. 6, the authors focus on eIF2D, which makes sense. But they don't comment on two other highly suggestive hits in the a3 vs. beta-globin and a3 vs. a11 comparisons. These are eIF5B and HBS1L. Both are highly suggestive of what might be going in with the eIF2D-dependent translation mechanism. They don't show up in the GMP-PNP samples in Supplementary Fig. 6, which is interesting and would deserve a comment.

      We are grateful for this very interesting comment. As suggested, we have inserted a comment related to HBS1L and eIF5B in the discussion of the manuscript.

    1. Author Response:

      Reviewer #1:

      Samineni et al. seek to identify and characterize the brain mechanisms responsible for itch-related behaviors. Previous work by this group and others showed that mouse CeA contains itch-responsive neurons. Here the authors set out to determine the molecular and circuit identity of these neurons, their necessity and sufficiency in controlling scratching behavior and itch-related affective components. Using photometry in Vgat-IRES-Cre animals, they show that GABAergic neurons in CeA are active during scratching behavior. In subsequent experiments, scratch-responsive neurons are TRAPed (with scratching behavior elicited by pruritogenic chloroquine injections) and later manipulated using optogenetics and DREADD to test their necessity and sufficiency in scratching behavior and other known CeA-dependent behaviors. Scratching bouts are optogenetically driven with or without chloroquine, suggesting that the neurons are sufficient to elicit this behavior. Optogenetic stimulation is also used in a closed-loop real time assay and zero plus maze to show that chloroquine-TRAPed CeA neurons encode aversive affect and anxiety-related behaviors. Inhibitory DREADD is used to show that TRAPed neurons are required for choroquine-mediated itch behaviors and aversive affect elicited by chloroquine. Appetitive studies show that manipulation of chloroquine-TRAPed neurons does not affect free feeding or food seeking. Viral tracing studies show a connection between the CeA and vPAG and optogenetic manipulations of axon terminals in this circuit reproduces findings with TRAPed CeA neuronal manipulations. Finally, TRAPed neurons are isolated and sequenced in an effort to identify their unique molecular profiles. These results strongly suggest that a subtype(s) of CeA neurons are activated by chloroquine and are important for both scratching behavior and affective aspects of the behavior, while not being involved in appetitive behaviors. However, the use of terms like 'active avoidance' is misleading based on the assays used and interpretation of some of the findings is muddied somewhat by missing or inadequately described control data.

      We thank the reviewer for the thoughtful comments on our work. We do understand how our use of “active avoidance” can lead to confusion. Itch is an aversive sensory experience. In mice, pruritic stimuli (chloroquine and histamine) can produce robust place aversion (Mu and Sun, 2017 and Samineni et al, 2019). We interpreted this learned avoidance to pruritic stimuli as 'active avoidance'. As you pointed out, this can lead to confusion in interpreting our results. To mitigate any confusion, we have now removed any reference to active avoidance in the manuscript. We have also addressed other minor issues raised as requested.

      Reviewer #2:

      The neurological pathways that give rise to the distinct response to irritation of the skin are largely unknown. This study investigates the neurons in a region of the brain well known to be, in part, responsible for assignment of positive and negative valence to sensory information, the amygdala. The data in this study clearly establish an important role of the central area of the amygdala in initiating itch. It provides several lines of evidence for this conclusion using different molecular genetic approaches. The weaknesses of the study are minor.

      We have addressed the minor issues identified by the reviewer, including clarifying why we chose Vgat neurons for our fiber photometry experiments and the nature of additional projection fields from the itch-activated neurons.

      Reviewer #3:

      Samineni et al. provide a beautiful insight into the mouse circuitries of itching in the Central Amygdala, a region of the brain that has apart from its role in pain, received ample attention for its role in feeding and freezing/escape to threat behavior. The manuscript provides an impressive amount experimental evidence, combining activity dependent gene expression with expression of genetically encoded calcium indicators, fluorescent proteins, optogenetic and chemogenetic tools, fiberoptometry and behavioral readouts. With these they identify a subpopulation of GABAergic neurons in the central amygdala that are activated by neck-applied chloroquine-induced itch (as witnessed by the presence of specific scratching in the neck). They show how their specific optogenetic reactivation (in the absence of chloroquine) induces 1). (non-directed all over the body) scratching 2). Real-time place aversion, and reduced spending in open arm of elevated zero maze. And they show how specific chemogenetic inhibition in the presence of chloroquine reduces scratching and real-time place aversion . They then go further to show by fluorescence axonal projections of these neurons in the vPAG and how optogenetic activation of these projections in the vPAG also induces (non-directed) scratching behavior. Finally they identify the genetic blueprint of these neurons with FACS. The experiments all well performed and provide convincing evidence for the implication of neurons in the CeA in sensitivity to itch and activity of scratching. It stands out for a rich combination of diverse state of the art technical approaches that are appropriate applied to answer the questions at hand.

      In its completeness, the manuscript raises an important number of open questions in the field, and I would like to encourage the authors to identify these more clearly in their discussion, as they could set out a pathway along which this field may develop further.

      We appreciate the overall positive assessment, and the suggestion to expand the discussion of a number of issues. We have made significant changes to the discussion to address the findings regarding spontaneous, non-directed scratching induced by reactivation of the chloroquine-trapped neurons, the relationship of genes found to be enriched in itch-activated CeA neurons and those identified as important markers of cells involved in other behaviors (such as pain, fear), and the various projection fields observed in itch-activated neurons with a discussion of their relationship to various aspects of the itch/scratch cycle.

    1. Author Response:

      Reviewer #1:

      This manuscript presents a generalizable tool for the comparison of single-cell atlases across species. The work addresses an important problem given the proliferation of such cataloguing efforts across a rapidly increasing diversity of organisms, and the opportunities this presents for comparative and evolutionary biology. The algorithms developed extend the use of self-assembling manifolds to this critical problem by addressing key challenges in the assignment of homologous genes and cell types. The method will be extremely useful for comparative studies to understand the evolutionary relationship of different cell types, and to quickly assign the cell type identity to new single-cell atlases by taking advantage of existing datasets. The authors demonstrate the robustness of the method by comparing cell atlases from diverse metazoans. In the process, the authors arrive at three provocative evolutionary conclusions that will require further investigation to fully support: widespread paralog substitutions, the multifunctionality of ancestral contractile cells, and the existence of a deeply conserved gene module associated with multipotency.

      Strengths:

      A key advantage of the approach presented is the relaxation of one-to-one mapping of orthologous genes, instead considering all possible homologous sequences in the alignment of the transcriptomes. Similarly, alignment of cell types is achieved by taking into account the general neighborhood of cell types and not just the closest match. The authors show that the algorithm outperforms existing methods, which were not really developed for the alignment of distantly related cell types. I expect this method will therefore be of general interest to anyone working with diverse organisms.

      Cell types inferred from the use of algorithm could be validated in the poorly studied parasite Schistosoma mansoni. These experiments provide a glimpse into the broad utility of the analysis presented, which can be used as a resource in itself.

      We thank the review for these positive comments.

      Weaknesses:

      The observation of widespread paralog substitution may be complicated by the use of relaxed gene orthology assignments in the initial alignment of cell types. It will be important to see whether similar levels of paralog substitution are observed when the paralogs in question are excluded during manifold assembly. This would ensure that the apparent paralog substitution is not a consequence of the necessary relaxation of ortholog assignments.

      We have performed the suggested analysis, with results summarized in the reply to the editor’s comments 2.1. and copied below.

      SAMap yields a similar combined manifold when using only one-to-one orthologs (Figure 2E), suggesting that at least for the zebrafish-frog comparison the paralogs are not driving the manifold mapping. To rule out the possibility that these paralogs were linked spuriously during the homology refinement steps of SAMap, we repeated the paralog substitution analysis on the combined manifold constructed using only one-to-one orthologs. This identified a largely similar set of paralog substitution events, although weaker manifold alignment when restricting the mapping to one-to-one orthologs led to the loss of some substitution paralogs that showed lower correlations. These new results are now reported in Figure 3 – figure supplement 1 and discussed in the text (lines 242-251).

      Further study of this phenomenon could reveal whether paralogs are more likely to be substituted in cases where they arose more recently, and whether the substitutions are stable within clades-perhaps elucidating different paths of specialization following the ancestral gene duplication event.

      To determine whether paralog substitution depends on how recently they arose, we used the orthology groups provided by Eggnog to infer when paralogs duplicated during evolution. We found that more recent paralogs substitute at higher rates than more ancestral paralogs, which is in line with the expectation that less diverged genes are likely more capable of functionally substituting each other (Figure 3C). We also used the paralog substitution score to quantify the rate of paralog substitution in each cell type and observed that substituting paralogs are expressed in a wide variety of cell types, with some (e.g., dorsal organizer) exhibiting higher rates than others (Figure 3B), indicating uneven diversification rates of paralogs across cell types. Unfortunately, assessing the stability of paralog substitutions within a clade requires more cell atlases than what are available at the moment. This analysis needs to densely sample species within clades and at key branching points along the tree of life. We now discuss these new results and possible future directions in the text (lines 229-231, lines 237-242, and lines 448-455).

      The claim that ancestral contractile cells were multifunctional demands closer exploration of the gene module common to this cell type across species. Cellular contractility is a complex process in any cell and the distribution of the gene module across categories of signaling, actin regulation, and cell adhesion does not in itself imply multifunctionality.

      This comment has been addressed in the reply to editor’s comments 2.3., which is copied below.

      We apologize for this confusing statement. We have modified the text (lines 356-359) to clarify that ancestral contractile cells may already possess the broad assemblage of gene modules associated with different functional aspects of modern muscle cell types, including the adhesion complex that connects cells, actomyosin networks that drive contractility, and signaling pathways that stimulate contraction.

      The authors also point to a second enriched module within multipotent cells (stem cells) which could be investigated further. Cursory analysis suggests that the gene signature might simply be the consequence of actively dividing cells lacking specialized cell identity markers, as opposed to a more fundamental program of multipotency.

      Thanks for noting this potential point of confusion. We now provide three lines of evidence to show that stem cells are mapped through similarities beyond common features of dividing cells. First, though we did observe conservation of genes involved in cell cycle and DNA replication, they are not the most enriched gene categories (Figure 6C). Second, we have now performed new analysis to compare multipotent stem cells (MSCs), lineage-restricted stem cells, and differentiated cells for all four invertebrates analyzed in this study. We found that the conserved genes in MSCs consistently have lower expression in lineage-restricted stem cells, which also divide actively. This suggests that the gene expression program associated with MSCs is not shared by all dividing cells. Finally, this new analysis also identified several transcriptional regulators enriched in MSCs compared with other stem cells (Figure 6D). These genes include members of transcription factor families that are known to be essential in mammalian pluripotency (e.g., sox and klf) and chromatin modifiers that are not directly associated with the cell cycle but have reported functions in stem cell maintenance (e.g., kat7 and sub1). These new results are now discussed in lines 380-399.

      Reviewer #2:

      The authors sought to build upon their previously methods (self-assembling manifolds) to utilize these data representations to compare single cell atlases between organisms and compare cell types.

      Major strengths of the paper include:

      1) Benchmarking against state of the art integration methods

      2) Clever framework to relax the constraints on sequence orthology

      3) Many comparisons across diverse organisms

      The authors achieve their proposed aims and these tools may provide useful insight for the field going forward; however, it would be useful for the authors to highlight any potential limitations to the approach, places where comparisons did not work out well, etc.

      We thank the reviewer for this great suggestion. As detailed in the reply to editor’s comments 1.1-1.2, we have now performed new analysis and discussed potential limitations. These include the scalability to large datasets, the applicability to datasets collected across different pipelines, and the robustness to overfitting.

      Reviewer #3:

      The manuscript by Tarashansky et al., builds on this group's recently developed self-assembling manifold algorithm to develop methods for aligning cells of the same type across distantly related species using single cell gene expression data. The new method, SAMap, considers homologous genes in a novel way that takes into account paralog substitutions through gene expression correlations and the method further considers cell neighborhood relationships within and between species. Together, and through iterative analysis, these innovations maximally utilize the single cell data compared with only considering 1:1 orthologous genes and direct transcriptional correlations of cell types. Importantly (based on assumptions about cell type evolution), this method can identify homologous cell types based on shared neighbors, even if gene expression has diverged. The authors first apply SAMap to identify homologous cell types between developing zebrafish and xenopus at the whole organism level. SAMap captures nearly all homologous cell types, even with 1:1 orthologs using the mutual nearest neighbors approach whereas other top-in-field methods do poorly at this large evolutionary distance. SAMap also identifies 565 examples of candidate paralog substitution based on closer expression correlation of paralogs than orthrologs. The authors further extend these comparisons to flatworms and trematodes, and then to further include sponge, Hydra, and mouse. One fascinating result is that Spongilla choanocytes and apopylar cells show homology to the neuronal family, supporting recent predictions.

      Overall, I find this approach extremely powerful and likely to be widely used in the study of cell type evolution and separately in the study of gene neofunctionalization. The validation among known homologs in distant vertebrates and benchmarking is convincing. My only major comment is that the authors could try a "leave one cluster out" analysis in the zebrafish xenopus comparison to ensure that the method does not overfit when a homologous cell type is absent.

      Thanks for this great suggestion. We have performed the analysis and the results are summarized in the reply to editor’s comments 1.2. and copied below.

      To evaluate if SAMap overfits in cases where some cell types are missing, we performed dropout experiments in which we systematically removed each cell type that has an annotated homolog in the comparison of zebrafish and frog atlases. Cell types whose homologous partners were removed weakly mapped to closely related cell types, and most of these links were already present in the original analysis (Supplementary File 3). For example, optic cells from both species are also connected to eye primordium, frog skeletal muscles to zebrafish presomitic mesoderm, and frog hindbrain to zebrafish forebrain/midbrain. While we observed several mappings that were not present in the original analysis, their alignment scores were all barely above the detection threshold of SAMap. Moreover, most of these edges were mapped between cell types with similar developmental origins, with the only exception being the zebrafish neural crest mapped to the frog otic placode in the absence of frog neural crest cells. Examining the genes that support this mapping revealed that both cell types express sox9 and sox10, two TFs previously implicated to form a conserved gene regulatory circuit common to otic/neural crest cells (Betancur et al., 2011). These results are now discussed in the text (lines 194-210).

      Minor comments:

      I am confused about how the homologous zebrafish and xenopus secretory cells with different developmental origins fit into the evolutionary cell type model. Could the foxa1 grhl cells that differ in their germ layer cells represent homology via horizontal transmission of a shared secretory gene network and convergent function rather than identity by descent and hierarchical diversification of a shared developmental gene regulatory network?

      We thank the reviewer for raising this important point. We now provide a deeper discussion about key transcription factors that are conserved between the secretory cell types (lines 166-175), as well as additional discussion regarding cell type homology and evolutionary convergence (lines 427-436). Specifically, we point out that the shared TFs are known to play important roles in specifying secretory cell types. For example, we now identified a shared TF (klf17) between zebrafish and frog hatching glands, which arise from different germ layers. klf17 homologs have been shown to be crucial for the specification of the hatching glands in both zebrafish and frog (Kurauchi et al., 2010; Suzuki et al., 2019). The fact that these cells types share a number of TFs implicated in secretory cell type specification suggests they are evolutionary homologs, and did not evolve their functions convergently. This secretory cell type regulatory network has been likely redeployed (or co-opted) into different developmental lineages. Developmentally, this resembles convergence because different developmental lineages converge on similar identities. However, this is distinct from evolutionary convergence, because the secretory cell type regulatory network – composed of cell type-specific TFs and their downstream effector targets – evolved only once. Under evolutionary convergence, we would expect to observe different TFs driving secretory effector gene expression, reflecting the different cell type specification networks that converged on similar functions. However, fully resolving this evolutionary history will require further characterization of these networks in fish, frogs, and a broader array of vertebrates, which is outside the scope of this study. We hope our observations and discussion on this topic will stimulate research in this direction, and again thank the reviewer for raising this point.

      Are there any differences in the properties of genes that are deeply conserved in metazoan cell types (e.g., Fox, Csrp families in contractile cells) vs. genes that are more lineage restricted (e.g., mef2) - for example are the more conserved genes more central in regulatory networks within a species and thus more constrained?

      We agree with the reviewer that this is an important question. Genes that are deeply conserved throughout metazoan cell type families may be more central to the regulatory network compared to lineage-restricted genes. We now mention in the text (lines 371-373) that this is an important question to address in future studies.

      Why did heart, germline, and olfactory placode cells not cluster in the xenopus atlas - these seem like conserved populations, or was this due to sampling / staging?

      In the original analysis of the frog atlas, some cell clusters were isolated and subjected to a second round of sub-clustering. The final clustering assignments can therefore reflect very subtle differences that are not apparent when considering the entire dataset. As a result, the germline cells are scattered across the reconstructed manifold and do not concentrate in a distinct cluster. The heart cells and olfactory placode cells are inextricably mixed with larger populations of intermediate mesoderm and placodal cells, respectively. We have now clarified this potential point of confusion in the methods section (lines 635-642).

    1. Author Response:

      Evaluation Summary:

      This well-done study establishes a work flow for the analysis of the peptidome of wound fluids. By doing so it enables the identification of peptide patterns associated with wounds that are healing versus non-healing. The method may therefore help to define candidate biomarkers for wound healing. Overall enthusiasm was somewhat dampened by findings previously reported by the same group and also by others.

      We appreciate the positive evaluation of the work and its applicability and have now added clarifications and information on the relationship between this and the subsequent Frontiers Immunology paper published by our group. We want to stress that the current eLife MS was originally uploaded on the preprint server medRxiv on the 3rd of November 2020. The Frontiers paper, which is a follow up study of the current manuscript, was published in February 2021. Importantly, the latter is based on, and refers to, the original methodology and peptidome data described in the medRxiv article (which was then later transferred to eLife). In the current revised version of the manuscript, we now clearly describe the originality of the methodology described here and its precedence, and the overall separate and independent character of the current manuscript. In particular, we now thoroughly discuss the findings of this present study in relation to the Frontiers article and therefore, we believe that the uniqueness of the present paper is now made very clear. Therefore, in our opinion, the Frontiers article does not diminish the novelty and strength of the current manuscript, as it has an overall separate and independent character. Instead, it increases its strength as we showed that the here described method and obtained qualitative results can be used successfully in quantitative bioinformatics analysis as well.

      Finally, as stated in our MS, peptidomics investigations have been conducted for a number of different biological samples, including plasma, cerebrospinal fluid, saliva, tears, and brain tissues. To our best knowledge, there are no other reports of peptidomic analysis of wound fluids. Previously published mass spectrometry based analyses of wound fluids have used classical proteomics and N-terminomics, thus investigating very different subsets of wound fluid components. These previous studies do therefore not diminish the novelty of the current study.

      Reviewer #1:

      This paper focuses on using liquid chromatography and mass-spectrometry (LC-MS) to compare peptidome of human wound fluid. In this study, uninfected healing wound fluid and infected would fluid were evaluated for potential differences that can predict wound status and infection risk. The authors concluded differences between plasma and wound fluid as well as differences between non-inflamed/non-infected wounds fluid in term of signature of LG-MS peptidome and peptide alignment maps.

      Through their analysis they found many traditional biomarkers associated with wounds such as the cytokines IL-1β, 403 IL-6, IL-8 and TNF-α; the major novel findings come from the vast number of new peptide sequences they described, that could be used as wound biomarkers or drug targets in the future. The major counterargument for their otherwise novel findings is the same group's recent publication on wound biomarkers recently published in Frontiers in Immunology, "Bioinformatic Analysis of the Wound Peptidome Reveals Potential Biomarkers and Antimicrobial Peptides".

      Regarding our recently published paper and the relation to this manuscript, please see our comments on the public evaluation summary.

      Reviewer #2:

      The authors used mass-spectrometry to analyze the peptides that are present in wounds as a result of proteolysis. The authors thoroughly investigated multiple aspects of the methods for peptidomics. The best sample preparation was determined and robustness was shown by comparing multiple injections or multiple sample preparations. Subsequently, different types of samples were tested, i.e. normal plasma, sterile acute wound fluid and infected wound fluid, in order to be able to distinguish e.g. common proteins. Wound fluids were shown to contain more and smaller peptides than plasma. Further analysis showed clear differences in peptide profiles between wound fluids and plasma. In high inflammatory samples, which contain high levels of cytokines, the protein degradation correlated with enzymatic activity (zymograms). Many proteins were identified that were found exclusively in the low or in the high inflammation group. This will help elucidating the pathways during wound healing and/or infection but also for diagnosis or biomarker discovery.

      The conclusions of this paper are well supported by data. Although interesting differences were found between low and high inflammation, only a limited number of patient samples have been analyzed.

      Reviewer #3:

      Van der Plas et al established a mass-spectrometry based work flow for the analysis of peptidome in wound fluids. They found that wound fluids contained a higher degree of peptides as compared to plasma which is expected because of proteolytic events in wound fluids. Authors identified unique peptide patterns in healing and non-healing (infected) wounds and nicely discuss many of the identified peptides/peptide patterns and their likely roles in innate immunity, healing etc. The established methodology seems to be robust and yields interesting insights into proteolytically generated peptides in wound fluids. Authors speculate that assessing the peptidome of wounds would result in the identification of potential biomarkers for wound healing and infection.

      The manuscript is the first that determines the peptidome in wound fluids using an unbiased technology. However, the results gained are largely confirmative or "as expected" because others have previously reported an increase in peptides number in wound fluids due to proteolytic activity. Also the same group recently published a related paper without discussing it. The main novelty of the manuscript is thus more of technological interest, as long as the translational perspective (diagnostic approach) has not been demonstrated.

      Indeed, increased peptide numbers in a proteolytic environment, such as wound fluids, are to be expected and have been reported earlier. However, that does not make our results largely confirmative, as the aim of this study was not to investigate quantitative differences in peptide numbers, but to study qualitative differences in peptide patterns. Previously published mass spectrometry based analyses of wound fluids have used classical proteomics or N-terminomics, thus investigating very different subsets of wound fluid components. Therefore, these previous studies do not affect the novelty of the current study.

      Regarding our recently published paper and the relation to this manuscript, please see our comments on the public evaluation summary.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript Rao et al. describe an interesting relationship between KSR1 and the translation regulation of EPSTI1 (a regulator of EMT). They identified this relationship by polysome RNAseq of CRC cells in the context of KSR1 knockdown (KD) which they confirm by polysome QPCR. They then go on to show that KSR KD and add back influences EPSTI1 expression at the protein but not mRNA level and impacts cell viability, anchorage-independent growth, and possibly cell migration. They focus on the cell migration phenotype and show that it is associated with changes in EMT-related genes including E-cad and N-cad. Interestingly, add back of EPSTI1 can reverse the phenotype elicited by KSR1 deletion. Overall, this story is interesting and translation regulation by KSR1 has not been described previously. However, Rao et al. do not provide a mechanism for how KSR1 regulates the translation of EPSTI1, and it is unclear whether this occurs through eIF4E, as the authors suggest.

      We agree completely that our observation that KSR1-dependent ERK regulation of EPSTI1 to promote an EMT-like phenotype raises new questions regarding how the translation of EPSTI1 mRNA is regulated. An additional intriguing question that arises from out work is how this relatively nondescript protein enhances the E- to N-cadherin switch in the colon cancer cells. Multiple possibilities (e.g., altered RNA processing or ribosome heterogeneity) may mediate ERK-dependent regulation of EPSTI1 translation and induction of the cadherin switch. RNA-binding proteins affect discrete cell behaviors, including motility and invasion, by selectively regulating pre-mRNA splicing, mRNA stability, and localization. However, it is hard to imagine a general mechanism involving ERK-mediated regulation of 4E-BP1 and eIF4E, which would affect global mRNA translation, as responsible for a selective effect EPSTI1 mRNA translation and discrete components of EMT-like behavior. Indeed, while KSR1 disruption and ERK inhibition potently suppressed EPSTI1 translation, robust inhibition of mTOR signaling had little effect on EPSTI1. Further development of the detailed cellular mechanisms and critical regulators mediating translation- dependent EMT-like behavior should now be possible.

      Reviewer #2 (Public Review):

      KSR1 functions as a critical rheostat to fine-tune MAPK signalling, and identifying modes by which its over-expression promotes tumor progression is clinically important and potentially druggable. Ras is highly mutated in CRC and unfortunately inhibitors of Ras have been challenging to develop. However, small molecules which stabilize an inactive form of the KSR are actively being developed in an attempt to repress RAS signaling. Thus, this study, which seeks to identify how KSR1 promotes oncogenic mRNA translation, is potentially highly clinically relevant, as it may identify novel druggable targets.

      In this manuscript the authors performed polysome profiling in colorectal cancer (CRC) cells and proposed that KSR1 and ERK regulate the translation of EPSTI1 mRNA. They go on to characterize the phenotypes associated with knock-down or knock-out of KSR1 in CRC, and show that their defects in invasion, anchorage-independent growth and switch to a less EMT-like phenotype are all EPSTI1-dependent.

      The authors succeeded in providing ample in vitro data that KSR1 and EPSTI1 are potential therapeutic targets in CRC. However, the data demonstrating that KSR1 and ERK regulate EPSTI1 mRNA translation is tenuous. Although the authors state that "EPSTI1 is necessary and sufficient for EMT in CRC cells", the data presented are consistent with a more restrained conclusion of a partial-EMT and not EMT per se. Finally, without an in vivo model it is difficult to glean novel insight into the mechanism by which KSR1 and/or EPSTI1 control the invasive and metastatic behaviour of cells.

      We greatly appreciate your comments and are excited about the implications of KSR1-EPSTI1 signaling in promoting the EMT-like phenotype in colon cancer cell lines. We have corrected the use of term ‘EMT’ to ‘EMT-like phenotype’ within the text of the manuscript. We recognize the limitations of using only in vitro data to demonstrate the role of KSR1 and EPSTI1 in promoting motility and invasion in colon cancer cells. In vivo studies will be invaluable to our future efforts to determine the extent to which EPSTI1 promotes metastatic behavior in colon tumors.

      Reviewer #3 (Public Review):

      It is established that Kinase suppressor of Ras 1 (KSR1) contributes to the oncogenic actions of Ras by promoting ERK activation. However, the downstream actions of this pathway are poorly understood. Here Rao et al. demonstrate that this KSR1-dependent pathway increases translation of Epithelial-Stromal Interaction-1 (EPSTI1) mRNA and expression of EPSTI1 protein. This is significant because EPSTI1 drives aspects of EMT, including expression of ZEB1, SLUG, and N-Cadherin. The analysis is thorough and includes both loss-of-function and gain-of-function studies. Overall, the conclusions of this study are convincing and advance our understanding of cancer development.

      We appreciate the positive feedback, and we are excited on implications of our findings on translation regulation of KSR1 on EPSTI1.

    1. Author Response:

      Evaluation Summary:

      Since DBS of the habenula is a new treatment, these are the first data of its kind and potentially of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap. This paper is of interest to neuroscientists studying emotions and clinicians treating psychiatric disorders. Specifically the paper shows that the habenula is involved in processing of negative emotions and that it is synchronized to the prefrontal cortex in the theta band. These are important insights into the electrophysiology of emotion processing in the human brain.

      The authors are very grateful for the reviewers’ positive comments on our study. We also thank all the reviewers for the comments which has helped to improve the manuscript.

      Reviewer #1 (Public Review):

      The study by Huang et al. report on direct recordings (using DBS electrodes) from the human habenula in conjunction with MEG recordings in 9 patients. Participants were shown emotional pictures. The key finding was a transient increase in theta/alpha activity with negative compared to positive stimuli. Furthermore, there was a later increase in oscillatory coupling in the same band. These are important data, as there are few reports of direct recordings from the habenula together with the MEG in humans performing cognitive tasks. The findings do provide novel insight into the network dynamics associated with the processing of emotional stimuli and particular the role of the habenula.

      Recommendations:

      How can we be sure that the recordings from the habenula are not contaminated by volume conduction; i.e. signals from neighbouring regions? I do understand that bipolar signals were considered for the DBS electrode leads. However, high-frequency power (gamma band and up) is often associated with spiking/MUA and considered less prone to volume conduction. I propose to also investigate that high-frequency gamma band activity recorded from the bipolar DBS electrodes and relate to the emotional faces. This will provide more certainty that the measured activity indeed stems from the habenula.

      We thank the reviewer for the comment. As the reviewer pointed out, bipolar macroelectrode can detect locally generated potentials, as demonstrated in the case of recordings from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). However, considering the size of the habenula and the size of the DBS electrode contacts, we have to acknowledge that we cannot completely exclude the possibility that the recordings are contaminated by volume conduction of activities from neighbouring areas, as shown in Bertone-Cueto et al. 2019. We have now added extra information about the size of the habenula and acknowledged the potential contamination of activities from neighbouring areas through volume conduction in the ‘Limitation’:

      "Another caveat we would like to acknowledge that the human habenula is a small region. Existing data from structural MRI scans reported combined habenula (the sum of the left and right hemispheres) volumes of ~ 30–36 mm3 (Savitz et al., 2011a; Savitz et al., 2011b) which means each habenula has the size of 2~3 mm in each dimension, which may be even smaller than the standard functional MRI voxel size (Lawson et al., 2013). The size of the habenula is also small relative to the standard DBS electrodes (as shown in Fig. 2A). The electrodes used in this study (Medtronic 3389) have electrode diameter of 1.27 mm with each contact length of 1.5 mm, and contact spacing of 0.5 mm. We have tried different ways to confirm the location of the electrode and to select the contacts that is within or closest to the habenula: 1.) the MRI was co-registered with a CT image (General Electric, Waukesha, WI, USA) with the Leksell stereotactic frame to obtain the coordinate values of the tip of the electrode; 2.) Post-operative CT was co-registered to pre-operative T1 MRI using a two-stage linear registration using Lead-DBS software. We used bipolar signals constructed from neighbouring macroelectrode recordings, which have been shown to detect locally generated potentials from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). Considering that not all contacts for bipolar LFP construction are in the habenula in this study, as shown in Fig. 2, we cannot exclude the possibility that the activities we measured are contaminated by activities from neighbouring areas through volume conduction. In particular, the human habenula is surrounded by thalamus and adjacent to the posterior end of the medial dorsal thalamus, so we may have captured activities from the medial dorsal thalamus. However, we also showed that those bipolar LFPs from contacts in the habenula tend to have a peak in the theta/alpha band in the power spectra density (PSD); whereas recordings from contacts outside the habenula tend to have extra peak in beta frequency band in the PSD. This supports the habenula origin of the emotional valence related changes in the theta/alpha activities reported here."

      We have also looked at gamma band oscillations or high frequency activities in the recordings. However, we didn’t observe any peak in high frequency band in the average power spectral density, or any consistent difference in the high frequency activities induced by the emotional stimuli (Fig. S1). We suspect that high frequency activities related to MUA/spiking are very local and have very small amplitude, so they are not picked up by the bipolar LFPs measured from contacts with both the contact area for each contact and the between-contact space quite large comparative to the size of the habenula.

      A

      B

      Figure S1. (A) Power spectral density of habenula LFPs across all time period when emotional stimuli were presented. The bold blue line and shadowed region indicates the mean ± SEM across all recorded hemispheres and the thin grey lines show measurements from individual hemispheres. (B) Time-frequency representations of the power response relative to pre-stimulus baseline for different conditions showing habenula gamma and high frequency activity are not modulated by emotional

      References:

      Savitz JB, Bonne O, Nugent AC, Vythilingam M, Bogers W, Charney DS, et al. Habenula volume in post-traumatic stress disorder measured with high-resolution MRI. Biology of Mood & Anxiety Disorders 2011a; 1(1): 7.

      Savitz JB, Nugent AC, Bogers W, Roiser JP, Bain EE, Neumeister A, et al. Habenula volume in bipolar disorder and major depressive disorder: a high-resolution magnetic resonance imaging study. Biological Psychiatry 2011b; 69(4): 336-43.

      Lawson RP, Drevets WC, Roiser JP. Defining the habenula in human neuroimaging studies. NeuroImage 2013; 64: 722-7.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. Journal of Neurophysiology 2017; 117(6): 2140-51.

      Bertone-Cueto NI, Makarova J, Mosqueira A, García-Violini D, Sánchez-Peña R, Herreras O, et al. Volume-Conducted Origin of the Field Potential at the Lateral Habenula. Frontiers in Systems Neuroscience 2019; 13:78.

      Figure 3: the alpha/theta band activity is very transient and not band-limited. Why refer to this as oscillatory? Can you exclude that the TFRs of power reflect the spectral power of ERPs rather than modulations of oscillations? I propose to also calculate the ERPs and perform the TFR of power on those. This might result in a re-interpretation of the early effects in theta/alpha band.

      We agree with the reviewer that the activity increase in the first time window with short latency after the stimuli onset is very transient and not band-limited. This raise the question that whether this is oscillatory or a transient evoked activity. We have now looked at this initial transient activity in different ways: 1.) We quantified the ERP in LFPs locked to the stimuli onset for each emotional valence condition and for each habenula. We investigated whether there was difference in the amplitude or latency of the ERP for different stimuli emotional valence conditions. As showing in the following figure, there is ERP with stimuli onset with a positive peak at 402 ± 27 ms (neutral stimuli), 407 ± 35 ms (positive stimuli), 399 ± 30 ms (negative stimuli). The flowing figure (Fig. 3–figure supplement 1) will be submitted as figure supplement related to Fig. 3. However, there was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli. 2.) We have quantified the pure non-phase-locked (induced only) power spectra by calculating the time-frequency power spectrogram after subtracting the ERP (the time-domain trial average) from time-domain neural signal on each trial (Kalcher and Pfurtscheller, 1995; Cohen and Donner, 2013). This shows very similar results as we reported in the main manuscript, as shown in Fig. 3–figure supplement 2. These further analyses show that even though there were event related potential changes time locked around the stimuli onset, and this ERP did NOT contribute to the initial broad-band activity increase at the early time window shown in plot A-C in Figure 3. The figures of the new analyses and following have now been added in the main text:

      "In addition, we tested whether stimuli-related habenula LFP modulations primarily reflect a modulation of oscillations, which is not phase-locked to stimulus onset, or, alternatively, if they are attributed to evoked event-related potential (ERP). We quantified the ERP for each emotional valence condition for each habenula. There was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli (Fig. 3–figure supplement 1). In addition, when only considering the non phase-locked activity by removing the ERP from the time series before frequency-time decomposition, the emotional valence effect (presented in Fig. 3–figure supplement 2) is very similar to those shown in Fig.3. These additional analyses demonstrated that the emotional valence effect in the LFP signal is more likely to be driven by non-phase-locked (induced only) activity."

      A

      B

      Fig. 3–figure supplement 1. Event-related potential (ERP) in habenula LFP signals in different emotional valence (neutral, positive and negative) conditions. (A) Averaged ERP waveforms across patients for different conditions. (B) Peak latency and amplitude (Mean ± SEM) of the ERP components for different conditions.

      Fig. 3–figure supplement 2. Non-phase-locked activity in different emotional valence (neutral, positive and negative) conditions (N = 18). (A) Time-frequency representation of the power changes relative to pre-stimulus baseline for three conditions. Significant clusters (p < 0.05, non-parametric permutation test) are encircled with a solid black line. (B) Time-frequency representation of the power response difference between negative and positive valence stimuli, showing significant increased activity the theta/alpha band (5-10 Hz) at short latency (100-500 ms) and another increased theta activity (4-7 Hz) at long latencies (2700-3300 ms) with negative stimuli (p < 0.05, non-parametric permutation test). (C) Normalized power of the activities at theta/alpha (5-10 Hz) and theta (4-7 Hz) band over time. Significant difference between the negative and positive valence stimuli is marked by a shadowed bar (p < 0.05, corrected for multiple comparison).

      References:

      Kalcher J, Pfurtscheller G. Discrimination between phase-locked and non-phase-locked event-related EEG activity. Electroencephalography and Clinical Neurophysiology 1995; 94(5): 381-4.

      Cohen MX, Donner TH. Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. Journal of Neurophysiology 2013; 110(12): 2752-63.

      Figure 4D: can you exclude that the frontal activity is not due to saccade artifacts? Only eye blink artifacts were reduced by the ICA approach. Trials with saccades should be identified in the MEG traces and rejected prior to further analysis.

      We understand and appreciate the reviewer’s concern on the source of the activity modulations shown in Fig. 4D. We tried to minimise the eye movement or saccade in the recording by presenting all figures at the centre of the screen, scaling all presented figures to similar size, and presenting a white cross at the centre of the screen preparing the participants for the onset of the stimuli. Despite this, participants my still make eye movements and saccade in the recording. We used ICA to exclude the low frequency large amplitude artefacts which can be related to either eye blink or other large eye movements. However, this may not be able to exclude artefacts related to miniature saccades. As shown in Fig. 4D, on the sensor level, the sensors with significant difference between the negative vs. positive emotional valence condition clustered around frontal cortex, close to the eye area. However, we think this is not dominated by saccades because of the following two reasons:

      1.) The power spectrum of the saccadic spike artifact in MEG is characterized by a broadband peak in the gamma band from roughly 30 to 120 Hz (Yuval-Greenberg et al., 2008; Keren et al., 2010). In this study the activity modulation we observed in the frontal sensors are limited to the theta/alpha frequency band, so it is different from the power spectra of the saccadic spike artefact.

      2.) The source of the saccadic spike artefacts in MEG measurement tend to be localized to the region of the extraocular muscles of both eyes (Carl et al., 2012).We used beamforming source localisation to identify the source of the activity modulation reported in Fig. 4D. This beamforming analysis identified the source to be in the Broadmann area 9 and 10 (shown in Fig. 5). This excludes the possibility that the activity modulation in the sensor level reported in Fig. 4D is due to saccades. In addition, Broadman area 9 and 10, have previously been associated with emotional stimulus processing (Bermpohl et al., 2006), Broadman area 9 in the left hemisphere has also been used as the target for repetitive transcranial magnetic stimulation (rTMS) as a treatment for drug-resistant depression (Cash et al., 2020). The source localisation results, together with previous literature on the function of the identified source area suggest that the activity modulation we observed in the frontal cortex is very likely to be related to emotional stimuli processing.

      References:

      Yuval-Greenberg S, Tomer O, Keren AS, Nelken I, Deouell LY. Transient induced gamma-band response in EEG as a manifestation of miniature saccades. Neuron 2008; 58(3): 429-41.

      Keren AS, Yuval-Greenberg S, Deouell LY. Saccadic spike potentials in gamma-band EEG: characterization, detection and suppression. NeuroImage 2010; 49(3): 2248-63.

      Carl C, Acik A, Konig P, Engel AK, Hipp JF. The saccadic spike artifact in MEG. NeuroImage 2012; 59(2): 1657-67.

      Bermpohl F, Pascual-Leone A, Amedi A, Merabet LB, Fregni F, Gaab N, et al. Attentional modulation of emotional stimulus processing: an fMRI study using emotional expectancy. Human Brain Mapping 2006; 27(8): 662-77.

      Cash RFH, Weigand A, Zalesky A, Siddiqi SH, Downar J, Fitzgerald PB, et al. Using Brain Imaging to Improve Spatial Targeting of Transcranial Magnetic Stimulation for Depression. Biological Psychiatry 2020.

      The coherence modulations in Fig 5 occur quite late in time compared to the power modulations in Fig 3 and 4. When discussing the results (in e.g. the abstract) it reads as if these findings are reflecting the same process. How can the two effect reflect the same process if the timing is so different?

      As the reviewer pointed out correctly, the time window where we observed the coherence modulations happened quite late in time compared to the initial power modulations in the frontal cortex and the habenula (Fig. 4). And there was another increase in the theta band activities in the habenula area even later, at around 3 second after stimuli onset when the emotional figure has already disappeared. Emotional response is composed of a number of factors, two of which are the initial reactivity to an emotional stimulus and the subsequent recovery once the stimulus terminates or ceases to be relevant (Schuyler et al., 2014). We think these neural effects we observed in the three different time windows may reflect different underlying processes. We have discussed this in the ‘Discussion’:

      "These activity changes at different time windows may reflect the different neuropsychological processes underlying emotion perception including identification and appraisal of emotional material, production of affective states, and autonomic response regulation and recovery (Phillips et al., 2003a). The later effects of increased theta activities in the habenula when the stimuli disappeared were also supported by other literature showing that, there can be prolonged effects of negative stimuli in the neural structure involved in emotional processing (Haas et al., 2008; Puccetti et al., 2021). In particular, greater sustained patterns of brain activity in the medial prefrontal cortex when responding to blocks of negative facial expressions was associated with higher scores of neuroticism across participants (Haas et al., 2008). Slower amygdala recovery from negative images also predicts greater trait neuroticism, lower levels of likability of a set of social stimuli (neutral faces), and declined day-to-day psychological wellbeing (Schuyler et al., 2014; Puccetti et al., 2021)."

      References:

      Schuyler BS, Kral TR, Jacquart J, Burghy CA, Weng HY, Perlman DM, et al. Temporal dynamics of emotional responding: amygdala recovery predicts emotional traits. Social Cognitive and Affective Neuroscience 2014; 9(2): 176-81.

      Phillips ML, Drevets WC, Rauch SL, Lane R. Neurobiology of emotion perception I: The neural basis of normal emotion perception. Biological Psychiatry 2003a; 54(5): 504-14.

      Haas BW, Constable RT, Canli T. Stop the sadness: Neuroticism is associated with sustained medial prefrontal cortex response to emotional facial expressions. NeuroImage 2008; 42(1): 385-92.

      Puccetti NA, Schaefer SM, van Reekum CM, Ong AD, Almeida DM, Ryff CD, et al. Linking Amygdala Persistence to Real-World Emotional Experience and Psychological Well-Being. Journal of Neuroscience 2021: JN-RM-1637-20.

      Be explicit on the degrees of freedom in the statistical tests given that one subject was excluded from some of the tests.

      We thank the reviewers for the comment. The number of samples used for each statistics analysis are stated in the title of the figures. We have now also added the degree of freedom in the main text when parametric statistical tests such as t-test or ANOVAs have been used. When permutation tests (which do not have any degrees of freedom associated with it) are used, we have now added the number of samples for the permutation test.

      Reviewer #2 (Public Review):

      In this study, Huang and colleagues recorded local field potentials from the lateral habenula in patients with psychiatric disorders who recently underwent surgery for deep brain stimulation (DBS). The authors combined these invasive measurements with non-invasive whole-head MEG recordings to study functional connectivity between the habenula and cortical areas. Since the lateral habenula is believed to be involved in the processing of emotions, and negative emotions in particular, the authors investigated whether brain activity in this region is related to emotional valence. They presented pictures inducing negative and positive emotions to the patients and found that theta and alpha activity in the habenula and frontal cortex increases when patients experience negative emotions. Functional connectivity between the habenula and the cortex was likewise increased in this band. The authors conclude that theta/alpha oscillations in the habenula-cortex network are involved in the processing of negative emotions in humans.

      Because DBS of the habenula is a new treatment tested in this cohort in the framework of a clinical trial, these are the first data of its kind. Accordingly, they are of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap.

      In terms of community impact, I see the strengths of this paper in basic science rather than the clinical field. The authors demonstrate the involvement of theta oscillations in the habenula-prefrontal cortex network in emotion processing in the human brain. The potential of theta oscillations to serve as a marker in closed-loop DBS, as put forward by the authors, appears less relevant to me at this stage, given that the clinical effects and side-effects of habenula DBS are not known yet.

      We thank the reviewers for the favourable comments about the implication of our study in basic science and about the value of our study in closing a knowledge gap. We agree that further studies would be required to make conclusions about the clinical effects and side-effects of habenula DBS.

      Detailed comments:

      The group-average MEG power spectrum (Fig. 4B) suggests that negative emotions lead to a sustained theta power increase and a similar effect, though possibly masked by a visual ERP, can be seen in the habenula (Fig. 3C). Yet the statistics identify brief elevations of habenula theta power at around 3s (which is very late), a brief elevation of prefrontal power a time 0 or even before (Fig. 4C) and a brief elevation of Habenula-MEG theta coherence around 1 s. It seems possible that this lack of consistency arises from a low signal-to-noise ratio. The data contain only 27 trails per condition on average and are contaminated by artifacts caused by the extension wires.

      With regard to the nature of the activity modulation with short latency after stimuli onset: whether this is an ERP or oscillation? We have now investigated this. In summary, by analysing the ERP and removing the influence of the ERP from the total power spectra, we didn’t observe stimulus emotional valence related modulation in the ERP, and the modulation related to emotional valence in the pure induced (non-phase-locked) power spectra was similar to what we have observed in the total power shown in Fig. 3. Therefore, we argue that the theta/alpha increase with negative emotional stimuli we observed in both habenula and prefrontal cortex 0-500 ms after stimuli onset are not dominated by visual or other ERP.

      With regard to the signal-to-noise ratio from only 27 trials per condition on average per participant: We have tried to clean the data by removing the trials with obvious artefacts characterised by increased measurements in the time domain over 5 times the standard deviation and increased activities across all frequency bands in the frequency domain. After removing the trials with artefacts, we have 27 trials per condition per subject on average. We agree that 27 trials per condition on average is not a high number, and increasing the number of trials would further increase the signal-to-noise ratio. However, our studies with EEG recordings and LFP recordings from externalised patients have shown that 30 trials was enough to identify reduction in the amplitude of post-movement beta oscillations at the beginning of visuomotor adaption in the motor cortex and STN (Tan et al., 2014a; Tan et al., 2014b). These results of motor error related modulation in the post-movement beta have been repeated by other studies from other groups. In Tan et al. 2014b, with simultaneous EEG and STN LFP measurements and a similar number of trials (around 30), we also quantified the time-course of STN-motor cortex coherence during voluntary movements. This pattern has also been repeated in a separate study from another group with around 50 trials per participant (Talakoub et al., 2016). In addition, similar behavioural paradigm (passive figure viewing paradigm) has been used in two previous studies with LFP recordings from STN from different patient groups (Brucke et al., 2007; Huebl et al., 2014). In both studies, a similar number of trials per condition around 27 was used. The authors have identified meaningful activity modulation in the STN by emotional stimuli. Therefore, we think the number of trials per condition was sufficient to identify emotional valence induced difference in the LFPs in the paradigm.

      We agree that the measurement of coherence can be more susceptible to noise and suffer from the reduced signal-to-noise ratio in MEG recording. In Hirschmann et al. 2013, 5 minutes of resting recording and 5 minutes of movement recording from 10 PD patients were used to quantify movement related changes in STN-cortical coherence and how this was modulated by levodopa (Hirschmann et al., 2013). Litvak et al. (2012) have identified movement-related changes in the coherence between STN LFP and motor cortex with recording with simultaneous STN LFP and MEG recordings from 17 PD patients and 20 trials in average per participant per condition (Litvak et al., 2012). With similar methods, van Wijk et al. (2017) used recordings from 9 patients and around on average in 29 trials per hand per condition, and they identified reduced cortico-pallidal coherence in the low-beta decreases during movement (van Wijk et al., 2017). So the trial number per condition participant we used in this study are comparable to previous studies.

      The DBS extension wires do reduce signal-to-noise ratio in the MEG recording. therefore the spatiotemporal Signal Space Separation (tSSS) method (Taulu and Simola, 2006) implemented in the MaxFilter software (Elekta Oy, Helsinki, Finland) has been applied in this study to suppress strong magnetic artifacts caused by extension wires. This method has been proved to work well in de-noising the magnetic artifacts and movement artifacts in MEG data in our previous studies (Cao et al., 2019; Cao et al., 2020). In addition, the beamforming method proposed by several studies (Litvak et al., 2010; Hirschmann et al., 2011; Litvak et al., 2011) has been used in this study. In Litvak et al., 2010, the artifacts caused by DBS extension wires was detailed described and the beamforming was demonstrated to effectively suppress artifacts and thereby enable both localization of cortical sources coherent with the deep brain nucleus. We have now added more details and these references about the data cleaning and the beamforming method in the main text. With the beamforming method, we did observe the standard movement-related modulation in the beta frequency band in the motor cortex with 9 trials of figure pressing movements, shown in the following figure for one patient as an example (Figure 5–figure supplement 1). This suggests that the beamforming method did work well to suppress the artefacts and help to localise the source with a low number of trials. The figure on movement-related modulation in the motor cortex in the MEG signals have now been added as a supplementary figure to demonstrate the effect of the beamforming.

      Figure 5–figure supplement 1. (A) Time-frequency maps of MEG activity for right hand button press at sensor level from one participant (Case 8). (B) DICS beamforming source reconstruction of the areas with movement-related oscillation changes in the range of 12-30 Hz. The peak power was located in the left M1 area, MNI coordinate [-37, -12, 43].

      References:

      Tan H, Jenkinson N, Brown P. Dynamic neural correlates of motor error monitoring and adaptation during trial-to-trial learning. Journal of Neuroscience 2014a; 34(16): 5678-88.

      Tan H, Zavala B, Pogosyan A, Ashkan K, Zrinzo L, Foltynie T, et al. Human subthalamic nucleus in movement error detection and its evaluation during visuomotor adaptation. Journal of Neuroscience 2014b; 34(50): 16744-54.

      Talakoub O, Neagu B, Udupa K, Tsang E, Chen R, Popovic MR, et al. Time-course of coherence in the human basal ganglia during voluntary movements. Scientific Reports 2016; 6: 34930.

      Brucke C, Kupsch A, Schneider GH, Hariz MI, Nuttin B, Kopp U, et al. The subthalamic region is activated during valence-related emotional processing in patients with Parkinson's disease. European Journal of Neuroscience 2007; 26(3): 767-74.

      Huebl J, Spitzer B, Brucke C, Schonecker T, Kupsch A, Alesch F, et al. Oscillatory subthalamic nucleus activity is modulated by dopamine during emotional processing in Parkinson's disease. Cortex 2014; 60: 69-81.

      Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson's disease. NeuroImage 2013; 68: 203-13.

      Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes G, Foltynie T, et al. Movement-related changes in local and long-range synchronization in Parkinson's disease revealed by simultaneous magnetoencephalography and intracranial recordings. Journal of Neuroscience 2012; 32(31): 10541-53.

      van Wijk BCM, Neumann WJ, Schneider GH, Sander TH, Litvak V, Kuhn AA. Low-beta cortico-pallidal coherence decreases during movement and correlates with overall reaction time. NeuroImage 2017; 159: 1-8.

      Taulu S, Simola J. Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements. Physics in Medicine and Biology 2006; 51(7): 1759-68.

      Cao C, Huang P, Wang T, Zhan S, Liu W, Pan Y, et al. Cortico-subthalamic Coherence in a Patient With Dystonia Induced by Chorea-Acanthocytosis: A Case Report. Frontiers in Human Neuroscience 2019; 13: 163.

      Cao C, Li D, Zhan S, Zhang C, Sun B, Litvak V. L-dopa treatment increases oscillatory power in the motor cortex of Parkinson's disease patients. NeuroImage Clinical 2020; 26: 102255.

      Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes GR, Penny WD, et al. Optimized beamforming for simultaneous MEG and intracranial local field potential recordings in deep brain stimulation patients. NeuroImage 2010; 50(4): 1578-88.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson's disease. Brain 2011; 134(Pt 2): 359-74.

      Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Distinct oscillatory STN-cortical loops revealed by simultaneous MEG and local field potential recordings in patients with Parkinson's disease. NeuroImage 2011; 55(3): 1159-68.

      I doubt that the correlation between habenula power and habenula-MEG coherence (Fig. 6C) is informative of emotion processing. First, power and coherence in close-by time windows are likely to to be correlated irrespective of the task/stimuli. Second, if meaningful, one would expect the strongest correlation for the negative condition, as this is the only condition with an increase of theta coherence and a subsequent increase of theta power in the habenula. This, however, does not appear to be the case.

      The authors included the factors valence and arousal in their linear model and found that only valence correlated with electrophysiological effects. I suspect that arousal and valence scores are highly correlated. When fed with informative yet highly correlated variables, the significance of individual input variables becomes difficult to assess in many statistical models. Hence, I am not convinced that valence matters but arousal not.

      For the correlation shown in Fig. 6C, we used a linear mixed-effect modelling (‘fitlme’ in Matlab) with different recorded subjects as random effects to investigate the correlations between the habenula power and habenula-MEG coherence at an earlier window, while considering all trials together. Therefore the reported value in the main text and in the figure (k = 0.2434 ± 0.1031, p = 0.0226, R2 = 0.104) show the within subjects correlation that are consistent across all measured subjects. The correlation is likely to be mediated by emotional valence condition, as negative emotional stimuli tend to be associated with both high habenula-MEG coherence and high theta power in the later time window tend to happen in the trials with.

      The arousal scores are significantly different for the three valence conditions as shown in Fig. 1B. However, the arousal scores and the valence scores are not monotonically correlated, as shown in the following figure (Fig. S2). The emotional neutral figures have the lowest arousal value, but have the valence value sitting between the negative figures and the positive figures. We have now added the following sentence in the main text:

      "This nonlinear and non-monotonic relationship between arousal scores and the emotional valence scores allowed us to differentiate the effect of the valence from arousal."

      Table 2 in the main text show the results of the linear mixed-effect modelling with the neural signal as the dependent variable and the valence and arousal scores as independent variables. Because of the non-linear and non-monotonic relationship between the valence and arousal scores, we think the significance of individual input variables is valid in this statistical model. We have now added a new figure (shown below, Fig. 7) with scatter plots showing the relationship between the electrophysiological signal and the arousal and emotional valence scores separately using Spearman’s partial correlation analysis. In each scatter plot, each dot indicates the average measurement from one participant in one emotional valence condition. As shown in the following figure, the electrophysiological measurements linearly correlated with the valence score, but not with the arousal scores. However, the statistics reported in this figure considered all the dots together. The linear mixed effect modelling taking into account the interdependency of the measurements from the same participant. So the results reported in the main text using linear mixed effect modelling are statistically more valid, but supplementary figure here below illustrate the relationship.

      Figure S2. Averaged valence and arousal ratings (mean ± SD) for figures of the three emotional condition. (B) Scatter plots showing the relationship between arousal and valence scores for each emotional condition for each participant.

      Figure 7. Scatter plots showing how early theta/alpha band power increase in the frontal cortex (A), theta/alpha band frontal cortex-habenula coherence (B) and theta band power increase in habenula stimuli (C) changed with emotional valence (left column) and arousal (right column). Each dot shows the average of one participant in each categorical valence condition, which are also the source data of the multilevel modelling results presented in Table 2. The R and p value in the figure are the results of partial correlation considering all data points together.

      Page 8: "The time-varying coherence was calculated for each trial". This is confusing because coherence quantifies the stability of a phase difference over time, i.e. it is a temporal average, not defined for individual trials. It has also been used to describe the phase difference stability over trials rather than time, and I assume this is the method applied here. Typically, the greatest coherence values coincide with event-related power increases, which is why I am surprised to see maximum coherence at 1s rather than immediately post-stimulus.

      We thank the reviewer for pointing out this incorrect description. As the reviewer pointed out correctly, the method we used describe the phase difference stability over trials rather than time. We have now clarified how coherence was calculated and added more details in the methods:

      "The time-varying cross trial coherence between each MEG sensor and the habenula LFP was first calculated for each emotional valence condition. For this, time-frequency auto- and cross-spectral densities in the theta/alpha frequency band (5-10 Hz) between the habenula LFP and each MEG channel at sensor level were calculated using the wavelet transform-based approach from -2000 to 4000 ms for each trial with 1 Hz steps using the Morlet wavelet and cycle number of 6. Cross-trial coherence spectra for each LFP-MEG channel combination was calculated for each emotional valence condition for each habenula using the function ‘ft_connectivityanalysis’ in Fieldtrip (version 20170628). Stimulus-related changes in coherence were assessed by expressing the time-resolved coherence spectra as a percentage change compared to the average value in the -2000 to -200 ms (pre-stimulus) time window for each frequency."

      In the Morlet wavelet analysis we used here, the cycle number (C) determines the temporal resolution and frequency resolution for each frequency (F). The spectral bandwidth at a given frequency F is equal to 2F/C while the wavelet duration is equal to C/F/pi. We used a cycle number of 6. For theta band activities around 5 Hz, we will have the spectral bandwidth of 25/6 = 1.7 Hz and the wavelet duration of 6/5/pi = 0.38s = 380ms.

      As the reviewer noticed, we observed increased activities across a wide frequency band in both habenula and the prefrontal cortex within 500 ms after stimuli onset. But the increase of cross-trial coherence starts at around 300 ms. The increase of coherence in a time window without increase of power in either of the two structures indicates a phase difference stability across trials in the oscillatory activities from the two regions, and this phase difference stability across trials was not secondary to power increase.

      Reviewer #3 (Public Review):

      This paper describes the oscillatory activity of the habenula using local field potentials, both within the region and, through the use of MEG, in connection to the prefrontal cortex. The characteristics of this activity were found to vary with the emotional valence but not with arousal. Sheding light on this is relevant, because the habenula is a promising target for deep brain stimulation.

      In general, because I am not much on top of the literature on the habenula, I find difficult to judge about the novelty and the impact of this study. What I can say is that I do find the paper is well-written and very clear; and the methods, although quite basic (which is not bad), are sound and rigourous.

      We thank the reviewer for the positive comments about the potential implication of our study and on the methods we used.

      On the less positive side, even though I am aware that in this type of studies it is difficult to have high N, the very low N in this case makes me worry about the robustness and replicability of the results. I'm sure I have missed it and it's specified somewhere, but why is N different for the different figures? Is it because only 8 people had MEG? The number of trials seems also a somewhat low. Therefore, I feel the authors perhaps need to make an effort to make up for the short number of subjects in order to add confidence to the results. I would strongly recommend to bootstrap the statistical analysis and extract non-parametric confidence intervals instead of showing parametric standard errors whenever is appropriate. When doing that, it must be taken into account that each two of the habenula belong to the same person; i.e. one bootstraps the subjects not the habenula.

      We do understand and appreciate the concern of the reviewer on the low sample numbers due to the strict recruitment criteria for this very early stage clinical trial: 9 patients for bilateral habenula LFPs, and 8 patients with good quality MEGs. Some information to justify the number of trials per condition for each participant has been provided in the reply to the Detailed Comments 1 from Reviewer 2. The sample number used in each analysis was included in the figures and in the main text.

      We have used non-parametric cluster-based permutation approach (Maris and Oostenveld, 2007) for all the main results as shown in Fig. 3-5. Once the clusters (time window and frequency band) with significant differences for different emotional valence conditions have been identified, parametric statistical test was applied to the average values of the clusters to show the direction of the difference. These parametric statistics are secondary to the main non-parametric permutation test.

      In addition, the DICS beamforming method was applied to localize cortical sources exhibiting stimuli-related power changes and cortical sources coherent with deep brain LFPs for each subject for positive and negative emotional valence conditions respectively. After source analysis, source statistics over subjects was performed. Non-parametric permutation testing with or without cluster-based correction for multiple comparisons was applied to statistically quantify the differences in cortical power source or coherence source between negative and positive emotional stimuli.

      References:

      Maris E, Oostenveld R. Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 2007; 164(1): 177-90.

      Related to this point, the results in Figure 6 seem quite noisy, because interactions (i.e. coherence) are harder to estimate and N is low. For example, I have to make an effort of optimism to believe that Fig 6A is not just noise, and the result in Fig 6C is also a bit weak and perhaps driven by the blue point at the bottom. My read is that the authors didn't do permutation testing here, and just a parametric linear-mixed effect testing. I believe the authors should embed this into permutation testing to make sure that the extremes are not driving the current p-value.

      We have now quantified the coherence between frontal cortex-habenula and occipital cortex-habenula separately (please see more details in the reply to Reviewer 2 (Recommendations for the authors 6). The new analysis showed that the increase in the theta/alpha band coherence around 1 s after the negative stimuli was only observed between prefrontal cortex-habenula and not between occipital cortex-habenula. This supports the argument that Fig. 6A is not just noise.

    1. Author Response:

      Reviewer #1:

      The authors demonstrate in this study that it is possible to train mice to perform a challenging tactile discrimination task, in a highly controlled manner, in a fully automated setup in which the animals learn to head-fix voluntarily. A number of well described tricks are used to prolong the self-fixation time and thereby obtain enough training time to reach good performance when the decision perceptual decision is difficult. In addition the study establish that this experimental design allows targeted silencing of relatively deep brain areas through a clear skull preparation.

      It has already been demonstrated that mice can perform voluntary head-fixation and can do behavioral tasks in this context. However, this is the first time this methodology is applied to first to a tactile task and second to a task that mice learn is thousands of trials. Another advantage of the present technique is that it is fully automated and allows training without virtually any human intervention.

      The demonstration that optogenetic silencing can be performed in this context is nice but not very surprising as already done in other contexts. Nevertheless it is an interesting application of self head-fixation. The authors should make sure that a maximum of information is available relative to the efficiency of the silencing (fraction of cells silenced) and about its impact on the behavior (does it result or not in a complete impairment?).

      We have improved presentation in various places of the paper to provide more information about the optogenetic manipulation. We added new analysis of the fraction of neurons affected by photostimulation (Figure 8E). We also analyzed the impact on behavioral performance relative to chance performance (Figure S4A and S6). We compared the effect size to prior studies (Figure S4) and we discuss the interpretation of effect size (Discussion, page 22).

      In the power range tested in this study, photostimulation did not reduce performance to chance level (Figure S6). One limitation of the optogenetic workflow is the interpretation of behavioral deficit effect size. We examined this issue in ALM, a brain region from which we have the most extensive data. In previous studies, we have shown that bilateral photoinhibition of ALM results in chance level performance (Li et al 2016, Fig 2b; Gao et al, 2018, Extended Data Fig 6b). Here, mice performance was above chance during photoinhibition of ALM (Figure S4). This difference in effect size likely resulted from incomplete silencing of ALM. The photostimulus intensity used here was much less than those used in previous studies (0.3 vs. 11.9 mW/mm2). In addition, a single virus injection was not sufficient to cover the entire ALM. Thus a partial behavioral effect could be due to incomplete silencing of a brain region, or partial involvement of the brain region in the task. Given this limitation, we caution that the function of a brain region could only be fully deduced in more detailed analysis and together with neurophysiology. The workflow presented here can be used as a discovery platform to quickly identify regions of interest for more detailed neurophysiology analysis. We now better highlight these points in the Discussion.

      Reviewer #2:

      Hao and colleagues developed an automatic system for high-throughput behavioral and optogenetic experiments for mice in home cage settings. The system includes a voluntary head-fixation apparatus and integrated fiber-free optogenetic capabilities. The authors describe in detail the design of the system and the stages for successful automatic training. They perform proof-of-concept experiments to validate their system. The experiments are technically solid and I am convinced that their system will be of interest to some laboratories that perform similar experiments. Despite the large variety of similar automated systems out there, this one may prove to become a popular design.

      The weak side of the work is that it is not particularly novel scientifically. The system is complex but there it is not an innovative technology. The body of the study has too many technical details as if it is a Methodological section of a regular manuscript. There are bits of interesting information scattered around the paper (like the insights about the strategy mice use, which stem from the regression analysis), but these are not developed into any coherent direction that answers outstanding questions. The potential advantages of this system compared to other systems is marginal. In my eyes, the fact that manual training is so similar to the automatic one is not only a positive point. Rather, it signifies that the differences are mainly quantitative (e.g. # of mice a lab can train per day, etc). Thus, even as a methods paper, the lack of qualitative difference between this and other methods weakens it as a potential substrate for novel findings.

      The automated workflow presented here significantly boosts the yield and duration of training to rival and slightly surpass that of manual training for the first time (new Supplemental Table 1). We think this degree of automation is an important technical advance. We show that the workflow can significantly scale up the throughput of optogenetic experiments probing behaviors that require thousands of trials to learn. This enables efficient and systematic mapping of large subcortical structures that are previously difficult to achieve. We better highlight comparisons to previous methods in several key areas in the Supplemental Table 1. We have also strengthened the Discussion (page 20).

      We highlight one line of inquiry enabled by our workflow, a systematic mapping of the cortico-basal- ganglia loops during perceptual decision-making. The striatum is topographically organized. Previous studies examined different subregions of the striatum in different perceptual decision behaviors, making comparisons across studies difficult. The striatum in the mouse brain is ~21.5 mm3 in size (Allen reference brain, (Wang, et al, Cell 2020)). Optogenetic experiments using optical fibers manipulate activity near the fiber tip (approximately 1 mm3). A systematic survey of different striatal domains’ involvement in specific behaviors is currently difficult. In our workflow, individual striatal subregions (~1 mm3, Figure 8) could be rapidly screened through parallel testing. At moderate throughput (15 mice / 2 months), a screen that tiles the entire striatum could be completed in under 12 months with little human effort. To illustrate its feasibility, we tested 3 subregions in the striatum previously implicated in different types of perceptual decision behaviors (Yartsev et al, eLife 2018; Sippy et al, Neuron 2015; Znamenskiy & Zador, Nature 2013), including an additional region in the posterior striatum that do not receive ALM and S1 inputs. The results revealed a hotspot in the dorsolateral striatum that biased tactile-guided decision-making (Figure 8). Our approach thus opens the door to rapid screening of the striatal domains during complex operant behaviors.

      Moreover, by eliminating human intervention, automated training allows quantitative assaying of task learning (Figure 4). Home-cage testing also exposes behavioral signatures of motivation in self-initiated behavior (Figure 6). These observations suggest additional opportunities for inquires of goal-directed behaviors in the context of home-cage testing.

      Reviewer #3:

      In this study, Hao et al. developed an automatized operant box to perform decision-making tasks and optogenetic perturbations without requiring the experimenter's manipulation. For this aim, mice learn to head-fix and to perform a task by themselves. The optogenetic experiment using red-shifted opsins allows manipulation of circuits without the need of an implanted optical fiber. The automation of behavioral tasks in home cages (isolated rodents or in groups) is an intense area of research in neuroscience. The possibility of coupling home cage behavioral analysis with optogenetic manipulation and with complex tasks that require precise positioning of the animal for controlled stimulations (vibrating stimulation, visual …..) is thus of great interest and I commend the authors for their comprehensive dissection of the automated behavioral training setup. Some clarification, reporting of additional behavioral measures and refinement of analyses could improve the impact of this work.

      1) The first part of the paper nicely describes the experimental procedure to automate such a complex task. The procedure is very well described, the important points (e.g. the possibility for the animal to disengage…) are properly highlighted, and the online site allows to download the plans and 3D descriptions of the tools and the procedures. The authors compare task learning in automated versus manual training and show that there are overall very few differences. Whisker trimming reduces performance, indicating that animal used information to make the choice. This part of the work is already impressive. Apart from that, the authors do not consider in their description what could be an essential aspect of experiments in a home-cage, i.e the control of the motivation to perform the task. Mice perform the task (here, engage in the head fixation to obtained reward) when they wish and thus, compared with the manual training, there is no explicit control of the animal motivation. This could have consequence on i) the inter-fixation intervals that become an element of the decision and ii) questioned whether the commitment to the task is always motivated by drinking, or whether there is also a commitment to explore, or to check… This could impact the success in the task (e.g. if the animal is not motivated by water, it can explore…). Adding data analyses (information about the daily water consumption, are the inter-fixation intervals correlated with the success or failure in the last trial …) and even short discussion or introduction of these aspects (see for example Timberlake et al, JEAB 1987 or Rowland et al 2008, Physiol behavior for distinction between close and open economies paradigm) could strengthened the behavioral description.

      We thank the reviewer for these suggestions. We performed additional analyses to examine these issues which led us to include a new section of Results in the revised manuscript (page 13-14 and Figure 6).

      We have added a new Figure 6 showing water consumption and body weight information in home-cage testing. At steady state, a mouse typically consumed ~1mL of water daily (~400 rewarded trials) while maintaining stable body weight. This amount of water consumption was similar to mice engaged in daily manual experiments (Guo et al, Plos ONE 2014). The number of head-fixations per day was correlated with body weight (Figure 6). Since body weight reflects prior water consumption, this indicates different levels of motivation due to thirst, which drives engagement in the task.

      We also examined the inter-fixation-interval. Interestingly, the inter-fixation-interval after an error (which led to no reward) was significantly longer than following a correct trial (Figure 6E). This is inconsistent with error from exploration. Rather it likely reflects a loss of motivation after an error, perhaps due to the loss of an expected reward. We suspect that error trials violated the mice’s expectation of reward, and therefore discouraged the mice, leading to a loss in motivation. Consistent with this interpretation, we also found a significant increase in inter-fixation-intervals shortly after a sensorimotor contingency reversal (Figure 6F), coinciding with an increase in error rate due to the rule change.

      Despite these changes in motivation to engage in the task, the choice behavior in the task was similar. In highly trained mice, task performance was stable despite the body weight change (Figure 6D). Logistic regression analysis of the choice behavior shows that mice maintained the same strategy in their choice behavior (Figure 6G).

      2) In the second part of the work, the authors focus on the description of choice behavior. To characterize it, the authors used a logistic model to predict choices. They suggest that at the beginning of the task the animals biased their current choice by their last choice (parameter A1) and that once the task is learned they alternate according to the current stimulation (parameters S0). The model was a logistic function of the weighted sum of several behavioral and task variables and has 19 parameters (the ß parameters). If the animal only used these two informations, can a model that only takes into account A1 and S0 reproduce the data? If not, this certainly indicates that other informations (even distributed) are necessary; and also indicates individual strategies. Finally, analyses are made by considering trials as a discrete chain (trial n, n+1…). However, the self-head-fixed methodology causes the trials to be organized with more or less time between successive trials depending on motivation (see above). Again, do the authors note differences in performance according to the timing between trials? Could it be a variable in the model?

      We thank the reviewer for these great suggestions. We tested a model that included only choice history A1, tactile stimulus S0, and a constant bias term (β0). This 3-parameter model performed as well as the full model in predicting choice. This indicates that other factors do not contribute significantly to the choice behavior. We have included this result in the revised Figure 4C.

      We next examined whether inter-fixation-interval (i.e. the time elapsed between head-fixations and presumably the motivation to engage in the task) could impact mice’s choice behavior. There are multiple ways inter-fixation-interval could be incorporated into the logistic regression model. For example, it could be modeled as an explicit variable that biases left/right choice, or modulations on existing regressors (i.e. a gain variable that modulates the contribution of specific regressors). Each approach requires assumptions about how motivation affects the behavioral strategy of the mice. Instead, as a first order analysis, we examined whether the logistic regression model could predict choice equally well in trials following short vs. long inter-fixation-intervals. Our logic is that if mice adapted different strategies in different motivational states (reflected in short vs. long inter-fixation- intervals), the predictive power of the model would differ between these conditions. We fit the logistic regression model using trials in their natural sequential order (regardless of the inter-fixation-intervals). The model was then used to predict choice on independent trials. Trials were then sorted by the preceding inter-fixation-intervals. Prediction performance was calculated separately for trials following short vs. long inter-fixation-intervals. We did not find a significant difference in the model prediction performance. The result was similar in early and late stages of task learning (Figure 6G), even though mice used distinct strategies during these periods (Figure 4). These results suggest consistent strategies in the choice behavior. We have included this analysis in the new Figure 6.

      3) The third part described optogenetic manipulations. It is clear that group sizes are small. Nevertheless, if the objective was to show that the method works, the results are convincing. Some experimental details and in particular the choice of the statistical procedure need clarification.

      We have improved the presentation and clarified experimental details of the task, hypotheses for targeting specific brain regions, and statistical procedures.

    1. Author Response:

      Reviewer #1:

      In this ms, Voroslakos et al., describe a customizable and versatile microdrive and head cap system for silicon probe recordings in freely moving rodents (mice and rats). While there are similar designs elsewhere, the added value here is: a) a carefully designed solution to facilitate probe recovery, thus reducing experimental costs and favoring reproducibility; b) flexibility to accommodate several microdrives and additional instrumentation; c) open access design and documentation to favor customization and dissemination. Authors provide detailed description to faccilitate building the system.

      Personally, I found this resource very useful to democratize multi-site recordings, not only for standard silicon probes, but also more novel integrated optoelectrodes and neuropixels. While there are other solutions, this design is quite simple and versatile. A potential caveat is whether it could be perceived as just an upgrade, given some similitudes with previous designs (e.g. Chung et al., Sci Rep 2017 doi: 10.1038/s41598-017-03340-5) and concepts (Headley et al., JNP doi: 10.1152/jn.00955.2014). However, the system presented in this paper provides added value and knowledge-based solutions to make silicon probe recordings more accessible.

      We thank the reviewer for carefully reading our manuscript and providing useful and constructive comments.

      Reviewer #2:

      This manuscript provides an updated guide on the procedures for performing chronic recordings with silicon probes in mice and rats in the lab of the senior author, who is one of the leaders in the use of this experimental method. The new set of procedures relies on metal and plastic 3D printed parts, and represents a major improvement over the older methodology (i.e. Vandecasteele et al. 2012).

      The manuscript is clearly written and the technical instructions (in the Methods section) seem rather detailed. The main concerns I had are as follows.

      We thank the reviewer for carefully reading our manuscript and providing useful and constructive comments.

      1) The present design is an improvement over Chung et al. (the most similar previously published explantable microdrive design, as far as I am aware) in terms of the footprint and travel distance. However, a main disadvantage of the system in its present form is that (apparently) it does not support Neuropixels probes. While such probes might not be suitable for some uses (e.g. to record from large populations in dorsal hippocampus), Neuropixels probes are of considerable interest to many labs.

      Our microdrive and head cap system can also support Neuropixels probes. Since our initial submission, we have implanted a Neuropixels probe in the intermediate hippocampus of a rat using our recoverable, plastic microdrive. At the end of the experiment, the Neuropixels probe was successfully recovered, cleaned, and implanted again in a new rat. In addition, we designed a new arm for our metal microdrive which can support Neuropixels probes (Figure 2) and implanted another rat (Figure 3 and 4). We have also created a video showing how to attach Neuropixels probe to a metal microdrive (Suppl. Video 3).

      Figure 2. Metal microdrive adapter for Neuropixels probe. A Arm design for 64-channel silicon probes. 45o, front, side and top views are shown (from left to right). All dimensions are in mm. B Changing the overall length (from 7.35 mm to 10 mm) and width (from 4 mm to 5.4 mm) of the 64-channel arm makes our metal microdrive compatible with Neuropixels probe. Note, that only three dimensions of the 64-channel arm were modified (red numbers). 45-degree, front, side and top views are shown (from left to right). All dimensions are in mm. C Photograph of the different arm designs of the metal, recoverable microdrive (top shows an arm designed for a 64-ch silicon probe, bottom shows an arm designed for Neuropixels probe).

      Figure 3. Recording of unit firing with Neuropixels probe attached to a metal microdrive in freely moving rat. A Metal microdrive for Neuropixels probe (a – stereotax attachment, b – drive holder, c – metal microdrive, d – Neuropixels probe and e – Neuropixels headstage). B Photo of Neuropixels probe attached to a metal microdrive (a-e same as in A). C Location of probe implantation (Bregma - 4.8 mm, mediolateral + 4.6 mm, 11-degree angle). D High pass filtered traces (1s) from a freely moving rat implanted with Neuropixels probe. Note the single unit activity in the cellular layer of cortex (top) and hippocampus (bottom).

      Figure 4. Implantation of Neuropixels probe in a rat using metal microdrive and rat cap system. A The base of the rat cap is attached to the skull. Reference (ref) and ground (gnd) screws are placed over the cerebellum. Neuropixels probe is mounted on a metal microdrive. The microdrive is held by the drive holder and attached to a stereotax arm using the stereotax attachment. For more details, see video: Neuropixels_attachment.mp4. B Once the probe is inserted to its final depth (left), the base of the microdrive is cemented to the skull (zoomed in photograph on the right). C The surface of the brain is kept wet using saline during probe insertion and during cementing the base of the microdrive. D After the base is cemented, the craniotomy is sealed with bone wax. E Releasing the drive from the drive holder. Once it is released the stereotax arm is moved upwards. F Neuropixels headstage is removed from the male header of the stereotax attachment (soldering joint) and placed on the animals back. G The walls of the cap system are attached to the base. Ground and reference wires are soldered to the probe (not shown). H The male header of the headstage is secured to the walls. The headstage and its cable are oriented to allow easy access to the screw head of the microdrive. Note, that there is enough room for custom connectors inside the rat cap.

      2) The total weight of the mouse implant seems quite high (together with the headstage, I estimate it is >= 4gr). Could the authors provide the exact value, and describe whether this has any impact on the way the animal moves? Also, the authors should describe how the animals are housed (e.g. do they carry the headstage even when not being recorded). The authors say that a mouse can be implanted with more than one microdrive. The authors should clarify whether they actually have an experience with such implants, or is this just a suggestion based on their educated estimate?

      The total weight of the metal microdrive, including the base, body and arm is 0.87 gram. Additional weight is the metabond and dental acrylic cement. The amount of cement that is used during surgery can vary between researchers and the type of surgery. The overall weight of the assembly also depends on the silicon probe with Omnetics connector(s) that is used for the surgery, e.g.: 32-channel micro-LED probe is 1.11g (NeuroLight Technologies LTD.), 64-channel 4-shank probe is 0.96g (ASSY E-1, Cambridge NeuroTech), 64-channel 5-shank probe is 1.05g (A5x12- 16-Buz-Lin-5mm, NeuroNexus Ltd.) and a 128-channel 4-shank probe with integrated Intan chips is 0.94g (P128-5, Diagnostic Biochips). In addition, the overall weight of the entire assembly can change if optic fibers are used in optogenetic studies or if any custom connectors are implanted (e.g., connector and wires for brain stimulation). That is the reason why we reported the overall weight of each system (metal microdrive, mouse cap and rat cap) individually.

      The implanted mice are single housed, and they do not carry the headstage while in the vivarium. During recordings, the headstage is attached and a counterbalanced pulley system ensures that the animal is not carrying the extra weight of the headstage. We have quantitatively compared running speed with traditional and the new head caps in both rats and mice (Fig. 6).

      The small footprint of the metal microdrive enables researchers to perform more than one silicon probe implantation in freely moving mice. For this purpurse, larger mice (>35 g) are selected (Figure 5).

      Figure 5. Metal microdrive enables double silicon probe recordings in freely moving mice. A Intraoperative photograph of double silicon probe implantation. Note that the metal microdrive on the left had been secured to the skull and the second drive is being implanted using the stereotaxic attachment and drive holder. The probe PCBs are placed on the copper mesh. B Photograph focused on the metal microdrives.

      3) There is no information in the results section on the number of implants performed, the duration the animals were implanted, the quality of the recordings obtained, number of successes or failures failures. The figures merely provide examples of one successful recording in a mouse and in a rat. All these details should be provided, along with details of how many probes were reused and how many times (a brief mention of one case, lines 252-253 and 359-360, is not sufficient).

      We have added a Supplementary Table explaining all the details of our implants. We would like to refer the Reviewer to response #1 to Reviewer 1.

      Adapting new technology is challenging. To date, we have extensive experience with the rat cap system only (n=3 users in the lab, n = 25 rats implanted). Two lab members have started to adapt our mouse cap and implanted 3 mice since our submission. We included their maze running behavioral data for comparison between the copper mesh and cap system.

      Prior to the development of the metal microdrive, we have conducted an internal lab survey comparing the hand-made microdrive (Vandecasteele et al., 2012) and our recoverable, plastic microdrive. Six lab members who had extensive experience with both types participated (Figure 6). Our questions were:

      1) On a scale 1-10, how would you compare the plastic, recoverable drive to the Vandecasteele et. al. 2012 one in terms of: a) ease of building a drive, b) size and c) ease of recovery.

      Figure 6. Internal lab survey using recoverable, plastic microdrives. A User feedback based on four criteria: ease of building, ease of implantation, size, ease of recovery. The 3D printed microdrive surpasses the manually built drive (Vandecasteele et. al., 2012) on every parameter except the size. B 24 silicon probes were used with the recoverable plastic microdrive. On average each probe was recovered two times. Out of these 48 recovery attempts 5 failed only. There were 2 total losses during recovery and in three cases different number of shanks broke during the recovery process making the recovery partially successful. One major limitation of reusability is the sudden increase in impedance over time (we have to discard 30% of the successfully recovered probes due this reason). Researchers in our lab spend on average 30 minutes to recover a silicon probe.

      Overall, the success rate of recovery is much higher using a recoverable microdrive system, but the size of the plastic, recoverable microdrive is limits certain experiments. This was one of the main motivations to develop the metal, recoverable microdrive.

      4) In fig. 2, spike waveforms are classified as pyramidal, wide or narrow interneurons. I did not find any description of how this classification was performed.

      We have removed the single cell putative cell types from the manuscript as this issue is not relevant to the current manuscript. Figure 2 has been simplified and a new figure 5 is dedicated to the single cell quantification.

      5) Also in fig. 2, refractory period violations are reported in percent (permille in fact). First, it is not clear how refractory period was defined. Second, such quantification is incorrect in principle: we use refractory period violations to infer the rate of false positives. Yet the relationship between fraction of ISI violations and false positive rate depends on the firing rate of the neuron. For example, 0.1% of ISI violations is quite good for a unit spiking at 10 spikes/s, is so so for a unit spiking at 1 spike/s, and is very bad if the firing rate is 0.1 spike/s (see Hill et al. JNeurosci. 2011 for derivation). Alternatively, the authors can follow an approach described in an old paper by the same lab (Harris et al., JNeuropsysiol. 2000), quantifying the violations in spike autocorrelogram relative to its asymptotic height.

      We have removed this panel from Fig. 2 and dedicated a new figure (Fig. 5) to the single cell quantification. Refractory violations can be used as an alarm for poor cluster quality. Absence of refractory violations alone does not guarantee good separation for the reasons the Reviewer mentioned.

      6) Line 477: the authors write that the probes were mounted on a plastic microdrive. This seems to contradict the key claim of the manuscript (namely that the microdrives were from stainless steel).

      We apologize if this description was not clear in the original manuscript. In the revised version, we have added a table (Suppl. Table 1) explaining all details of each animal subject (species, strain, weight, cap type), type of silicon probe and microdrive used. As we explained in Response 3, our main goal was to test each system individually and once all components have been verified, we combined everything into one surgery.

      The plastic and metal microdrives are based on the same principles. The implantation/recovery tools are also identical in design concepts. Based on our own experience, users dol not recognize any changes in terms of ease of use, ease of implantation and ease of recovery when changing from plastic recoverable microdrives to metal ones. The advantage of metal drives is size reduction, their multiple reusability and stability.

      7) I believe that the work of Luo & Bondy et al. (eLife 2020) and should be references and compared to.

      We reference Luo et. al. (2020) in our revised manuscript. One of the main advantages of using a microdrive system is the ability to move the recording probe inside the brain tissue and sample new sets of neurons. This is not the case in Luo & Bondy et al. (eLife 2020).

    1. Author Response:

      Reviewer #1:

      The authors note how previous studies on myocardial infarction have usually studied individual tissues and not examined the cross talk between tissues and their dysregulation. To address this challenge they have therefore performed, in a mouse model of MI, an integrated analysis of heart, liver, skeletal muscle and adipose tissue responses at 6 and 24 hours. They have then validated their findings at 24 hours in two independent mouse model data sets.

      A major strength is their comprehensive approach. They have used high throughput RNA seq and applied integrative network analysis. They show for multiple genes whether they are up regulated or down regulated in these four tissues at the 6 and 24 hour time points and whether the regulation directions are concordant or opposite and note in particular that for the liver both concordant and opposite effects occur. They identify key tissue specific clusters in each tissue and identify the key genes in each cluster. Finally they use whole body modelling to identify cross talk between tissues.

      A further strength of this paper is the integration of transcriptomic data (differential expression, functional analysis and reporter metabolite analysis). The final strength is the very clear presentation of the findings and their implications such that the reader gets a very clear message and at the same time can go in to more detail if this is their area of research interest.

      There are no major weaknesses. The authors have achieved their aims and the data supports their conclusions.

      This work represents a major advance in both methodology and understanding of a multi tissues approach to the study of the metabolic impact of MI and the underlying up and down regulation of relevant genes.

      The relevance of these findings in human MI will need to be tested and may ultimately have therapeutic implications.

      First and foremost, we would like to thank the reviewer for the positive and encouraging comments. We agree that further research, especially rigorous validation of the findings from this work in humans, is needed and hopefully it can be translated into clinical settings. Moreover, we would like to thank the reviewer for his highlight on our comprehensive approach that we hope can be a framework for future multi-tissue research in disease setting.

      Reviewer #2:

      The authors collected post-myocardial infarction (MI) transcriptome data from a mouse model as well as sham-operated control mice to identify systemic molecular changes in multiple tissues at pathway level. The data were collected at two time points (6 hours and 24 hours post-MI), and several computational systems biology tools were applied to the dataset to identify altered molecular processes. The applied tools vary from very standard tools (eg. enrichment analysis) to advanced methods based on mapping data on biological networks. A specific focus was put on the altered signaling pathways as well as metabolic pathways and metabolites. Identified up-/down-regulated pathways were in agreement with the literature.

      Strengths:

      • One unique aspect of the work is the fact that the transcriptomic data were collected from not only heart, the source tissue for MI, but also from three more tissues (liver, skeletal mouse, adipose). Therefore, molecular alterations in the related tissues were also able to be monitored and discussed comparatively. The introduced transcriptomic dataset has a high re-use potential by other researchers in the field since coverage of responses by four tissues at two different time points makes it unique.

      • Correlation-based coexpression networks were created for all four tissues, and some of the clusters in these networks were shown to be tissue-specific clusters, which nicely validates both the experimental and computational approach in the paper.

      • The results were validated by using independent transcriptomic datasets available in the literature. The authors showed that there is a high overlap between their dataset and the literature datasets in terms of identified differentially expressed genes and enriched pathways. This additional validation strengthens the results reported in the manuscript.

      • Use of a variety of computational approaches and showing that they point to similar or complementary molecular mechanisms increase the impact of the paper. The employed computational tools include not only information-extraction methods such as enrichment, coexpression networks, reporter metabolites, but also predictive methods based on modelling. The authors construct a multi-tissue genome-scale metabolic network covering all four tissues of interest in the study, and they show that this model can correctly predict some major post-MI changes in the metabolism. It is interesting to see that two completely different computational approaches (constraint-based metabolic modeling versus information-extraction based approaches) point to same/similar molecular mechanisms.

      We would like to thank the reviewer for providing positive comments and a comprehensive summary of our work. We also really appreciate the constructive comments from the reviewer to improve our work.

      Weaknesses:

      • Regarding predictions made by multi-tissue metabolic network modeling, the control case fluxes were predicted by maximizing the rate of lipid droplet accumulation in the adipose tissue. Although there is an agreement between the model predictions and the results obtained by other bioinformatics tools used in the study as well as literature information, it looks rather oversimplification to assume that all other three tissues are programmed to serve for maximum fat production in adipose tissue. This should be further elaborated by the authors.

      We would like to thank the reviewer for the comment and we agree that there is a simplification of the situation in the modeling. However, we would like also to emphasize that the model has been carefully constrained with the dietary composition as well as the tissue specific resting energy expenditure. In our opinion, these constraints have already included a great part of the metabolic activity and satisfied the basic metabolic needs of the mice. The rest of the energy in the diet could be either used as physical activity (energy production in muscle) or stored as fat (lipid droplet accumulation in adipose tissue), and in our analysis, we assumed the latter as we think it is more realistic in this study as mice in the cage might have very little physical activities. We added a clarification for this in the revised manuscript as follows,

      ‘To simulate the metabolic flux distribution in the sham-operated mice, we set the lipid droplet accumulation reaction in adipose tissue (m3_Adipose_LD_pool) as the objective function as we assume the energy additional to the resting energy expenditure will be mostly stored as fat rather than used by the muscle for physical activities because mice raised in the cages might have very little exercise. Then, we used parsimonious FBA to calculate the flux distribution.’

      Reviewer #3:

      In the manuscript, "Integrative transcriptomic analysis of tissue-specific metabolic crosstalk after myrocardial infarction" by Arif et al., the authors describe analyses of transcriptomes of +/- myocardial infarction (MI) mice. The study is useful and reports interesting results. These results could be of interest to further develop cellular insight in effects and treatments for MI. However, I do not find any methodological advances here. The manuscript appears to be a repository of transcriptomics analyses. All the techniques used have been tried and applied to other scientific problems. The authors have presented differential expression analysis, followed by GSEA, and then they perform different network analyses - co-expression networks, reporter analyses, multi-tissue model, etc.

      My main issues are that the authors do too many different analyses but neither of them get sufficient light in the paper. Also no other independent quantitative evidence is shown in support of results of their analyses. Further, validation was done the same way the pipeline was built. This makes their results comes across as circular. For e.g. when validating metabolic models of cells built using transcriptomic data, CRISPR-Cas9 essentiality screens are used. Here, they basically repeated the same analyses on the same transcriptome from a different experiment it appears.

      First of all, we would like to thank the reviewer for the positive summary of our research. We agree that this study can be useful to be explored further, especially by validating it in human. We also would like to thank you for the constructive comments. We agree that we presented multiple transcriptomics analyses that have been used before. Apart from understanding the metabolic effect of MI in multiple tissues (which is unique as of now), our secondary goal is to propose a novel integrative framework for analyzing multi-tissue transcriptomics data based on the available techniques. We would like to emphasize that, even though the singular analyses were not novel, the integrative analysis in multi-tissue and disease setting both at transcriptomic and metabolic crosstalk level is a strong novelty of this study. This required employing not only state-of-art network analyses but also reconstruction of multi-tissue models through new methods that enable joint modeling of the metabolic interactions within and between tissues.

      As this study is unique (as of now), we tried our best to validate it with other data with similar settings (from a tissue and we found only transcriptomics data) and run our pipeline to validate and strengthen our findings. Moreover, we also recognized the limitation that all the results presented in this study are purely based on transcriptomics data (as stated in the “Discussion” section of the manuscript). More experiments, such as with metabolomics and proteomics data, are in our pipeline to complement the results from the current study. In summary, we recognized the reviewer’s concerns and we would like to address it in our future studies.

    1. Author Response:

      Evaluation Summary:

      The paper describes an algorithm that combines epidemiological and sequence data to provide a rapid assessment of the probability of healthcare-associated infections among hospital onset SARS-CoV-2 infections, that also may be associated with outbreak events. There is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals. It is questionable to what extent the tool presented is generalisable to medical facilities outside of the specific data rich settings considered here, or if the tool is useful for prospective analyses. This study would be of interest to specialists working in hospital infection prevention, with more limited further interest.

      We thank eLife for the commentary on our work. We agree that there is a need for robust prospective evaluation of routine viral sequencing of SARS-CoV-2 for Infection Prevention and Control and of this tool specifically. Our research group is conducting such work within a multi- centre prospective study that is currently ongoing https://clinicaltrials.gov/ct2/show/NCT04405934, https://doi.org/10.1101/2021.04.13.21255342.

      Reviewer #1 (Public Review):

      -In the present paper the authors have attempted to develop a novel statistical method and sequence reporting tool that combines epidemiological and sequence data to provide a rapid assessment of the probability of HCAI among HOCI cases (defined as first positive test >48 hours following admission) and to identify infections that could plausibly constitute outbreak events.

      -As healthcare-associated infections in hospitals present a significant health risk to both vulnerable patients and healthcare workers, significant improvements to provide a rapid assessment of the probability of HCAI among HOCI cases is of utmost importance in a pandemic setting.

      -The strength of the paper is that they have successfully used a large number of virus sequence data from two UK cities with selected hospitals and developed a statistical method to bring these together with classical epidemiological data, which has resulted in a sequence reporting tool (SRT) that was evaluated in relation to:

      -The IPC classification system recommended by PHE,

      -The PHE definition of healthcare-associated COVID-19 outbreaks (using a 2 SNP threshold).

      -They show the added value of combining the two systems. Obviously, this can only work prospectively in a setting like in the UK, where indeed a system like the COVID-19 Genomics (COG) UK initiative is effectively in place. They conclude that through their retrospective application to clinical datasets, to have demonstrated that the methodology is able to provide confirmatory evidence for most PHE-defined definite and probable HCAIs and provide further information regarding indeterminate HCAIs. Therefor, the SRT may allow IPC teams to optimise their use of resources on areas with likely nosocomial acquisition events.

      -The acquisition of the extensive prospective datasets necessary to use the system requires a non-negligible investment that is possible in a setting in which sequencing routine and phylogenetic analyses can be carried out in real time. The added value of the methodology should eventually justify the investment.

      We thank the reviewer for their summary and commentary on our work. We agree that full evaluation of the use of viral sequencing for clinical practice requires health economic analysis of the associated costs relative to potential gains, and this is planned within our ongoing research program on this topic.

      Reviewer #2 (Public Review):

      Since early 2020, the SARS-CoV-2 pandemic has presented numerous challenges to healthcare facilities around the world. Given the highly transmissible nature of SARS-CoV-2 virus, and the confined nature of most hospital settings, hospital acquired infections with SARS-CoV-2 are a frequent occurrence and pose major challenges for hospital infection prevention teams. The increasing use of genomic epidemiology, facilitated by cheaper/faster genetic sequencing tools and user-friendly algorithms for data analysis, creates new opportunities for using virus sequencing to track virus spread in healthcare facilities. While opportunities are increasing, there remain two important bottlenecks to meaningful and widespread use of genomic epidemiology in well-resourced healthcare settings - 1. the turnaround time from sample collection to delivery of sequenced and analysed result; 2. a lack of training among many infection prevention personnel in interpreting genomic epidemiology output.

      The study by Stirrup et al tries to alleviate these issues through the development of an algorithm that synthesises inferences from virus genetic sequences and hospital epidemiological data to provide easy to interpret information about whether or not there is likely to be ongoing virus transmission within a medical facility. In general, these kinds of approaches are highly worthwhile and can have important translational value as they facilitate the use of powerful new technologies without necessarily requiring extensive professional training to interpret the results. Indeed, there is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals.

      In this study, the authors describe their new algorithm and apply it in two retrospective cases to evaluate its potential value to provide valuable information to infection control teams. While it seems clear that the algorithm reliably detects nosocomial transmission in situations where there are obvious hospital outbreaks, it is much less clear that it performs meaningfully in situations where nosocomial transmission is more questionable. To this end, it is not clear if the algorithm provides useful or meaningful information that would help to reduce the burden of hospital acquired SARS-CoV-2 infections. Towards the end of the discussion section, the authors mention that analyses on the utility of the algorithm in prospective use cases were ongoing from late 2020 to early 2021. These analyses will provide essential information on the value of this tool.

      While the development of these sorts of tools is important, it is unclear from this study if the tool has value in prospective use or if it would be useful in settings where virus genetic sequencing is less frequent and/or slower than the retrospective use cases considered here. Additionally, in many infection prevention scenarios the existence of an outbreak is clear but tracing the routes of transmission is the primary object of investigation. Because the algorithm does not include phylogenetic information infection tracing potential transmission routes is not possible.

      We thank the Reviewer for their commentary on our work. Our ongoing prospective study on implementation of the reporting tool includes intervention phases both with a ‘rapid’ target turnaround of 48 hours from sampling and with a ‘slow’ target turnaround of 5-10 days, and this will generate data on the relative utility of viral sequencing within these timeframes. We acknowledge that the reporting tool developed does not evaluate evidence of direct transmission between case pairs, although it should also be noted that phylogenetic investigation alone cannot be used to confidently infer direct transmission linkage for SARS-CoV-2. We feel that the algorithm and report format can flag potential transmission routes to IPC teams, through the identification of close sequence matches within the hospital as a whole and highlighting of any matching previous ward locations (although the latter is not used in the probability calculations).

    1. Author Response:

      Reviewer #3 (Public Review):

      About 30 million years ago the ancestors of Old World primates lost the ability to produce the glycan a-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. The current study builds upon previous work by the authors showing (i) that the presence of a-gal expressing bacteria in ggta1 deficient mice led to production of antibodies capable of clearance of malaria-causing plasmodia carrying a-gal (Yilmaz et al., 2014), and (ii) that ggta1 deficiency is associated with increased resistance to sepsis via the enhancement of IgG effector function (Sigh et al., 2021). Here they expand on these findings to show that ggta1 deletion in mice is associated with altered composition of the gut microbiome due to the action of IgA targeting of a-Gal expressing bacteria. In addition, they show that the absence of a-gal results in a microbiome that is less pathogenic (i.e., less likely to induce sepsis in their experimental model). Although some aspects of the work are not very novel (e.g., the fact that ggta1 is associated with a remodeled microbiome had already been shown in their previous publications) the work does provide additional insights into the pleiotropic role of ggta1 in immune function, susceptibility to sepsis, and eventual fitness advantage. The work is extremely well done and all conclusions are supported by solid data. Indeed, I felt that the authors were reading my mind every step of the way. Each time I questioned one of the conclusions the next paragraph would address that exact concern. There are, however, a few points that I think would deserve additional clarification.

      1 - I was a little surprised that they found no difference in the microbiome of F2 mice between a-gal deficient and wild-type mice. Although I understand that this might be due to antibodies received by the mom, the fact that the divergence in only seen in F3 to F5 would also be compatible with drift and not necessarily a genotype-driven phenotype. Are the microbiome differences detected in F3-F5 overlapping to those observed at F0? If the original differences were controlled by host genetics - the hypothesis being tested - we would expect to see some convergent (at least at the level of specific taxa)

      We agree essentially with the comment: “… would also be compatible with drift and not necessarily a genotype-driven phenotype” and have addressed this issue by adding the following statement in the Discussion section:

      “On the basis of this observation alone (Figure 1), one cannot exclude the observed divergence in the microbiota bacterial population frequencies of wild type vs. Ggta1-deleted mice (Figure 1) from being a stochastic event. However, the observation that these changes occur via an Ig-dependent mechanism that differs in wild type vs. Ggta1- deleted mice (Figure 3) does support that loss of αGal contributes critically to shape the microbiota composition of Ggta1- deficient mice.”

      We have previously shown that homogenization of the microbiota occurs between the littermates in the F2 generation (Singh et al., 2021). Having confirmed this finding in this manuscript (Figure 1C, Figure 3-figure supplement 7A-B), we find that the effect of the genotype and Ig is seen only from the F3 generation onwards (Figure 1D-F, Figure 3). Presumably, the inability of F1 Ggta1+/- mothers to produce anti-αGal antibodies accounts for the absence of overt shaping of the F2 microbiota. In these experiments, anti-αGal antibodies can only be generated from αGal-deficient F2 Ggta1-/- mice, being vertically transferred and shaping the microbiota from F3 Ggta1-/- mice onwards. We propose that the differences in the microbiota composition of the two F3 genotypes onwards are driven by a cumulative effect of maternal anti-αGal antibodies over the offspring microbiota composition.

      2 - I was really surprised that ggta1 deficient mice lacking a functional adaptive immune system (Figure S8) were equally resistant to systemic infection with the cecal inoculum isolated from ggta1 deficient mice. In the previous work they show that the increases resistance to sepsis comes from increases effector function of IgG. If that is the case, how come mice not having an adaptive system (hence no IgG) are equally protected? Is the pathogenicity of the microbiome of ggta1 deficient mice that reduced? It seems unlikely. More generally, I would like to have seen a better discussion about how these new findings connect to their past work. In the context of increased resistance to sepsis what seems to be more important - the remodeling of the microbiome by IgA or the increased effector function of IgG?

      The data reported in our manuscript does indeed support the conclusion that shaping of the microbiota composition of Ggta1-deficient mice is associated with an overall reduction of the microbiome pathogenicity. This finding is in keeping with host-microbe commensal interactions not being hard- wired but instead oscillating from pathogenic to symbiotic (Ayres, 2016; Vonaesch et al., 2018). Our findings suggest that the loss of Ggta1 function can modify the nature of host-microbiota interactions, through a mechanism whereby the absence of host αGal and the emergence of antibodies targeting this glycan in microbes, shapes and reduces the microbiome pathogenicity.

      We have shown that loss of αGal can enhance resistance to bacterial sepsis via a mechanism that increases IgG effector function (Singh et al., 2021). This was demonstrated by systemically infecting Ggta1-deficient mice with a “non-shaped” microbiota inoculum, isolated from Ggta1-deficient mice lacking adaptive immunity (Rag2-/-Ggta1-/- mice). As discussed in the manuscript “the gut microbiota of Rag2-/-Ggta1-/- mice, lacking adaptive immunity, is highly enriched in pathobionts such as Proteobacteria, including Helicobacter (Singh et al., 2021)”. Under these experimental conditions, resistance to infection is IgG dependent, explaining why modulation of IgG effector function by αGal impacts on the outcome of sepsis.

      In the current manuscript we describe another survival advantage against bacterial sepsis associated with Ggta1 deletion in mice. Namely, antibodies generated by Ggta1-deficient mice can shape and reduce the microbiota pathogenicity. This was demonstrated by infecting systemically Ggta1-deficient mice lacking adaptive immunity (Rag2-/-Ggta1-/- mice) with a “shaped- microbiota” inoculum isolated from Ggta1-deficient mice. While the mechanism underlying microbiota shaping is antibody-dependent, the effector mechanism conferring resistance against the shaped microbiota acts irrespectively of adaptive immunity, including IgG. This conclusion is supported by the observation that systemic infection by the shaped microbiota (isolated from Ggta1-deficient mice) failed to induce sepsis in Rag2-/-Ggta1-/- mice, which was not the case upon systemic infection with a non-shaped microbiota (isolated from Rag2-/-Ggta1-/- mice). We conclude that Ggta1 deletion in mice increases resistance to bacterial sepsis via two interrelated antibody-dependent mechanisms: i) Increased IgG effector function (Singh et al., 2021) and ii) Antibody shaping and reduction of microbiota pathogenicity (current manuscript). To what extent these two traits are related remains to be established.

      It is possible that similarly to what was demonstrated for IgG (Singh et al., 2021), the absence of αGal from glycan structures in other Ig isotypes, including IgA, might modify their effector function. We do not yet know if this is the case, as in our manuscript, what we find is an altered antibody response targeting immunogenic bacteria in the microbiota of Ggta1-deficient mice. This is associated with modulation of the microbiota bacterial composition, i.e. antibody shaping of the microbiota, and with a reduction of the microbiome pathogenicity. The latter explains why the Ggta1-deficient mice do not rely on circulating antibodies to prevent the development of sepsis upon systemic infection by bacteria emanating for their own “shaped” microbiota.

    1. Author Response:

      Reviewer #1 (Public Review):

      Facial muscles control the execution of essential tasks like eating, drinking, breathing and (in most mammals) tactile exploration. The activity of motor neurons targeting different muscles are coordinated by premotor regions distributed throughout brainstem. The precise identity of these cells and regions in adults is presently unclear, largely due to technical challenges. In the current work, Takaoh and colleagues develop an elegant strategy to label premotor neurons that target select muscles and register these cells on a common digital atlas. Their work confirms and also extends previous studies in neonates and provides a useful resource for the field.

      We thank Reviewer 1 for the positive evaluation.

      Reviewer #2 (Public Review):

      The authors describe a variant of retrograde monosynaptic rabies tracing from skeletal muscle. They make use of AAV2-retro-Cre to infect brainstem motoneurons projecting to muscles involved in regulation of orofacial movements (whisking, genioglossus, masseter motoneurons). The strategy that worked most efficiently and with specificity was to inject AAV2-retro-Cre intramuscularly at P17, followed 3 weeks thereafter by central injection of Cre-dependent AAVs expressing TVA and oG, and 2 weeks thereafter followed by central injection of EnvA(M21)-ΔG-RV-GFP. Five days after this final injection, experiments were terminated to analyse the distribution of premotor neurons. This allowed the authors to reconstruct and compare the distribution of premotor neurons to the whisking (lateral 7N), tongue protruding genioglossus (12N), and jaw-closing masseter (5N) motoneurons. To do so, they used the Allen Brain Atlas as a reference for 3D reconstruction, into which they integrated all data. Notably, the authors found that for all three injection types, the highest density of neurons was found in the IRt and PCRt, but the precise peak of highest density was consistently distinct for the three different injection types. The peak for whisker premotor neurons was most caudal-ventral, for masseter premotor neurons most rostro-dorsal, and jaw-closing genioglossal premotor neurons in between these. The authors also make use of the strong expression of fluorescent proteins through rabies virus to analyse collateralization to other motor nuclei. Interestingly, they found cross-talk to other motor nuclei in selective patterns, supporting a model whereby some premotor neurons to one brainstem motor pool also interact with other output circuits, perhaps to coordinate orofacial behaviors. Using a split-Cre retrograde approach from motor nuclei, dual-projecting premotor neurons were identified to be located in dorsal IRt and SupV.

      This is a high-quality study making use of several methods not previously brought together in one study. Particularly interesting is the 3-way virus strategy in wild-type mice allowing visualization of premotor neurons in the adult. Second, alignment in a common reference brain is also very useful. And finally, the beginning of understanding dynamics of premotor circuit distribution between development and adult is also a value of this paper. Overall, the study is very interesting for the field.

      We thank Reviewer 2 for the positive evaluation.

      Reviewer #3 (Public Review):

      Orofacial actions show exquisite coordination among many muscles, yet the pools of motor neurons exciting each of these muscles is specific to that muscle. The coordination of activity across muscles therefore relies on circuits of premotor neurons that excite the motor neurons. Work by the authors and others has produced major progress in delineating these complex premotor circuits. Recent work using transsynaptic viral tracing has overcome limitations associated with traditional retrograde tracing methods, such as a lack of adequate specificity. However, these transsynaptic viral methods have been unsuccessful in animals older than approximately postnatal day 8 (P8). This is a problem because circuits continue to develop far beyond P8 in mice. Here, the authors overcome this limitation by introducing a novel viral transsynaptic tracing method that can be applied in adult mice.

      The authors apply their method to trace premotor circuits for whisking, licking, and jaw movements. They align their anatomical data to the Allen Mouse Brain Common Coordinate Framework and make it available with the manuscript, greatly facilitating its quantitative use by other laboratories. The authors find premotor circuits in adult mice that are almost entirely consistent with results from younger mice, with some important exceptions that they highlight and discuss. The authors quantify overlap of premotor circuits for whisking, licking and jaw movements and discuss the implications of interactions among these circuits.

      The experiments and analysis are carefully performed, and the results put into proper context. Overall, this is a straightforward and valuable contribution to our knowledge of the premotor circuits that coordinate orofacial behaviors. It will be of wide interest to neuroscientists.

      Suggestions:

      -The methods applied in neonatal mice (Takatoh et al. 2013; Stanek et al. 2014), while obviously different, are similar enough that it may be worth including discussion of any possible ways that differences between the neonatal and adult results could be due to methods, rather than age. I defer to the authors about whether such discussion is worthwhile, but readers may benefit from knowing what was considered.

      Now we added the technical considerations that may cause the difference in the tracing patterns: Line 505-517.

      -Spatial correlation in Figure 5C. To interpret this properly it's important to know the degree of smoothing. I could not find this in the relevant methods section describing the kernel density estimation or elsewhere.

      Same as the above: The cells detected in each mouse were first registered into the standard three-dimensional brain model. The (x, y, z) coordinates of each cell were then extracted, and the multivariate kernel smoothing density estimation was applied (bandwidth = 1). The resulting kernel density estimation was then vectorized, and the cosine similarity between any two of the mice were calculated to form the correlogram.

    1. Author Response:

      Reviewer #3 (Public Review):

      1) The authors seem to assume a somewhat random sample throughout Washington state. They state that given a low sampling proportion they do not expect to have captured infection pairs, which seems reasonable. However, they then go onto assume that their sample is primarily comprised of samples from long, successful transmission chains. This is a reasonable assumption if there is no major difference in accessibility of samples from long transmission chains and shorter ones (for example, decreased access to healthcare). Could this impact the assumption of sampling primarily from long transmission chains? It seems from the data collected in this outbreak that this was not the case for mumps in Washington but addressing this assumption clearly (and potential ways to interrogate it) could make their methodology more applicable to other pathogen studies.

      2) There are many examples of phylogenetic analyses that have led to conclusions about pathogen sources and sinks that were later shown to be wrong because of oversampling or other sampling biases. The authors address unequal sampling between clades, but additional contextualization of the problem and how this approach is different may help strengthen the methodology presented in the paper.

      We thank the reviewer for these important points. We have attempted to address these by including an additional paragraph about different types of sampling and their impacts on phylodynamic studies.

      We agree that this is a helpful addition, and have added a new paragraph devoted to a discussion of sampling bias to the discussion on lines 458-484. This paragraph reads:

      “Sampling bias presents a persistent problem for phylodynamic studies that can complicate inference of source-sink dynamics (De Maio et al., 2015; Dudas et al., 2018; Frost et al., 2015; Kühnert et al., 2011; Lemey et al., 2020; Stack et al., 2010). Sampling bias can arise from unequal case detection or from curating a dataset that poorly represents the underlying outbreak. Washington State uses a passive surveillance system for mumps detection and case acquisition, which is known to result in underreporting. Because the WA Department of Health did not perform active mumps surveillance, it is difficult to assess whether different epidemiologic groups have different likelihoods of being sampled. Marshallese individuals are less likely to seek healthcare (Towne et al., 2020), which may have resulted in particularly high rates of underreporting in this group. If the number of cases within the Marshallese community were in fact higher than reported, this would increase the magnitude of the patterns we describe, making our estimates conservative. Given a distribution of cases, composing a dataset for analysis also requires sampling decisions. Uniform sampling regimes in which sampling probability is equal across groups have been shown to perform well for source- sink inferences (Hall et al., 2016). By selecting sequences that matched the overall attributes of the outbreak, including a near 50:50 split between Marshallese and non- Marshallese cases, we adhere to this recommendation. We then specifically employed structured coalescent approaches which have been shown to be robust to sampling differences (Dudas et al., 2018; Müller et al., 2018; Vaughan et al., 2014), rather than using other common approaches that treat sampling intensity as informative of population size (Lemey et al., 2009). Within this framework, we further explore the possibility that unequal sampling within Washington clades could skew internal node reconstruction by forcing the sampling within each Washington clade to be equal between Marshallese and non-Marshallese tips. In doing so, differences within each clade must necessarily be driven by differences in transmission dynamics, rather than sampling. By combining careful sample selection with overlapping approaches to evaluate sampling bias, we were able to mitigate concerns that our source-sink reconstructions are driven by sampling artifacts.”

      3) The authors present compelling evidence that the mumps outbreak in Washington state was sustained by the Marshallese community, and state that mumps did not transmit efficiently among the general Washington populace. That said, there were several other mumps outbreaks in the United States in the same 2016-2017 time period. Was there something different about Washington state that prevented mumps transmission outside of the Marshallese community? Were there no other close-knit communities (universities, prisons, other cultural communities, etc.) affected? It just seems surprising that the Marshallese community was the only community sustaining transmission at a time where many different types of communities were affected across the United States.

      We thank the reviewers and editor for this comment, and agree that further contextualization would be helpful. We did not make it clear in the initial submission that in 2016/2017, the vast majority of mumps outbreaks in the US were associated with either universities or ethnic communities. We have re-organized a few paragraphs in the discussion section and added information about other 2016/2017 outbreaks. This new paragraph is on lines 499-519, and reads:

      “Our finding that most introductions sparked short transmission chains suggests that mumps did not transmit efficiently among the general Washington populace. We suspect that more diffuse contact patterns may help explain this. Mumps has historically caused outbreaks in communities with strong, interconnected contact patterns (Barskey et al., 2012; Fields et al., 2019; Nelson et al., 2013), and in dense housing environments (Snijders et al., 2012), highlighted most recently by outbreaks in US detention centers (Lo et al., 2021). In 2016, most outbreaks in the US were associated with university settings (Albertson et al., 2016; Bonwitt et al., 2017; Donahue et al., 2017; Golwalkar et al., 2018; Shah et al., 2018; Wohl et al., 2020), including a separate, smaller outbreak in Washington State associated with Greek housing (Bonwitt et al., 2017). Outside of university settings, other outbreaks in 2016 were reported within close-knit ethnic communities (Fields et al., 2019; Marx et al., 2018). We speculate that while waning immunity may promote outbreaks by increasing susceptibility among young adults, outbreaks in younger age groups may be possible in sufficiently high-contact settings. Provision of an outbreak dose of mumps-containing vaccine to high-risk groups may therefore be especially effective for limiting mumps transmission in future outbreaks. Others have reported success in using outbreak dose mumps vaccinations to reduce mumps transmission on college campuses (Cardemil et al., 2017; Shah et al., 2018) and in the US army (Arday et al., 1989; Eick et al., 2008; Green, 2006; Kelley et al., 1991), and the CDC currently recommends providing outbreak vaccine doses to individuals with increased risk due to an outbreak (Marlow et al., 2020). Future work to quantify the interplay between contact rates and vaccine-induced immunity among different age and risk groups should be used to guide updated vaccine recommendations.”

      We also amended lines 42-46 in the introduction to highlight that most other US outbreaks in 2016/2017 were university-associated:

      “Like with other recent mumps outbreaks, most Washington cases in 2016/17 were vaccinated. Unusually though, while most US outbreaks in 2016/2017 were associated with university settings (Albertson et al., 2016; Bonwitt et al., 2017; Donahue et al., 2017; Golwalkar et al., 2018; Shah et al., 2018; Wohl et al., 2020), incidence in Washington was highest among children aged 10-18 years, younger than expected given waning immunity.”

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Schrieber et al., explores whether inbreeding affects floral attractiveness to pollinators with additional factors of sex and origin in play, in male and female plants of Silene latifolia. The authors use a combination of spatial sampling, floral volatiles, flower color, and floral rewards coupled with the response of a specialized pollinator to these traits. Their results show that females are more affected by inbreeding and in general inbreeding negatively impacts the "composite nature" of floral traits. The manuscript is well written, the experiments are detailed and quite elaborate. For example., the methodology for flower color estimation is the most detailed effort in this area that I can remember. All the experiments in the manuscript show meticulous planning, with extensive data collection addressing minute details, including the statistics used. However, I do have some concerns that need to be addressed.

      Core strengths: Detailed experimental design, elaborate data collection methods, well-defined methodology that is easy to follow. There is a logical flow for the experiments, and no details are missing in most of the experiemnts.

      Weaknesses: A recent study has addressed some of the questions detailed in the manuscript. So, introduction needs to be tweaked to reflect this.

      Thank you very much for bringing this excellent article to our attention! We adjusted the writing in the introduction and the discussion accordingly. Please consider that this article was first published at the 15th of January 21, while our manuscript was submitted at the 9th of January. Hence, we were not able to account for this study in the first submission. Introduction pp 4-5, ll 48-54: “Although in a few cases inbreeding has been shown to alter single components of flower attractiveness (Ivey and Carr, 2005; Ferrari et al., 2006; Haber et al., 2019), insight into syndrome-wide effects is restricted to a single study. Kariyat et al. (2021) demonstrated that inbred Solanum carolinense L. display reduced flower size, pollen and scent production and receive fewer visits from diurnal generalists. It is necessary to broaden such integrated methodological approaches to other plant-pollinator systems (e.g., nocturnal specialist pollinators) and further floral traits (i.e., flower colour).” Discussion p 19, ll 535-542: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change.”

      Some details and controls are missing in floral scent estimation. Flower age, a pesticide treatment of plants that could affect chemistry..needs to be better refined.

      We clarified this issue at different occasions in the methods section. Previous studies (and our study) on S. latifolia have shown no clear differences in the quality of floral scent between sexes. However, one study found higher total emission of VOC in males, while others found no differences. Hence, females produce no specific VOC that are used as oviposition cues but may be differentiated from males by the total amount of emitted VOC and pronounced differences in spatial flower traits. We highlight this at p 6, ll 111-116: “Silene latifolia exhibits various sexual dimorphisms with male plants producing more and smaller flowers that excrete lower volumes of nectar with higher sugar concentrations as compared to females (Gehring et al., 2004; Delph et al., 2010). The quality of floral scent exhibits no clear sex-specific patterns, while male plants have been shown to emit higher or equal total amounts of VOC as compared to females in different studies (Dötterl & Jürgens 2005, Waelti et al. 2009)”.

      Both male and female moths show pronounced behavioural responses to lilac aldehyde isomers and other VOC in the floral scent of S. latifolia (Dötterl et al., 2006). We therefore treated these VOC as typical floral scent compounds. We clarified this at p 7, ll 125-126: “A substantial fraction of floral VOC produced by S. latifolia triggers antennal and behavioural responses in male and female H. bicruris moths (Dötterl et al., 2006).” and p 9, ll 2010-218:” For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

      We used biological pest control agents in a preventive manner because S. latifolia is often infested by thrips and aphids under greenhouse conditions. The writing in the previous manuscript version was not clear with this regard and we changed the text at p 8, ll 157-161: ” Plants received water and fertilisation (UniversolGelb 12-30-12, Everris-Headquarters, NL) when necessary for the entire experimental period and were prophylactically treated with biological pest control agents under greenhouse conditions to prevent thrips (agent Amblyseius barkeri and Amblyseius cucumeris) and aphid (agent Chrysoperla carnea) infestation (Katz Biotech GmbH, GE) .”

      Indeed, flower size and scent emission can be correlated. Although the question whether differences in scent emission were based on a difference in flower size is an interesting one, it seemed less relevant to us because it is unlikely that our pollinators correct their perception of a scent for the size of a flower (see also p 19, 520-526). We were rather interested in whether scent emission differs between the plant treatments and thus pollinators may chemically perceive such differences. Moreover, we found it problematic to correct our models for flower size by including it as a covariate, which is the reason why we have not assessed this trait during scent collection. In this case, we would have corrected our scent responses for the effects of inbreeding, sex and population origin (i.e., the predictors we are interested in) because all of them determine the size of a flower (Figure 2 c,d). Hence, the inbreeding, sex and origin effects on flower scent would likely vanish. However, it is highly unlikely that the set of genes contributing to sex-, breeding treatment- and origin-based variation in flower size is exactly the same one that determines variation in scent emission per flower, which is basically the assumption underlying the model that includes flower size as a covariate. We critically mentioned the trade-off relationships and our reasoning to not correct for flower size at 9p ll 208-210: “The intensities of VOC were not corrected for flower size because we wanted to capture all variation in scent emission that is relevant for the receiver i.e., the pollinator.”

      While the study is laser-focused on floral traits, as the authors are aware inbreeding affects the total phenotype of the plants including fitness and defense traits. For example, there are quite a few studies that have shown how inbreeding affects the plant defense phenotype. This could be addressed in the introduction and discussion.

      We agree that this aspect is important and therefore addressed it in further detail in the introduction at p 4 ll 34-38: “While it is well established that inbreeding can increase a plant’s susceptibility to herbivores by diminishing morphological and chemical defences (Campbell et al., 2013; Kariyat et al., 2012; Kalske et al., 2014), its effects on plant-pollinator interactions are less well understood. Inbreeding may reduce a plant’s attractiveness to pollinating insects by compromising the complex set of floral traits involved in interspecific communication.” Since other referees suggested to rather tone down than increase the discussion based on floral scent results, we stick to the general feedback relationship among of herbivory and pollination, rather than relating it specifically to volatiles in the discussion at p 19, ll 535-544: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change. As such, our study adds to a growing body of literature supporting the need to maintain or restore sufficient genetic diversity in plant populations during conservation programs.”

      Reviewer #2 (Public Review):

      A summary of what the authors were trying to achieve. This interesting and data-rich paper reports the results of several detailed experiments on the pollination biology of the dioceus plant Silene latfolia. The authors uses multiple accessions from several European (native range) and North American (introduced range) populations of S. latifolia to generate an experimental common garden. After one generation of within-population crosses, each cross included either two (half-)siblings or two unrelated individuals, they compared the effects of one-generation of inbreeding on multiple plant traits (height, floral size, floral scent, floral color), controlling for population origin. Thereby, they set out to test the hypothesis that inbreeding reduces plant attractiveness. Furthermore, they ask if the effect is more pronounced in female than male plants, which may be predicted from sexual selection and sex-chromosome-specific expression, and if the effect of inbreeding larger in native European populations than in North American populations, that may have already undergone genetic purging during the bottleneck that inbreeding reduces plant attractiveness. Finally, the authors evaluate to what extent the inbreeding-related trait changes affect floral attractiveness (measured as visitation rates) in field-based bioassays.

      An account of the major strengths and weaknesses of the methods and results. The major strength of this paper is the ambitious and meticulous experimental setup and implementation that allows comparisons of the effect of multiple predictors (i.e. inbreeding treatment, plant origin, plant sex) on the intraspecific variation of floral traits. Previous work has shown direct effects of plant inbreeding on floral traits, but no previous study has taken this wholesale approach in a system where the pollination ecology is well known. In particular, very few studies, if any, has tested the effects of inbreeding on floral scent or color traits. Moreover, I particularly appreciate that the authors go the extra mile and evaluate the biological importance of the inbreeding-induced trait variation in a field bioassay. I also very much appreciate that the authors have taken into account the biological context by using a relevant vision model in the color analyses and by focusing on EAD-active compounds in the floral scent analyses.

      The results are very interesting and shows that the effects of inbreeding on trait variation is both origin- and sex-dependent, but that the strongest effects were not always consistent with the hypothesis that North American plants would have undergone genetic purging during a bottleneck that would make these plants less susceptible to inbreeding effects. The authors made a large collection effort, securing seeds from eight populations from each continent, but then only used population origin and seed family origin as random factors in the models, when testing the overall effect of inbreeding on floral traits. It would have been very interesting with an analysis that partition the variance both in the actual traits under study and in the response to inbreeding to determine whether to what extent there is variation among populations within continents. Not the least, because it is increasingly clear that the ecological outcome of species interactions (mutualistic/antagonistic) in nursery pollination systems often vary among populations (cf. Thompson 2005, The geographic mosaic of coevolution), and some results suggest that this is the case also in Hadena-Silene interactions (e.g. Kephardt et al. 2006, New Phytologist). Furthermore, some plants involved in nursery pollination systems both show evidence of distinct canalization across populations of floral traits of importance for the interaction (e.g. Svensson et al. 2005), whereas others show unexpected and fine-grained variation in floral traits among populations (e.g. Suinyuy et al. 2015, Proceedings B, Thompson et al. 2017 Am. Nat., Friberg et al. 2019, PNAS). Hence, it is possible that the local population history and local variation in the interactions between the plants and their pollinators may be more important predictors for explaining variation in floral trait responses to inbreeding, than the larger-scale continental analyses. Not the least, because North American S. latifolia probably has multiple origins, with subsequent opportunity for admixture in secondary contact.

      Yes, it is necessary to put populations from the same continent into one category, since native and invasive plant populations differ significantly in their evolutionary history (p 5, ll 74-81, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2012.05751.x). Origin explained sufficient amounts of variation in several traits including flower number, corolla expansion, VOC diversity, lilac aldehyde A intensity, and pollinator visitation rates (see Figures 2-3; and Table 2) and some variation in in the magnitude of inbreeding effects (Figure 2e, f; Figure 3). Even if we would not be interested in differences among native and invasive populations, we would have to include origin as a fixed effect in our models because:

      i) populations within a distribution range are no independent samples,

      ii) origin explains sufficient variation in many responses,

      iii) origin cannot be fitted as a random factor, since it has only two levels (the minimum number of levels for random effect is 4). We agree that it would be very interesting to specifically assess differences in the magnitude of breeding and sex effects among populations within origins. We now discuss this as important future research direction at p 18, ll 500-507: “As such, the precise mechanisms underlying variation in inbreeding effects on different scent traits across population origins of S. latifolia can only be explored based on comprehensive genomic resources, which are currently not available. Future studies should also incorporate field-data on the abundance of specialist pollinators and extend the focus from variation in the magnitude of inbreeding effects among geographic origins to variation among populations within geographic origins and individuals within populations. This would allow a detailed quantification of geographic variation in inbreeding effects and elaborating on the causes and ecological consequences of such variation (Thompson, 2005; Schrieber and Lachmuth, 2017; Thompson et al., 2017)”.

      To empirically address within-origin variation of inbreeding effects with our data, we would have to i) fit correlated random intercepts and slopes for the interaction breeding-sex on the population random factor (models consume min. 22 DF); or ii) include population as a fixed effect in our models (models consume min. 67 DF). We have tried both of these approaches when preparing the revision, but unfortunately it turned out that our study is not designed to address this question. The models for both variants only partially converge (see R-script ll. 1568-1580), and even if they do this does not imply that one can draw solid inference from them. Approach i often results in multiple singular convergence warning messages implying that no variance is explained by population-specific reaction norms to the fixed effects specified in the random effects structure. Approach ii results in odd rank- deficient models (I was seriously worried about type I errors). We simply have too few replicates (5) per population-breeding treatment-sex combination for both approaches. For solid inference we would need 10approach i-40approach ii replicates = 640-2600 individuals. However, our experimental design is sufficient to address the hypothesis we have raised in the introduction as well as general differences in response variables among populations. We now provide information on variance partitioning for all models that include population as a random effect in S9. As you will see, population explains lower amounts of variation in our responses as the fixed effects in 9 out of 12 models. The random effects maternal and paternal genotype (mother&father) explain more variation than the random effect population in 6 of 12 cases. Thus, these data do not make a strong case for an extensive discussion of population-based differences in floral traits and this was also not a question or hypotheses we wanted to address with our study.

      I see no major weaknesses in the study, and but in my detailed response, I have made a few questions and suggestions about the floral scent analyses. In short, the authors have used a technique that is not the standard method used for making quantitative floral scent analyses, and I am curious about how it was made sure that the results obtained from the static headspace sampling using PDMS adsorbents could be used as a quantitative measure. I would suggest the authors to validate the use of this method more thoroughly in the manuscript, and have detailed this comment in my response to the authors.

      Also, and this may seem like a nit-picky comment, I am not convinced that the best way to describe the traits under study is "plant attractiveness", because in the experimental bioassays, most of the traits under study that are affected by the inbreeding treatment, did not result in a reduced pollinator visitation. Most (or all) of these traits may also be involved in other plant functions and important for other interactions, so I suggest potentially using a term like "floral traits" or "(putative) signalling traits".

      We now avoid the term floral attractiveness throughout the manuscript and instead refer to “floral traits”.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions: By and large, the authors achieved the aims of this study, and drew conclusions based in these results. One interesting aspect of this work that I think could be discussed a bit deeper is the lack of congruence between the effects of inbreeding on floral traits and the variation in visitation pattern in the bioassay. In fact, the only large effect of inbreeding on a floral trait that may play a role as an explanatory factor is the reduction of emission of lilac aldehyde A in inbred female S. latifolia from North America, which correspond to a reduced visitation rate in this group in the pollinator visitation bioassay. I have made some specific suggestions in my comments to the authors.

      We agree that this aspect required deeper discussion and revised the section at p 19, ll 520-526 accordingly. We believe that the limited spatial vision of H. bicruris in combination with our experimental setup for pollinator observations increased the relative importance of floral scent for pollinator visitation rates (suggested by referee #3).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community: I think that one important aspect of this work that may broaden the impact of this study further is the link between these experiment, and our expectations from the evolution of selfing. Selfing plant species most often conform to the selfing syndrome, presenting smaller, less scented flowers than outcrossing relatives. Traditionally, the selfing syndrome is explained by natural selection against individuals that invest energy into floral signalling, when attracting pollinators is no longer crucial for reproduction. Some studies (for example Andersson, 2012, Am. J. Bot), however, have shown that only one, or a few, generations of inbreeding may reduce floral size as much as quite strong selection for reduced signalling. Here, at least for some populations and sexes, similar results are obtained in this paper regarding several traits (including floral scent), and one way to put this paper in context is by discussing the results in the light of these previous papers.

      We now address this issue at p 16, ll 417-420: “However, our findings highlight that even weak degrees of biparental inbreeding (i.e., one generation sib-mating) can result in a severe reduction of spatial flower trait and scent trait values that is detectable against the background of natural variation among multiple plant populations from a broad geographic region. This observation indirectly supports that the selfing syndrome (i.e., smaller, less scented flowers observed in selfing relative to outcrossing populations of hermaphroditic plant species) may not merely be a result of natural selection against resource investment into floral traits, but also a direct negative consequence of inbreeding (Andersson, 2012).”

      Reviewer #3 (Public Review):

      Schrieber et al. studied the effects of biparental inbreeding in the dioecious plant Silene latifolia, focusing specifically on traits important for floral attractiveness and pollinator attraction. These traits are especially important for dioecious species with separate sexes as they are obligate outcrossers. The authors find that inbreeding mostly decreases floral attractiveness, but that this effect tended to be stronger in the female flowers, which the authors suspect to result from the trade-off with larger investment in the sexual functions in the female plants. The authors then go on to couple the changes in visual and olfactory floral traits to pollinator attraction which allows them to conclude or at least speculate that differences in pollinator behavior are mostly driven by the changes in olfactory traits. The study is robust in its broad and well-balanced sampling of populations, rigorous and in large part meticulously documented experimental designs and linking of the effects on mechanisms to ecological function. The hypothesis are clearly stated and the study is able to address them mostly convincingly. However, some of the aspects of the decisions the authors made and possible caveats need to be addressed and elaborated on.

      A major caveat, in my opinion, is that while the authors find stronger effects of inbreeding on pollinator visitation rates in the plants from the North American (Na) origin, these plants were tested in an environment that was foreign to them, which could have important consequences for the results of this study. This is specifically because the main pollinator Hadena bicruris moth is completely absent from the populations in Na, and yet, was the main pollinator observed in the pollinator attraction experiment. As this pollinator is also a seed predator, the Na populations are released from the selection pressure to avoid attracting the females of this species and thus risking the loss of seeds and fitness. In fact, some of the results suggest that the release from the specialist pollinator and seed predator in Na has led to increase in the attractiveness of the female flowers based on the higher number of flowers visited in the outcrossed females compared to outcrossed males in the plant from the Na origin and the similar, though not statistically significant, pattern in the olfactory cue. While ideally this pollinator attraction experiment should be repeated within the local range of the Na plants, this is of course is not feasible. Instead I suggest the problem should be addressed in the discussion explicitly and its consequences for the interpretation of the results should be considered.

      Indeed, North American populations are tested in their “away”- habitat only and the observed plant performance and pollinator visitation rates can thus provide no direct implications for their “home”-habitat. We state this now more clearly at pp 11-12, ll 283-285. However, our design is appropriate for investigating inbreeding effects on plant-pollinator interactions in multiple plant populations in a common environment. Given the close taxonomic relationship of H. bicruris (main pollinator in Europe) and H. ectypa (main pollinator in North America), the behavioural responses of the former species to variation in the quality of its host plant was considered to overlap sufficiently with responses of the latter species as outlined at pp 11-12, ll 285-291.

      The hypothesis that North American (NA) S. latifolia evolved higher attractiveness to female Hadena moths because H. ectypa is not able to oviposit on female plants in contrast to H. bicruris is indeed a highly interesting one. However, as you have outlined correctly, our study is not designed to elaborate on questions related to adaptive evolutionary differentiation among North American and European plants. Instead of addressing this hypothesis based on our data, we thus take reference to previous studies in the discussion p 17, ll 482-487: “As discussed in detail in previous studies, higher flower numbers in North American S. latifolia plants (Figure 1b) may result from changes in the selective regimes for numerous abiotic factors (Keller et al., 2009) or from the release of seed predation. As opposed to H. bicruris, H. ectypa pollinates North American S. latifolia without incurring costs for seed predation, which may result in the evolution of higher flower numbers, specifically in female plants (Elzinga and Bernasconi, 2009).”

      The incorporation of the VOC data in the actual manuscript was quite limited and I found the reasoning for picking only the three lilac aldehydes (in addition to the Shannon diversity index) for the univariate statistical tests insufficient. How much more efficient was the effect of the lilac aldehydes compared to the other 17 compounds deemed important in the previous study? While the data on this one aldehyde matches the pollinator attraction results, having one compound out of 70 (or out of 20 if only considering the ones identified important for the main pollinator) seems, perhaps, fortuitous lest there is a good reason for focusing on these particular compounds.

      We adapted the text to increase clarity but sticked to our previous choice for the analyses of VOC data.

      i) We now explain our choice of analysing lilac aldehydes with more detail p9, ll 210-218: “For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

      ii) If one analyses 20 compounds with zero-inflation models (actually two models in one) + 8 floral trait models + 2 pollinator visitation models (zi-models with two component models), one ends up with 52 models investigating complex fixed and random effect structures. To keep type-1 errors as low as possible (see also comment 2.12.b from Referee#2), we approached the more comprehensive VOC data sets with multivariate analyses or Shannon diversity.

      iii) We tested the effect of sexoriginbreeding treatment on the Shannon diversity of 20 active VOC as well as in the random forest analyses with the 20 VOC and 70 VOC dataset and transparently reported the results from all of these analyses in the manuscript. Hence, the incorporation of VOC data was not limited. However, we agree that we have taken too little reference to these results and now changed the text accordingly. Results section p 13 ll 351-354: ”Multivariate statistical analyses of 20 H. bicruris active VOC and all 70 VOC detected in S. latifolia revealed no clear separation of floral headspace VOC patterns for any of the treatments (Figure 2-figure supplement 2). In summary, the combined effects of breeding treatment, sex and range on floral scent were rather week.”

      Sampling time of VOCs is reported ambiguously. Was it from 21:00 to 17:00 the next day or in fact from 9pm to 5AM (instead of 5 pm as reported)? Please be more specific in the text as this is quite important. If sampling tubes were left in place during the daytime, some of the compounds could have evaporated due to heating of the tubes in the summer. It would also be important to mention whether all of the headspace VOCs were sampled on the same day and whether there could be variation in i.e. temperature.

      Thank you very much for identifying this typo! It is from 9 pm to 5 am (p 9, l 186).

      Considering the experimental setup for the pollinator attraction observations and the pooling of the data at the block level (which I think is the right choice) it seems possible the authors were more likely to get a result where pollinator behavior matches the long-distance cue, the VOCs. Short-distance cues such a subtle difference in flower size would perhaps not be distinguished with the current setup. I would be interested to know if the authors agree, and if so, mention this in the discussion.

      Thank you very much for this excellent suggestion! We agree and discuss this aspect in detail at p 19, ll 520-526. Indeed, one would need two different experimental setups to assess the contributions of long and short distance cues. Our setup (large distances among plots) is optimal for long distance cues, while a setup for short distance cues should have all plants in close spatial proximity. However, the latter approach does then not allow to address long-distance cues and to exclude competition/facilitation for pollinators among plants from different treatment groups.

    1. Author Response:

      Evaluation Summary:

      This manuscript will be of interest to a broad audience of immunologists especially those studying host-pathogen interactions, mucosal immunology, innate immunity and interferons. The study reveals a novel role for neutrophils in the regulation of pathological inflammation during viral infection of the genital mucosa. The main conclusions are well supported by a combination of precise technical approaches including neutrophil-specific gene targeting and antibody-mediated inhibition of selected pathways.

      We would like to thank the reviewers for taking the time to review our manuscript, would also like to thank the editors for handling our manuscript. We are grateful for the positive response to our work and the thoughtful suggestions.

      Reviewer #1 (Public Review):

      Overall this is a well-done study, but some additional controls and experiments are required, as discussed below. The authors have done a considerable amount of work, resulting in quite a lot of negative data, and so should be commended for persistence to eventually identify the link between neutrophils with IL-18, though type I IFN signaling.

      Thank you! We appreciate the feedback and suggestions for strengthening the study.

      Major Comments:

      -A major conclusion of this manuscript is prolonged type I IFN production following vaginal HSV-2 infection, but the data presented herein did not actually demonstrate this. At 2 days post infection, IFN beta was higher (although not significantly) in HSV-2 infection, but much higher in HSV-1 infection compared to uninfected controls. At 5 days post infection the authors show mRNA data, but not protein data. If the authors are relying on prolonged type I IFN production, then they should demonstrate increased IFN beta during HSV-2 infection at multiple days after infection including 5dpi and 7dpi.

      We apologize for not including the IFN protein data and have now have provided this information in new Figure 3 and Figure 3 - Supplement 3. This new addition shows measurement of secreted IFNb in vaginal lavages at 4, 5 and 7 d.p.i., as well as total IFNb levels in vaginal tissue at 7 d.p.i..

      -Does the CNS viral load or kinetics of viral entry into the CNS differ in mice depleted of neutrophils, IFNAR cKO mice, or mice treated with anti- IL-18? Do neutrophils and/or IL-18 participate at all in neuronal protection from infection?

      To maintain the focus of our study on the host factors that contribute specifically to genital disease, we have not included discussion on viral dissemination into the PNS or CNS, especially as viral invasion of

      the CNS seems to be an infrequent occurrence during genital herpes in humans. However, we have performed some preliminary exploration of this interesting question, and find that viral invasion of the nervous system is unaltered in the absence of neutrophils. This is in accordance with the lack of antiviral neutrophil activity we have described in the vagina after HSV-2 infection. These preliminary data are provided below as a Reviewer Figure 1. We have not yet begun to investigate whether IL-18 modulates neuroprotection, but agree this is an important question to address in future studies.

      RFigure 1. Viral burden in the nervous system is similar in the presence or absence of neutrophils. Graphs show viral genomes measured by qPCR from the DRG, lower half of of the spinal cord and the brainstem at the indicated days post- infection.

      -In Figure 3 the authors show that neutrophil "infection" clusters 2 and 5 express high levels of ISGs. Only 4 of these ISGs are shown in the accompanying figures. Please list which ISGs were increased in neutrophils after both HSV-2 and HSV-1 infection, perhaps in a table. Were there any ISGs specifically higher after HSV-2 infection alone, any after HSV-1 infection alone?

      These tables listing differentially-expressed neutrophils ISGs during HSV-1 and HSV-2 have now been provided in new Figure 3 - Supplement 1, with complete lists of DEGs provided as Source Files for the same figure.

      -The authors claim that HSV-1 infection recruits non-pathogenic neutrophils compared to the pathogenic neutrophils recruited during HSV-2 infection. Can the authors please discuss if these differences in inflammation or transcriptional differences between the neutrophils in these two different infections could be due to differences in host response to these two viruses rather than differences in inflammation? Please elaborate on why HSV-1 used as opposed to a less inflammatory strain of HSV-2. Furthermore, does HSV-1 infection induce vaginal IL-18 production in a neutrophil-dependent fashion as well?

      These are excellent questions, and we have emphasized that differences in host responses against HSV-1 and HSV-2 likely lead to distinct inflammatory milieus that differentially affect neutrophil responses in lines 374-375 and 409-419. We completely agree that differences in neutrophil responses are likely due to distinct host responses against HSV-1 and HSV-2 and apologize for not making that clear. We have previously described some of the other differences in the immunological response against these two viruses (Lee et al, JCI Insight 2020). We would suggest that differences in the host response against these two viruses would naturally result in differences in the local inflammatory milieu, which then modulates neutrophil responses. Whether the transcriptomes of neutrophils beyond the immediate site of infection (outside the vagina) are different between HSV-1 and HSV-2 is currently an open question.

      As for why we used HSV-1 instead of a less inflammatory strain of HSV-2, we had originally been interested in trying to model the distinct disease outcomes that have previously been described during HSV-1 vs HSV-2 genital herpes in humans and thought this would be a relevant comparison. We have not yet examined infection with less inflammatory HSV-2 strains, but agree that this is a great idea. We have also not yet examined neutrophil-dependent IL-18 production in the context of HSV-1.

      Reviewer #2 (Public Review):

      This manuscript will be of interest to a broad audience of immunologists especially those studying host-pathogen interactions, mucosal immunology, innate immunity and interferons. The study reveals a novel role for neutrophils in the regulation of pathological inflammation during viral infection of the genital mucosa. The main conclusions are well supported by a combination of precise technical approaches including neutrophil-specific gene targeting and antibody-mediated inhibition of selected pathways.

      In this study by Lebratti, et al the authors examined the impact of neutrophil depletion on disease progression, inflammation and viral control during a genital infection with HSV-2. They find that removal of neutrophils prior to HSV-2 infection resulted in ameliorated disease as assessed by inflammatory score measurements. Importantly, they show that neutrophil depletion had no significant impact on viral burden nor did it affect the recruitment of other immune cells thus suggesting that the observed improvement on inflammation was a direct effect of neutrophils. The role of neutrophils in promoting inflammation appears to be specific to HSV-2 since the authors show that HSV-1 infection resulted in comparable numbers of neutrophils being recruited to the vagina yet HSV-1 infection was less inflammatory. This observation thus suggests that there might be functional differences in neutrophils in the context of HSV-2 versus HSV-1 infection that could underlie the distinct inflammatory outcomes observed in each infection. In ordered to uncover potential mechanisms by which neutrophils affect inflammation the authors examined the contributions of classical neutrophil effector functions such as NETosis (by studying neutrophil-specific PAD4 deficient mice), reactive oxygen species (using mice global defect in NADH oxidase function) and cytokine/phagocytosis (by studying neutrophil-specific STIM-1/STIM-2 deficient mice). The data shown convincingly ruled out a contribution by the neutrophil factors examined. The authors thus performed an unbiased single cell transcriptomic analysis of vaginal tissue during HSV-1 and HSV-2 infection in search for potentially novel factors that differentially regulate inflammation in these two infections. tSNE analysis of the data revealed the presence of three distinct clusters of neutrophils in vaginal tissue in mock infected mice, the same three clusters remained after HSV-1 infection but in response to HSV-2 only two of the clusters remained and showed a sustained interferon signature primarily driven by type I interferons (IFNs). In order to directly interrogate the impact of type I IFN on the regulation of inflammation the authors blocked type I IFN signaling (using anti IFNAR antibodies) at early or late times after infection and showed that late (day 4) IFN signaling was promoting inflammation while early (before infection) IFN was required for antiviral defense as expected. Importantly, the authors examined the impact of neutrophil-intrinsic IFN signaling on HSV-2 infection using neutrophil-specific IFNAR1 knockout mice (IFNAR1 CKO). The genetic ablation of IFNAR1 on neutrophils resulted in reduced inflammation in response to HSV-2 infection but no impact on viral titers; findings that are consistent with observations shown for neutrophil-depleted mice. The use of IFNAR1 CKO mice strongly support the importance of type I IFN signaling on neutrophils as direct regulators of neutrophil inflammatory activity in this model. Since type I IFNs induce the expression of multiple genes that could affect neutrophils and inflammation in various ways the authors set out to identify specific downstream effectors responsible for the observed inflammatory phenotype. This search lead them to IL-18 as possible mediator. They showed that IL-18 levels in the vagina during HSV-2 infection were reduced in neutrophil-depleted mice, in mice with "late" IFNAR blockade and in IFNAR1 CKO mice. Furthermore, they showed that antibody-mediated neutralization of IL-18 ameliorated the inflammatory response of HSV-2 infected mice albeit to a lesser extent that what was seen in IFNAR1 CKO. Altogether, the study presents intriguing data to support a new role for neutrophils as regulators of inflammation during viral infection via an IFN-IL-18 axis.

      In aggregate, the data shown support the author's main conclusions, but some of the technical approaches need clarification and in some cases further validation that they are working as intended.

      Thank you! We appreciate the enthusiasm for our work as well as the suggestions for improving our study.

      1) The use of anti-Ly6G antibodies (clone 1A8) to target neutrophil depletion in mice has been shown to be more specific than anti-Gr1 antibodies (which targets both monocytes and neutrophils) thus anti-Ly6G antibodies are a good technical choice for the study. Neutrophils are notoriously difficult to deplete efficiently in vivo due at least in part to their rapid regeneration in the bone marrow. In order to sustain depletion, previous reports indicate the need for daily injection of antibodies. In the current study the authors report the use of only one, intra-peritoneal injection (500 mg) of 1A8 antibodies and that this single treatment resulted in diminished neutrophil numbers in the vagina at day 5 after viral infection (Fig 1A). Data shown in figure 2B suggests that there are neutrophils present in the vagina of uninfected mice, that there is a significant increase in their numbers at day 2 and that their numbers remain fairly steady from days 2 to 5 after infection. In order to better understand the impact antibody-mediated depletion in this model the authors should have examined the kinetics of depletion from day 0 through 5 in the vaginal tissue after 1A8 injection as compared to the effect of antibodies in the periphery. These additional data sets would allow for a deeper understanding of neutrophil responses in the vagina as compared to what has been published in other models of infection at other mucosal sites.

      We agree and apologize for not providing this information in the original submission. Neutrophil depletion kinetics from the vagina have been shown in new Figure 1A, while depletion from the blood is shown in new Figure 1 - Supplement 1.

      2) The authors used antibody-mediated blockade as a means to interrogate the impact of type I IFNs and IL-18 in their model. The kinetics of IFNAR blockade were nicely explained and supported by data shown in supplementary figure 4. IFNAR blockade was done by intra-peritoneal delivery of antibodies at one day before infection or at day 4 after infection. When testing the role of IL-18 the authors delivered the blocking antibody intra-vaginally at 3 days post infection. The authors do not provide a rationale for changing delivery method and timing of antibody administration to target IL-18 relative to IFNAR signaling. Since the model presented argues for an upstream role for IFNAR as inducer of IL-18 it is unclear why the time point used to target IL-18 is before the time used for IFNAR.

      We thank Reviewer #2 for raising this point and apologize for not providing an explanation for the differences in antibody treatment regimens for modulating IFNAR and IL-18. As the anti-IL-18 mAb is a cytokine neutralizing antibody, we hypothesized that administering the antibody vaginally would help to concentrate the antibody at the relevant site of cytokine production and increase the potency of neutralization. This is in contrast to systemic administration of the anti-IFNAR1 mAb that acts to block signaling in the 'receiving' cell. We expect the anti-IFNAR1 mAb (given in much higher doses) to bind both circulating cells that are recruited to the site of infection as well as cells that are already at the site of infection. Similarly, we started the anti-IL-18 antibody treatment one day earlier to allow a presumably sufficient amount antibody to accumulate in the vagina. Our rationale has been included in the revised manuscript (lines 351-353). We are pleased to report, however, that we have conducted preliminary studies in which mice were treated beginning at 4 d.p.i. rather than 3 d.p.i., and observe similar trends. This data is provided below as Reviewer Figure 3.

      RFigure 3. Mice treated with anti-IL-18 mAb starting at 4 d.p.i. exhibit reduced disease severity. Mice were infected with HSV-2 and treated ivag with 100ug of anti-IL-18 on 4, 5 and 6 d.p.i.. Mice were monitored for disease until 7 d.p.i.. Data was analyzed by repeated measured two-way ANOVA with Geisser-Greenhouse correction and Bonferroni's multiple comparisons test.

      3) An open question that remains is the potential mechanism by which IL-18 is acting as effector cytokine of epithelial damage. As acknowledged by the authors the rescue seen in IFNAR1 CKO mice (Fig 5C) is more dramatic that targeting IL-18 (Fig 6D). It is thus very likely that IFNAR signaling on neutrophils is affecting other pathways. It would have been greatly insightful to perform a single cell RNA seq experiment with IFNAR CKO mice as done for WT mice in Fig 3. Such an analysis might would have provided a more thorough understanding of neutrophil-mediated inflammatory pathways that operate outside of classical neutrophil functions.

      We agree that the proposed scRNA-seq experiment comparing vaginal cells from IFNAR CKO and WT mice would be very interesting and insightful. Although a bit beyond the scope of the current manuscript, we are currently planning on performing these types of studies to better understand IFN-mediated regulation of inflammatory neutrophil functions.

      4) The inflammatory score scale used is nicely described in the methods and it took into consideration external signs of vaginal inflammation by visual observation. It would have been helpful to mention whether the inflammation scoring was done by individuals blinded to the experimental groups.

      This is an important point and we apologize for not making this clear. We have now provided this information in the methods section of the revised manuscript (lines 778).

      5) The presence of distinct clusters of neutrophils in the scRNA-seq data analysis is a fascinating observation that might suggest more diversity in neutrophils than what is currently appreciated. In this study, the authors do not provide a list of the genes expressed in each cluster within the data shown in the paper. Although the entire data set is deposited and publicly available, having the gene lists within the paper would have been helpful to provide a deeper understanding of the current study.

      The heterogeneity of the vaginal neutrophil population after HSV infection is indeed an unexpected finding. To provide a deeper understanding of these transcriptionally distinct clusters, we have now included complete lists of DEGs between the different clusters as Source Files for Figure 3.

      Reviewer #3 (Public Review):

      This paper examines the role of neutrophils, inflammatory immune cells, in disease caused by genital herpes virus infection. The experiments describe a role for type I interferon stimulation of neutrophils later in the infection that drives inflammation. Blockade of interferon, and to a lesser degree, IL-18 ameliorated disease. This study should be of interest to immunologists and virologists.

      This study sought to examine the role of neutrophils in pathology during mucosal HSV-2 infection in a mouse model. The data presented in this manuscript suggest that late or sustained IFN-I signals act on neutrophils to drive inflammation and pathology in genital herpes infection. The authors show that while depletion of neutrophils from mice does not impact viral clearance or recruitment of other immune cells to the infected tissue, it did reduce inflammation in the mucosa and genital skin. Single cell sequencing of immune cells from the infected mucosa revealed increased expression of interferon stimulated genes (ISGs) in neutrophils and myeloid cells in HSV-2 infected mice. Treatment of anti-IFNAR antibodies or neutrophil-specific IFNAR1 conditional knockout mice decreased disease and IL-18 levels. Blocking IL-18 also reduced disease, although these data show that other signals are likely to also be involved. It is interesting that viral titers and anti-viral immune responses were unaffected by IFNAR or IL-18 blockade when this treatment was started 3-4 days after infection, because data shown here (for IFN-I) and by others in published studies (for IFN-I or IL-18) have shown that loss of IFN-I or IL-18 prior to infection is detrimental.

      These data are interesting and show pathways (namely IFN-I and IL-18) that could be blocked to limit disease. While this suggests that IL-18 blockade might be an effective treatment for genital inflammation caused by HSV-2 infection, the utility of IL-18 blockade is still unclear, because the magnitude of the effect in this mouse model was less than IFNAR blockade. Additionally, further experiments, such as conditional loss of IL-18 in neutrophils, would be required to better define the role and source(s) of IL-18 that drive disease in this model.

      We thank the reviewer for the positive response and agree that additional studies would likely be necessary to fully understand the role of IL-18 during HSV-2 infection.

    1. Author Response:

      Reviewer #1 (Public Review):

      The study by Diboun et al. aims to investigate methylation profiles in Paget's disease of bone patients. Many of the genes identified near areas of differentially methylated sites were known to be involved in osteoclast differentiation, viral infection and mechanical loading. These gene pathways are known to play a role in the pathogenesis of PDB. The strength of this study is that it is the first study to look at changes in methylation profiles in Paget's disease of bone patients. Additionally, the genes identified as having differentially methylated sites suggest that environmental factors such as host immune responses may be altered and play a role in the pathogenesis of PBD. The main weakness of this study is that the cells that were analyzed for changes in methylation sites were not osteoclasts the cells of interest in PBD. While many of the genes identified have been shown to play a role in regulation of the skeletal system, results should be interpreted with caution until they are validated in bone tissue.

      We thank the reviewers and the editors for this thoughtful comment. Ebrahimi et al (EPIGENETICS; 2021, 16(1): 92–105) investigated correlation in methylation profiles between blood and bone tissue in 12 subjects using Illumina MethylationEPIC BeadChip array. Bone samples were taken from the exposed proximal femur after removal of the femoral head from osteoarthritis patients. After quality control, Ebrahimi et al focused the correlation analysis on 64,349 probes that fit their analysis criteria (to define the most highly correlated positions), of which 30,607 sites showed significant (FDR < 0.05) high correlation (r2 > 0.74) between bone and blood.

      Additional filter was applied to these sites to include those with at least 80% similar methylation profile between bone and blood (n = 28,549) which were reported as supplementary table in their paper. We assessed if CpG sites annotated to genes identified from our DMS and DMR analyses (Table 2 and 3) showed high correlation between bone and blood as reported by Ebrahimi et al. Results showed that CpGs annotated to 8 out of the 14 genes from our DMS analysis were among the highly correlated sites between blood and bone (r2 > 0.74; FDR <0.05; Supplementary File 6). For DMRs, out of the 10 genes reported in our study (Table 3), 6 had at least one CpG with high correlation between blood and bone (Supplementary File 6). It is important to note that, in the study by Ebrahimi et al, only 64,349 CpG sites were tested for correlation, owing to the stringent criteria adopted by the authors to identify the list of highly concordant sites. Therefore, our DMS/DMR sites that did not feature in the list are not necessarily uncorrelated. Unfortunately, these sites cannot be investigated further since Ebrahimi et al did not make their entire dataset available in public domain. To address this point, A table has been added to the manuscript (Supplementary File 6) listing the sites with high correlation and the text has been modified to include and discuss these results.

      Reviewer #2 (Public Review):

      This unique study has shown that epigenetic (therefore, potentially environment-driven) factors contribute to the pathogenesis of Paget's Disease of Bone (PDB). Although PDB is not very rare condition, its early diagnosis is problematic. The bone tissue is not easily accessible, thus many cases are not diagnosed till later in life. Thus, having diagnostic markers measured in blood, normalized to cell type count, might be of use for possible diagnostic applications.

      The PRISM trial's sample, comprising 232 cases and 260 controls from UK, was divided in two - discovery and replication sets - based on power calculations for EWAS. Meta-analysis of data from the discovery and replication sets revealed significant differences in DNA methylation. Among gene-body regions/loci, many associated with functions related to osteoclast differentiation, mechanical loading, immune function, etc. two loci were suggested as functional through expression quantitative trait methylation (eQTM) analysis. Further, there was some value in assessing the risk of developing PDB. The AUC of 82.5%, based on the 95 discriminatory sites from the "best subset" analysis, is promising for clinical applicability. If confirmed in independent samples and further studies, chromosomal loci found in this study may offer diagnostic markers for prediction of the disease.

      We would like to draw the reviewer’s attention to the fact that the original cohort comprised of 232 PDB cases and 260 controls (that is 116 cases and 130 controls in each of the discovery and cross validation set). The abstract has been slightly modified to make the text clearer.

      Reviewer #3 (Public Review):

      Diboun et al used a case-control study design to identify DNA methylation sites and regions that differ between individuals with Paget's Disease of Bone (PDB) and controls. Cases were identified from an ongoing PDB clinical trial. Spouses of cases were used as controls. Candidate methylation sites were identified in a discovery set and then tested in a validation set to confirm association with PDB. Meta-analysis was used to combine effects from the discovery and validation sets. A machine learning approach was then used to prioritize candidates and build a prediction model capable of differentiating PDB cases from controls. The model was associated with high level of accuracy (AUC >0.90) in the discovery and validation sets.

      A major strength of the study is the collection of a large population of individuals with a rare bone disease. Epigenetic features are appealing for building prediction models as they may represent interplay between genetics and environment. Using this approach, the authors built a prediction model with a high level of accuracy. The results advance our understanding of the etiology of PDB.

      Overall, the primary conclusions are generally well supported. However, there are several aspects of the paper that will require additional clarification.

      I commend the authors for using a split sample cross validation approach to maximize experimental rigor. However, this approach is distinct from a true external replication. Given that the 'training' and the 'test' sets come from the same overall population, we expect the 'replication' results to be optimistic relative to results from a true, external replication population. Given the absence of a suitable external replication population due the unique nature of the disease, this limitation is acceptable. However, I expect the authors to discuss the potential limitations of this approach in their discussion section and I encourage the authors to refer to the 'replication' set as a 'cross-validation' set to more appropriately convey their experimental approach to the broader scientific community.

      We have referred to the replication set as “cross-validation” as suggested by the reviewer. However, the study subjects were recruited from over 27 medical centres across the United Kingdom (UK) representing most major cities. We have also added text to discuss this point.

      The authors look for functional validation using the BIOS qTL database. This reference provides valuable information about functional role of methylation in gene expression in whole blood (eQTM). We know that eQTMs are tissue specific. Do the authors have any evidence whether the methylation plays a similar role in bone tissue?

      We agree that eQTMs tend to be tissue specific and although we were able to gather some confidence about concordance in methylation levels between blood and bone tissue samples using the Ebrahimi study, it is rather difficult to speculate about the concordance in the effect on gene expression. We therefore raise this issue in the study limitation section of the paper.

      The authors report the markers from their 'best set' for prediction have potential functional relevance. The potential clinical relevance, however, requires additional context. The data were obtained after onset of PDB. The potential for reverse causation cannot be overlooked. Do the authors have any evidence that the methylation markers precede clinical diagnosis? Appropriate temporality is an essential requisite for an effective clinical prediction model.

      We agree with the reviewers that this is an issue with most EWAS studies. The observed methylation changes reported in the study may exist as a consequence of the disease. We therefore updated our discussion of study limitations to reflect the potential issue of reverse causation (page 11). We also discussed the design of future experiments when the predictive value of our best subset set could be properly validated with appropriate temporality. Specifically, how individuals with a genetic predisposition or/and family history of PDB could be measured routinely for changes in the methylation patterns of the best subset identified in this study in an attempt to draw possible associations with future disease onset.

    1. Author Response:

      Reviewer #1:

      In this study, Lee et al. reanalyzed a previous fMRI dataset (Aly et al., 2018) in which participants watched the same 90s movie segment six times. Using event-segmentation methods similar to Baldassano et al. (2017), they show that event boundaries shifted for the average of the last 5 viewings as compared to the first viewing, in some regions by as much as 12 seconds. Results provide evidence for anticipatory neural activity, with apparent differences across brain regions in the timescale of this anticipation, in line with previous reports of a hierarchy of temporal integration windows.

      – One of the key findings of the paper – long-timescale anticipatory event reinstatement – overlaps with the findings of Baldassano et al., 2017. However, the previous study could not address the multiple time scales/hierarchy of predictions. Considering that this is the novel contribution of the current study, more statistical evidence for this hierarchy should be provided.

      We agree that more statistical evidence for the hierarchy is critical. As noted above and described in more detail below, we did this in two ways. First, we related anticipation amounts to the position of brain regions along the anterior/posterior axis, and indeed found that anticipation significantly increases as one moves more anteriorly in the brain. Second, we explored whether brain regions with faster vs slower activity dynamics (i.e., more vs fewer events during the movie) showed differences in anticipation amounts. We found that regions that integrate information over more of the past (show fewer, longer events) show significantly more anticipation into the future.

      – The current hierarchy of anticipation is closely linked to (and motivated by) previous studies showing evidence of a hierarchy of temporal integration windows. Indeed, the question of the study was "whether this hierarchy also exists in a prospective direction". This question is currently addressed somewhat indirectly, by displaying above-threshold brain regions, but without directly relating this hierarchy to previous findings of temporal integration windows, and without directly testing the claimed "posterior (less anticipation) to anterior (more anticipation) fashion" (from abstract).

      Thank you for this important suggestion. We tested whether there is a hierarchy in the posterior (less anticipation) to anterior (more anticipation) direction by calculating the Spearman correlation between the Y-coordinate of each significant voxel (indexing how posterior vs anterior that voxel is) and the amount of anticipation in that voxel. We compared this correlation value to correlations between the Y-coordinate and the amount of anticipation in null maps produced by randomly permuting the order of the viewings. We observed a Spearman rho of 0.58 for the anterior/posterior axis (p = 0.0030). This relationship persisted when the analysis was done on the unthresholded anticipation map (Spearman’s rho = 0.42, p = 0.0028; Supplementary Figure 1). Furthermore, there were no significant relationships between anticipation and the left-to-right (X) axis or the inferior-to-superior (Z) axis. We now describe this as follows in the paper:

      In Methods:

      “To determine if anticipation systematically varied across the cortex in the hypothesized posterior-to-anterior direction, we calculated the Spearman correlation between the Y-coordinate of each significant (q < 0.05) voxel (indexing the position of that voxel along the anterior/posterior axis) and the mean amount of anticipation in that voxel. To obtain a p-value, the observed correlation was compared to a null distribution in which the Spearman correlation was computed with the null anticipation values from the permutation analysis described above, in which the order of the viewings was randomly scrambled for each participant. For comparison, the correlation was also computed for the X (left-right) and Z (inferior-superior) axes. This analysis was repeated on unthresholded anticipation maps, to examine if this hierarchy remained even when including regions whose anticipation amounts did not reach statistical significance.” (p.13)

      In the Results:

      “The magnitude of this shift varied along a posterior to anterior temporal hierarchy (Spearman’s rho = 0.58, p = 0.0030), with the most anterior regions in the temporal pole and prefrontal cortex showing shifts of up to 15 seconds on subsequent viewings compared to the first viewing. This hierarchy persisted even when computed on the unthresholded anticipation map including voxels that did not meet the threshold for statistical significance (Spearman’s rho = 0.42, p = 0.0028; see Supplementary Figure 1). There were no significant correlations with the left-to-right axis (rho = 0.06, p = 0.41 for thresholded map; rho = 0.12, p = 0.29 for unthresholded map) or the inferior-to-superior axis (rho = 0.07, p = 0.28 for thresholded map; rho = -0.11, p = 0.73 for unthresholded map). We obtained a similar map when comparing the first viewing to just the sixth viewing alone (see Supplementary Figure 2).” (p.4)

      We also complemented this approach by looking at whether anticipation amounts vary systematically as a function of the optimal event timescale for a brain region. We first found the optimal number of HMM events for a given brain region based on the first viewing of the movie clip. Regions with fewer events show slower timescales of processing than those with more events, and based on prior studies are known to integrate information over more of the past (Hasson et al., 2008; Hasson et al., 2015; Lerner et al., 2011). We then looked at anticipation within each timescale bin. This was compared to a null distribution where timescale values were correlated with permuted anticipation maps, for which repetition order was scrambled. We found that anticipation is further reaching for regions with longer timescales, as we hypothesized (Spearman rho = 0.319, p = 0.00031; Supplementary Figure 3).

      These new analyses have been incorporated into the Methods and Results as follows:

      “To relate the timescales of anticipation to the intrinsic timescales of brain regions during the first viewing, we fit the HMM on the first viewing alone, varying the number of events from 2 to 10. The HMM was trained on the average response from half of the participants (fitting the sequence of activity patterns for the events and the event variance) and the log-likelihood of the model was then measured on the average response in the other half of the participants. The training and testing sets were then swapped, and the log-likehoods from both directions were averaged together. Hyperalignment was not used during this fitting process, to ensure that the training and testing sets remained independent. The number of events that yielded the largest log-likelihood was identified as the optimal number of events for that searchlight. The optimal number of events was then compared to the anticipation timescale in that region (from the main analysis), using Spearman correlation” (p.14)

      “We also compared how this hierarchy of anticipation timescales related to the intrinsic processing timescales in each region during the initial viewing of the movie clip. Identifying the optimal number of HMM events for each searchlight, we observed a timescale hierarchy similar to that described in previous work, with faster timescales in sensory regions and slower timescales in more anterior regions (Supplementary Figure 3a). Regions with longer intrinsic timescales also showed a greater degree of anticipation with repeated viewing (Supplementary Figure 3b).” (p.4)

      – The analysis is based on averaging the data of the 5 repeated viewings and comparing this average with the data of the first viewing. This means that the repeated viewing condition had much more reliable data than the initial viewing condition. This could potentially affect the results (e.g. better fit to HMM). To avoid this bias, the 5 repeated viewings could be entered separately into the analysis (e.g., each separately compared to the first viewing) and results averaged at the end. Alternatively, only the 6th viewing could be compared to the first viewing (as in Aly et al., 2018).

      Thank you for this suggestion, which we have implemented. Rather than averaging the timescourses from the repeated viewings, we fit the HMM jointly to data from all six viewings. This joint fit constrained the event patterns to be the same across viewings, but allowed the timing of these patterns to vary freely across viewings. We then averaged the anticipation results (from the time by events plots) across viewings 2-6, as suggested. The same pattern of results was observed, and this is now the main analysis in the paper (Figure 2). We also compared the first viewing to the last viewing, as suggested. As shown in Supplementary Figure 2, this analysis also showed a similar pattern of results.

      – Correlation analysis (Fig 6). "we tested whether these correlations were significantly positive for initial viewing and/or repeated viewing, and whether there was a significant shift in correlation between these conditions". It was not clear to me how we should interpret the correlation results in Figure 6. Might a lower correlation for repeated viewing not also reflect general suppression (e.g. participants no longer paying attention to the movie)? Perhaps comparing the correlations at the optimal lag (for each cluster) might help to reduce this concern; that is, the correlation difference would only exist at lag-0.

      We agree that a lower correlation for repeated vs. initial viewing could reflect cognitive processes unrelated to anticipation. Thus, the drop in correlation at lag 0 is not as important or meaningful as a shift in the peak correlation with multiple viewings. In particular, the peak correlation value might be the same for first vs repeated viewings, but a shift in the timing of that peak correlation would support our hypothesis of anticipation.

      We addressed this issue above, under Essential Revision #3, but also include our response below for convenience. We conducted a new analysis in which we measured the timing of the peak cross-correlation between HMM-derived event transitions in the brain and the human-annotated event boundaries, separately for each of the six movie viewings. In other words, we found the amount of shift in the brain’s event transitions that led to the maximum correlation with the timing of the human-annotated event boundaries. We then compared the timing of the correlation peak for the first movie viewing to the timing of the mean peak across viewings 2-6, and found regions of the brain where the peak shifted to be earlier with subsequent movie viewings. This was done as a whole-brain analysis with FDR correction. We include a figure (Figure 5) showing the data for the three searchlights that corresponded to clusters that met the q < .05 FDR criterion. The preceding analysis looked for regions for which the timing of the peak cross-correlation between the brain’s events and human-annotated events shifted earlier over movie repetitions, but did not test for the absolute location of that peak correlation (relative to zero lag between the HMM events and annotated events). Do the brain’s event transitions occur before annotated event transitions, after, or are they aligned? And how does this change over movie repetitions? We examined this question in the three clusters that emerged from the analysis in the preceding paragraph. We found that for the initial viewing, the brain’s event transitions lagged behind human-annotated event boundaries for two of the three clusters, whereas for the last cluster, the brain’s transitions and subjective event boundaries were aligned. For repeated viewings, the timing of the peak correlations shifted such that the brain’s representations of an event transition reliably preceded the occurrence of the human-annotated event boundary, for all three clusters (Figure 5).

      – Correlation analysis (Figure 6). "For both of these regions the initial viewing data exhibits transitions near the annotated boundaries, while transitions in repeated viewing data occur earlier than the annotated transitions" How was this temporal shift statistically assessed?

      The reviewer rightly noted that we did not statistically assess this shift in the first submission; that assessment was based on visual inspection. We now statistically assess whether the relationship between human-annotated event boundaries and the brain’s event transitions shifts with movie repetitions (see response above and Figure 5). We also test whether the brain’s event boundaries reliably occur before, after, or aligned with the human-annotated event boundaries. To that end, we first found the timing of the peak cross-correlation between the brain’s event boundaries and human-annotated event boundaries for each of the three clusters that emerged from the preceding analysis, separately for initial vs repeated viewings. We then obtained confidence intervals for the timing of those peaks by bootstrapping across participants who did the event annotations. In particular, we obtained the timing of the peak cross-correlation between the brain’s event transitions and each of the bootstrapped human-annotated event transitions, and used the bootstrapped timing distribution to find the upper and lower bounds of a 95% confidence interval (measured in seconds).

      We found that, for the initial movie viewing, two of three clusters had event transitions that occurred after subjective event boundaries (95% CIs for Fusiform Gyrus = [0.14, 1.99]; Superior Temporal Sulcus = [1.48, 8.53]). The last cluster had a peak correlation with event boundaries in the movie that was not different from a lag of 0 (i.e., the brain’s event transitions and the human-annotated event boundaries were aligned (95% CIs for Middle Temporal Gyrus = [-0.27, 2.86]). For the repeated movie viewings, this relationship shifted such that, for all three clusters, the brain’s event transitions reliably preceded event boundaries in the movie (95% CIs for Fusiform Gyrus = [-1.56, -0.26], Superior Temporal Sulcus = [-3.06, -1.69], Middle Temporal Gyrus = [-4.06, -1.83]). These shifts are largely consistent with mean anticipation amounts in each of these clusters (Fusiform Gyrus = 2.32s; Superior Temporal Sulcus = 2.18s; Middle Temporal Gyrus = 1.18s).

      This updated analysis is described in Methods and Results as follows:

      “We compared the event boundaries identified by the HMM within each searchlight to the event boundaries annotated by human observers. To obtain an event boundary timecourse from the annotations, we convolved the number of annotations (across all raters) at each second with the hemodynamic response function (HRF) (Figure 4). Separately, we generated a continuous measure of HMM "boundary-ness" at each timepoint by taking the derivative of the expected value of the event assignment for each timepoint, as illustrated in Figure 1d. Moments with high boundary strength indicate moments in which the brain pattern was rapidly switching between event patterns. We cross-correlated the HMM boundary strength timecourse for each viewing with the annotated event boundary timecourse, shifting the annotated timecourse forward and backward to determine the optimal temporal offset (with the highest correlation). We measured the timing of the peak correlation by identifying the local maximum in correlation closest to zero lag, then fitting a quadratic function to the maximum-correlation lag and its two neighboring lags and recording the location of the peak of this quadratic fit. This produced a continuous estimate of the optimal lag for each viewing. We measured the amount of shift between the optimal lag for the first viewing and the average of the optimal lags for repeated viewings, and obtained a p value by comparing to the null distribution over maps with permuted viewing orders (as in the main analysis), then performed an FDR correction.

      We identified three gray-matter clusters significant at q < 0.05. To statistically assess whether the optimal lags differed from zero in the three searchlights maximally overlapping these three clusters, we repeated the cross-correlation analysis in 100 bootstrap samples, in which we resampled from the raters who generated the annotated event boundaries. We obtained 95% bootstrap confidence intervals for maximally-correlated lag on the first viewing and for the average of the maximally-correlated lags on repeated viewings.” (p.14)

      “We asked human raters to identify event transitions in the stimulus, labeling each ”meaningful segment” of activity (Figure 3). To generate a hypothesis about the strength and timing of event shifts in the fMRI data, we convolved the distribution of boundary annotations with a Hemodynamic Response Function (HRF) as shown in Figure 4. We then explored alignment between these human-annotated event boundaries and the event boundaries extracted from the brain response to each viewing, as shown in Figure 1d. In each searchlight, we cross-correlated the brain-derived boundary timecourse with the event annotation timecourse to find the temporal offset that maximized this correlation. We found three clusters in the Middle Temporal Gyrus (MTG), Fusiform Gyrus (FG), and Superior Temporal Sulcus (STS) in which the optimal lag for the repeated viewings was significantly earlier than for the initial viewing, indicating that the relationship between the brain-derived HMM event boundaries and the human-annotated boundaries was changing with repeated viewings (Figure 5). The HMM boundaries on the first viewing were significantly later than the annotated boundaries in FG and STS, while the optimal lag did not significantly differ from zero in MTG (95% confidence intervals for the optimal lag, in seconds: MTG = [-0.27, 2.86]; FG = [0.14, 1.99]; STS = [1.48, 8.53]). The HMM boundaries on repeated viewings were significantly earlier than the annotated boundaries in all three regions (95% confidence intervals for the average optimal lag, in seconds: MTG = [-4.06, -1.83]; FG = [-1.56, -0.26]; STS = [-3.06, -1.69]).” (p.6-7)

      – Not all clusters in Figure 2/6 look like contiguous and meaningful clusters. For example, cluster 9 appears to include insula as well as (primary?) sensorimotor cortex, and cluster 4 includes both ventral temporal cortex and inferior parietal cortex/TPJ. It is thus not clear what we can conclude from this analysis about specific brain regions. For example, the strongest r-diff is in cluster 4, but this cluster includes a very diverse set of regions.

      We agree with this assessment. Because dividing up large clusters would have to be done somewhat arbitrarily, we opted to remove the table that implied the existence of functionally homogeneous clusters. Instead, we will publicly share the unthresholded anticipation map on NeuroVault at https://identifiers.org/neurovault.collection:9584 in case others are interested in it for meta-analyses or comparison to their own work.

      Furthermore, our new analyses systematically compare anticipation across the cortical hierarchy and across regions with different event timescales. Those analyses allow us to say that anticipation varies from posterior to anterior regions of the brain, and that regions with longer event timescales also show further-reaching anticipation. We therefore believe that the current work offers important conclusions about how anticipation varies across the brain, rather than conclusions about any specific brain region’s role in anticipation or why that role arises.

      – In previous related work, the authors correlated time courses within and across participants, providing evidence for temporal integration windows. For example, in Aly et al., 2018 (same dataset), the authors correlated time courses across repeated viewings of the movie. Here, one could similarly correlate time courses across repeated viewings, shifting this time course in multiple steps and testing for the optimal lag. This would seem a more direct (and possibly more powerful) test of anticipation and would link the results more closely to the results of the previous study. If this analysis is not possible to reveal the anticipation revealed here, please motivate why the event segmentation is crucial for revealing the current findings.

      Thank you for bringing this up! This is indeed a simpler way to look for anticipation, but it is arguably less sensitive than the HMM approach. This is because the shifting analysis assumes that anticipation is relatively constant throughout the movie clip. For example, if one shifts the timecourses by 2s relative to one another, the assumption is that brain activity dynamics for repeated viewings will precede those for the initial viewing by 2s throughout the entire clip. Furthermore, one needs to systematically test multiple possible shifts in each brain region to identify the best-fitting amount of anticipation. We therefore opted to use the HMM approach because it does not assume constant anticipation amounts throughout the duration of the clip, but instead allows the amount of anticipation to vary dynamically. Furthermore, the HMM can naturally uncover different timescales of anticipation in different brain regions, without needing a priori hypotheses about what the extent of the shift is.

      Nevertheless, we took the reviewer’s advice and used this cross-correlation approach to examine our data. We ran a searchlight approach to find the peak in the cross-correlation between activity dynamics within each searchlight for the first vs repeated viewings of the movie clip. We found a number of regions where the activity dynamics shifted earlier across movie repetitions, with the amount of this shift varying across regions. Interestingly, the regions that passed statistical significance were primarily in frontal and parietal regions and were far less extensive than those revealed by the HMM analyses. Thus, the HMM approach did seem to be more sensitive and better able to detect anticipation, especially in more posterior parts of the brain with more subtle anticipatory shifts.

      We now include this analysis as Supplementary Figure 4, and briefly discuss it as follows:

      In the Methods:

      “For comparison, we also ran a searchlight looking for anticipatory effects using a non-HMM cross-correlation approach. Within each searchlight, we obtained an average timecourse across all voxels, and correlated the response to the first viewing with the average response to repeated viewings at differing lags. Using the same quadratic-fit approach for identifying the optimal lag described below, we tested whether the repeated-viewing timecourse was significantly ahead of the initial-viewing timecourse (relative to a null distribution in which the viewing order was shuffled within each subject). The p values obtained were then corrected for False Discovery Rate.” (p.14)

      In the Results:

      "We also compared these results to those obtained by using a simple cross-correlation approach, testing for a fixed temporal offset between the responses to initial and repeated viewing. This approach did detect significant anticipation in some anterior regions, but was much less sensitive than the more flexible HMM fits, especially in posterior regions (Supplementary Figure 4)." (p.4)

      Reviewer #2:

      Aly et al. investigated anticipatory signals in the cortex by analysing data in which participants repeatedly watched the same movie clip. The authors identified events using an HMM-based data-driven event segmentation method and examined how the timing of events shifted between the initial and repeated presentation of the same video clip. A number of brain regions were identified in which event timings were shifter earlier in time due to repeated viewing. The main findings is that more anterior brain regions showed more anticipation than posterior brain regions. The reported findings are very interesting, the approach the authors used is innovative and the main conclusions are supported by the results and analyses. However, many cortical regions did not show any anticipatory effects and it is not clear why that is. In part, this may be due to a number of suboptimal aspects in the analysis approach. In addition, the analyses of behavioural annotations are open to multiple interpretations.

      Methods and Results:

      1) The paper shows that across multiple regions in the cortex, there is significant evidence for anticipation of events with repeated viewing. However, there are also many areas that do not show evidence for anticipation. It is not clear whether this is due to a lack of anticipation in those areas, or due to noise in the data or low power in the analyses. There are two factors that may be causing this issue. First, the data that were used are not optimal, given the short movie clip and relatively low number of participants. Second, there are a number of important issues with the analyses that may have introduced noise in the observed neural event boundaries (see points 2-4 below).

      We agree that our previous analyses were suboptimal in several ways. We discuss the changes we have made to address this concern in response to points 2-4 below. We also now share unthresholded anticipation maps (Supplementary Figure 1), and show that an anticipation hierarchy is present even in those data. Thus, even if our approach failed to find statistically significant anticipation in some regions, the main claim of the paper holds when anticipation across the entire brain is considered.

      2) Across all searchlights, the number of estimated events was fixed to be the same as the number of annotated events. However, in previous work, Baldassano and colleagues (2017) showed that there are marked differences between regions in the timescales of event segmentation across the cortex. Therefore, it may be that in regions such as visual cortex, that tends to have very short events, the current approach identifies a mixture of neural activity patterns as one 'event'. This will add a lot of noise to the analysis and decrease the ability of the method to identify anticipatory event timings, particularly for regions lower in the cortical hierarchy that show many more events than tend to be observed in behavioural annotations.

      Thank you for raising this point. The reason we chose the same number of events for each region is to avoid confounding event numbers with anticipation amounts. Our concern was that, if we systematically vary the number of events used in the HMM along the anterior-posterior axis (based on the optimal event timescale), then any resulting differences in anticipation could potentially be driven by the fact that different HMM models were used in different regions. That is, one might see differences in ‘anticipation’ that are entirely driven by differences in the number of events used in the model. To avoid this confound, we fit identical models across the cortex (with a fixed number of events) during the anticipation analysis. We chose to keep this approach in our revision, but supplemented it with analyses that we hope address the relationships between optimal event numbers and amount of anticipation. We mentioned our new analysis in response to comments from Reviewer #1, but we include our response here as well for convenience.

      Our new analysis investigates anticipation amounts as a function of the optimal event timescale for a given brain region. We first found the optimal number of HMM events for a given brain region based on the first viewing of the movie clip. Regions with fewer events show slower timescales of processing than those with more events, and based on prior studies are known to integrate information over more of the past (Hasson et al., 2008; Hasson et al., 2015; Lerner et al., 2011). We then looked at anticipation within each timescale bin, while keeping the number of events fixed at seven. This analysis was compared to a null distribution where timescale values were correlated with permuted anticipation maps, for which repetition order was scrambled. We found that anticipation is further reaching for regions with longer timescales, as we hypothesized (Spearman rho = 0.319, p = 0.00031; Supplementary Figure 3). We believe this analysis nicely links our work to studies of hierarchical information integration.

      These new analyses have been incorporated into the Methods and Results as follows:

      “To relate the timescales of anticipation to the intrinsic timescales of brain regions during the first viewing, we fit the HMM on the first viewing alone, varying the number of events from 2 to 10. The HMM was trained on the average response from half of the participants (fitting the sequence of activity patterns for the events and the event variance) and the log-likelihood of the model was then measured on the average response in the other half of the participants. The training and testing sets were then swapped, and the log-likehoods from both directions were averaged together. Hyperalignment was not used during this fitting process, to ensure that the training and testing sets remained independent. The number of events that yielded the largest log-likelihood was identified as the optimal number of events for that searchlight. The optimal number of events was then compared to the anticipation timescale in that region (from the main analysis), using Spearman correlation” (p.14) “We compared how this hierarchy of anticipation timescales related to the intrinsic processing timescales in each region during the initial viewing of the movie clip. Identifying the optimal number of HMM events for each searchlight, we observed a timescale hierarchy similar to that described in previous work, with faster timescales in sensory regions and slower timescales in more anterior regions (Supplementary Figure 3a). Regions with longer intrinsic timescales also showed a greater degree of anticipation with repeated viewing (Supplementary Figure 3b).” (p.4)

      3) If I understand correctly, the HMM event segmentation model was applied to data from voxels within a searchlight that were averaged across participants. Regular normalization methods typically do not lead to good alignment at the level of single-voxels (Feilong et al., 2018, Neuroimage). Therefore, averaging the data without first hyperaligning them may lead to noise due to functional alignment issues within searchlights.

      Thank you for this suggestion! We re-ran all analyses after hyperalignment (using the Shared Response Model approach; Chen et al., 2015), and anticipatory signals are generally more robust and widespread when conducted in the hyperaligned space. We have therefore replaced all the main analyses in the paper with this new approach.

      4) In the analyses the five repeated viewings of the clips were averaged into a single dataset. However, it is likely that participants' ability to predict the upcoming information still increased after the first viewing. That is especially true for perceptual details that may not have been memorised after watching the clip once, but will be memorised after watching it five times. It is not clear why the authors choose to average viewings 2-6 rather than analyse only viewing 6, or perhaps even more interesting, look at how predictive signals varied with the number of viewings. I would expect that especially for early sensory regions, predictive signals increase with repeated viewing.

      Thank you for this suggestion, which we have implemented. Rather than averaging the timescourses from the repeated viewings, we fit the HMM jointly to data from all six viewings. This joint fit constrained the event patterns to be the same across viewings, but allowed the timing of these patterns to vary freely across viewings. We then averaged the anticipation results (from the time by events plots) across viewings 2-6, as suggested. The same pattern of results was observed, and this is now the main analysis in the paper (Figure 2). We also compared the first viewing to the last viewing, as suggested. As shown in Supplementary Figure 2, this analysis also showed a similar pattern of results.

      5) In the analyses of the alignment between the behavioural and neural event boundaries, the authors show the difference in correlation between the initial and repeated viewing without taking the estimated amount of anticipation into account. I wonder why the authors decided on this approach, rather than estimating the delay between the neural and behavioural event boundaries. The finding that is currently reported, i.e. a lower correlation between neural and annotated events in the repeated viewing condition, does not necessarily indicate anticipation. It could also suggest that with repeated viewing, participants' neural events are less reflective of the annotated events. Indeed the results in figure 5 suggest that the correlations are earlier but also lower for the repeated viewing condition.

      Thank you for raising this point; we agree that the previous analysis was suboptimal. We mentioned our new analysis in response to Essential Revision #3 and comments from Reviewer #1, but we include our response here as well for convenience. We agree that the most important test for this analysis is whether there is a systematic shift, across movie repetitions, in the timing of the peak cross-correlation between the brain’s event transitions and human-annotated event boundaries. To test this, we conducted a new analysis in which we measured the timing of the peak cross-correlation between HMM-derived event transitions in the brain and the human-annotated event boundaries, separately for each of the six movie viewings. In other words, we found the amount of shift in the brain’s event transitions that led to the maximum correlation with the timing of the human-annotated event boundaries. We then compared the timing of the correlation peak for the first movie viewing to the timing of the mean peak across viewings 2-6, and found regions of the brain where the peak shifted to be earlier with subsequent movie viewings. This was done as a whole-brain analysis with FDR correction. We include a figure (Figure 5) showing the data for the three searchlights that corresponded to clusters that met the q < .05 FDR criterion. The preceding analysis looked for regions for which the timing of the peak cross-correlation between the brain’s events and human-annotated events shifted earlier over movie repetitions, but did not test for the absolute location of that peak correlation (relative to zero lag between the HMM events and annotated events). Do the brain’s event transitions occur before annotated event transitions, after, or are they aligned? And how does this change over movie repetitions? We examined this question in the three clusters that emerged from the analysis in the preceding paragraph. We found that for the initial viewing, the brain’s event transitions lagged behind human-annotated event boundaries for two of the three clusters, whereas for the last cluster, the brain’s transitions and subjective event boundaries were aligned. For repeated viewings, the timing of the peak correlations shifted such that the brain’s representations of an event transition reliably preceded the occurrence of the human-annotated event boundary, for all three clusters (Figure 5).

      Together, these results confirm that, in some regions, the best alignment between the brain’s event transitions and human-annotated event boundaries shifts over movie repetitions such that the brain’s event transitions start to occur earlier over repetitions. In particular, the brain’s events shift to precede subjective event boundaries.

      6) To do the comparison between neural and annotated event boundaries, the authors refit the HMM model to clusters of significant voxels in the main analysis. I wonder why this was done rather than using the original searchlights. By grouping larger clusters of voxels, which cover many searchlights with potentially distinct boundary locations, the authors may be introducing noise into the analyses.

      Thank you for this suggestion. Our new analyses comparing neural and annotated event boundaries were conducted on the same searchlights used for the main results, i.e., we do not refit the HMM to significant clusters of voxels.

      Discussion:

      7) To motivate their use of the HMM model, the authors state that: "This model assumes that the neural response to a structured narrative stimulus consists of a sequence of distinct, stable activity patterns that correspond to event structure in the narrative." If neural events are indeed reflective of the narrative event structure, what does it mean if these neural events shift in time? How does this affect the interpretation the association between neural events and narrative events?

      Thank you for raising this issue, which we need to clarify. The HMM produces a probability distribution across states (events) at each timepoint. This probability distribution can reflect a combination of current and upcoming event representations. With more repetitions of the movie, these probability distributions start to shift, such that the expected value of the event assignments at any given time point moves toward upcoming events. Thus, it is not that the brain’s events no longer represent event structure in the narrative; they continue to represent current events while also starting to represent progressively more of upcoming events as well.

      To make that more concrete: during initial viewing, the HMM may be 100% confident that the brain’s representations reflect event #1 at the first time point in the movie. But during subsequent viewings, the same time point may be classified as 90% event #1 and 10% event #2. Thus, shifts in the expected value of the event indicate that the brain is anticipating components of upcoming events, even while continuing to represent the current event. The brain’s events are therefore still related to the narrative, but shift to incorporate upcoming events as well.

      We now clarify this as follows in the caption to Figure 1:

      “By fitting a Hidden Markov Model (HMM) jointly to all viewings, we can identify this shared sequence of event patterns, as well as a probabilistic estimate of event transitions. Regions with anticipatory representations are those in which event transitions occur earlier in time for repeated viewings of a stimulus compared to the initial viewing, indicated by an upward shift on the plot of the expected value of the event at each timepoint.” (p.3)

      And in the caption to Figure 2:

      “Because the HMM produces a probability distribution across states at each timepoint, which can reflect a combination of current and upcoming event representations, we plot the expected value of the event assignments at each timepoint.” (p.5)

      And in the Methods:

      “After fitting the HMM, we obtain an event by time-point matrix for each viewing, giving the probability that each timepoint belongs to each event. Note that because this assignment of timepoints to events is probabilistic, it is possible for the HMM to detect that the pattern of voxel activity at a timepoint reflects a mixture of multiple event patterns, allowing us to track subtle changes in the timecourse of how the brain is transitioning between events.” (p.13)

      Reviewer #3:

      Lee et al. report results from an fMRI experiment with repeated viewings of a single movie clip, finding that different brain regions come to anticipate events to different degrees. The findings are brief but a potentially very interesting contribution to the literature on prediction in the brain, as they use rich movie stimuli. This literature has been limited as it has typically focused on fixed short timescales of possible anticipation, with many repetitions of static visual stimuli, leading to only one possible time scale of anticipation. In contrast, the current video design allows the authors to look in theory for multiple timescales of anticipation spanning simple sensory prediction across seconds to complex social dynamics across tens of seconds.

      The authors applied a Hidden Markov Model to multivoxel fMRI data acquired across six viewings of a 90 second movie. They fit a small set of components with the goal of capturing the different sequentially-experienced events that make up the clip. The authors report clusters of regions across the brain that shift in their HMM-identified events from the first viewing of the movie through the (average of the) remaining 5 viewings. In particular, more posterior regions show a shift (or 'anticipation') on the order of a few seconds, while more anterior regions show a shift on the order of ~10 seconds. These identified regions are then investigated in a second way, to see how the HMM-identified events correspond to subjective event segmentation given by a separate set of human participants. These data are a re-analysis of previously published data, presenting a new set of results and highlighting how open sharing of imaging data can have great benefits. There are a few important statistical issues that the authors should address in a revision in order to fully support their arguments.

      1) The authors report different timescales of anticipation across what may be a hierarchy of brain regions. However, do these timescales change significantly across regions? The paper rests in part on these differences, but the analyses do not yet actually test for any change. For this, there are multiple methods the authors could employ, but it would be necessary to do more than fit a linear model to the already-reported list of (non-independently-sorted) regions.

      Thank you for this important suggestion. We tested whether there is a hierarchy in the posterior (less anticipation) to anterior (more anticipation) direction by calculating the Spearman correlation between the Y-coordinate of each significant voxel (indexing how posterior vs anterior that voxel is) and the amount of anticipation in that voxel. We compared this correlation value to correlations between the Y-coordinate and the amount of anticipation in null maps produced by randomly permuting the order of the viewings. We observed a Spearman rho of 0.58 for the anterior/posterior axis (p = 0.0030). This relationship persisted when the analysis was done on the unthresholded anticipation map (Spearman’s rho = 0.42, p = 0.0028; Supplementary Figure 1). Furthermore, there were no significant relationships between anticipation and the left-to-right (X) axis or the inferior-to-superior (Z) axis. We now describe this as follows in the paper:

      In Methods:

      “To determine if anticipation systematically varied across the cortex in the hypothesized posterior-to-anterior direction, we calculated the Spearman correlation between the Y-coordinate of each significant (q < 0.05) voxel (indexing the position of that voxel along the anterior/posterior axis) and the mean amount of anticipation in that voxel. To obtain a p-value, the observed correlation was compared to a null distribution in which the Spearman correlation was computed with the null anticipation values from the permutation analysis described above, in which the order of the viewings was randomly scrambled for each participant. For comparison, the correlation was also computed for the X (left-right) and Z (inferior-superior) axes. This analysis was repeated on unthresholded anticipation maps, to examine if this hierarchy remained even when including regions whose anticipation amounts did not reach statistical significance.” (p.13)

      In the Results:

      “The magnitude of this shift varied along a posterior to anterior temporal hierarchy (Spearman’s rho = 0.58, p = 0.0030), with the most anterior regions in the temporal pole and prefrontal cortex showing shifts of up to 15 seconds on subsequent viewings compared to the first viewing. This hierarchy persisted even when computed on the unthresholded anticipation map including voxels that did not meet the threshold for statistical significance (Spearman’s rho = 0.42, p = 0.0028; see Supplementary Figure 1). There were no significant correlations with the left-to-right axis (rho = 0.06, p = 0.41 for thresholded map; rho = 0.12, p = 0.29 for unthresholded map) or the inferior-to-superior axis (rho = 0.07, p = 0.28 for thresholded map; rho = -0.11, p = 0.73 for unthresholded map). We obtained a similar map when comparing the first viewing to just the sixth viewing alone (see Supplementary Figure 2).” (p.4)

      2) The description of the statistical methods is unclear at critical points, which leads to questions about the strength of the results. The authors applied the HMM to group-averaged fMRI data to find the neural events. Then they run statistical tests on the difference in the area-under-the-curve (AUC) results from first to other viewings. It seems like they employ bootstrap testing using the group data? Perhaps it got lost, but the methods described here about resampling participants do not seem to make sense if all participants contributed to the results. Following this, they note that they used a q < 0.05 threshold after applying FDR for the resulting searchlight clusters, but based on their initial statement about the AUC tests, this is actually one-tailed? Is the actual threshold for all these clusters q < 0.10? That would be quite a lenient threshold and it would be hard to support using it. The authors should clarify how these statistics are computed.

      We agree that we did not clearly describe the methods. In the previous draft, we used a standard bootstrapping approach in which the individuals contributing to a group analysis are sampled with replacement. Specifically, for each bootstrap iteration, we constructed a bootstrap group average timecourse by resampling participants with replacement, and then ran our full analysis pipeline on this group average. In response to reviewer suggestions to use alternative approaches to obtain a measure of false positive rates, this analysis is now no longer included in the manuscript.

      Instead, in the current draft, we have replaced this bootstrap approach with a permutation-based approach, in which (for each permutation iteration) we randomly shuffle each participant’s six responses to the six presentations of the clip, ensuring that there can be no consistent relationship between viewing order and anticipation. We ran our analysis pipeline on each of these permuted datasets, then fit a Normal null distribution to the resulting anticipation values obtained in each voxel. We obtained a one-tailed p value as the fraction of this distribution that exceeded the real result in this voxel. The p values across all voxels were then corrected for False Discovery Rate, and thresholded the resulting map at q < 0.05.

      This approach is described in the Methods:

      “To assess statistical significance, we utilized a permutation-based null hypothesis testing approach. We constructed null datasets by randomly shuffling each participant’s six responses to the six presentations of the movie clip. The full analysis pipeline (including hyperalignment) was run 100 times, once on the real (unpermuted) dataset and 99 times on null (permuted) datasets, with each analysis producing a map of anticipation across all voxels. A one-tailed p value was obtained in each voxel by fitting a Normal distribution to the null anticipation values, and then finding the fraction of this distribution that exceeded the real result in this voxel (i.e., showed more anticipation than in our unpermuted dataset). Voxels were determined significant (q < 0.05) after applying the Benjamini-Hochberg FDR (False Discovery Rate) correction, as implemented in AFNI (Cox, 1996).” (p.13)

      3) Regarding the relationship to annotated transitions, the reported difference in correlations at zero lag don't tell the story that the authors wish they tell, and as such it does not appear that they support the paper. While it is interesting to see that the correlation at zero lag in the initial viewing is often positive in the independently identified clusters, the fact that there is a drop in correlation on repeated viewings doesn't, in itself, mean that there has been a shift in the temporal relationship between the neural and annotated events. A drop in correlation could also occur if there was just no longer any correlation between the neural and annotated events at any lag due to noisy measurements, or even if, for example, the comparison wasn't to repeated viewings but to a totally different clip. The authors want to say something about the shift in in the waveform/peak, but they need to apply a different method to be able to make this argument.

      We addressed this issue above, under Essential Revision #3, but also include our response below for convenience.

      We conducted a new analysis in which we measured the timing of the peak cross-correlation between HMM-derived event transitions in the brain and the human-annotated event boundaries, separately for each of the six movie viewings. In other words, we found the amount of shift in the brain’s event transitions that led to the maximum correlation with the timing of the human-annotated event boundaries. We then compared the timing of the correlation peak for the first movie viewing to the timing of the mean peak across viewings 2-6, and found regions of the brain where the peak shifted to be earlier with subsequent movie viewings. This was done as a whole-brain analysis with FDR correction. We include a figure (Figure 5) showing the data for the three searchlights that corresponded to clusters that met the q < .05 FDR criterion.

      The preceding analysis looked for regions for which the timing of the peak cross-correlation between the brain’s events and human-annotated events shifted earlier over movie repetitions, but did not test for the absolute location of that peak correlation (relative to zero lag between the HMM events and annotated events). Do the brain’s event transitions occur before annotated event transitions, after, or are they aligned? And how does this change over movie repetitions? We examined this question in the three clusters that emerged from the analysis in the preceding paragraph. We found that for the initial viewing, the brain’s event transitions lagged behind human-annotated event boundaries for two of the three clusters, whereas for the last cluster, the brain’s transitions and subjective event boundaries were aligned. For repeated viewings, the timing of the peak correlations shifted such that the brain’s representations of an event transition reliably preceded the occurrence of the human-annotated event boundary, for all three clusters (Figure 5).

      4) Imaging methods with faster temporal resolution could reveal even earlier reactivation, or replay, of the movies, that would be relatively invisible with fMRI, and the authors do not discuss relevant recent work. E.g. Michelmann et al. 2019 (Nat Hum Beh) and Wimmer et al. 2020 (Nat Neuro) are quite relevant citations from MEG. Michelmann et al. utilize similar methods and results very similar to the current findings, while Wimmer et al. use a similar 'story' structure with only one viewing (followed by cued retrieval) and find a very high degree of temporal compression. The authors vaguely mention faster timescale methods in the discussion, but it would be important to discuss these existing results, and the relative benefits of these methods versus the benefits and limitations of fMRI. It would be interesting and puzzling if there were multiple neural timescales revealed by different imaging methods.

      Thank you for raising this point and those key studies. We have added a section to the Discussion to consider that research and its relation to the current work:

      “The current fMRI study is complementary to investigations of memory replay and anticipation that use MEG and intracranial EEG (iEEG). In an MEG study, Michelmann et al. (2019) found fast, compressed replay of encoded events during recall, with the speed of replay varying across the event. Furthermore, an iEEG investigation found anticipatory signals in auditory cortex when individuals listened to the same story twice (Michelmann et al., 2020). In another MEG study, Wimmer et al. (2020) found compressed replay of previously encoded information. Replay was forward when participants were remembering what came after an event, and backward when participants were remembering what came before an event. The forward replay observed in the Wimmer et al. study may be similar to the anticipatory signals observed in the current study, although there was no explicit demand on memory retrieval in our paradigm. Thus, one possibility is that the anticipatory signals observed in MEG or iEEG are the same as those we observe in fMRI, except that they are necessarily sluggish and smoothed in time when measured via a hemodynamic response. This possibility is supported by fMRI work showing evidence for compressed anticipatory signals, albeit at a slower timescale relative to MEG (Ekman, Kok, & de Lange, 2017).

      An alternative possibility is that the anticipatory signals measured in our study are fundamentally different from those captured via MEG or iEEG. That could explain why we failed to find widespread anticipatory signals in primary visual or primary auditory cortex: the anticipatory signals in those regions might have been too fast to be captured with fMRI, particularly when competing with incoming, dynamic perceptual input. Future studies that obtain fMRI and MEG or iEEG in participants watching the same movie would be informative in that regard. It is possible that fMRI may be particularly well-suited for capturing relatively slow anticipation of stable events, as opposed to faster anticipatory signals relating to fast sub-events. Nevertheless, advances in fMRI analyses may allow the detection of very fast replay or anticipation, closing the gap between these methods and allowing more direct comparisons (Wittkuhn & Schuck, 2021).” (p.10)

      5) The original fMRI experiment contained three conditions, while the current results only examine one of these conditions. Why weren't the results from the two scrambled clip conditions in the original experiment reported? Presumably there were no effects observed, but given that the original report focused on a change in response over time in a scrambled video where the scrambled order was preserved across repetitions, and the current report also focuses on changes across viewings, it would be important to describe reasons for not expecting similar results to these new ones in the scrambled condition.

      We agree it would be very interesting to systematically compare anticipation in those different conditions! Unfortunately, the Scrambled datasets are not well suited to answering this question for a couple of reasons. First, there is an issue of sample size. All 30 participants in Aly et al. (2018) viewed the same Intact movie clip. However, the Scrambled-Fixed and Scrambled-Random conditions had two clip-to-condition assignments, such that only 15 participants within each condition viewed the same clip. Thus, less data is available to look at anticipation within the Scrambled conditions. Another limitation is that our HMM analyses depend on group-averaged fMRI data; to the extent the different individuals show similar brain activity dynamics, the analysis will be more robust. While the Intact condition does have high inter-subject correlations in activity dynamics across many parts of the brain, the Scrambled conditions have much lower inter-subject correlations. We found that this makes hyperalignment (which, given reviewer recommendations, we now do prior to all analyses) work relatively poorly for the Scrambled-Fixed condition, and also makes data in that condition much more noisy than that in the Intact condition. Applying hyperalignment to the Intact and Scrambled-Fixed conditions simultaneously also produced poor fits. Thus, because of these limitations, we do not believe that our group-level approach in this study is appropriate for studying anticipation in the Scrambled-Fixed condition. All that said, this question is of key interest to us, and we are actively running studies to determine how anticipation varies as a function of the stimuli used.

    1. Author Response:

      Reviewer #2:

      The current work makes the case that local neural measurements of selectivity to stimulus features and categories can, under certain circumstances, be misleading. The authors illustrate this point first through simulations within an artificial, deep, neural network model that is trained to map high-level visual representations of animals, plants, and objects to verbal labels, as well as to map the verbal labels back to their corresponding visual representations. As activity cycles forward and backward through the model, activity in the intermediate hidden layer (referred to as the "Hub") behaves in an interesting and non-linear fashion, with some units appearing first to respond more to animals than objects (or vice-versa) and then reversing category preference later in processing. This occurs despite the network progressively settling to a stable state (often referred to as a "point attractor"). Nevertheless, when the units are viewed at the population level, they are able to distinguish animals and objects (using logistic regression classifiers with L1- norm regularization) across the time points when the individual unit preferences appear to change. During the evolution of the network's states, classifiers trained at one time point do not apply well to data from earlier or later periods of time, with a gradual expansion of generalization to later time points as the network states become more stable. The authors then ask whether these same data properties (constant decodability, local temporal generalization, widening generalization window, change in code direction) are also present in electrophysiological recordings (ECoG) of anterior ventral temporal cortex during picture naming in 8 human epilepsy patients. Indeed, they find support for all four data properties, with more stable animal/object classification direction in posterior aspects of the fusiform gyrus and more dynamic changes in classification in the anterior fusiform gyrus (calculated in the average classifier weights across all patients).

      Strengths:

      Rogers et al. clearly expose the potential drawbacks to massive univariate analyses of stimulus feature and task selectivity in neuroimaging and physiological methods of all types -- which is a really important point given that this is the predominant approach to such analyses in cognitive neuroscience. fMRI, while having high spatial resolution, will almost certainly average over the kinds of temporal changes seen in this study. Even methods with high temporal and moderate spatial resolution (e.g. MEG, EEG) will often fail to find selectivity that is detectable only though multivariate methods. While some readers may be skeptical about the relevance of artificial neural networks to real human brain function, I found the simulations to be extremely useful. For me, what the simulations show is that a relatively typical multi-layer, recurrent backpropagation network (similar to ones used in numerous previous papers) does not require anything unusual to produce these kinds of counterintuitive effects. They simply need to exhibit strong attractor dynamics, which are naturally present in deep networks with multiple hidden layers, especially if the recurrent network interactions aid the model during training. This kind of recurrent processing should not be thought of as a stretch for the real brain. If anything, it should be the default expectation given our current knowledge of neuroanatomy. The authors also do a good job relating properties detected in their simulations to the ECoG data measured in human patients.

      We thank the reviewer for these positive comments.

      Weaknesses:

      While the ECoG data generally show the properties articulated by the authors, I found myself wanting to know more about the individual patients. Averaging across patients with different electrode locations -- and potentially different latencies of classification on different electrodes -- might be misleading. For example, how do we know that the shifts from negative to positive classification weights seen in the anterior temporal electrode sites are not really reflecting different dynamics of classification in separate patients? The authors partially examine this issue in the Supplementary Information (SI-3 and Figure SI-4) by analyzing classification shifts on individual patient electrodes. However, we don't know the locations of these electrodes (anterior versus posterior fusiform gyrus locations). The use of raw-ish LFPs averaged across the four repetitions of each stimulus (making an ERP) was also not an obvious choice, particularly if one desires to maximize the spatial precision of ECoG measures (compare unfiltered LFPs, which contain prominent low frequency fluctuations that can be shared across a larger spatial extent, to high frequency broadband power, 80-200 Hz).

      In the new statistical tests described above, we compute each metric separately for each patient, then conduct cross-subject statistical tests against a null hypothesis to assess whether the global pattern observed in the mean data is reliable across patients. We hope this addresses the reviewer's general concern that the mean pattern obscures heterogeneity across patients. With regard to the question of greater variability in anterior electrodes, the new analysis showing a remarkably strong correlation between variability of coefficient change and electrode location along the anterior-posterior axis provides a formal statistical test of this observation. We view variability of decoder coefficients as more informative than the independent correlations between electrode activity and category label shown in the supplementary materials, because the coefficients indicate the influence of electrode activity on classification when all other electrode states are taken into account (akin in some ways to a partial correlation coefficient). This distinction is noted in SI-3, p 48.

      The authors are well-known for arguing that conceptual processing is critically mediated by a single hub region located in the anterior temporal lobe, through which all sensory and motor modalities interact. I think that it's worth pointing out that the current data, while compatible with this theory, are also compatible with a conceptual system with multiple hubs. Deep recurrent dynamics from high-level visual processing, for which visual properties may be separated for animals and objects in the posterior aspects of the fusiform gyrus, through to phonological processing of object names may operate exactly as the authors suggest. However, other aspects of conceptual processing relating to object function (such as tool use) may not pass through the anterior fusiform gyrus, but instead through more posterior ventral stream (and dorsal stream) regions for which the high-level visual features are more segregated for animals versus tools. Social processing may similarly have its own distinct networks that tie in to visual<- >verbal networks at a distinct point. So while the authors are persuasive with regard to the need for deep, recurrent interactions, the status of one versus multiple conceptual hubs, and the exact locations of those hubs, remains open for debate.

      We agree that the current data does not speak to hypotheses about other components of the cortical semantic network outside the field-of-view of our dataset. We have added an explicit statement of this in the General Discussion (page 22).

      The concepts that the authors introduce are important, and they should lead researchers to examine the potential utility of multivariate classification methods for their own work. To the extent that fMRI is blind to the dynamics highlighted here, supplementing fMRI with other approaches with high temporal resolution will be required (e.g. MEG and simultaneous fMRI-EEG). For those interested in applying deep neural networks to neuroscientific data, the current demonstration should also be a cautionary tale for the use of feed-forward-only networks. Finally, the authors make an important contribution to our thinking about conceptual processing, providing novel arguments and evidence in support of point-attactor models.

      Thanks to the reviewer for highlighting these points, which we take to be central contributions of this work!

      Reviewer #3:

      The authors compared how semantic information is encoded as a function of time between a recurrent neural network trained to link visual and verbal representations of objects and in the ventral anterior temporal lobe of humans (ECOG recordings). The strategy is to decode between 'living' and 'nonliving' objects and test/train at different timepoints to examine how dynamic the underlying code is. The observation is that coding is dynamic in both the neural network as well as the neural data as shown by decoders not generalizing to all other timepoints and by some units contributing with different sign to decoders trained at different timepoints. These findings are well in line with extensive evidence for a dynamic neural code as seen in numerous experiments (Stokes et al. 2013, King&Dehaene 2014).

      Strengths of this paper include a direct model to data comparison with the same analysis strategy, a model capable of generating a dynamic code, and the usage of rare intracranial recordings from humans. Weaknesses: While the model driven examination of recordings is a major strength, the data analysis does only provide limited support for the major claim of a 'distributed and dynamic semantic code' - it isn't clear that the code is semantic and the claims of dynamics and anatomical distribution are not quantitative.

      Major issues:

      1) Claims re a 'semantic code'. The ECOG analysis shows that decoding 'living from 'nonliving' during viewing of images exhibits a dynamic code, with some electrodes coding to early decodability and some to later, and with some contributing with different signs. It is a far stretch to conclude from this that this shows evidence for a 'dynamic semantic code'. No work is done to show that this representation is semantic- in fact this kind of single categorical distinction could probably be done also based on purely visual signals (such as in higher levels of a network such as VGG or higher visual cortex recordings). In contrast the model has rich structure across numerous semantic distinctions.

      We have added a new analysis showing that the animate/inanimate distinction cannot be decoded for these stimuli from purely visual information as captured by a well-known unsupervised method for computing visual similarity structure amongst bitmap line drawings (Chamfer matching). We did not consider deep layers of the VGG-19 model as that model is explicitly trained to assign photographs to human-labeled semantic categories, so the representations do not reflect purely visual structure. The new analysis appears as part of the description of the stimulus set on page 31.

      The proposal that ventral anterior temporal cortex encodes semantic information is not new to this paper but is based on an extensive prior literature that includes studies of semantic impairments in patients with pathology in this area (e.g. refs 7, 13, 29-32), studies of semantic disruption by TMS applied to this region (refs. 37-38 ), functional brain imaging of semantic processing with PET (33), distortion-corrected MRI (34-36), MEG (e.g. Mollos et al., 2017, PLOS ONE), and ECOG (ref. 46), and neurally-constrained computational models of developing, mature, and disordered semantic processing (refs. 7, 31, 40, 53). A great deal of this literature uses the same animate/inanimate distinction employed here as a paradigmatic example of a semantic distinction. It is especially useful in the current case because the animate/inanimate distinction is unrelated to the response elicited by the stimuli (the basic-level name).

      2) Missing quantification of model-data comparison. These conclusions aren't supported by quantitative analysis. This includes importantly statements regarding anatomical location (Fig 4E), ressemblenes in dynamic coding patterns ('overlapping waves' Fig 4C-D), and presence of electrodes that 'switch sign'. These key conclusions seem to be derived purely by graphical inspection, which is not appropriate.

      We have added new statistical analyses of each core claim as explained above.

      3) ECOG recordings analysis. Raw LFP voltage was used as the feature (if I interpreted the methods correctly, see below). This does not seem like an appropriate way to decode from ECOG signals given the claims that are made due to sensitivity to large deflections (evoked potentials). Analysis of different frequency bands, power, phase etc would be necessary to substantiate these claims. As it stands, a simpler interpretation of the findings is that the early onset evoked activity (ERPs) gives rise to clusters 1-4, and more sustained deflections to the other clusters. This could also give rise to sign changes as ERPs change sign.

      The reviewer's comment suggests that information about the category should be reflected in spectral properties of the time-varying signals but not the direction/magnitude of the LFP itself. While we recognize that this is a common hypothesis in the literature, an alternative hypothesis more consistent with neural-network models of cognition suggests that such information can be encoded in magnitude and direction of the LFP itself—the closest brain analog to unit activity in a neural network model. The fact that semantic information can be accurately decoded from the LFPs, following a pattern closely resembling that arising in the model, is consistent with this hypothesis. We agree that, in future, it would be interesting to look at decoding of spectral properties of the signal. We note these points on revised manuscript page 22.

      With regard to this comment:

      a simpler interpretation of the findings is that the early onset evoked activity (ERPs) gives rise to clusters 1-4, and more sustained deflections to the other clusters. This could also give rise to sign changes as ERPs change sign

      We are not sure how this constitutes a simpler or even a different explanation of our data. ERPs at an intracranial electrode reflect local neural responses to the stimulus, which change over stimulus processing. The data show that semantic information about the stimulus can be decoded from these signals at the initial evoked response and all subsequent timepoints, but the relationship between the neural response and the semantic category (ie how the semantic information is encoded in the measured response) changes as the stimulus is processed. The changing sign of an ERP reflects changing activity of nearby neural populations. "More sustained deflections" indicates that changes to the code are slowing over time. These are essentially the conclusions that we draw about the dynamic code from our data.

      Maybe the reviewer is concerned that the results are an artifact of just the temporal structure of the LFPs themselves—that these change rapidly with stimulus onset and then slow down, so that the “expanding window” pattern arises from, for instance, temporal auto-correlation in the raw data. Testing this possibility was the goal of the analysis in SI-5, where we show that auto- correlation of the raw LFP signal does not grow broader over time—so the widening-window pattern observed in the generalization of classifiers is not attributable to the temporal autocorrelation structure of the raw data.

    1. Author Response:

      Reviewer #1:

      The largest concern with the manuscript is its use of resting-state recordings in Parkinson's Disease patients on and off levodopa, which the authors interpret as indicative of changes in dopamine levels in the brain but not indicative of altered movement and other neural functions. For example, when patients are off medication, their UPDRS scores are elevated, indicating they likely have spontaneous movements or motor abnormalities that will likely produce changed activations in MEG and LFP during "rest". Authors must address whether it is possible to study a true "resting state" in unmedicated patients with severe PD. At minimum this concern must be discussed in the manuscript.

      We agree that Parkinson’s disease can lead to unwanted movements such as tremor as well as hyperkinesias. This would of course be a deviation from a resting state in healthy subjects. However, such movements are part of the disease and occur unwillingly. The main tremor in Parkinson’s disease is a rest tremor and - as the name already suggests – it occurs while not doing anything. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. Resting state activity with and without medication is therefore still representative for changes in brain activity in Parkinson’s patients and indicative of alterations due to medication.

      To further investigate the effect of movement in our patients, we subdivided the UPDRS part 3 score into tremor and non-tremor subscores. For the tremor subscore we took the mean of item 15 and 17 of the UPDRS, whereas for the non-tremor subscore items 1, 2, 3, 9, 10, 12, 13, and 14 were averaged. Following Spiegel et al., 2007, we classified patients as akinetic-rigid (non-tremor score at least twice the tremor score), tremor-dominant (tremor score at least twice as large as the non-tremor score), and mixed type (for the remaining scores). Of the 17 patients, 1 was tremor dominant and 1 was classified as mixed type (his/her non-tremor score was greater than tremor score). None of our patients exhibited hyperkinesias during the recording. To exclude that our results are driven by tremor-related movement, we re-ran the HMM without the tremor-dominant and the mixed-type patient (see Figure R1 response letter).

      ON medication results for all HMM states remained the same. OFF medication results for the Ctx-Ctx and STN-STN state remained the same as well. The Ctx-STN state OFF medication was split into two states: Sensorimotor-STN connectivity was captured in one state and all other types of Ctx-STN connections were captured in another state (see Figure 1 response letter. The important point is that the biological conclusions stand across these solutions. Regardless, both with and without the two subjects a stable covariance matrix entailing sensorimotor-STN connectivity was determined, which is the main finding for the Ctx-STN state OFF medication.

      We therefore discuss this issue now within the limitation section (page 20):

      “Both motor impairment and motor improvement can cause movement during the resting state in PD. While such movement is a deviation from a resting state in healthy subjects, such movements are part of the disease and occur unwillingly. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. None of the patients in our cohort experienced hyperkinesia during the recording. All patients except for two were of the akinetic-rigid subtype. We verified that tremor movement is not driving our results. Recalculating the HMM states without these 2 subjects, even though it slightly changed some particular aspects of the HMM solution did not materially affect the conclusions.”

      Figure R1: States obtained after removing one tremor dominant and one mixed type patient from analysis. Panel C shows the split OFF medication cortico-STN state. Most of the cortico-STN connectivity is captured by the state shown in the top row (Figure 1 C OFF). Only the motor-STN connectivity in the alpha and beta band (along with a medial frontal-STN connection in the alpha band) is captured separately by the states labeled “OFF Split” (Figure 1 C OFF SPLIT).

      This reviewer was unclear on why increased "communication" in the medial OFC in delta and theta was interpreted as a pathological state indicating deteriorated frontal executive function. Given that the authors provide no evidence of poor executive function in the patients studied, the authors must at least provide evidence from other studies linking this feature with impaired executive function.

      If we understand the comment correctly it refers to the statement in the abstract “Dopaminergic medication led to communication within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with deteriorated frontal executive functioning as a side effect of dopamine treatment in Parkinson’s disease”

      This statement is based on the dopamine overdose hypothesis reported in the Parkinson’s disease (PD) literature (Cools 2001; Kelly et al. 2009; MacDonald and Monchi 2011; Vaillancourt et al. 2013). We have elaborated upon the dopamine overdose hypothesis in the discussion on page 16. In short, dopaminergic neurons are primarily lost from the substantia nigra in PD, which causes a higher dopamine depletion in the dorsal striatal circuitry than within the ventral striatal circuits (Kelly et al. 2009; MacDonald and Monchi 2011). Thus, dopaminergic medication to treat the PD motor symptoms leads to increased dopamine levels in the ventral striatal circuits including frontal cortical activity, which can potentially explain the cognitive deficits observed in PD (Shohamy et al. 2005; George et al. 2013). We adjusted the abstract to read:

      “Dopaminergic medication led to coherence within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with known side effects of dopamine treatment such as deteriorated executive functions in Parkinson’s disease.”

      In this article, authors repeatedly state their method allows them to delineate between pathological and physiological connectivity, but they don't explain how dynamical systems and discrete-state stochasticity support that goal.

      To recapitulate, the HMM divides a continuous time series into discrete states. Each state is a time-delay embedded covariance matrix reflecting the underlying connectivity between brain regions as well as the specific temporal dynamics in the data when such state is active. See Packard et al., (1980) for details about how a time-delay embedding characterises a linear dynamical system.

      Please note that the HMM was used as a data-driven, descriptive approach without explicitly assuming any a-priori relationship with pathological or physiological states. The relation between biology and the HMM states, thus, purely emerged from the data; i.e. is empirical. What we claim in this work is simply that the features captured by the HMM hold some relation with the physiology even though the estimation of the HMM was completely unsupervised (i.e. blind to the studied conditions). We have added this point also to the limitations of the study on page 19 and the following to the introduction to guide the reader more intuitively (page 4):

      “To allow the system to dynamically evolve, we use time delay embedding. Theoretically, delay embedding can reveal the state space of the underlying dynamical system (Packard et al., 1980). Thus, by delay-embedding PD time series OFF and ON medication we uncover the differential effects of a neurotransmitter such as dopamine on underlying whole brain connectivity.”

      Reviewer #2:

      Sharma et al. investigated the effect of dopaminergic medication on brain networks in patients with Parkinson's disease combining local field potential recordings from the subthalamic nucleus and magnetencephalography during rest. They aim to characterize both physiological and pathological spectral connectivity.

      They identified three networks, or brain states, that are differentially affected by medication. Under medication, the first state (termed hyperdopaminergic state) is characterized by increased connectivity of frontal areas, supposedly responsible for deteriorated frontal executive function as a side effect of medical treatment. In the second state (communication state), dopaminergic treatment largely disrupts cortico-STN connectivity, leaving only selected pathways communicating. This is in line with current models that propose that alleviation of motor symptoms relates to the disruption of pathological pathways. The local state, characterized by STN-STN oscillatory activities, is less affected by dopaminergic treatment.

      The authors utilize sophisticated methods with the potential to uncover the dynamics of activities within different brain network, which opens the avenue to investigate how the brain switches between different states, and how these states are characterized in terms of spectral, local, and temporal properties. The conclusions of this paper are mostly well supported by data, but some aspects, mainly about the presentation of the results, remain:

      We would like to thank the reviewer for his succinct and clear understanding of our work.

      1) The presentation of the results is suboptimal and needs improvement to increase readers' comprehension. At some points this section seems rather unstructured, some results are presented multiple times, and some passages already include points rather suitable for the discussion, which adds too much information for the results section.

      We have removed repetitions in the results sections and removed the rather lengthy introductory parts of each subsection. Moreover, we have now moved all parts, which were already an interpretation of our findings to the discussion.

      2) It is intriguing that the hyperdopaminergic state is not only identified under medication but also in the off-state. This is intriguing, especially with the results on the temporal properties of states showing that the time of the hyperdopaminergic state is unaffected by medication. When such a state can be identified even in the absence of levodopa, is it really optimal to call it "hyperdopaminergic"? Do the results not rather suggest that the identified network is active both off and on medication, while during the latter state its' activities are modulated in a way that could relate to side effects?

      The reviewer’s interpretations of the results pertaining to the hyper-dopaminergic state are correct. The states had been named post-hoc as explained in the results section. The hyper-dopaminergic state’s name derived from it showing the overdosing effects of dopamine. Of course, these results are only visible on medication. But off medication, this state also exists without exhibiting the effects of excess dopamine. To avoid confusion or misinterpretation of the findings and also following the relevant comment by reviewer 1, we renamed all states to be more descriptive:

      Hyperdopaminergic > Cortico-cortical state

      Communication > Cortico-STN state

      Local > STN-STN state.

      3) Some conclusions need to be improved/more elaborated. For example, the coherence of bilateral STN-STN did not change between medication off and on the state. Yet it is argued that a) "Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019) , local oscillations are a potential mechanism to prevent excessive communication with the cortex" (line 436) and b) "Another possibility is that a loss of cortical afferents causes local basal ganglia oscillations to become more pronounced" (line 438). Can these conclusions really be drawn if the local oscillations did not change in the first place?

      We apologize for the unclear description. Our conclusion was based on the following results:

      a) We state that STN-STN connectivity as measured by the magnitude of STN-STN coherence does not change OFF vs ON medication in the Cortico-STN state. This result is obtained using inter-medication analysis.

      b) But ON medication, STN-STN coherence in the Cortico-STN state was significantly different from mean coherence within the ON condition. These results are obtained using intra-medication analysis.

      Based on this, we conclude that in the Cortico-STN state, although OFF vs ON medication the magnitude of STN-STN coherence was unchanged, the STN-STN coherence was significantly different from mean coherence in the ON medication condition. The emergence of synchronous STN-STN activity may limit information exchange between STN and cortex ON medication.

      An alternative explanation for these findings might be a mechanism preventing connectivity between cortex and the STN ON medication. This missing interaction between STN and cortex might cause STN-STN oscillations to increase compared to the mean coherence within the ON state. Unfortunately, we cannot test such causal influences with our analysis.

      We have added the following discussion to the manuscript on page 17 in order to improve the exposition:

      “Bilateral STN–STN coherence in the alpha and beta band did not change in the cortico-STN state ON versus OFF medication (InterMed analysis). However, STN-STN coherence was significantly higher than the mean level ON medication (IntraMed analysis). Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019), the high coherence within the STN ON medication could prevent communication with the cortex. A different explanation would be that a loss of cortical afferents leads to increased local STN coherence. The causal nature of the cortico-basal ganglia interaction is an endeavour for future research.”

      Reviewer #3:

      In PD, pathological neuronal activity along the cortico-basal ganglia network notably consists in the emergence of abnormal synchronized oscillatory activity. Nevertheless, synchronous oscillatory activity is not necessarily pathological and also serve crucial cognitive functions in the brain. Moreover, the effect of dopaminergic medication on oscillatory network connectivity occurring in PD are still poorly understood. To clarify these issues, Sharma and colleagues simultaneously-recorded MEG-STN LFP signals in PD patients and characterized the effect of dopamine (ON and OFF dopaminergic medication) on oscillatory whole-brain networks (including the STN) in a time-resolved manner. Here, they identified three physiologically interpretable spectral connectivity patterns and found that cortico-cortical, cortico-STN, and STN-STN networks were differentially modulated by dopaminergic medication.

      Strengths:

      1) Both the methodological and experimental approaches used are thoughtful and rigorous.

      a) The use of an innovative data-driven machine learning approach (by employing a hidden Markov model), rather than hand-crafted analyses, to identify physiologically interpretable spectral connectivity patterns (i.e., distinct networks/states) is undeniably an added value. In doing so, the results are not biased by the human expertise and subjectivity, which make them even more solid.

      b) So far, the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD was evaluated/assessed to specific cortico-STN spectral connectivity. Conversely, whole-brain MEG studies in PD patients did not account for cortico-STN and STN-STN connectivity. Here, the authors studied, for the first time, the whole-brain connectivity including the STN (whole brain-STN approach) and therefore provide new evidence of the brain connectivity reported in PD, as well as new information regarding the effect of dopaminergic medication on the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD.

      2) Studying the temporal properties of the recurrent oscillatory patterns of transient network connectivity both ON and OFF medication is extremely important and provide interesting and crucial information in order to delineated pathological versus physiologically-relevant spectral brain connectivity in PD.

      We would like to thank the reviewer for their valuable feedback and correct interpretation of our manuscript.

      Weaknesses:

      1) In this study, the authors implied that the ON dopaminergic medication state correspond to a physiological state. However, as correctly mentioned in the limitations of the study, they did not have (for obvious reasons) a control/healthy group. Moreover, no one can exclude the emergence of compensatory and/or plasticity mechanisms in the brain of the PD patients related to the duration of the disease and/or the history of the chronic dopamine-replacement therapy (DRT). Duration of the disease and DRT history should be therefore considered when characterizing the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD, as well as when examining the effect of the dopaminergic medication on the functioning of these specific networks.

      We would like to thank the reviewer for pointing this out. We regressed duration of disease (year of measurement – year of onset) on the temporal properties of the HMM states. We found no relationship between any of the temporal properties and disease duration. Similarly, we regressed levodopa equivalent dosage for each subject on the temporal properties and found no relationship. We now discuss this point in the manuscript (page 20):

      “A further potential influencing factor might be the disease duration and the amount of dopamine patients are receiving. Both factors were not significantly related to the temporal properties of the states.”

      2) Here, the authors recorded LFPs in the STN activity. LFP represents sub-threshold (e.g., synaptic input) activity at best (Buzsaki et al., 2012; Logothetis, 2003). Recent studies demonstrated that mono-polar, but also bi-polar, BG LFPs are largely contaminated by volume conductance of cortical electroencephalogram (EEG) activity even when re-referenced (Lalla et al., 2017; Marmor et al., 2017). Therefore, it is likely that STN LFPs do not accurately reflect local cellular activity. In this study, the authors examined and measured coherence between cortical areas and STN. However, they cannot guarantee that STN signals were not contaminated by volume conducted signals from the cortex.

      We appreciate this concern and thank the reviewer for bringing it up. Marmor et al. (2017) investigated this on humans and is therefore most closely related to our research. They find that re-referenced STN recordings are not contaminated by cortical signals. Furthermore, the data in Lalla et al. (2017) is based on recordings in rats, making a direct transfer to human STN recordings problematic due to the different brain sizes. Since we re-referenced our LFP signals as recommended in the Marmor paper, we think that contamination due to cortical signals is relatively minor; see Litvak et al. (2011), Hirschmann et al. (2013), and Neumann et al. (2016) for additional references supporting this. That being said, we now discuss this potential issue in the paper on page 20.

      “Lastly, we recorded LFPs from within the STN –an established recording procedure during the implantation of DBS electrodes in various neurological and psychiatric diseases. Although for Parkinson patients results on beta and tremor activity within the STN have been reproduced by different groups (Reck et al. 2010, Litvak et al. 2011, Florin et al. 2013, Hirschmann et al. 2013, Neumann et al. 2016), it is still not fully clear whether these LFP signals are contaminated by volume-conducted cortical activity. However, while volume conduction seems to be a larger problem in rodents even after re-referencing the LFP signal (Lalla et al. 2017), the same was not found in humans (Marmor et al. 2017).”

      3) The methods and data processing are rigorous but also very sophisticated which make the perception of the results in terms of oscillatory activity and neural synchronization difficult.

      To aid intuition on how to interpret the result in light of the methods used, one can compare the analysis pipeline to a windowing approach. In a more standard approach, windows of different time length can be defined for different epochs within the time series and for each window coherence and connectivity can be determined. The difference in our approach is that we used an unsupervised learning algorithm to select windows of varying length based on recurring patterns of whole brain network activity. Within those defined windows we then determine the oscillatory properties via coherence and power – which is the same as one would do in a classical analysis. We have added an explanation of the concept of “oscillatory activity” within our framework to the introduction (page 2 footnote):

      “For the purpose of our paper, we refer to oscillatory activity or oscillations as recurrent, but transient frequency–specific patterns of network activity, even though the underlying patterns can be composed of either sustained rhythmic activity, neural bursting, or both (Quinn et al. 2019).”

      Moreover, we provide a more intuitive explanation of the analysis within the first section of the results (page 4):

      “Using an HMM, we identified recurrent patterns of transient network connectivity between the cortex and the STN, which we henceforth refer to as an ‘HMM state’. In comparison to classic sliding-window analysis, an HMM solution can be thought of as a data-driven estimation of time windows of variable length (within which a particular HMM state was active): once we know the time windows when a particular state is active, we compute coherence between different pairs of regions for each of these recurrent states.”

      4) Previous studies have shown that abnormal oscillations within the STN of PD patients are limited to its dorsolateral/motor region, thus dividing the STN into a dorsolateral oscillatory/motor region and ventromedial non-oscillatory/non-motor region (Kuhn et al. 2005; Moran et al. 2008; Zaidel et al. 2009, 2010; Seifreid et al. 2012; Lourens et al. 2013, Deffains et al., 2014). However, the authors do not provide clear information about the location of the LFP recordings within the STN.

      We selected the electrode contacts based on intraoperative microelectrode recordings (for details, see page 23). The first directional recording height after the entry into the STN was selected to obtain the three directional LFP recordings from the respective hemisphere. This practice has been proven to improve target location (Kochanski et al., 2019; Krauss et al., 2021). The common target area for DBS surgery is the dorsolateral STN. To confirm that the electrodes were actually located within this part of the STN, we now reconstructed the DBS location with Lead-DBS (Horn et al. 2019). All electrodes – except for one – were located within the dorsolateral STN (see figure 7 of the manuscript). To exclude that our results were driven by outlier, we reanalysed our data without this patient. No change in the overall connectivity pattern was observed (see figure R3 of the response letter).

      Figure R2: Lead DBS reconstruction of the location of electrodes in the STN for different subjects. The red electrodes have not been placed properly in the STN. The contacts marked in red represent the directional contacts from which the data was used for analysis.

      Figure R3: HMM states obtained after running the analysis without the subject with the electrode outside the STN.

      References:

      Buzsáki G, Anastassiou CA, Koch C. The origin of extracellular fields and currents-EEG, ECoG, LFP and spikes. Nat Rev Neurosci 2012; 13: 407–20.

      Cagnan H, Duff EP, Brown P. The relative phases of basal ganglia activities dynamically shape effective connectivity in Parkinson’s disease. Brain 2015; 138: 1667–78.

      Cools R. Enhanced or impaired cognitive function in Parkinson’s disease as a function of dopaminergic medication and task demands. Cereb Cortex 2001; 11: 1136–43.

      Cruz A V., Mallet N, Magill PJ, Brown P, Averbeck BB. Effects of dopamine depletion on network entropy in the external globus pallidus. J Neurophysiol 2009; 102: 1092–102.

      Florin E, Erasmi R, Reck C, Maarouf M, Schnitzler A, Fink GR, et al. Does increased gamma activity in patients suffering from Parkinson’s disease counteract the movement inhibiting beta activity? Neuroscience 2013; 237: 42–50.

      George JS, Strunk J, Mak-Mccully R, Houser M, Poizner H, Aron AR. Dopaminergic therapy in Parkinson’s disease decreases cortical beta band coherence in the resting state and increases cortical beta band power during executive control. NeuroImage Clin 2013; 3: 261–70.

      Hirschmann J, Özkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson’s disease. Neuroimage 2013; 68: 203–13.

      Holt AB, Kormann E, Gulberti A, Pötter-Nerger M, McNamara CG, Cagnan H, et al. Phase-dependent suppression of beta oscillations in parkinson’s disease patients. J Neurosci 2019; 39: 1119–34.

      Horn A, Li N, Dembek TA, Kappel A, Boulay C, Ewert S, et al. Lead-DBS v2: Towards a comprehensive pipeline for deep brain stimulation imaging. Neuroimage 2019; 184: 293–316.

      Kelly C, De Zubicaray G, Di Martino A, Copland DA, Reiss PT, Klein DF, et al. L-dopa modulates functional connectivity in striatal cognitive and motor networks: A double-blind placebo-controlled study. J Neurosci 2009; 29: 7364–78.

      Kochanski RB, Bus S, Brahimaj B, Borghei A, Kraimer KL, Keppetipola KM, et al. The impact of microelectrode recording on lead location in deep brain stimulation for the treatment of movement disorders. World Neurosurg 2019; 132: e487–95.

      Krauss P, Oertel MF, Baumann-Vogel H, Imbach L, Baumann CR, Sarnthein J, et al. Intraoperative neurophysiologic assessment in deep brain stimulation surgery and its impact on lead placement. J Neurol Surgery, Part A Cent Eur Neurosurg 2021; 82: 18–26.

      Lalla L, Rueda Orozco PE, Jurado-Parras MT, Brovelli A, Robbe D. Local or not local: Investigating the nature of striatal theta oscillations in behaving rats. eNeuro 2017; 4: 128–45.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson’s disease. Brain 2011; 134: 359–74.

      MacDonald PA, MacDonald AA, Seergobin KN, Tamjeedi R, Ganjavi H, Provost JS, et al. The effect of dopamine therapy on ventral and dorsal striatum-mediated cognition in Parkinson’s disease: Support from functional MRI. Brain 2011; 134: 1447–63.

      MacDonald PA, Monchi O. Differential effects of dopaminergic therapies on dorsal and ventral striatum in Parkinson’s disease: Implications for cognitive function. Parkinsons Dis 2011; 2011: 1–18.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. J Neurophysiol 2017; 117: 2140–51.

      Neumann WJ, Degen K, Schneider GH, Brücke C, Huebl J, Brown P, et al. Subthalamic synchronized oscillatory activity correlates with motor impairment in patients with Parkinson’s disease. Mov Disord 2016; 31: 1748–51.

      Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett 1980; 45: 712–6.

      Quinn AJ, van Ede F, Brookes MJ, Heideman SG, Nowak M, Seedat ZA, et al. Unpacking Transient Event Dynamics in Electrophysiological Power Spectra. Brain Topogr 2019; 32: 1020–34.

      Reck C, Himmel M, Florin E, Maarouf M, Sturm V, Wojtecki L, et al. Coherence analysis of local field potentials in the subthalamic nucleus: Differences in parkinsonian rest and postural tremor. Eur J Neurosci 2010; 32: 1202–14.

      Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA. The role of dopamine in cognitive sequence learning: Evidence from Parkinson’s disease. Behav Brain Res 2005; 156: 191–9.

      Spiegel J, Hellwig D, Samnick S, Jost W, Möllers MO, Fassbender K, et al. Striatal FP-CIT uptake differs in the subtypes of early Parkinson’s disease. J Neural Transm 2007; 114: 331–5.

      Vaillancourt DE, Schonfeld D, Kwak Y, Bohnen NI, Seidler R. Dopamine overdose hypothesis: Evidence and clinical implications. Mov Disord 2013; 28: 1920–9.

    1. Author Response:

      We would like to thank the reviewers for their thoughtful and thorough critique of our manuscript. In our revised preprint, we added important additional data and restructured our manuscript to reflect as many of the recommendations as possible. Additionally, we have added experiments to define the cellular mechanisms underlying observed damage following mechanical injury. The most significant additions of new data include:

      • Further experiments demonstrating block of glutamate clearance exacerbates stimulus-induced hair-cell synapse loss.
      • Analysis of neuromast disruption in lhfpl5b mutant null larvae showing mechanical displacement. Lhfpl5b mediates mechanosensitivity in lateral-line hair cells, allowing us to determine whether mechanotransduction is required for mechanical disruption of neuromasts.
      • Testing the vibratory stimulus at various frequencies to confirm the optimal frequency to induce acute, generally sub-lethal damage to lateral-line hair cells is 60 Hz.
      • Assessment of neuromast supporting cell and hair cell proliferation following mechanical overstimulation.
      • Quantitative analysis of kinocilia SEM and confocal images of hair bundles in control and stimulus exposed fish. Individual comments are addressed as outlined below.

      Reviewer #1:

      1) The authors use a vertically-oriented Brüel+Kjær LDS Vibrator to deliver a 60 Hz vibratory stimulus to damage lateral line hair cells. It is not made clear on why this frequency was selected. Did the authors choose this frequency because they screened a number of frequencies and this is the one that did the most damage to hair cells or was it chosen for another reason? Or, do all frequencies do the same amount of damage? The authors should screen a number of frequencies and choose the stimulus that does the most damage to hair cells. This would set the field in the best direction, should members of the community attempt this new technique. It is not necessary to repeat all of the experiments, but the authors should show which frequencies are best for inducing damage.

      The frequency selected for mechanical overexposure of lateral-line organs was based on previous studies showing 60 Hz to be within the optimal upper frequency range of mechanical sensitivity of superficial posterior lateral-line neuromasts, with maximal response between 10-60 Hz, but a suboptimal frequency for hair cells of the anterior macula in the ear (Weeg and Bass 2002, Trapani et al, 2009, Levi et al, 2015). To confirm that 60 Hz was the optimal frequency to induce damage, we tested 45, 60, and 75 Hz at comparable intensities. We observed at 75 Hz no apparent damage to lateral line neuromasts while 45 Hz at a comparable intensity proved toxic i.e. it was lethal to the fish. We have updated the Results and Method Details to include our rationale for choosing 60 Hz.

      2) The SEM images of the hair bundle are beautiful and do show damage to the hair bundle, but historically speaking older studies in mammals have shown that the actin core of the stereocilia is damaged. It would be critical to know if this was the case. Showing damage to the kinocilium and stereocilia splaying is a start, but readers would need to know if the actin cores are damaged. So, TEM should be used to find damage to the actin cores of stereocilia.

      Our main goal of this initial manuscript was to survey morphological and functional changes in mechanically injured lateral line organs with an emphasis on inflammation and synapse loss. We agree TEM studies showing damage to the actin core of the stereocilia will be important to determine whether mechanical damage to neuromast hair bundles fully mimics mammalian stereocilia damage, but these experiments will require significant time to perform and optimize. We have expanded our analysis of hair-bundle morphology in this study and intend to pursue deeper analysis of hair bundle damage, i.e. examination of the stereocilia actin core, in future follow-up studies.

      3) I think the use of "Noise-exposed lateral line" as a term for mechanically overstimulated lateral line hair cells is not correct and could be misleading. The lateral line senses water motion not sound as the word noise would imply. Calling the stimulus "noise" should be removed throughout.

      We have removed the term “noise” throughout the manuscript and replaced it with either “strong water current stimulus” or “mechanical overstimulation” where appropriate.

      4) Decreases in mechanotransduction are shown by dye entry. These results should be strengthened using microphonic potentials to determine the extent of damage. This experiment is not necessary but would improve the quality of the document.

      While we agree that microphonic recordings would provide further support for reduced mechanotransduction, quantitative FM1-43 uptake in zebrafish lateral line hair cells is a well-established proxy for microphonic measurements. In a previous study using the same protocol utilized in our manuscript, FM1-43 labeling intensity was shown to directly correspond with microphonic amplitude (Toro et al, 2015). Moreover, the fixable analogue of FM1-43 (FM1-43FX) gave us comparable relative measurements of uptake as live FM1-43 and provided the additional advantage of high temporal resolution and the ability to simultaneously assay entire cohorts of control and overstimulated fish (which is not possible with microphonic measurements or live FM1-43 imaging), as we could expose groups of fish briefly to the dye at determined time intervals following overstimulation, then immediately place in fixative.

      5) In figure 2, PSD labeling is not clear.

      We assume the reviewer meant PSD labeling in Figure 4 and we agree it is difficult to discern. We have changed the hair-cell label from gray to blue in the images so that the green PSD labeling is clear.

      Reviewer #2:

      1) While the findings are carefully measured and described, the effects of insult on hair cells are relatively minor, with a change in hair cell number, extent of innervation or synapses per hair cell (Figs 3 and 4) in the range of 10% reduction compared to control. One potential value of the model would be to use it to discover underlying pathways of damage or screen for potential therapeutics. However with these modest changes it is not clear that there will be enough power to determine effects of potential interventions.

      One advantage of the zebrafish model is the ability to overstimulate large cohorts of larvae, thereby providing enough power to uncover modest but significant changes resulting from moderate damage to hair cells. While not as well suited for unbiased large-scale screens of therapeutics, our overexposure protocol provides the opportunity to determine the role of specific cellular pathways (e.g. metabolic stress, inflammation, and glutamate excitotoxicity) in hair-cell damage and synapse loss following mechanically-induced damage via genetic or pharmacological manipulation of these pathways. Additionally, as the hair cell synapses fully repair following stimulus-induced loss, the zebrafish model has the potential for identifying novel pathways for repair through transcriptomic profiling (for an example, see Mattern et al, Front. Cell Dev. Biol., 2018). Cumulatively, these future experimental directions will provide important mechanistic information that could be used toward the development of targeted therapeutic interventions.

      2) The most dramatic phenotype after shaking is a physical displacement of hair cells, described as disrupted morphology. However it is not clear what the underlying cause of this change. Are only posterior neuromasts damaged in this way? Is it a wounding response as animals are exposed to an air interface during shaking? It is also not clear to what extent this displacement reveals more general principles of the effects of noise on hair cells. Additional discussion of underlying causes would be welcome.

      We agree that the underlying causes of the physical displacement of posterior lateral-line neuromasts warranted further investigation and we have expanded appropriate sections of the results. To determine if excessive hair-cell activity plays a role in the displacement of neuromasts we have exposed lhfpl5b mutant—fish that have intact hair cell function in the ear, but no mechanotransduction in hair cells of the lateral line—to mechanical overstimulation. We observed comparable disruption of neuromasts lacking mechanotransduction, supporting that displacement of lateral-line hair cells is due to mechanical damage and does not require intact mechnotransduction. Further, when examining the adjacent supporting cells in disrupted neuromasts, we observed they are similarly displaced and elongated. We conclude that observed disruption of hair cells is a consequence of mechanical displacement of the entire neuromast organ. We have added additional discussion of this phenomenon to the Results and Discussion sections of the manuscript.

      3) Because afferent neurons innervate more than one neuromast and more than one hair cell per neuromast, measurements of innervation of neuromasts (Figure 3) or synapses per hair cell (Fig 4) cannot be assumed to be independent events. That is, changes in a single postsynaptic neuron may be reflected across multiple synapses, hair cells, and even neuromasts. This needs to be accounted for in experimental design for statistical analysis.

      We agree that changes in single postsynaptic neurons, which innervate groups of hair cells of the same polarity within a neuromast, could be reflected across multiple synapses. Additionally, it is plausable that excitotoxic events at the postsynapse, while not contributing to apparent neurite retraction, could be contributing to synapse loss across multiple innervated hair cells. We have updated the manuscript to reflect the potential contribution of postsynaptic signaling to synapse loss and added experiments pharmacologically blocking glutamate uptake.

      4) The SEM analysis provides compelling snapshots of apical damage, but could be supplemented by quantitative analysis with antibody staining or transgenic lines where kinocilia are labeled. The amount of reduced FM1-43 labeling is one of the more dramatic effects of the shaking insult, suggesting widespread disruption to mechanotransduction that could be related to this apical damage. Further examination of the recovery of mechanotransduction would be interesting.

      To supplement the SEM snapshots of severe apical damage, we have expanded the SEM image analysis with quantitative data on kinocilia morphology. We have also added confocal images of hair bundles using antibody labeling of acetylated tubulin in a transgenic line expressing β-actin-GFP in hair cells. We agree that correlative studies of mechanotransduction recovery relative to hair-bundle morphology would be interesting, and we intend to examine this question in a future follow-up study.

      5) A previous publication by Uribe et al.2018 describes a somewhat similar shaking protocol with somewhat different results - more long-lasting changes in hair cell number, presynaptic changes in synapses, etc. It would be worth discussing potential differences across the two studies.

      We agree we did not adequately address the considerable differences between our mechanical damage protocol for the zebrafish lateral line and the damage protocol described by Uribe et al, 2018. We have provided a more direct comparison in the Results section and addressed the differences in our protocols in-depth in the Discussion section.

      Our damage protocol uses a stimulus within the known frequency range of lateral-line hair cells (60 Hz) that is applied to free-swimming larvae and evokes a behaviorally relevant response (fast start response). The damage is observable immediately following noise exposure, is specific to posterior lateral-line neuromasts, and appears to be rapidly repaired. Some features of the damage we observe—reduced mechanotransduction and hair-cell synapse loss—may correspond to mechanically induced damage of hair cell organs in other species. Notably, hair cell synapse loss in seemingly intact neuromasts is exacerbated by pharmacologically blocking synaptic glutamate clearance, supporting that the 60 Hz frequency stimulus is overstimulating neuromast hair cells directly and suggesting that the mechanism of synapse loss may be similar to inner hair cell synapse loss reported in mice following moderate noise exposures.

      By contrast, the damage protocol published by Uribe et al used ultrasonic transducers (40-kHz) to generate small, localized shock waves rather than directly stimulate neuromast hair cells. The damaged they reported—delayed hair-cell death and modest synapse loss with no effect on hair-cell mechanotransduction—was not apparent until 48 hours following exposure and not specific to the lateral-line organ. Some of the features of the damage they observed—delayed onset apoptosis and hair-cell death—may correspond to damage reported in mice following blast injuries.

      Reviewer #3:

      1) As the authors point out, zebrafish hair cells can be regenerated. With that in mind, and to make the relevance for mammalian hair cell repair clear, a clear distinction between mechanisms mediated by "repair" or "regeneration" needs to be made. The authors discuss that proliferative hair cell generation can be excluded based on the short time period, but suggest that transdifferentiation might be involved. Recovery of NM hair cell number occurs within the same 2 hour period in which NM morphology and hair cell function improved, making it difficult to determine the extent to which "regeneration" contributed to the recovery. The amount of transdifferentiation has to be shown experimentally (lineage tracing?).

      We agree that the distinction between "repair" and "regeneration" needs to be made when discussing this model of mechanical damage to zebrafish hair cell organs. We have tried to clarify that most of what we observe regarding recovery—restoration of neuromast shape, mechanostransduction, afferent contacts, and synapse number —reflect mechanisms of repair following mechanical damage (and, in the case of synapse loss, overstimulation) rather than regeneration. However, one feature of damage that may reflect rapid regeneration is restoration of hair cells number following mechanical injury. To experimentally determine whether proliferation contributed to hair cell generation, we assessed the incorporation of the thymidine analog EdU during a 4 hour recovery following mechanical overexposure in a transgenic line expressing GFP in neuromast supporting cells and observe a modest but not statistically significant increase in the number of proliferating supporting cells in neuromasts exposed to strong current stimulus, suggesting recovery of lost hair cells is not primarily due to renewed proliferation.

      The number of hair cells that are lost and recover within several hours are low, i.e., typically ~1 hair cell/neuromast. We observed this consistently in all of our experiments, but the mechanisms responsible are not clear. Based on previous studies of hair cell regeneration in the lateral line, the recovery time appears too rapid to be caused by renewed proliferation, a notion that is further supported by our Edu studies. On the other hand, it is possible that a few supporting cells may undergo the initial phases of phenotypic change into hair cells during this short time period, and we speculate that such transdifferentiation may be responsible for the observed recovery. We should emphasize that this is a new observation and, at present, we do not fully understand the underlying mechanism. However, the focus of the present study is on mechanical damage, synaptic loss, and subsequent repair. We believe that it is important to report our consistent findings of low level hair cell loss and recovery, but a detailed characterization of the mechanism would require considerable effort and would best be the topic of a future study.

      2) The classification of "normal" vs "disrupted" is vague and not quantitative. The examples shown in the paper seem to be quite clear-cut, but this reviewer doubts that was the case throughout all analyzed samples. Formulate clear benchmarks and criteria for the disrupted phenotype (even when blind analysis is performed).

      We have defined measurable criteria for "normal" vs "disrupted" neuromasts that we have added to the Method Details section: “We defined exposed neuromast morphology as “normal” when hair cells appeared radially organized with a relatively uniform shape and size, with ≤7 μm difference observed when comparing the lengths from apex to base of an opposing pair of anterior/posterior hair cells. Length was measured from a fixed point at the center of the hair bundle to the basolateral end of each opposing hair cell. We defined neuromasts as “disrupted” when hair cells appeared elongated and displaced to one side, with >7 μm difference observed when comparing the lengths of an opposing pair of anterior/posterior hair cells. Generally, the apical ends of the hair cells were displaced posteriorly, with the basolateral ends oriented anteriorly.”

      3) Sustained and periodic exposure: These two exposure protocols not only differ with respect to sustained vs periodic, they also differ in total exposure time (Fig 2B). This complicates the interpretation, especially considering the authors own finding that a pre-exposure is protective.

      To clarify—pre-exposure was not protective to hair-cell survival. Rather, in preliminary experiments, pre-exposure appeared to reduce larval mortality, and we have clarified that observation in the text of the Results and the Methods Details sections. We agree with the reviewer that comparing the two protocols based on differences in time distribution is complicated in that they also differ in total exposure time. For the purpose of clarity, we now focus on the sustained exposure in the main figures and created supplemental figures for the reduced damage still observed using periodic exposure, specifying that reduced damage may be the result of periodic time distribution of stimulus and/or less cumulative time exposed to the stimulus.

      4) The data on the mitochondrial ROS aspect seems not well integrated into the overall story.

      We agree that the ROS story was not well integrated and incomplete. We have removed the data describing mpv17-/- mutants and mitochondrial disfunction from this manuscript. A more comprehensive report of mpv17-/- mutant mitochondrial function and morphological analysis of neuromasts following noise exposure is now described in a follow-up manuscript (“Influence of Mpv17 on hair-cell mitochondrial homeostasis, synapse integrity, and vulnerability to damage in the zebrafish lateral line”).

      5) It is surprising that the hair bundle morphology was not assessed after recovery. This is crucial. Overall, it would be good to see some quantification of the SEM data, e.g. kinocilia length and number of splayed bundles.

      We have expanded the SEM image analysis to quantitatively access kinocilia morphology following exposure. We agree that assessment of recovery using live imaging of hair bundles paired with subsequent SEM analysis will be informative, and we intend to perform those experiments in a future study.

      6) Behavioral recovery (measured as number of "fast start" responses) was also not assessed. This is essential for determining the functional relevance of the recovery.

      We attempted to measure behavior recovery of lateral-line function by measuring “fast-start” responses immediately and several hours after recovery, and discovered that i) strong water current provided stimulation that was too intense to reveal subtle behavioral changes following lateral-line damage and recovery, and ii) when testing larvae immediately following sustained strong current exposures, it was difficult to discern if fewer “fast-start” responses were due to lateral-line organ damage or larval fatigue. We agree that behavioral recovery is important to assay but acknowledge assessing lateral-line mediated behavior following mechanical damage will require a more sensitive testing paradigm that stimulates the lateral-line sensory organ with a relatively gentile, calibrated water flow stimulus. We are currently performing a follow-up study to this paper using a testing paradigm developed by a postdoctoral associate in our lab that analyses subtle changes in larval orientation to water flow (rheotaxis) mediated by the lateral-line organ. Using this behavior paradigm, we will directly correlate morphological and functional recovery over time.

      7) This reviewer is not yet convinced that this damage model displays enough commonalities to mammalian noise damage to justify the ubiquitous use of the term "noise" throughout the manuscript. It would be more prudent to use a more careful term along the lines of "mechanical overstimulation-induced damage".

      We have removed the term “noise” throughout the manuscript and replaced it with either “strong water current stimulus” or “mechanical overstimulation” where appropriate.

      8) Overall, there was a lack of experimental and analysis detail in the results section. For example, how was afferent innervation quantified? Just counting GFP labeled contacts to hair cells?

      Innervation of neuromast hair cells was quantified during blinded analysis by scrolling through confocal z-stacks of each neuromast (step size 0.3 μm) containing hair cell and afferent labeling and identifying hair cells that were not directly contacted by an afferent neuron i.e. no discernable space between the hair cell and the neurite. Hair cells that were identified as no longer innervated showed measurable neurite retraction; there was generally >0.5 μm distance between a retracted neurite and hair cell. We have added this information to the Methods Detail section.

      There was also inconsistency in the use of two variations of the mechanical damage protocol, the time points at which repair was assessed, and whether the damage was quantified in all neuromasts or in normal vs. disrupted neuromasts separately, making the data difficult to interpret.

      We have revised our figure legends to clearly indicate when we are assessing damage in all exposed neuromasts (pooled) to control vs. comparative analysis of normal vs. disrupted neuromasts relative to control. In addition, we now focus on the sustained exposure in the main figures, which was the exposure protocol used for the time points in which repair and recovery were assessed.

    1. Author Response:

      Reviewer #1:

      In this manuscript, Ma, Hung and colleagues rewind the tape to explore the genetic landscape that precedes carbapenem resistance of Klebsiella pneumoniae strains. The importance of this work is underscored by the paucity of new drugs to treat CPO (carbapenemase producing organisms). 'Given the need for 35 greater antibiotic stewardship, these findings argue that in addition to considering the current 36 efficacy of an antibiotic for a clinical isolate in antibiotic selection, considerations of future 37 efficacy are also important.' And so I would say the major weakness of the paper is the aspirational nature of how this work could be used by clinicians in antibiotic selection or treatment of the patient.

      We consider this study as a first step towards recognizing the need to develop more comprehensive diagnostics and more sophisticated antibiotic stewardship programs. This study suggests that factors besides MICs could inform clinical antibiotic selection, including that specific lineages have higher propensity to develop resistance (i.e., ST258), stepping-stone mutations that facilitate the evolution of resistance (i.e., mutations in rseA and ompK36), and antibiotics that have high level resistance barriers (i.e., meropenem). We have now added language to both the introduction and discussion to note that next steps are needed to extend these findings into the clinic, including more extensive whole genome sequencing of isolates and tracking of these strains in the clinic, associated patient outcome and strain evolution data, to understand the full impact of these mutational events in CREs.

      The strains selected for these experiments and the evolutionary in vitro models are both well considered. One idea that has stuck with me from the figures of a review article by Kishony (https://pubmed.ncbi.nlm.nih.gov/23419278/, figure 4) is the concept of constraining the evolutionary pathways or fitness landscape for antibiotic resistance. Are there any peaks that a microbial strain reaches that optimize resistance to one AbX but basically leave it inherently unable to evolve resistance to another AbX? This could have application for dual drug therapy or pulsed therapy.

      This is a good evolutionary question that might be suggested by Kishony’s work. In our particular study however, because the majority of isolates used that are carbapenem susceptible are already resistant to many other antibiotics, we cannot measure their resistance frequencies to other clinically relevant antibiotics. It does suggest that such a strategy would have to be implemented early enough before strains have already acquired significant resistance and cannot be used to manage currently existing resistance.

      When you sequence the isolates that have increased their MIC do you find 'unrelated' mutations in genes that would control protein synthesis or other functions that might be compensatory mutations. Developing a clearer understanding of the rewiring of the bacterium's basic processes might also elucidate both integrated functions and potential weaknesses. You mention mutations in wzc, ompA, resA, bamD.

      Yes. We found some strains had acquired multiple mutations in multiple genes. Please refer to supplementary file 12. In some cases, we found additional mutations of unclear significance; for example, we identified two mutations in Mut86. We tested these two mutations separately and found that only the mutation in ompA affects the susceptibility of the mutant. However, this does not exclude the possibility that the other mutation might have other compensatory functions versus just being a random passenger mutation; this will require further investigation.

      On the other hand, in some cases, we indeed found mutations that affect the fitness of the isolates when cultured in LB medium or M9, e.g., mutations in rseA. Some mutations affect fitness only in LB medium but not M9, e.g., mutations in ompK36. Some mutations do not significantly affect the fitness in either LB or M9, e.g., duplication of blaSHV-12. We are performing RNA sequencing on these mutants to further understand the “rewiring of the bacterium's basic processes.”

      Point of discussion. Classic ST258 carries blaKPC on pKpQIL plasmid. Your ST258 strain (UCI38) carries blaSHV-12 on pESBL. Am I to assume that pESBL is in lieu of pKpQIL?

      Indeed, pESBL encodes an ESBL in UCI38 and may obviate the need for another classical KPC-carrying plasmid such as pKpQIL. However, pESBL and pKpQIL are not incompatible and so it is not clear that anything is precluding UCI38 from picking up pKpQIL.

      Transformation of CPO have many variables and in vitro data does not always mirror what is observed in vivo. So the findings of Fig 2f might need to be considered under different laboratory conditions (substrate, temperature) [https://pubmed.ncbi.nlm.nih.gov/27270289/].

      We revised the statement in the revision and pointed out that the results in Fig. 2F were limited to our assay condition.

      Reviewer #2:

      In this manuscript Ma et al., sought to investigate the breadth of genetic mechanisms available across various lineages of clinical isolates of Klebsiella pneumoniae, with a specific focus on carbapenem resistance evolution. The authors systematically evaluated how different carbapenems and genetic backgrounds affect the rate of evolution by measuring mutation frequencies. The authors found three major observations: First, that a higher mutational frequency is dependent on genetic background and high-level transposon activity affecting porins associated to carbapenem resistance. Importantly transposon activity was not only higher than SNP acquisition rates in distinct backgrounds, but was also reversible, thus emphasizing that resistance evolution via this mechanism might impart less of a cost than by the accumulation of mutations in other genetic backgrounds. Second, that CRISPR-cas systems have the potential to restrict the horizontal acquisition of resistance elements. Importantly, determining the presence or absence of such systems alone is not enough to determine wether a strain is "resistant" to certain foreign elements, but specific sequences within the different spacers can be more informative of the exact range of plasmids or genetic elements to which the system is restrictive. Third, pre-selection with ertapenem increases the likelihood of resistance evolution against other carbapenems both via de novo mutation and HGT.

      Altogether, these results emphasize the importance of additional factors, other than MIC values, such as genetic background, plasmid/transposon activity, and drug identity and choice in determining the rate at which resistance can evolve in K. pneumoniae. I consider that the data generally supports the authors conclusions and provides relevant observations to the field. I do not have any major concern and think the authors have done a very complete and systematic evaluation of the data necessary to answer their questions.

      My only minor concern is regarding the authors emphasis in their introduction and discussion on how these kind of data is relevant for clinical decision making. It remains unclear to me exactly how. While I completely agree that genomic information and drug choice play a major role in the evolution of antibiotic resistance, it is unclear to me how to efficiently and promptly translate all of this information at the bedside. Genome sequencing, however economical it has become in the recent years, is still not affordable to be implemented at the scales needed for diagnosis at the clinic. Perhaps the authors could expand on how they envision this could be implemented?

      We consider this study as a first step towards the development of more comprehensive diagnostics and more sophisticated antibiotic stewardship. Indeed, as current diagnostics exist, it would be difficult to implement. However, we hope that as studies such as these grow, it will usher in a new era of diagnostics that can indeed take such factors into account. We have now added such a discussion to the introduction and discussion in the revised manuscript.

    1. Author Response:

      Evaluation Summary:

      This study, based on an elaborated animal sample collection, reconstructs a comprehensive tree of Eulipotyphla, especially concentrating on Talpidae (moles), and infers the transitions of their lifestyles. It also models myoglobin structure and calculate electrophoretic mobility, demonstrating that semiaquatic eulipotyphlans have a higher net surface charge than fossorial, semifossorial, and terrestrial relatives. This variable myoglobin property indicates convergent shifts to a semi-aquatic lifestyle in multiple independent lineages including that of the Russian desman, the smallest endothermic diver.

      We note that the final sentence contains a misconception as the Russian desman – which weighs 180-220 g – is in fact the largest semi-aquatic member of the eulipotyphlan clade; a relative size comparison of this species relative to three other eulipotyphlans is presented in figure 1. The title of ‘the world’s smallest endothermic diver’ is instead held by the 12-18 g American water shrew.

      We have reworked the text to avoid any future ambiguity here, and additionally recommend re-writing this sentence as:

      “This variable myoglobin property indicates convergent shifts to a semi-aquatic lifestyle in multiple independent lineages including three separate times by ‘water shrews’, the smallest endothermic divers.”

      Reviewer #1:

      The authors of this study investigated the evolutionary process of the mammalian group of species including moles, shrews, hedgehogs, and solenodons with molecular approaches, with a reference to their diverse lifestyles. They first unveiled the among-species relationships and the chronological pattern of diversification by comparing molecular sequences of commonly shared genes. The highlight of the study is the inference of net surface charge and three-dimensional structure of the oxygen-storing muscle protein myoglobin, which reflected the varied lifestyles, with the Russian desman, the smallest endothermic diver, exhibiting a prominently altered disposition of myoglobin, possibly resulting from the adaptation to a semi-aquatic lifestyle.

      As with the Evaluation Summary above, the final sentence contains a misconception regarding the Russian desman. We thus recommend re-writing this sentence as:

      “The highlight of the study is the inference of net surface charge and three-dimensional structure of the oxygen-storing muscle protein myoglobin, which reflected the varied lifestyles, with three separate lineages of ‘water shrews’, the smallest endothermic diving species, exhibiting a prominently altered disposition of myoglobin, possibly resulting from the adaptation to a semi-aquatic lifestyle.”

    1. Author Response:

      Reviewer #1:

      This MS combines two-photon glutamate sensing (using the iGluSnFR fluorescent probe), two-photon glutamate uncaging, two-photon calcium imaging and electrophysiology to investigate whether synaptically released glutamate activates receptors outside the synapse of release, and at neighboring synapses. The data themselves are very impressive. The authors arrive at the revolutionary conclusion that synaptically released glutamate is able to activate both NMDA and even AMPA receptors at neighboring synapses, remarkably strongly. I say revolutionary, because previous modelling has yielded diametrically opposite conclusions. The reflex would be to prefer experiment over theory, yet the modelling was based upon quite strongly constrained physical parameters that would be quite incompatible with the interpretations reported here. However, I believe the authors have failed to take into account significant technical limitations inherent in the technologies they apply. These include spatial averaging of fluorescence, possible saturation of iGluSnFR and diffusive exchange of (caged) glutamate during uncaging. As a result, the conclusion is wholly unproven. Indeed, I believe it highly probable that all of the data in favor of distal activation will prove to be consistent with synapse specificity and the presence of technical artifacts related to spatial averaging of fluorescence signals and diffusive exchange of (caged) glutamate during uncaging.

      We agree that there are technical limitations and that the interpreration of signals recorded from near synapses is difficult. This concerns the length constants we describe and name SARGe. Our usage of those terms in the results may have suggested we propose the value of lambda istelf well dscribes the action range of glutamate. This is not the true as the reviewer states and in the beginning of the discussion section we note this limitation.

      However, our interpretation that glutamate may regularly activate AMPA-R in neighboring synapses is not based on lambda values (see discussion).

      It is based on the facts that a) ~5% iGluSnFr responses are observed at more than 1.5 µm remote to a synapse and b) uncaging at 500 nm produces a current response of ~38% of the quantal synaptic amplitude. Here, the remarks of the reviewer are incorrect: a) is not affected by volume averaging or saturation of iGluSnfr and previous models predict an activation of upto 1-2% only. We have shown this by simulation in an appeal letter which unfortunately was not forwarded to the reviewer. b) is not increased by “diffusive exchange of glutamate during uncaging”. In fact, releasing the same amount glutamate for a longer period reduces distant receptor activation and current models predict an 2-4 fold lower activation of AMPA-R than we observe here. This was also shown by simulation in the appeal letter but a further exchange with the reviewer on this was not permitted by the editors.

      Reviewer #2:

      Matthews, Sun, McMahon et al. addresses the extent of the spread of the neurotransmitter glutamate into the extracellular space. The authors use a combination of imaging techniques, 2-photon glutamate uncaging and electrophysiology to conclude that vesicular glutamate release reaches nearby, adjacent synapses. Although this is an interesting question, and one that has been addressed many times previously, I have several technical concerns about the strength of the conclusions that reduces my enthusiasm.

      Unfortunately, only this general part of comments of reviewer 2 is published so that we cannot meaningfully rule out/comment on the reviewer’s concerns.

      Reviewer #3:

      This is an interesting paper combining several impressive techniques to argue that synaptically released glutamate is allowed to diffuse to and activate receptors at much greater distance than previously thought. iGluSnFR recordings show that glutamate released from single vesicles activates the indicator with a spatial spread (length constant) of 1.2 um, substantially farther than previous estimates based on the time course of glutamate clearance by glial transporters (PMC6725141). Similar parameters are observed with spontaneous and evoked events, large or small, or when glutamate is released via 2P uncaging. Further uncaging experiments show that both AMPARs and especially NMDARs are activated a substantial distance. AMPARs, previously thought to be recruited only within active synapses, are activated with a spatial length constant that compares quite closely with the average distance between synapses in the hippocampus. More heroic experiments and some geometric calculations show that this behavior enables neighboring synapses to interact supralinearly. The results suggest that "crosstalk" between neighboring synapses may be substantially more common than previously thought.

      The experiments in this paper appear carefully performed and are analyzed thoroughly. Despite all of the quantitative rigor and careful thought, however, the authors fail to reconcile convincingly their results with what we know about neuropil structure and the laws of diffusion. There are very good data in the literature regarding the extracellular volume fraction and geometric tortuosity of the neuropil, the diffusion characteristics of glutamate and the time course of glutamate uptake. These data more or less demand that synaptically released glutamate is diluted over a much smaller spatial range than that suggested here. In the Discussion, the authors suggest that this discrepancy might reflect a simplified view of the neuropil as an isotropic diffusion medium (PMC6763864, PMC6792642, PMC6725141), whereas a more realistic network of sheets and tunnels (PMC3540825) might prolong the extracellular lifetime of neurotransmitter. I like this idea in principle, but there is no quantitative support in the paper for the claim - in fact, it seems at odds with the authors' very nice demonstration that diffusion appears to be similar in all directions (Figure 3B). I don't necessarily think a solution is within the scope of this single paper, but I would suggest that the authors acknowledge the present lack of a compelling explanation.

      Our results are not predicted by the modelling studies cited that is correct and this makes them important in our eyes. But it is important to note that those modelling/simulation studies use a strong simplification and view the extracellular space/ the neuropil as a porous medium. This is a powerful approach but it is only a valid description when considering diffusion distances of several micrometer - it is not applicable on the sub micron scale of neighboring synapses (PMID: 15345540 p1608; PMID: 7338810 p227, and DOI: 10.1088/0034-4885/64/7/202). This drawback of the simulation has been overlooked and the reviewer seems not to be aware of it and we point this out at the end of the discussion section. We do not suggest anisotropy near a synapse nor a particular perisynaptic geometry such that there would be specific channels from one synapse to the next; we don’t, we also assume that the neuropil is random (as shown by PMID 9547224) - instead everywhere in the neuropil the intial and submicron diffusion will not follow the “porous medium approach”.

      It is true that we do not offer a quantitative description of how this violation of the porous medium approach would lead to an underestimation of synaptic cross-talk - we provide experimental data. However, in our appeal letter we expicitly describe this discrepancy in detail to make the reviewer aware of it, but regrettably this information never reached the reviewer.

    1. Author Response:

      We thank the editors and the reviewers for their positive assessment of our work.

      Reviewer #1 (Public Review):

      [...] One major concern is that the levels of protein expression and folding are not verified. This is concerning for the Gln118 mutation because lack of fitness could result trivially from misfolding or accelerated degradation that might result from increased flexibility and conformational stability. Moreover, the authors' finding that it was not possible to purify Gln118 mutant proteins for biochemical studies is consistent with this sort of trivial explanation for apparent lack of biological function.

      As described in the manuscript, the sidechain of Gln 118 makes hydrogen bonds with the backbone segment leading into an adjacent helix. We had omitted to point out in the original manuscript that Gln 118 is completely buried in thestructure (we now do so in the revised manuscript, on page 29). As shown by Worth and Blundell, buried polar sidechains that form backbone hydrogen bonds (as Gln 118 does) are highly conserved in proteins, and these polar sidechains are important for the stabilization of the protein architecture (Worth and Blundell BMC Evolutionary Biology 2010, 10:161). Thus, we do expect the mutation of Gln118 to destabilize the clamp loader structure. However, we do not find the identification of the importance of Gln118 to be a trivial finding, because the role of polar residues in maintaining structure is quite commonly linked to their functional role, making it difficult to separate the two effects. For example, the proximal histidine that links the F helix in hemoglobin to the iron atom is perhaps the most important residue for allosteric communication in hemoglobin. Mutation of the proximal histidine severely destabilizes hemoglobin, due to loss of heme binding and conversion to a molten globule state (see, for example, Brennan and Matthews, Hemoglobin, 21:393-403, 1997).

      It was an oversight for us to have not analyzed the effects of Q118 mutations on stability and function, and we have now rectified this. We now include the results of the following four experiments, in which we compare the expression of the mutant and wild-type forms of the clamp loader, their behavior on gel filtration analysis, and their activities in ATPase assays and DNA replication assays. These experiments demonstrate that the mutation most likely destabilizes the protein, and affects the nature of the assembled complex. These results further emphasize the crucial nature of the hydrogen-bonding interactions made by the Gln 118 sidechain.

      1) We created a clamp loader variant in which the ATPase subunit is C-terminally tagged with the fluorescent protein mCherry, allowing the expression levels of the proteins to be monitored by flow cytometry of E. coli cells. This experiment shows that introduction of the Q118N mutation leads to a very substantial reduction in protein expression (Figure 6 supplement 2 in the revised manuscript). An important point is that the proteins are expressed using a strong promoter (T7 RNA polymerase promoter), which was done so as to purify proteins for biochemical experimentation and also enable ready detection of the mCherry fluorescence. The natural T4 promoter that is used in the phage assay results in very low levels of protein expression (no detectable fluorescence signal when mCherry is fused to the ATPase subunit), and we do not know whether the expression defect that we see is also manifested under conditions where the protein expression is low. Nevertheless, the data do indicate that the Q118N mutation destabilizes the clamp loader complex.

      2) We purified mCherry tagged variants of the wild-type clamp-loader complex, the Q118N mutant complex, and the Q118N/I141L double mutant that has partial recovery of fitness in the phage propagation assay. SDS-PAGE analysis (not shown) confirms that all complexes have the ATPase and clasp subunits of the clamp loader in the proper 4:1 ratio. Gel filtration analysis shows that the wild-type complex corresponds to a single peak eluting at ~70 ml, which we assume corresponds to correctly assembled clamp loader (see Figure 9 supplement 1 in the revised manuscript). For both mutants, there is a peak at ~70 ml, corresponding to the properly assembled clamp loader, but also an additional peak that is close to the void volume of the column (~45 ml). For the Q118N mutant, the fraction of the protein corresponding to the properly assembled clamp loader is small. This fraction is substantially larger for the double mutant that has increased fitness (Q118N/I141L), indicating that one effect of the second mutation is to recover the ability of the clamp loader to assemble properly.

      3) We measured the rates of DNA-stimulated ATP hydrolysis for purified and mCherry-tagged wild-type clamp loader and the Q118N mutant, as we had described in the original manuscript for several other mutants (Figure 9 supplement 2 in the revised manuscript). Addition of the mCherry tag to the wild-type clamp loader results in a slight reduction of the ATPase activity. The Q118N mutation has a very low rate of DNA-stimulated ATPase activity (less than 10% of activity of the wild-type mCherry-tagged clamp loader). These data indicate that even in the fraction of Q118N mutant that can be purified as part of an intact clamp loader complex, the mutation compromises the ability to hydrolyze ATP. This is likely to be due to the failure to assemble into a competent conformation.

      4) We measured the extent of plasmid DNA replication by the T4 replisome, using wild-type and mutant clamp loaders, as described in the original manuscript (Figure 9 supplement 3 in the revised manuscript). As for the ATPase assay, addition of the mCherry tag to the wild-type clamp loader results in a slight reduction of replication efficiency. Introduction of the Q118N mutation leads to a near-total loss of replication efficiency, to a level comparable to that seen in the absence of the clamp loader.

      The main text of the manuscript now includes a description of these new results, and the new data are included as supplementary figures.

      Reviewer #2 (Public Review):

      [...] One potential weakness is that among all the questions posed at the beginning of the study not all received a definitive answer. In particular, the question "To what extent does the mutational sensitivity of the system in a particular organism, carrying out the essential function of DNA replication, reflect the sequence diversity seen across the spread of life?" is only partially addressed.

      We agree that we have not fully probed the issue of evolutionary sequence diversity versus mutational sensitivity. We find it exciting that the two sets of data show clear divergence in certain positions, pointing to epistasis. This will be an exciting direction for future exploration.

      The second question, "The clamp loader subunits respond cooperatively to the clamp, ATP and DNA. How do the mechanisms underlying this cooperativity impose constraints on the sequence?", has not been answered and goes beyond the scope of this study.

      Answering this question requires the analysis of second-order mutations. We have demonstrated the feasibility of using the phage propagation assay for such analysis, but we agree with the reviewer that this is beyond the scope of the present study.

    1. Author Response:

      For the reader, we specifically want to highlight the following new data (Figures 7, 8 and accompanying supplements) that were added:

      1) To directly compare physiological (Wntoff/Wnton) and oncogenic (i.e. constitutively active) signaling, we generated a second cell line using CRISPR/Cas9 genome editing, harboring an oncogenic point mutant form of CTNNB1 (SGFP2-CTNNB1S45F).

      2) To further quantify the levels, complex state and multimerization status when WNT/CTNNB1 signaling is hyperactivated, we performed additional FCS and N&B experiments in the new mutant cell line and upon GSK3B inhibition by CHIR99021 treatment (as requested by reviewers 1 and 2).

      3) We use these same perturbations to strengthen the link between our experimental data and the computational model (as suggested by reviewer 2) and provide access to the model in the form of an interactive app (available at https://wntlab.shinyapps.io/WNT_minimal_model/).

      While we are in the process of further revising our manuscript, we do want to take this opportunity to briefly reply to two of the points made by Reviewer #1:

      The authors have concluded with FCS that the diffusion coefficient of free β-catenin to be 14.9 um2/s (line 259) and the complexed β-catenin to be 0.17 um2/s (line 327). Similar to the authors' argument in the manuscript, this difference means about a 100-fold change of the complex length scale. If the complex is linear, this means a 100-fold change in molecule size, but if the complex is spherical, this means a one-million-fold increase of the molecule size.

      To clarify: We indeed measure a 14.9/0.17 = 87-fold change in speed. IF we assume Einstein-Stokes relation, this would be indicative of a 87^3 change in molecular size. However, the Einstein-Stokes equation is only valid when specific conditions are met (including the assumption that we are dealing with perfectly spherical particles in a homogeneous environment). Therefore, we noted the following in the material and methods section: “It must be noted that, especially for larger protein complexes, the linearity between the radius of the protein and the speed is not ensured, if the shape is not globular, and due to other factors such as molecular crowding in the cell and hindrance from the cytoskeletal network. We therefore did not estimate the exact size of the measured CTNNB1 complexes, but rather compared them to measurements from other FCS studies.” Put differently: The most important take home message is not an absolute size estimate of the CTNNB1 complex (which is why were careful not to make that point explicitly, although it is unlikely that this complex only contains one copy of the ‘standard’ destruction complex components APC, AXIN, GSK3 and CK1), but the fact that this complex is still present after WNT stimulation, although it does undergo a substantial reduction in size (a 3.5-fold change in speed upon WNT stimulation), and thus changes its identity. We will take care to ensure that a future revision leaves no room for further confusion on this point.

      From the biology point of view, APC is the backbone of the destruction complex, which has several β-catenin binding sites by itself. Additionally, APC also contains several Axin1 binding sites where each Axin1 can also recruit one β-catenin. It is unlikely that one APC complex contains only one β-catenin, not mentioning the potential oligomerization of APC.

      Here we can only agree with the reviewer: We were equally surprised by the findings from our N&B analysis, which is why we extensively discuss possible explanations in our manuscript. Future follow-up by ourselves and others will reveal in how far our interpretation of these measurements stands the test of time.

    1. Author Response:

      Reviewer #1 (Public Review):

      Redmond et al. use single-cell and single-nucleus RNA-sequencing to reveal the molecular heterogeneity that underlies regional differences in neural stem cells in the adult mouse V-SVZ. The authors generated two datasets: one which was whole cell RNA-seq of whole V-SVZ and one which consisted of nuclear RNA-seq of V-SVZ microdissected into anterior-posterior and dorsal-ventral quadrants. The authors first identified distinct subtypes of B cells and showed that these B cell subtypes correspond to dorsal and ventral identities. Then, they identified distinct subtypes of A cells and classified them into dorsal and ventral identities. Finally, the authors identified a handful of genes that they conclude constitute a conserved molecular signature for dorsal or ventral lineages. The text of the manuscript is well written and clear, and the figures are organized and polished. The datasets generated in this manuscript will be a great resource for the field of adult neurogenesis. However, the arguments and supporting data used to assign dorsal/ventral identities to B cells and A cells could be strengthened, and more rigorous data analysis could result in new biological insights into stem and progenitor cell heterogeneity in the V-SVZ.

      We thank the Reviewer for their feedback on our manuscript. As suggested by Reviewer #1, we are performing additional analyses in the following areas:

      1) Performing additional analyses to further strengthen the dorsal/ventral scRNA-Seq B cell marker analysis and its relationship to our sNucRNA-Seq B data.

      2) Performing additional analyses to identify potential novel biological insights into stem & progenitor cell heterogeneity and text edits to discuss how differentially-expressed sets of genes among B cells and A cells are related to biological processes and/or signaling pathways.

      Reviewer #2 (Public Review):

      The paper is well written, and the data are well analyzed and presented. My concerns centre on terminology and alternative explanations of some of the data, which the authors might deal with in the introduction or discussion.

      We thank Reviewer #2 for their positive reception of our manuscript and the data, and for the constructive suggestions, which we have addressed by changes to the manuscript and in our responses below:

      1) I am slightly confused about some of the data shown in Figure 1. If B cells are defined as GFAP expressing cells, then why do only 25% of the B cells in the plot in Figure 1C express GFAP? I may be missing something here, as other readers may as well. Similarly in the same panel, only 25% of astrocytes seem to be expressing GFAP or GFP driven by a GFAP promotor.

      Importantly, among all cells captured in our scRNA-Seq, only B cells (51.86%), a subpopulation of parenchymal astrocytes (25%) and a small subpopulation of ependymal cells (E cells) had GFAP expression. This is consistent with immunocytochemical staining (Ponti et al. 2013) and other studies of scRNA-Seq expression (Xie et al. 2020). Similarly, Gfp (under the control of hGFAP promoter) is not expected to be expressed in all B cells (here 31.08% of B cells are Gfp+).

      Note that previous work has shown that B cells express different levels of GFAP protein, and some B1 cells were negative (Ponti et al. 2013). This supports the notion that this intermediate filament is a good marker of the V-SVZ primary progenitors, but also present in a subpopulation of parenchymal astrocytes and ependymal cells. However, a negative signal for GFAP does not imply that a cell is not a B cell. This highlights the importance of our clustering analysis revealing additional genes associated with B cells. Our analysis suggests that a combination of Gfap, Thrombospondin 4, Slc1a3 (GLAST) and S100a6 provide a better marker combination to identify B cells.

      The reason for the variability among B cells in the expression of GFAP remains unknown. It could be associated with the normal regulation of intermediate filaments as B cells transit the cell cycle or different stages of their activation or quiescence. It could also be linked to technical aspects of scRNA-Seq analysis: e.g gene dropout; detection limits; sequencing saturation. Since on our dot plot the actual proportion is only graphically shown, to clarify this issue in the text we have added the specific percentages and the following sentences:

      “A fraction of both populations expressed GFAP: 51.85% of B cells (clusters 5,13,14 & 22), 24.37% of parenchymal astrocytes (clusters 21, 26 & 29). This is consistent with previous reports (Chai et al. 2017; Xie et al. 2020; Ponti et al. 2013). Note that across all cells captured in our scRNAseq analysis, only B cells, parenchymal astrocytes or ependymal cells expressed GFAP. Among these three cell types, B cells had the highest average expression of GFAP (4.41 for B cells, 1.00 for astrocytes, 0.29767 for Ependymal cells, values in Pearson residuals). Other markers, like S100a6 (Kjell et al. 2020) (88.9% of B cells; 54% of parenchymal astrocytes and 80% of ependymal cells) and Thbs4 (Zywitza et al. 2018) (45% of B cells; 28.77% in parenchymal astrocytes, 2.88 % in ependymal cells) are also expressed preferentially in B cells and parenchymal astrocytes, but they alone do not distinguish these two cell populations.”

      2) The authors term the germinal zone of the adult mouse brain - the ventricular-subventricular zone. They should discuss the evidence that the adult germinal zone is made up of cells from both the ventricular zone and the sub ventricular zone in the late embryo, where those zones are described clearly on the basis of morphology. Many of the early embryonic neural stem cells are present in the ventricular zone before the sub ventricular zone has developed and continue to be present into the adult. If there is not clear mouse evidence that the progeny of embryonic sub ventricular cells are present in the adult germinal zone independent of embryonic ventricular zone progeny, then the authors might consider calling the zone - the adult ventricular zone, or alternatively terming the neurogenic area around the lateral ventricle the adult germinal zone or by a more straightforward descriptive term - the adult subependymal zone or the adult periventricular zone. Also, I think the first word in line 6 on page 3 should be neural rather than neuronal.

      We agree that the terminology in the field is confusing and multiple names have been used to describe the same region. In order to clarify that we are referring to the same adult periventricular germinal region, we have added a short sentence in the introduction to indicate that the V-SVZ is also referred by other authors as the SVZ, the subependyma or subependymal zone: We have added in the text: “This neurogenic region has also been referred to as the SVZ or the subependymal zone (Kazanis et al. 2017; Morshead et al. 1994)”.

      This reviewer argues that the adult V-SVZ should only be called V-SVZ if a lineage relationship could be established with the embryonic SVZ. To our knowledge there is no need to link the adult SVZ to the embryo, as this structure, like the embryonic SVZ, anatomically sits beneath the VZ (the area next to the ventricle). However, a lineage relationship does exist between the adult V-SVZ and the embryonic VZ, established in previous studies showing that PreB1 cells around E15.5 became quiescent and give rise to adult B cells in the V-SVZ (Fuentealba et al., 2015; Furutachi et al., 2015). In addition, developmental studies show a continuum in the gradual transformation of the embryonic periventricular germinal layers, including the SVZ. Importantly, B1 cells are derived from VZ radial glia (RG), maintain RG markers and retain RG-like interkinetic behavior establishing that functionally and anatomically a VZ is retained in the adult (Merkle et al., 2004; Mirzadeh et al., 2008). Therefore the adult periventricular epithelium is not made of a pure layer of ependymal cells with progenitor cells underneath, as previously thought. Moreover, recent work indicates that just like in the embryo, the more basal adult SVZ progenitors (B2 cells) can be derived from adult VZ progenitors (B1 cells) (Obernier et al. 2018). This transformation of apical to basal cells begins to occur in embryonic stages further suggesting equivalences between the adult and the embryonic progenitor cells. For all the above reasons we prefer to use the term V-SVZ.

      In line 6, page 3, We have changed neuronal cell types to “neural cell types”, as suggested.

      3) The authors refer to their molecularly described B cells as stem cells. Certainly, their lab and others have shown that adult olfactory bulb neurons are the progeny of those B cells, however the classic definition of stem cells (in the blood or intestine for example) require demonstration that single stem cells can make all of the differentiated cells in that tissue. Is their evidence that a single adult B1 cell can make astrocytes, neurons and oligodendrocytes? Indeed, what percentage of the single adult B cells characterized here on the bases of RNA expression can be shown to be multipoint for both macroglial and neuron lineages in vivo or in vitro? Perhaps progenitor or precursor cells might be a better term for a B cells that appears to give rise to neurons primarily.

      This is also an issue of definitions. We modified the text to refer to the primary progenitors in the V-SVZ as adult neural stem cells, or progenitor cells “NSPCs”. We agree that this needs to be clarified and in the introduction we modified one paragraph to indicate:

      “From the initial interpretation that adult NSPCs are multipotent and able to generate a wide range of neural cell types (Reynolds and Weiss 1992; van der Kooy and Weiss 2000; Morshead et al. 1994), more recent work suggest that the adult NSPCs in vivo are heterogeneous and specialized, depending on their location, for the generation of specific types of neurons, and possibly glia (Merkle et al. 2014; Fiorelli et al. 2015; Chaker, Codega, and Doetsch 2016; Merkle, Mirzadeh, and Alvarez-Buylla 2007; Tsai et al. 2012; Delgado et al. 2020).”

      Under normal in vivo conditions, a primitive state for NSCs capable of generating all neuronal and glial cell types of the CNS may only exist at very early stages of development and even their regional specification seems to occur very early (as early as E10.5; Fuentealba et al. 2015). Note that recent work in the hematopoietic system suggests that stem cells there also become restricted embryonically (Carrelha et al., 2018) and in adults their potential to generate lymphoid or myeloid lineages changes dramatically with age, yet at all these ages they are referred as HSCs. We are well aware of the work from the van der Kooy lab, suggesting the existence in the V-SVZ of rare “primitive” Oct4+/GFAP- cells that may be pluripotent and earlier in the lineage from B cells (Reeve et al., 2017). However, as indicated above lineage tracing from the embryo indicates that adult NSPC are specified in the embryo and are already in place and regionally specified between E11.5 and E15. We have investigated whether we could detect Oct4+/Gfap- cells in our datasets. However, we did not detect Oct4 expression in B cells or other cell types. We now indicate in the discussion:

      “It has been suggested that in the adult V-SVZ a more primitive population of Oct4+/GFAP- NSCs may be present and that these cells may be earlier in the lineage from the “definitive” GFAP+ B cells (Reeve et al. 2017). However, regionally specified NSPCs can be lineage traced to the embryo (Fuentealba et al. 2015; Furutachi et al. 2015), and we could not detect a population of Oct4+ cells in our datasets. We, however, cannot exclude that rare primitive OCT4+ NSPCs were not captured in our analysis for technical reasons.” ……. “This underscores the early embryonic regional specification of adult V-SVZ NSPCs and how these primary progenitors maintain a memory of their regions of origin.”

      4) This may be more than a semantic issue, as the rare clonal neurophere forming neural stem cells that do make all three neural cell types in vitro, and also maintain their AP and DV positional identity through clonal passaging in vitro (Hitoshi et al, 2002). However, Emx1 expressing cortical neural stem cells can be lineage traced as they migrate from the embryonic cortical germinal zone to the striata germinal zone in the perinatal period (Willaime-Morawek et al, 2006). Surprisingly, in their new striatal home the Emx1 lineage cortical neural stem cells will turn down Emx1 expression and turn up Dlx2 striatal germinal zone expression. The switch in positional identities of clonal neural stem cells can be seen also in vitro when the stem cells are co-cultured with an excess of cells from a different region and then regrown as clonal neural stem cells. This may suggested that Emx1 expressing neural stem cells (the clonal neurosphere forming cells), may switch their positional identities in vivo as they migrate into the striatal germinal zone, but the downstream neuron producing precursor B cells studied in this paper may maintain their Emx1 expression into the adult germinal zone. This raises an interesting issue concerning which cells in the neural stem cell lineage can be regionally re-specified.

      The interesting question about plasticity and respecification is not addressed by our current manuscript that focuses on the gene expression profile of unmanipulated cells from adult samples. However, regional re-specification is controversial. While work from van der Kooy lab suggests that striatal Emx1+ NSPCs originate in the pallium and migrate into the striatum in the perinatal brain (Willaime-Morawek et la., 2006), other studies suggest that rare Emx1 cells are already present in the developing LGE from embryonic stages as early as E12.5 (Gorski et al. 2002). In addition, we have labeled neonatal radial glial cells in the pallium, when this migration has been suggested to occur, and do not see migration of cells ventrally into the striatal wall. We have also transplanted dorsal NSPCs into ventral locations -- and vice versa -- and do not observe evidence of regional re-specification (Merkle, Mirzadeh, and Alvarez-Buylla 2007; Delgado et al. 2020).

      5) The authors nicely show dorsal versus ventral germinal zone lineages are marked by some of the same positional genes from B cells to A cells, suggesting complete dorsal versus ventral neurogenic lineages giving rise to different types of olfactory bulb neurons. Indeed, they nicely test this idea with dissection of the dorsal versus ventral germinal zones, followed by nuclear RNA sequencing. However, they don't discuss the broader issues concerning the embryological origins of the dorsal versus ventral germinal zones. Emx1 is one of the genes the authors use to mark dorsal lineages. The authors reference papers (Young et al, 2007; Willaime-Morawek et al, 2006;2008) that use Emx1 lineage tracing to show that certain classes of olfactory bulb neurons originate from embryonic cortical neural stem cells that migrate perinatally from the cortical germinal zone into the dorsal subcortical germinal zone. Could cortical versus subcortical embryonic origins of the dorsal versus ventral adult germinal zone explain the origin of different sets of adult olfactory bulb neurons? Further, the authors report that one of the GO terms for their dorsal lineages in cortical regionalization.

      This is a very interesting question that unfortunately we cannot answer. The dorsal domain includes both pallial and subpallial components, but the specific origin of B cells in this dorsal domain and the contribution of the pallium and subpallium remains unresolved.

      We went back to our data to try to find evidence of pallial vs. subpallial components in the dorsal B clusters (5 & 22). Indeed, there are some hints that cluster 22 may be more pallial and 5 more dorsal subpallial. However, when we try to confirm differential distribution of markers associated with these two dorsal subdomains anatomically, it is not possible to determine segregation, likely due to the intermixing of cells as the wedge is formed. We also looked for Dbx1, a relatively specific marker of the border region between pallium and subpallium that has been termed ventral pallium, but unfortunately our scRNA-Seq dataset did not capture this marker. Further, targeted lineage tracing of this region is required to determine the subdivisions of the dorsal V-SVZ. We have added as requested a short discussion on this issue:

      “The dorsal V-SVZ domain is likely further subdivided into multiple subdomains. In the current analysis we pooled together clusters B(5) and B(22) as dorsal. However, largely pallial marker Emx1 and dorsal lateral ganglionic eminence marker Gsx2 were differentially enriched in clusters B(22) and B(5), respectively, suggesting that these two clusters may also represent different sets of regionally specified B cells with distinct embryonic origins. These regions become blurred by cells intermixing in the formation of the wedge region in the postnatal V-SVZ making it difficult to confirm their origin based on expression patterns. In addition to pallial and dorsal subpallial markers, this wedge region likely also includes what has been termed the ventral pallium (Puelles et al. 2016), which is characterized in the embryo by the expression of Dbx1. Unfortunately, our scRNA-Seq analysis did not detect this marker. Further lineage tracing experiments will help determine the precise embryonic origin and nature of the dorsal V-SVZ, including the wedge region.”

      6) The percentages of dividing cells based on gene expression is given for some clusters of cells but not others. It might be useful to have a chart showing the percentages of cells in cycle (ki67 expression) for each cluster. This might be especially useful in characterizing some fo the differences between various subclusters of B, A and C cells. On page 9 it is suggested that the heterogeneity amongst C cell clusters was driven by cell cycle genes. However, it is possible to remove the cell cycle genes from the data analysis to see if this then produces clearer dorsal versus ventral positional identities. This may be an important issue as the dorsal versus ventral positional identity genes appear to be expressed more in less dividing A and B cells, than in the more dividing C cells. This leads to a potentially alternative conclusion - that dorsal/ventral regional identity genes are primarily expressed in the non-dividing post mitotic cells in their resident dorsal or ventral region, and not in precursor cells in the lineage.This could be easiy tested by removing the cell cycle genes from the analysis of highly dividing clusters to see if they then break down into doral versus ventral clusters.

      We now provide a table indicating the fraction of proliferating cells (defined as in S phase or G2-M phase) for all scRNA-Seq clusters.

      Concerning whether dorsal and ventral identities are maintained during the period of proliferation we have analyzed our data looking at dorsal and ventral signature levels over pseudotime (Figure 6-Supplement 1F). Here we do not observe a reduction in either dorsal or ventral score at the proliferative cell stages (pseudotime ~0.75, Figure 2L). This is in contrast to gene signatures that show clear up- or down-regulation over pseudotime, such as Gfap, Egfr & Dcx (Figure 2M). To understand how cell clustering is affected in the absence of proliferative gene influence, and whether clearer dorsal/ventral signatures are observed in proliferating cells, we are performing additional analyses using our scRNA-Seq dataset that is clustered after cell-cycle gene regression.

      References Cited:

      Chaker, Zayna, Paolo Codega, and Fiona Doetsch. 2016. “A Mosaic World: Puzzles Revealed by Adult Neural Stem Cell Heterogeneity.” Wiley Interdisciplinary Reviews. Developmental Biology 5 (6): 640–58.

      Delgado, Ryan N., Benjamin Mansky, Sajad Hamid Ahanger, Changqing Lu, Rebecca E. Andersen, Yali Dou, Arturo Alvarez-Buylla, and Daniel A. Lim. 2020. “Maintenance of Neural Stem Cell Positional Identity by.” Science 368 (6486): 48–53.

      Fiorelli, Roberto, Kasum Azim, Bruno Fischer, and Olivier Raineteau. 2015. “Adding a Spatial Dimension to Postnatal Ventricular-Subventricular Zone Neurogenesis.” Development 142 (12): 2109–20.

      Fuentealba, Luis C., Santiago B. Rompani, Jose I. Parraguez, Kirsten Obernier, Ricardo Romero, Constance L. Cepko, and Arturo Alvarez-Buylla. 2015. “Embryonic Origin of Postnatal Neural Stem Cells.” Cell 161 (7): 1644–55.

      Furutachi, Shohei, Hiroaki Miya, Tomoyuki Watanabe, Hiroki Kawai, Norihiko Yamasaki, Yujin Harada, Itaru Imayoshi, et al. 2015. “Slowly Dividing Neural Progenitors Are an Embryonic Origin of Adult Neural Stem Cells.” Nature Neuroscience 18 (5): 657–65.

      Gorski, Jessica A., Tiffany Talley, Mengsheng Qiu, Luis Puelles, John L. R. Rubenstein, and Kevin R. Jones. 2002. “Cortical Excitatory Neurons and Glia, but Not GABAergic Neurons, Are Produced in the Emx1-Expressing Lineage.” The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 22 (15): 6309–14.

      Kazanis, Ilias, Kimberley A. Evans, Evangelia Andreopoulou, Christina Dimitriou, Christos Koutsakis, Ragnhildur Thora Karadottir, and Robin J. M. Franklin. 2017. “Subependymal Zone-Derived Oligodendroblasts Respond to Focal Demyelination but Fail to Generate Myelin in Young and Aged Mice.” Stem Cell Reports 8 (3): 685–700.

      Kooy, D. van der, and S. Weiss. 2000. “Why Stem Cells?” Science 287 (5457): 1439–41.

      Merkle, Florian T., Luis C. Fuentealba, Timothy A. Sanders, Lorenza Magno, Nicoletta Kessaris, and Arturo Alvarez-Buylla. 2014. “Adult Neural Stem Cells in Distinct Microdomains Generate Previously Unknown Interneuron Types.” Nature Neuroscience 17 (2): 207–14.

      Merkle, Florian T., Zaman Mirzadeh, and Arturo Alvarez-Buylla. 2007. “Mosaic Organization of Neural Stem Cells in the Adult Brain.” Science 317 (5836): 381–84.

      Morshead, C. M., B. A. Reynolds, C. G. Craig, M. W. McBurney, W. A. Staines, D. Morassutti, S. Weiss, and D. van der Kooy. 1994. “Neural Stem Cells in the Adult Mammalian Forebrain: A Relatively Quiescent Subpopulation of Subependymal Cells.” Neuron 13 (5): 1071–82.

      Ponti, Giovanna, Kirsten Obernier, Cristina Guinto, Lingu Jose, Luca Bonfanti, and Arturo Alvarez-Buylla. 2013. “Cell Cycle and Lineage Progression of Neural Progenitors in the Ventricular-Subventricular Zones of Adult Mice.” Proceedings of the National Academy of Sciences of the United States of America 110 (11): E1045–54.

      Puelles, Luis, Loreta Medina, Ugo Borello, Isabel Legaz, Anne Teissier, Alessandra Pierani, and John L. R. Rubenstein. 2016. “Radial Derivatives of the Mouse Ventral Pallium Traced with Dbx1-LacZ Reporters.” Journal of Chemical Neuroanatomy 75 (Pt A): 2–19.

      Reeve, Rachel L., Samantha Z. Yammine, Cindi M. Morshead, and Derek van der Kooy. 2017. “Quiescent Oct4 Neural Stem Cells (NSCs) Repopulate Ablated Glial Fibrillary Acidic Protein NSCs in the Adult Mouse Brain.” Stem Cells 35 (9): 2071–82.

      Reynolds, B. A., and S. Weiss. 1992. “Generation of Neurons and Astrocytes from Isolated Cells of the Adult Mammalian Central Nervous System.” Science 255 (5052): 1707–10.

      Tsai, Hui-Hsin, Huiliang Li, Luis C. Fuentealba, Anna V. Molofsky, Raquel Taveira-Marques, Helin Zhuang, April Tenney, et al. 2012. “Regional Astrocyte Allocation Regulates CNS Synaptogenesis and Repair.” Science 337 (6092): 358–62.

      Xie, Xuanhua P., Dan R. Laks, Daochun Sun, Asaf Poran, Ashley M. Laughney, Zilai Wang, Jessica Sam, et al. 2020. “High Resolution Mouse Subventricular Zone Stem Cell Niche Transcriptome Reveals Features of Lineage, Anatomy, and Aging.”Cold Spring Harbor Laboratory. https://doi.org/10.1101/2020.07.27.223602.

    1. Author Response:

      Reviewer #1 (Public Review):

      The energy released upon zippering of SNARE complexes from the N-terminus to the membrane-proximal C-terminus is widely believed to provide the driving force for membrane fusion, and the cis-SNARE complexes resulting after fusion are disassembled by formation of a 20S complex with Sec17 and Sec18, followed by ATP-hydrolysis by Sec18. This paper now shows that membrane fusion still occurs when the hydrophobic interactions that drive C-terminal zippering of the yeast vacuolar SNARE complex is completely prevented by C-terminal truncation of two of the SNAREs and replacement of the hydrophobic residues at the C-terminus of the SNARE domain of a third SNARE with polar residues, and that such fusion requires Sec17, Sec18 and non-hydrolyzable ATP homologues, in addition to the HOPS tethering complex, which mediates SNARE complex assembly. The results also show that Sec17 plays a key role in fusion through hydrophobic residues in an N-terminal loop that are known to interact with membranes. These results suggest that the core membrane fusion machinery is formed by the SNAREs, Sec17 and Sec18 rather than by the SNAREs alone, and that fusion is driven by a combination of SNARE C-terminal zippering and perturbation of the lipid bilayers by the hydrophobic loops of Sec17. These conclusions are strongly supported by a variety of membrane fusion experiments. FRET assays to SNARE complex assembly also support the conclusions but are less convincing.

      Thank you. We feel that the FRET assays of HOPS-dependent, Sec17-driven zippering are important, as they indicate one of the 3 vital Sec17 functions: (1) promote SNARE zippering which, when it can occur, makes an important contribution to fusion. This is the first demonstration of Sec17-induced zippering in the context of HOPS. (2) promote fusion directly even when zippering contributes no energy. and (3) support Sec18 in disassembly of SNARE complexes.

      Reviewer #2 (Public Review):

      This manuscript shows that the Sec17/18 machine can do more than we might have expected, and places new constraints on models for how this works. As the field expects from the Wickner lab, the work is creative and beautifully executed. I do still have some reservations, however, about whether the manuscript ultimately forwards our mechanistic understanding enough to merit publication in eLife. Some of the outstanding mechanistic questions articulated by the authors include:

      1) Why is HOPS required for Sec17/18/ATPγS activity? The authors suggest that HOPS and Sec17 bind to one another, but the assay (Figure 4) is rather non-physiological and the result does not really answer the question.

      We have not established why HOPS can work with Sec17/Sec18 while other nonspecific tethers cannot, but future explorations of this question will be founded on a knowledge of which components bind directly to the others, including HOPS:Sec17. Each demonstration of binding between two purified proteins is of course non-physiological, yet contributes to our understanding. Still, we "hear you", and we've moved this to a supplemental figure.

      2) What is the mechanistic role of Sec18? An intricate inhibitor experiment (Figure 9) suggests that Sec18 acts later than Vps33. This is consistent with current thinking on the early role of SM proteins, but does not further delineate the mechanistic role of Sec18.

      Exactly. This is a fascinating question which is reinforced, but not answered, by our current work. We note that Sec17 is now seen to perform 3 functions (working with Sec18 for ATP-driven SNARE complex disassembly, driving completion of SNARE zippering, and supporting fusion per se) and Sec18 performs 2 functions (SNARE complex disassembly and supporting Sec17 for fusion per se).

      3) Does "entropic confinement" explain the role of Sec17? This very interesting question was not, so far as I could tell, directly addressed. My understanding is that the concept of entropic confinement comes from studies of chaperonins such as GroEL/ES, which entirely enclose their substrates in what Paul Sigler memorably described as "a temple for protein folding". Here, it's much less clear that Sec17 could sufficiently constrain the presumably-unfolded juxtamembrane regions of the truncated and/or mutant SNAREs to drive membrane fusion. Indeed, Schwartz et al. (2017) noted "open portals" between adjacent Sec17 molecules that would "allow SNARE residues spanning the partially-zipped helical bundle and the transmembrane anchors to pass cleanly between pairs of adjacent Sec17 subunits".

      We have removed the term "entropic confinement", and in deference to Schwartz et al. (which we cite) we refer to the Sec17 open cage and the folding environment it may create for SNARE complex assembly.

      4) What is the mechanistic role of the "hydrophobic loop" at the N-terminus of Sec17? Previous work from the Wickner lab (Song et al., 2017) concluded that its main function under normal circumstances was to promote Sec17 membrane association, but when zippering was incomplete it might act as a wedge to perturb the bilayers. These experiments made use of artificially membrane-anchored Sec17, either wild-type or the "FSMS" hydrophobic loop mutant. This approach was extended here (Figure 8) but did not, so far as I could tell, greatly advance our mechanistic understanding.

      Agreed. Each of your points 2-4 reinforce central questions which our lab, and others, will strive to answer: What does Sec18 do? How does Sec17 oligomerization around the SNAREs relate to those SNAREs? What is the role of the Sec17 apolar loop? We do find though that Sec17 and Sec18, however they act, are so important as contributors toward driving fusion that they can compensate for only partial zippering when tested to do so.

    1. Author Response:

      Reviewer #2 (Public Review):

      In their study, Lutes et al examine the fate of thymocytes expressing T cell receptors (TCR) with distinct strengths of self-reactivity, tracking them from the pre-selection double positive (DP) stage until they become mature single positive (SP) CD8+ T cells. Their data suggest that self-reactivity is an important variable in the time it takes to complete positive selection, and they propose that it thus accounts for differences in timescales among distinct TCR-bearing thymocytes to reach maturity. They make use of three MHC-I restricted T cell receptor transgenics, TG6, F5, and OT1, and follow their thymic development using in vitro and in vivo approaches, combining measures at the individual cell-level (calcium flux and migratory behaviour) with population-level positive selection outcomes in neonates and adults. By RNA-sequencing of the 3 TCR transgenics during thymic development, Lutes et al make the additional observation that cells with low self-reactivity have greater expression of ion channel genes, which also vary through stages of thymic maturation, raising the possibility that ion channels may play a role in TCR signal strength tuning.

      This is a well-written manuscript that describes a set of elegant experiments. However, in some instances there are concerns with how analyses are done (especially in the summaries of individual cell data in Fig 2 and 3), how the data is interpreted, and the conclusions from the RNA-seq with regard to the ion channel gene patterns are overstated given the absence of any functional data on their role in T cell TCR tuning. As such the abstract is currently not an accurate reflection of the study, and the discussion also focuses disproportionately on the data in the final figure, which forms the most speculative part of this paper.

      (1) As the authors themselves point out (discussion), one of the strengths of this study is the tracking of individual cells, their migratory behaviour and calcium flux frequency and duration over time. However, the single-cell experiments presented (Figure 2 and 3) do not make use of the availability of single-cell read-outs, but focus instead on averaging across populations. For instance, Figure 3a,b provides only 2 sets of examples, but there is no summary of the data providing a comparison between the two transgenics across all events imaged. In Figure 3c, the question that is being asked, which is to test for between-transgenic differences is ultimately not the question that is being answered: the comparison that is made is between signaling and non-signaling events within transgenics. However, this latter question is less interesting as it was already shown previously that thymocytes pause in their motion during Ca flux events (as do mature T cells). Moreover, the average speed of tracks is probably not the best measure here in reading out self-reactivity differences between TCR transgenic groups.

      We regret any lack of clarity in how we presented our analyses of the calcium imaging data. In the original submission, we did provide analyses of individual cells (Fig 2b, Fig 3c (Fig 2e in the revised manuscript), Suppl Tables 1 and 2, and supplemental videos S1 and S2). In the revised manuscript, we have added an additional analyses of individual cells (Figure 2—figure supplement 1a). In addition, Fig 3a and b (Fig 2c and d in the revised manuscript) provided information about the average behavior of thymocytes during signaling events by identifying numerous examples of individual signaling cells (23-37 individual cell signaling events per condition), aligning these multiple examples based on the start of their signaling events, and displaying the average changes in calcium and speed over time. Thus this data does take advantage of the single-cell measurements by providing information about the average behavior of signaling events, which could not be inferred from bulk measurements. Regarding Fig. 3c (Fig. 2e in the revised manuscript), we agree that a more direct comparison of pausing between TCR transgenic models was needed. To address this point, we have added a new panel (Fig 2f in the revised manuscript) that uses the difference in speed between the signaling and nonsignaling portions of the same track to define a “pause index” for each cell. The difference in pause index between the transgenic models is highly significant at both 3 and 6 hours into positive selection. In the revised manuscript we have added additional text to detail more precisely how we performed the analyses, and to make it clearer that individual tracks are being analyzed. We have also included a graph of the Calcium Ratio and the Average Speed for the individual cells shown in the supplemental videos.

      (2) The authors conclude from their data that the self-reactivity of thymocytes correlates with the time to complete positive selection. However the definition of what this includes is blurry. It could be that while an individual cell takes the same amount of time to complete positive selection (ie, the duration from the upregulation of CD69 until transition to the SP stage is the same), but the initial 'search' phase for sufficient signaling events differs (eg. because of lower availability of selecting ligands for TG6 than for OT1), in which case at the population level positive selection would appear to take longer. Given that from Fig 2/3 it appears that both the frequency of events and their duration differ along the self-reactivity spectrum, this needs to be clarified. Moreover, whether the positive selection rate and positive selection efficiency can be considered independently is not explained. It appears that the F5 transgenic in particular has very low positive selection efficiency (substantially lower %CD69+ and of %CXCR4-CCR7+ cells than the OT1 and TG6) and how this relates to the duration of positive selection, or is a function of ligand availability is unclear.

      (3) While the question of time to appearance of SP thymocytes of distinct self-reactivities during neonatal development presented (Figure 5) is interesting, it is difficult to understand the stark contrast in time-scales seen here compared with their in vitro thymic slice (Figure 4) and in vivo EdU-labelling data (Figure 6), where differences in positive selection time was estimated to be ~1-2 days between TCR transgenics of high versus low affinity. This would suggest that there may be other important changes in the development of neonates to adults not being considered, such as the availability of the selecting self-antigens.

      Since, Reviewer #2’s comments 2 and 3 are related, we will discuss them together. In this study, we have used 3 independent approaches (the thymic slice system, the EdU labeling study, and analyses of neonatal transgenic mice) to estimate the relative time for thymocytes bearing different TCRs to complete positive selection, and all three confirm that OT1 is the most rapid and TG6 the slowest of the 3 transgenic models examined here. However, each approach relies on different start times and different read outs, so they are not directly comparable to each other. The thymic slice system tracks a cohort of preselection thymocytes over time. However, given the 4 day limit for this system, it is not possible to reach the theoretical maximum number of CD8SP. Thus, our estimates of the delay in positive selection are based on the timing of multiple phenotypic changes (CD69 induction, chemokine receptor switch, and CD8SP appearance) in this system. The EdU study (Fig 5 in the revised manuscript) allows us to track a cohort of thymocytes that have recently completed TCRb selection and follows them over a longer time period (up to 9 days). Because the number of OT1 and F5 CD8 SP thymocytes reached a clear plateau, this allows us to estimate the average time between the burst of cell division after TCRb selection and the downregulation of CD4 (3.5 days for OT1 and 4.5 days for F5). However, at 9 days the number of TG6 thymocytes is still increasing, and thus we have only a lower estimate (>6 days) of the average time after TCRb selection to the appearance of CD8SP thymocytes with this TCR. When we track the appearance of mature CD8SP after birth (Figure 4 in the revised manuscript), we are not tracking a synchronized cohort of positively selecting cells, but rather we are measuring the amount of time it takes for single positive cells to accumulate into a population size similar to what is seen in an adult. Thus, these experiments do not provide a direct measure of the time to complete positive selection, but rather provide an indirect measure of the number of cells that have successfully completed positive selection at the given timepoints post birth. The observation that OT-1 CD8 SP thymocytes reach their adult steady state numbers at one week whereas TG6 CD8 SP thymocytes are well below adult levels at 21 days is likely a reflection of lengthy positive selection of TG6, resulting in a much longer time to fill the adult niche for CD8SP thymocytes. We agree with the reviewer that there could be additional important differences in positive selection between neonatal vs adult. We explore this topic and relate our data to recent published in the discussion (line 574) of the manuscript.

      With regard to point (2), our data suggest that the longer time for positive selection is a result of both a longer search phase and a longer progression phase. Specifically, the % of CD69+ cells (Fig 3b and Figure 3—figure supplement 2a) peaks at 24 hours for OT1 and F5, but is delayed until 48 hours for TG6, consistent with a 1-2 day delay in the “search phase” for TG6. However, if this initial search phase was the only factor contributing to delayed TG6 development then we might expect to see a 1-2 day lag in TG6 development compared to OT-1. However, as discussed above, the EdU data indicates a > 3 day lag in the appearance of TG6 CD8SP compared to OT1. Thus, there is evidence that both the search phase and the progression phase of positive selection are longer in thymocytes with low self reactivity.

      (4) The conclusion that "ion channel activity may be an important component of T cell tuning during both early and late stages of T cell development" is not supported by any data provided. The authors have shown an interesting association between levels of expression of ion channels, their self-affinity and the thymus selection stage. However, some functional data on their expression playing a role in either the strength of TCR signaling or progression through the thymus (for instance using thymic slices and the level of CD69 expression over time), would be needed to make this assertion. Moreover, from how the data is presented it is difficult to follow the conclusion that a 'preselection signature' is retained by the low but not the high self-reactivity thymocytes.

      We agree that a role for ion channel activity in T cell tuning is speculative at this point, and we have tempered our conclusions in the revised manuscript. With regard to the evidence that a preselection signature is retained by thymocytes with low self reactivity, this conclusion is based on 2 separate lines of evidence presented in Figure 6 (previously Figure 7 in the original submission) and Figure 6—figure supplement 2. To summarize: 1). We defined a “preselection” gene signature based on preselection (CD69-DP) wild type thymocytes from the ImmGen microarray data, and show that this set of genes is also tends to be more highly expressed in thymocytes of low vs high self reactivity (TG6>F5>OT1) at equivalent stages of development (Fig 6d). 2). We identify a set of ion channel genes (cluster 2a from Fig 6c) that are more highly expressed in thymocytes of low vs high self reactivity (TG6>F5>OT1), and are also more highly expressed in earlier stages of positive selection for each TCR. This trend can also be seen in Figure 6— figure supplement 2c when comparing the expression of all cluster 2 ion channel genes across the wild type thymocyte subsets from ImmGen microarray data. Again, expression of this gene set peaks in the DP CD69- (preselection) population compared to other stages, including the preceding (DN4) and following (DP CD69+) stages of thymocyte development. We have edited this part of the results section in the revised manuscript to improve clarity.

  6. Mar 2021
    1. Author Response:

      Reviewer #1 (Public Review):

      In this project, the authors set out to create an easy to use piece of software with the following properties: The software should be capable of creating immersive (closed loop) virtual environments across display hardware and display geometries. The software should permit easy distribution of formal experiment descriptions with minimal changes required to adapt a particular experimental workflow to the hardware present in any given lab while maintaining world-coordinates and physical properties (e.g. luminance levels and refresh rates) of visual stimuli. The software should provide equal or superior performance for generating complex visual cues and/or immersive visual environments in comparison with existing options. The software should be automatically integrated with many other potential data streams produced by 2-photon imaging, electrophysiology, behavioral measurements, markerless pose estimation processing, behavioral sensors, etc.

      To accomplish these goals, the authors created two major software libraries. The first is a package for the Bonsai visual programming language called "Bonsai.Shaders" that brings traditionally low-level, imperative OpenGL programming into Bonsai's reactive framework. This library allows shader programs running on the GPU to seamlessly interact, using drag and drop visual programming, with the multitude of other processing and IO elements already present in numerous Bonsai packages. The creation of this library alone is quite a feat given the complexities of mapping the procedural, imperative, and stateful design of OpenGL libraries to Bonsai's event driven, reactive architecture. However, this library is not mentioned in the manuscript despite its power for tasks far beyond the creation of visual stimuli (e.g. GPU-based coprocessing) and, unlike BonVision itself, is largely undocumented. I don't think that this library should take center stage in this manuscript, but I do think its use in the creation of BonVision as well as some documentation on its operators would be very useful for understanding BonVision itself.

      We have added a reference to the Shaders package at multiple points in the manuscript including lines 58-59 and in Supplementary Details. We will be adding documentation of key Shaders nodes that are important for the creation of BonVision stimuli to the documentation on the BonVision website.

      Following the creation of Bonsai.Shaders, the authors used it to create BonVision which is an abstraction on top of the Shaders library that allows plug and play creation of visual stimuli and immersive visual environments that react to input from the outside world. Impressively, this library was implemented almost entirely using the Bonsai visual programming language itself, showcasing its power as a domain-specific language. However, this fact was not mentioned in the manuscript and I feel it is a worthwhile point to make.

      Thank you - we have now added clarification on this in Supplementary details (section Customised nodes and new stimuli)

      The design of BonVision, combined with the functional nature of Bonsai, enforces hard boundaries between the experimental design of visual stimuli and (1) the behavioral input hardware used to drive them, (2) the dimensionality of the stimuli (i.e. 2D textures via 3D objects), (3) the specific geometry of 3D displays (e.g. dual monitors, versus spherical projection, versus head mounted stereo vision hardware), and (4) automated hardware calibration routines. Because of these boundaries, experiments designed using BonVision become easy to share across labs even if they have very different experimental setups. Since Bonsai has integrated and standardized mechanisms for sharing entire workflows (via copy paste of XML descriptions or upload of workflows to publicly accessible Nuget package servers), this feature is immediately usable by labs in the real world.

      After creating these pieces of software, the authors benchmarked them against other widely used alternatives. IonVisoin met or exceeded frame rate and rendering latency performance measures when compared to other single purpose libraries. BonVision is able to do this while maintaining its generality by taking advantage of advanced JIT compilation features provided by the .NET runtime and using bindings to low-level graphics libraries that were written with performance in mind. The authors go on to show the real-world utility of BonVision's performance by mapping the visual receptive fields of LFP in mouse superior colliculus and spiking in V1. The fact that they were able to obtain receptive fields indicates that visual stimuli had sufficient temporal precision. However, I do not follow the logic as to why this is because the receptive fields seem to have been created using post-hoc aligned stimulus-ephys data, that was created by measuring the physical onset times of each frame using a photodiode (line 389). Wouldn't this preclude any need for accurate stimulus timing presentation?

      We thank the reviewer for this suggestion. We now include receptive field maps calculated using the BonVision timing log in Figure5 – figure supplement 1. Using the BonVision timing alone was also effective in identifying receptive fields.

      Finally the authors use BonVision to perform one human psychophysical and several animal VR experiments to prove the functionality of the package in real-world scenarios. This includes an object size discrimination task with humans that relies on non-local cues to determine the efficacy of the cube map projection approach to 3D spaces (Fig 5D). Although the results seem reasonable to me (a non-expert in this domain), I feel it would be useful for the authors to compare this psychophysical discrimination curve to other comparable results. The animal experiments prove the utility of BonVision for common rodent VR tasks.

      The psychometric test we performed on human subjects was primarily to test the ability of BonVision to present VR stimuli on a head-mounted display. We have edited the text to reflect this. The efficacy of the cube map approach for 3D spaces is well-established in computer graphics and gaming and is currently the industry standard, which was the reason for our choice.

      In summary, the professionalism of the code base, the functional nature of Bonsai workflows, the removal of overhead via advanced JIT compilation techniques, the abstraction of shader programming to high-level drag and drop workflows, integration with a multitude of input and output hardware, integrated and standardized calibration routines, and integrated package management and workflow sharing capabilities make Bonsai/BonVision serious competitors to widely-used, closed-source visual programming tools for experiment control such as LabView and Simulink. BonVision showcases the power of the Bonsai language and package management ecosystem while providing superior design to alternatives in terms of ease of integration with data sources and facilitation of sharing standardized experiments. The authors exceeded the apparent aims of the project and I believe BonVision will become a widely used tool that has major benefits for improving experiment reproducibility across laboratories.

      Reviewer #2 (Public Review):

      BonVision is a package to create virtual visual environments, as well as classic visual stimuli. Running on top of Bonsai-RX it tries and succeeds in removing the complexity of the above mentioned task and creating a framework that allows non-programmers the opportunity to create complex, closed loop experiments. Including enough speed to capture receptive fields while recording different brain areas.

      At the time of the review, the paper benchmarks the system using 60Hz stimuli, which is more than sufficient for the species tested, but leaves an open question on whether it could be used for other animal models that have faster visual systems, such as flies, bees etc.

      Thank you for prompting us to do this - we have now added new benchmarks for a faster refresh rate (144 Hz; new Figure 4 - figure supplement 1).

      The authors do show in a nice way how the system works and give examples for interested readers to start their first workflows with it. Moreover, they compare it to other existing software, making sure that readers know exactly what "they are buying" so they can make an informed decision when starting with the package.

      Being written to run on top of Bonsai-RX, BonVision directly benefits from the great community effort that exists in expanding Bonsai, such as its integration with DeepLabCut and Auto-pi-lot. Showing that developing open source tools and fostering a community is a great way to bring research forward in an additive and less competitive way.

      Reviewer #3 (Public Review):

      Major comments:

      While much of the classic literature on visual systems studies have utilized egocentrically defined ("2D") stimuli, it seems logical to project that present and future research will extend to not only 3D objects but also 3D environments where subjects can control their virtual locations and viewing perspectives. A single software package that easily supports both modalities can therefore be of particular interest to neuroscientists who wish to study brain function in 3D viewing conditions while also referencing findings to canonical 2D stimulus responses. Although other software packages exist that are specialized for each of the individual functionalities of BonVision, I think that the unifying nature of the package is appealing for reasons of reducing user training and experimental setup time costs, especially with the semi-automated calibration tools provided as part of the package. The provisions of documentation, demo experiments, and performance benchmarks are all highly welcome and one would hope that with community interest and contributions, this could make BonVision very friendly to entry by new users.

      Given that one function of this manuscript is to describe the software in enough detail for users to judge whether it would be suited to their purposes, I feel that the writing should be fleshed out to be more precise and detailed about what the algorithms and functionalities are. This includes not shying away from stating limitations -- which as I see it, is just the reality of no tool being universal, but because of that is one of the most important information to be transmitted to potential users. My following comments point out various directions in which I think the manuscript can be improved.

      We thank the reviewer for this suggestion. We have added a major new section, “Supplementary Details”, where we have highlighted known limitations and available workarounds. We also added new rows in the Supplementary Table that make these limitations transparent (eg. web-based deployment).

      The biggest point of confusion for me was whether the 3D environment functionality of BonVision is the same as that provided by virtual spatial environment packages such as ViRMEn and gaming engines such as Unity. In the latter software, the virtual environment is specified by geometrically laying out the shape of the traversable world and locations of objects in it. The subject then essentially controls an avatar in this virtual world that can move and turn, and the software engine computes the effects of this movement (i.e. without any additional user code) then renders what the avatar should see onto a display device. I cannot figure out if this is how BonVision also works. My confusion can probably be cured by some additional description of what exactly the user has to do to specify the placement of 3D objects. From the text on cube mapping (lines 43 and onwards), I guessed that perhaps objects should be specified by their vectorial displacement from the subject, but I have very little confidence in my guess and also cannot locate this information either in the Methods or the software website. For Figure 5F it is mentioned that BonVision can be used to implement running down a virtual corridor for a mouse, so if some description can be provided of what the user has to do to implement this and what is done by the software package, that may address my confusion. If BonVision is indeed not a full 3D spatial engine, it would be important to mention these design/intent differences in the introduction as well as Supplementary Table 1.

      Thank you for prompting us to do this. BonVision does indeed essentially render the view of an avatar in a virtual world (or multiple views, of multiple avatars), without any additional coding required by the user. We have now included in the new “Supplementary Details” specific pathways to the construction and rendering of 3D scenes. We have avoided the use of the terminology ‘game-engine’ as it has a particular definition that most softwares do not satisfy.

      More generally, it would be useful to provide an overview of what the closed-loop rendering procedure is, perhaps including a Figure (different from Supplementary Figure 2, which seems to be regarding workflow but not the software platform structure). For example, I imagine that after the user-specified texture/object resources have been loaded, then some engine runs a continual loop where it somehow decides the current scene. As a user, I would want to know what this loop is and how I can control it. For example, can I induce changes in the presented stimuli as a function of time, whether this time-dependence has to be prespecified before runtime, or can I add some code that triggers events based on the specific history of what the subject has done in the experiment, and so forth. The ability to log experiment events, including any viewpoint changes in 3D scenes, is also critical, and most experimenters who intend to use it for neurophysiological recordings would want to know how the visual display information can be synchronized with their neurophysiological recording instrumental clocks. In sum, I would like to see a section added to the text to provide a high-level summary of how the package runs an experiment loop, explaining customizable vs. non-customizable (without directly editing the open source code) parts, and guide the user through the available experiment control and data logging options.

      We have now added a brief paragraph regarding the basic structure of a BonVision program, and how to ‘close the loop’ in the new “Supplementary Details”.

      Having some experience myself with the tedium (and human-dependent quality) of having to adjust either the experimental hardware or write custom software to calibrate display devices, I found the semi-automated calibration capabilities of BonVision to be a strong selling point. However I did not manage to really understand what these procedures are from the text and Figure 2C-F. In particular, I'm not sure what I have to do as a user to provide the information required by the calibration software (surely it is not the pieces of paper in Fig. 2C and 2E..?). If for example, the subject is a mouse head-fixed on a ball as in Figure 1E, do I have to somehow take a photo from the vantage of the mouse's head to provide to the system? What about the augmented reality rig where the subject is free to move? How can the calibration tool work with a single 2D snapshot of the rig when e.g. projection surfaces can be arbitrarily curved (e.g. toroidal and not spherical, or conical, or even more distorted for whatever reasons)? Do head-mounted displays require calibration, and if so how is this done? If the authors feel all this to be too technical to include in the main text, then the information can be provided in the Methods. I would however vote for this as being a major and important aspect of the software that should be given air time.

      We have a dedicated webpage going through the step-by-step protocol for the automated screen calibration. We now explicitly point to this page in the new Supplementary Details section.

      As the hardware-limited speed of BonVision is also an important feature, I wonder if the same ~2 frame latency holds also for the augmented reality rendering where the software has to run both pose tracking (DeepLabCut) as well as compute whole-scene changes before the next render. It would be beneficial to provide more information about which directions BonVision can be stressed before frame-dropping, which may perhaps be different for the different types of display options (2D vs. 3D, and the various display device types). Does the software maintain as strictly as possible the user-specified timing of events by dropping frames, or can it run into a situation where lags can accumulate? This type of technical information would seem critical to some experiments where timings of stimuli have to be carefully controlled, and regardless one would usually want to have the actual display times logged as previously mentioned. Some discussion of how a user might keep track of actual lags in their own setups would be appreciated.

      We now provide this as part of the Supplementary Details, specifically animation and timing lags.

      On the augmented reality mode, I am a little puzzled by the layout of Figure 3 and the attendant video, and I wonder if this is the best way to showcase this functionality. In particular, I'm not entirely sure what the main scene display is although it looks like some kind of software rendering — perhaps of what things might look like inside an actual rig looking in from the top? One way to make this Figure and Movie easier to grasp is to have the scene display be the different panels that would actually be rendered on each physical panel of the experiment box. The inset image of the rig should then have the projection turned on, so that the reader can judge what an actual experiment looks like. Right now it seems for some reason that the walls of the rig in the inset of the movie remain blank except for some lighting shadows. I don't know if this is intentional.

      Because we have had limited experimental capacity in this period, we only simulated a real-time augmented reality environment off-line, using pre-existing videos of animal behaviour. We think that the comment above reflects a misunderstanding of what the Figure and associated Supplementary Movie represents, and we realise that their legends were not clear enough. We have now made sure that these legends make clear that these are based on simulations (new legends for Figure 3 and Figure 3 - video supplement 1).

    1. Author Response to Public Reviews

      Public comment #1: “The mRNA data are interpreted as evidence for changes in protein expression and Ras signalling activity - there is no formal evidence that this is the case.

      Response: As requested, while increase mRNA is a strong indication of elevated expression, we completely agree with this comment and added the caveat that protein analysis was not performed.

      Public comment #2: “It is also intriguing how there wasn't a more complete switch to Q61 in the high KRAS tumours when p53 was deleted.”

      Response: As requested, we now bring up this point in the discussion.

      Public comment #3: “Whilst the Ras signalling dosing/oncogenic stress nexus are a reasonable explanation, the model/methods are a snapshot in time and don't have the resolution to fully understand the detail of what is going on here.”

      Response: As requested, while we have a different take on the degree by which this study informs RAS mutation tropism, we appreciate the position of this reviewer, particularly the point that using genetics to modulate oncoprotein levels and the stress response thereof at the endogenous levels in vivo with Kras<sup>ex3op</sup> and Trp53<sup>fl</sup> alleles coupled with measuring the expression of three established downstream RAS target genes is no substitute for following signaling at the protein level at the moment Ras mutations are induced and thereafter throughout tumorigenesis, and hence note this caveat in the discussion.

    1. Author Response:

      Evaluation Summary:

      This paper will be of considerable interest to anybody focusing on highly sensitive T cell antigen recognition. It uses an extended experimental protocol and analytical methods to assess very low T cell receptor binding affinities, and to determine how T cells discriminate between self- and non-self antigens. The main conclusions are well supported by the presented analysis and provide a novel view on a previously considered concept.

      Reviewer #1 (Public Review):

      The presented manuscript takes a comprehensive and elaborated look at how T cell receptors (TCR) discriminate between self and non-self antigens. By extending a previous experimental protocol for measuring T cell receptor binding affinities against peptide MHC complexes (pMHC), they are able to determine very low TCR-pMHC binding affinities and, thereby, show that the discriminatory power of the TCR seems to be imperfect. Instead of a previously considered sharp threshold in discriminating between self and non-self antigen, the TCR can respond to very low binding affinities leading to a more transient affinity threshold. However, the analysis still indicates an improved discrimination ability for TCR compared to other cell surface receptors. These findings could impact the way how T cell mediated autoimmunity is studied.

      The authors follow a comprehensive and elaborated approach, combining in vitro experiments with analytical methods to estimate binding affinities. They also show that the general concept of kinetic proofreading fits their data with providing estimates on the number of proofreading steps and the corresponding rates. The statistical and analytical methods are well explained and outlined in detail within the Supplemental Material. The source of all data, and especially how the data to analyze other cell surface receptor binding affinities was extracted, are given in detail as well. Besides being able to quantify TCR-pMHC interactions for very low binding affinities, their findings will improve the ability to assess how autoimmune reactions are potentially triggered, and how potent anti-tumour T cell therapies can be generated.

      In summary, the study represents an elaborated and concise analysis of TCR-pMHC affinities and the ability of TCR to discriminate between self and non-self antigens. All conclusions are well supported by the presented data and analyses without major caveats.

      Reviewer #2 (Public Review):

      The paper revisits the question of ligand discrimination ability of TCRs of T cells. The authors find that the commonly held notion of very sharp discrimination between strongly and weakly binding peptides does not hold when the affinities of the weak peptides are re-measured more accurately, using their own new method of calibration of SPR measurements. They are able to phenomenologically fit their results with a ~2 step Kinetic Proofreading model.

      It is a very carefully researched and thorough paper. The conclusions seem to be supported by the data and fundamental for our understanding of the T cell immune response with potentially very high impact in many scientific and applied fields. The calibration method could be of potential use in other cases where low affinities are an issue.

      As a non-expert in the details of experimental technique, it is somewhat difficult to understand in detail the Ab calibration of the SPR curve - which is a central piece of the paper. The main question is - what are the grounds (theoretical and/or empirical) to expect that the B_max of the TCR dose response curve will continue to be proportional to the plateau level of the Ab. Figure 1D does suggest that, but it would be hard to predict what proportionality shape the curve will take for lower affinity peptides. Given that essentially all the paper claims rest on this assumption, this should explained/reasoned/supported more clearly.

      We have revised the relevant Results and Methods sections to provide additional information. This information should clarify the expected relationship between Bmax and W6/32 binding. We emphasise that we have only interpolated within the curve and therefore, have not relied on any assumptions about the relationship between these two values outside of the empirical curve that we have generated.

      On the theoretical side - I think the scaling alpha\simeq 2 in Figure 2 is indeed consistent with a two-step KPR amplification. However, there are some questions regarding the fitting of the full model to the P_15 of the CD69 response. As explained in the Supplementary Material the authors use 3 global and 2 local parameters resulting in 37 (or 27) parameters for 32 data points. To a naive reader this might look excessive and prone to overfitting. On the other hand, looking at Figure S8 shows the value ranges of lambda and k_p are quite tight. This is in contrast to gamma and dellta that look completely unconstrained.

      We have revised the relevant Results section to explicitly indicate that the number of data points ex- ceeds the number of free parameters, which together with the ABC-SMC results, should provide additional confidence that we are not over-fitting.

      Finally, one of the stated advantages of the adaptive proof-reading model is that it is capable of explaining antagonism. It is hard to see how a 'vanilla" KPR model is capable of explaining antagonism.

      We have added a discussion paragraph to discuss antagonism, which cannot be explained by the basic KP model that we found is sufficient to explain our data on antigen discrimination in the presence of self pMHCs on autologous APCs. We describe how the methods we have employed can be used to study antagonism.

      Reviewer #3 (Public Review):

      Pettmann et al. aimed at significantly improving the accuracy of SPR-based measurements of low affinity TCR-pMHC interactions by including a 100% binding control (injecting of a conformation-specific HLA-antibody) in the surface plasmon resonance protocol. Interpolating with the information of saturated pMHC binding on the chip The authors arrive at KDs for low affinity binders that are significantly higher than the previously reported constants. If correct, this has considerable ramifications for the interpretations of the results obtained from functional assays measuring the T cell response towards pMHCs featured in a titrated fashion. Unlike what was put forward by earlier reports, the authors conclude that the discriminatory power of TCRs is far from perfect, as T cells still respond to low affinity pMHC-ligands without a sharp affinity threshold. This is also because they managed to detect T cells responding to even ultra-low affinity ligands if provided in sufficient numbers.

      The body of work convinces in several regards:

      (i) It is exceedingly well thought out and introduces a quality of analytical strength that is absent in most of the literature published thus far on this topic.

      (ii) At the same time theoretical arguments are bolstered by a large body of experimental "wet" work, which combines a synthetic approach with cellular immunology and which appears overall well executed.

      (iii) The data lead to hypotheses in the field of T cell antigen recognition in general and in the theatre of autoimmunity, cancer and infectious diseases.

      There are a few aspects that may limit the impact of the study. I have listed them below:

      (i) The study does not provide kinetic data for the low affinity ligand-TCR binding but rather argues from the position of affinities as determined via Bmax. This limits somewhat the robustness of the statements made with regard to kinetic proofreading.

      We agree with this statement and are hoping to directly measure off-rates in the future. We note that in the published literature, including our own work, point mutations to the peptide generally modify the off-rate with only minor impact on the on-rate. An example of this can be found in Lever et al (2016) PNAS where point mutations led to 100,000-fold change in the off-rate but only a 10-fold change in the on-rate. This likely explains why antigen potency is often well-correlated with affinity when using point mutations to the peptide.

      (ii) Thresholds for readouts were arbitrarily chosen (e.g. 15% activation). It appears such choices were based on system behavior (with the largest differences observed among the groups) but may have implications for the drawn conclusions.

      We have chosen 15% in order to capture the ultra-low affinity pMHCs in our potency plots and have now added a sentence for why we have chosen this particular threshold. We did explore different thresholds but found that they produced similar values of α. The precise threshold could change the estimate of α if the shape of dose-response curves was dependent on antigen affinity but we did not find any evidence for this within our data.

      In summary, the work presented contributes to demystifying the link between TCR-engagement and (membrane proximal) signaling. It also provides a fresh perspective on the potential of TCR-cossreactivity.

    1. Author Response to Public Reviews

      Reviewer #1 (Public Review):

      [...] The deficiencies of this study are:

      1) This is a very specific cohort, largely urban, with - presumably - relatively higher levels of education. It is hard to see how this might translate into a general statement about the population

      We agree with the reviewer that this is a very specific cohort, largely urban, and with higher levels of education than average. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript:

      a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national serosurvey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

      b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates.

      c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate).

      2) The presentation of Figure 1 was quite confusing, especially the colour coding

      Figure 1 was made to represent cities with CSIR labs where the sero-survey was carried out in different colour coding formats to have a quick understanding of prevalence. The cities with sero-prevalence greater than 10 percent were coded as green, while cities with seroprevalence between 5-10 percent values were coded as yellow. Cities with less than 5 percent sero-positivity were depicted as red for these may turn up into hotspots or rise of cases may be higher in these cities later when sero-positivity is used as an indirect surrogate of infection. Though, these cities while truly may not represent the state population, the state colour were coded as a gradient blue in respective format to reflect increased sero-positivity in a darker shade according to city sero-positivity. We realize that colour coding of states may have created a confusion and have removed this in the revised manuscript.

      3) It is surprising that the state of Maharashtra shows only intermediate to low levels of seropositivity, given that the impact of the pandemic was largest there and especially in the city of Pune. There have been alternative serosurveys for Pune which found much higher levels of seropositivity from about the same period.

      The Pune city sero-surveillance which has been pointed out by the reviewer was a survey of Pune’s five most affected sub-wards and not the Pune population in general. 6 Despite all the limitations, which we accept in the prior comment, our overall crude positivity rate of 10% is very similar to that of the ICMR national serosurvey, and in general the patterns we see are along the lines of what is known about severity of outbreaks. Thus, there is no real evidence to the contrary that would establish inaccuracy of the trends seen by us, and we respectfully note that surprising findings may be the most valuable ones. In fact, seeing current trends of rising cases in Maharashtra, including in Pune, when compared to other cities, our survey values may have been more correct.

      4) The statement "Seropositivity of 10% or more was associated with reductions in TPR which may mean declining transmission": For a disease with R of about 2, this would actually be somewhat early in the epidemic, so you wouldn't expect to see this in an indicator such as TPR. TPR is also strongly correlated with amounts of testing which isn't accounted for.

      We agree that for R of about 2, one would not expect a decline at sero-positivity of about 10%. However, it is worth noting that general seropositivity during the declining phase of the outbreak has been in this range for not just India, but also in major western European cities, New York, amongst other. This has three explanations. First, the highly exposed community containing the high-contact spreaders gets infected first, with higher seropositivity, thus effectively shortening or blocking transmission chains. We too note a much higher seropositivity amongst public transport users who may better represent this sub-population. Second, R0 of 2-3 is the potential of this virus. R-effective after measures are put in place may be much lower. Better compliance with masking in India may have been important. Last, the fraction of population immune at baseline is unknown but has been variably estimated at 20-30% from T cell reactivity studies as well as closed area breakouts such as ships. This is a speculative area but may help understand the results.

      We agree that we do not directly account for testing rate, which is difficult to adjust for and can affect TPR despite that fact that TPR already is one way of adjusting for different levels of testing. Since our data is a trend across different geographies, but for 30 days bracketing the sample collection, different testing rate would not in itself explain the very strong inverse association of seropositivity with TPR. Given that high seropositivity areas are likely more advanced in the course of the pandemic, we favour that as the explanation. This is after noting the issues with overall seropositivity as a surrogate of population immunity as above.

      5) The correlation with vegetarianism is unusual - you might have argued that this could potentially protect against disease but that it might protect against infection is hard to credit. Much of South Asia is not particularly vegetarian but has seen significantly less impact

      We very well agree with the statement that much of South-Asia is particularly non-vegetarian and when we started analyzing our data, it was observed that our cohort had a 70:30 ratio for non-vegetarian population to vegetarian population which was in agreement with what nationwide surveys have concluded in the past and hence our cohort was not biased in terms of sampling for this variable. We hereby in this work have tried to demonstrate seropositivity as an indirect surrogate of infection and the data was not analyzed in respected of zonal distribution and was analyzed for the entire cohort where we obtained the said observation. At this stage, we cannot speculate on the role a vegetarian diet may play in decrease sero-positivity amongst vegetarian individuals but could possibly relate it to antiinflammatory effects and effect of high fibre diet in protecting gut mucosa against viral invasion. Existing studies have only speculated on the role diet could play and there are no affirmative or largely biochemical studies to provide further evidence on this cause effect relationship. We also did a multi-collinearity analysis to study if diet was related to any other variable being studied but we didn’t find any such association.

      6) On the same point above, it is possible that social stratification associated with diet - direct employees being more likely to be vegetarian than contract workers - might be a confounder here, since outsourced staff seem to be at higher risk.

      When we analyzed the data, we also hypothesized for the above stated bias; a person’s occupation or job reflecting indirectly the socio-economic status can have an influence on diet preferences, but we didn’t obtain such a finding. In our cohort also, outsourced staff had higher non-vegetarianism than staff. Against 70:30 ratio of non-vegetarianism to vegetarianism, for the entre cohort, outsourced staff had 83 percent non-vegetarianism while staff had 66 percent, but sero-positivity amongst non-vegetarians in both the groups had higher sero-positivity of 17.25 and 8.77 percent respectively against sero-positivity of 11.89 and 6.05 percent amongst vegetarian people. We also did a logistic regression and collinearity assessment through VIF score but did not observe any such association and hence this was not acting as a confounder. For females, we rather didn’t found this association and only found transport and occupation to be significant, hence to a certain extent it is the crowding environment and occupational exposure which stand as major exposure variables when both the genders are taken into consideration.

      7) There may be correlations to places of residence that again act as confounders. If direct employees are provided official accommodation, they may simply have had less exposure, being more protected.

      We agree with this statement and have stated that outsourced personnel and staff who have to travel and specifically utilize public means of transport are exposed to a higher risk. We regret if this was not clear. The subgroup of people who use public transport reflect a more generalizable sub-population from within the cohort, with all associated risks and confounders. While we attempt a regression to separate a few of them, that is not the primary focus of this work.

      8) The correlations with blood group don't seem to match what is known from elsewhere

      We are unsure of what the reviewer is matching our data with, but have tried to explain why we consider our results to be broadly concordant. As advised elsewhere, this has not been detailed in the revised manuscript. Data for 7496 individuals was available for their Blood Group type and serological status. Blood Group (BG) distribution amongst total samples collected was similar to national reference based on a recent systematic review. Hence the sample characteristics of our cohort were similar to the national population reference. Through the literature available, it has been observed that ‘O’ BG type has less risk of getting infected which was observed in our study also. In our study, BG type O was associated with a lower sero-positivity rate, with an OR of 0∙76 (95 % CI 0∙64 -0∙91, p=0∙018) vs Non O blood group types with a overall sero-positivity of 7.09 percent which was less than the cohort wide sero-seropositivity. BG type AB and B had higher chances of testing sero-positive is what has been observed by available literature which was corroborated in our findings too. In regard to available literature; BG A has a higher risk of getting infection and this was contrary to our finding where we obtained a favourable OR in favour BG type A albeit it was not significant on statistical testing.

      9) The statement that "declining cases may reflect persisting humeral immunity among sub-communities with higher exposure" is unsupported. What sub-communities?

      We regret the lack of clarity. The wording has now been corrected, it just refers to subgroups of population with high levels of exposure.

      Reviewer #2 (Public Review):

      1) The extrapolation of the study results to the country may not be completely acceptable with the basic difference from the country's urban rural divide and a largely agricultural economy. The female gender is underrepresented in the study cohort, and no children have been included.

      We agree with the reviewer that this is a specific cohort, largely urban. We also agree that a cohort of people utilizing public transport would be better representative and we are following the individuals as the cohort enables to follow them and get further insights. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript. (See also response to Reviewer #1.)

      2) The observations regarding corelates of sero-positivity such as diet smoking etc would need specifically designed adequately powered studies to confirm the same. The sample size for the three and six months follow up to conclude stability of the humoral immunity, is small and requires further follow-up of the cohort. The role of migration of labour helping the spread of the pandemic simultaneously to all parts of the country though attractive may not explain lower rates in states like UP and Bihar where maximum migrants moved to.

      We agree that the observations in regard to diet and smoking are only hypothesis generating and need specifically designed studies to confirm the findings. We have also mentioned in the manuscript that associations found between seropositivity and some of the parameters should be confirmed with studies specifically designed for this purpose. We are following up more individuals at three and six months to ascertain the stability of the antibodies. Maximum migrants in the early phase moved to UP and Bihar and it would indeed be expected that seeding would be higher there. While known cases were low for these states, the seropositivity data supports that seeding did occur but may have gone undetected. The ICMR Aug-Sept serosurvey data, for example, shows seropositivity in districts of these states to be higher than those Gujarat or Rajasthan.

      3) A large chunk of seropositive data set has been removed representing the big cities of Delhi and Bengaluru while correlating Test Positivity Rate citing duration as the reason. However, these cities also had different testing strategies and health infrastructure and hence are important.

      We agree that some data was removed. This was because the sample collection was extended over a longer interval, making point estimates meaningless for some labs, especially CSIR-IGIB which conducted many mini-surveys. For Delhi, only IGIB has been removed and other labs are still kept in the analysis. The graph directionality and trends remain same when analyzed with the excluded data. On keeping Bengaluru data, R square doesn’t change to second decimal place and remains same. When adding back the data from CSIR-IGIB, the R square is 0.32, maintaining the directionality and trend.

      4) Test positivity rate depends on testing strategy and type of test used; whether RTPCR or the Rapid Antigen Test and the ratio of the two tests was different in different parts of the country.

      This is a very well taken point, but the data was taken as a surrogate from a third party website and the further breakup of positivity rate was not available. It should of course be done ideally with one type of test only but this was not possible. Our larger point is that for a given part of the country TPR went down when seropositivity went up. This is relevant even if different parts of the country used different ratios of the test.

      Reviewer #3 (Public Review):

      [...] Weaknesses: While it is a pan-India survey, the population is not quite representative of general population of the country. CSIR labs are mostly in cities, and most of the employees use private transport. So the results cannot be generalized to the country as a whole. Restricting to people using public transport would be a better representation, although it still would not be fully representative.

      We agree with the reviewer that this is a specific cohort, largely urban. We also agree that a cohort of people utilizing public transport would be better representative and we are following the individuals as the cohort enables to follow them and get further insights. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript. (See also response to Reviewer #1.)

    2. Author Response:

      Reviewer 1:

      The deficiencies of this study are:

      1) This is a very specific cohort, largely urban, with - presumably - relatively higher levels of education. It is hard to see how this might translate into a general statement about the population

      We agree with the reviewer that this is a very specific cohort, largely urban, and with higher levels of education than average. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript.

      a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national serosurvey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

      b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates

      c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate).

      2) The presentation of Figure 1 was quite confusing, especially the colour coding

      Figure 1 was made to represent cities with CSIR labs where the sero-survey was carried out in different colour coding formats to have a quick understanding of prevalence. The cities with sero-prevalence greater than 10 percent were coded as green, while cities with sero-prevalence between 5-10 percent values were coded as yellow. Cities with less than 5 percent sero-positivity were depicted as red for these may turn up into hotspots or rise of cases may be higher in these cities later when sero-positivity is used as an indirect surrogate of infection. Though, these cities while truly may not represent the state population, the state colour were coded as a gradient blue in respective format to reflect increased sero-positivity in a darker shade according to city sero-positivity.

      3) It is surprising that the state of Maharashtra shows only intermediate to low levels of seropositivity, given that the impact of the pandemic was largest there and especially in the city of Pune. There have been alternative serosurveys for Pune which found much higher levels of seropositivity from about the same period.

      The Pune city sero-surveillance which has been pointed out by the reviewer was a survey of Pune’s five most affected sub-wards and not the Pune population in general. 6 Despite all the limitations, which we accept in the prior comment, our overall crude positivity rate of 10% is very similar to that of the ICMR national serosurvey, and in general the patterns we see are along the lines of what is known about severity of outbreaks. Thus, there is no real evidence to the contrary that would establish inaccuracy of the trends seen by us, and we respectfully note that surprising findings may be the most valuable ones. In fact, seeing current trends of rising cases in Maharashtra, including in Pune, when compared to other cities, our survey values may have been more correct.

      4) The statement "Seropositivity of 10% or more was associated with reductions in TPR which may mean declining transmission": For a disease with R of about 2, this would actually be somewhat early in the epidemic, so you wouldn't expect to see this in an indicator such as TPR. TPR is also strongly correlated with amounts of testing which isn't accounted for.

      We agree that for R of about 2, one would not expect a decline at sero-positivity of about 10%. However, it is worth noting that general seropositivity during the declining phase of the outbreak has been in this range for not just India, but also in major western European cities, New York, amongst other.7 This has three explanations. First, the highly exposed community containing the high-contact spreaders gets infected first, with higher seropositivity, thus effectively shortening or blocking transmission chains. We too note a much higher seropositivity amongst public transport users who may better represent this sub-population. Second, R0 of 2-3 is the potential of this virus. R-effective after measures are put in place may be much lower.8 9 Better compliance with masking in India may have been important. Last, the fraction of population immune at baseline is unknown but has been variably estimated at 20-30% from T cell reactivity studies as well as closed area breakouts such as ships. This is a speculative area but may help understand the results. We agree that we do not directly account for testing rate, which is difficult to adjust for and can affect TPR despite that fact that TPR already is one way of adjusting for different levels of testing. Since our data is a trend across different geographies, but for 30 days bracketing the sample collection, different testing rate would not in itself explain the very strong inverse association of seropositivity with TPR. Given that high seropositivity areas are likely more advanced in the course of the pandemic, we favour that as the explanation. This is after noting the issues with overall seropositivity as a surrogate of population immunity as above.

      5) The correlation with vegetarianism is unusual - you might have argued that this could potentially protect against disease but that it might protect against infection is hard to credit. Much of South Asia is not particularly vegetarian but has seen significantly less impact

      We very well agree with the statement that much of South-Asia is particularly non-vegetarian and when we started analyzing our data, it was observed that our cohort had a 70:30 ratio for non-vegetarian population to vegetarian population which was in agreement with what nationwide surveys have concluded in the past and hence our cohort was not biased in terms of sampling for this variable. 10 We hereby in this work have tried to demonstrate sero-positivity as an indirect surrogate of infection and the data was not analyzed in respected of zonal distribution and was analyzed for the entire cohort where we obtained the said observation. At this stage, we cannot speculate on the role a vegetarian diet may play in decrease sero-positivity amongst vegetarian individuals but could possibly relate it to anti-inflammatory effects and effect of high fibre diet in protecting gut mucosa against viral invasion. Existing studies have only speculated on the role diet could play and there are no affirmative or largely biochemical studies to provide further evidence on this cause effect relationship.11 12 We also did a multi-collinearity analysis to study if diet was related to any other variable being studied but we didn’t find any such association.

      6) On the same point above, it is possible that social stratification associated with diet - direct employees being more likely to be vegetarian than contract workers - might be a confounder here, since outsourced staff seem to be at higher risk.

      When we analyzed the data, we also hypothesized for the above stated bias; a person’s occupation or job reflecting indirectly the socio-economic status can have an influence on diet preferences, but we didn’t obtain such a finding. In our cohort also, outsourced staff had higher non-vegetarianism than staff. Against 70:30 ratio of non-vegetarianism to vegetarianism, for the entre cohort, outsourced staff had 83 percent non-vegetarianism while staff had 66 percent, but sero-positivity amongst non-vegetarians in both the groups had higher sero-positivity of 17.25 and 8.77 percent respectively against sero-positivity of 11.89 and 6.05 percent amongst vegetarian people. We also did a logistic regression and collinearity assessment through VIF score but did not observe any such association and hence this was not acting as a confounder. For females, we rather didn’t found this association and only found transport and occupation to be significant, hence to a certain extent it is the crowding environment and occupational exposure which stand as major exposure variables when both the genders are taken into consideration.

      7) There may be correlations to places of residence that again act as confounders. If direct employees are provided official accommodation, they may simply have had less exposure, being more protected.

      That was a standing hypothesis for this work as CSIR labs provides accommodation at campus at most of the labs, this data we couldn’t study as the variable was not available for where we could have observed the residence status of a person if he/she resides in office provided accommodation or outside the lab in city. Though we didn’t study this exclusively, it remains more of speculative than affirmative but this is in agreement for a hypothesis that outsourced personnel and staff who have to travel and specifically utilize public means of transport are exposed to a higher risk.

      8) The correlations with blood group don't seem to match what is known from elsewhere.

      Data for 7496 individuals was available for their Blood Group type and serological status. Blood Group (BG) distribution amongst total samples collected was similar to national reference based on a recent systematic review.13 Hence the sample characteristics of our cohort were similar to the national population reference. Through the literature available, it has been observed that ‘O’ BG type has less risk of getting infected which was observed in our study also.14-19 In our study, BG type O was associated with a lower sero-positivity rate, with an OR of 0∙76 (95 % CI 0∙64 -0∙91, p=0∙018) vs Non O blood group types with a overall sero-positivity of 7.09 percent which was less than the cohort wide sero-seropositivity. BG type AB and B had higher chances of testing sero-positive is what has been observed by available literature which was corroborated in our findings too.17 In regard to available literature; BG A has a higher risk of getting infection and this was contrary to our finding where we obtained a favourable OR in favour BG type A albeit it was not significant on statistical testing.14-16 18

      9) The statement that "declining cases may reflect persisting humeral immunity among sub-communities with higher exposure" is unsupported. What sub-communities?

      Wording has been corrected, it just refers to sub-groups of population with high levels of exposure

      Reviewer 2:

      Weaknesses:

      1) The extrapolation of the study results to the country may not be completely acceptable with the basic difference from the country's urban rural divide and a largely agricultural economy. The female gender is underrepresented in the study cohort, and no children have been included.

      We agree with the reviewer that this is a specific cohort. We agree that female gender in the cohort is underrepresented and hence all variable based associations were done separately for male and female. For low number of female samples in the cohort, association with smoking etc could not be carried out, while, it was obtained as not significant on model testing for diet variable. As the ethical approval didn’t permit us to have data on children, we couldn’t provide the same, but it is complemented through ICMR survey who have provided data for younger individuals. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript.

      a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national sero-survey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

      b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates

      c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate)

      2) The observations regarding corelates of sero-positivity such as diet smoking etc would need specifically designed adequately powered studies to confirm the same. The sample size for the three and six months follow up to conclude stability of the humoral immunity, is small and requires further follow-up of the cohort. The role of migration of labour helping the spread of the pandemic simultaneously to all parts of the country though attractive may not explain lower rates in states like UP and Bihar where maximum migrants moved to.

      We agree that the observations in regard to diet and smoking are only hypothesis generating and need specifically designed studies to confirm the findings. We have also mentioned in the manuscript that associations found between seropositivity and some of the parameters should be confirmed with studies specifically designed for this purpose. We are following up more individuals at three and six months to ascertain the stability of the antibodies. Maximum migrants in the early phase moved to UP and Bihar and it would indeed be expected that seeding would be higher there. While known cases were low for these states, the seropositivity data supports that seeding did occur but may have gone undetected. The ICMR Aug-Sept serosurvey data, for example, shows seropositivity in districts of these states to be higher than those Gujarat or Rajasthan.

      3) A large chunk of seropositive data set has been removed representing the big cities of Delhi and Bengaluru while correlating Test Positivity Rate citing duration as the reason. However, these cities also had different testing strategies and health infrastructure and hence are important.

      We agree that for these cities, data was removed considering sample collection was extended in these labs, though for Delhi, only IGIB has been removed, rest all Delhi labs data are still in the analysis. The data was removed for the mentioned reason above but, the graph directionality and trends remain same when analyzed with the excluded data. On keeping Bengaluru data, R square doesn’t change to second decimal place and remains same, while on adding the data from CSIR-IGIB, the R square is 0.32, maintaining the directionality and trend. We would like to state that for IGIB, the collection spanned over considerable time duration when the sero-positivity was low to the time when sero-positivity had come to mentioned levels in Delhi.

      4) Test positivity rate depends on testing strategy and type of test used; whether RTPCR or the Rapid Antigen Test and the ratio of the two tests was different in different parts of the country.

      This is a very well taken point, but the data was taken as a surrogate from a third party website for calculation purposes only with results obtained were logically expected, though yes, it should be done ideally with one type of test only and this could influence the outcome and interpretation , but when we saw the data and compare to the observed real trends, the graph directionality is in agreement and hence the adoption of these as surrogate could well work in these scenarios specifically when context is in of large scale heterogenous population.

      Reviewer 3:

      Weaknesses: While it is a pan-India survey, the population is not quite representative of general population of the country. CSIR labs are mostly in cities, and most of the employees use private transport. So the results cannot be generalized to the country as a whole. Restricting to people using public transport would be a better representation, although it still would not be fully representative.

      We agree with the reviewer that this is a specific cohort, largely urban. We also agree that a cohort of people utilizing public transport would be better representative and we are following the individuals as the cohort enables to follow them and get further insights. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript.

      a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national sero-survey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

      b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates

      c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate).

    1. Author Response:

      Reviewer #1:

      This manuscript reports the results of two timing experiments. The experimental paradigm asks participants to judge the time of target items in an unfilled interval between two landmark stimuli. In experiment 1, there is one item that must be judged. In experiment 2, there are two items to be judged. The basic empirical result is that relative order judgments in experiment 2 are more accurate than one might expect from the absolute timing judgments of experiment 1. A model is presented.

      My overall reaction is that this paper does not present a sufficiently noteworthy empirical result. I can't imagine that there is a cognitive psychologist studying memory who would be surprised by the finding that relative order judgments in the second experiment are more accurate than one might expect from the absolute judgments in experiment 1. On the encoding side, in these really short lists (with no secondary task), there is nothing preventing the participant from noting and encoding the order as the items are presented (not unlike the recursive reminding). On the retrieval side, we've known for a very long time that judgments of serial position use temporal landmarks (see for instance a series of remarkable studies by Hintzman and colleagues circa 1970).

      Methodologically, this paper falls short of the standards one would expect for a cognitive psychology paper. There are basically no statistics or description of the distribution of the effect across participants. Although I'm pretty well-convinced that the basic finding (distributions in experiment 2 are different from experiment 1), I could not begin to guess at an effect size. The model is not seriously evaluated. The bimodal distributions are a large qualitative discrepancy that is not really discussed.

      Although the title of the paper invites us to understand these results as telling us something about episodic memory, the empirical burden of this claim is not carried. Amnesia patients (and animals with hippocampal lesions) show relatively subtle differences in timing tasks. There is no evidence presented here, nor literature review, to convince the reader of this point.

      Reviewer #1

      We regret that the reviewer did not focus on the main results of the paper, and limited their remarks to just one analysis comparing the relative order precision to the one predicted from the naive assumption on independent absolute time judgements for each item. This analysis was done to confirm that relative order is quite precisely remembered for short lists that is indeed not surprising, but we still did it in order to get a quantitative estimate of ordering mistakes that we needed for our Bayesian experiments. Another purpose was to filter out the participants that don’t pay attention to the task (a common problem when performing experiments over the internet).

      Regarding the title of the paper, we are not aware of similar experiments as ours done with amnesic patients. However we take the reviewer's point that the relation of our experiments to episodic memory as usually understood is not direct, so we took the word 'episodic' off the title in the revised version. We also added statistical analysis of the results.

      Reviewer #2:

      In this manuscript, the authors set out to measure participant's decisions about when an item occurred in a short list of 3 or 4 items, where the first and last items were always at the beginning and end, respectively. They report two behavioral studies that examine time judgments to items in the intermediate positions. They show that time judgments (when did you see X item using a continuous line scale) are always a little off but, more importantly, they tend to be anchored to other items presented. The results are interesting and add to our knowledge of the representation of time in the brain mainly by introducing a new paradigm with which to study time. Within the broader context of research on timing capacities, it should not be surprising that participants do not have a continuous representation of time that lasts beyond traditional time interval training of a few hundred milliseconds to a few seconds. Furthermore, research has also shown that 'events' that require attentional resources do morph our perception and memory for time. So while the paradigm is worth expanding on, the behavioral results are not surprising given this past literature. I do feel however that this work is an important first step in developing a more firm model of memory for time.

      Reviewer #2

      Indeed, as mentioned above in response to Reviewer #1, we are not surprised that subjects don't remember well the absolute presentation time, especially when several items were involved. Exactly what they remember is the main point of this study, and the model is quite crucial in understanding what we believe is our novel result about how ordinal and absolute time representations interact in memory. The reviewer did not seem to appreciate this; rather they re-formulated our results as time judgments (when did you see X item using a continuous line scale) being 'anchored' to other items presented. We are not sure what this exactly means, probably that on average the difference between reported times of different items stayed almost constant for each presentation conditions. However our study not only presented this result but showed how it follows from the Bayesian theory.

    1. Author Response to Public Reviews

      Reviewer #1 (Public Review):

      [...] What is left unclear is what is unique about the fibrotic substrate in ESUS patients in comparison to AFib patients in the future.

      We thank the reviewer for these reasonable and accurate critiques. In the revised version of our manuscript, we offer a more in-depth analysis of potential cohort-scale differences in the spatial distribution of fibrosis between ESUS and AFib patients and how that might affect the overall arrhythmogenicity of fibrotic remodeling between the two populations. We further acknowledge comprehensive understanding of pathophysiological consequences of fibrosis in ESUS will require much more research in the future. Our plans include analysis of how fibrosis might affect LA hemodynamic properties and the likelihood of clot formation. Future work (both clinical and computational) will also be needed to test the hypothesis generated by the present study that ESUS patients lack the triggers needed to initiate AFib. We have added clarifying text to the Discussion section of our manuscript to acknowledge these two points (see lines 286-289, 367-368).

      Reviewer #2 (Public Review):

      [...] 1) As the authors point out, clinical studies have revealed that the fibrotic burden in ESUS patients is similar to those with aFib. The question is why then, do so few ESUS patients exhibit clinically detectable arrhythmias with long-term monitoring. The authors hypothesize and their data support the notion that while the substrate is prime for pro-arrhythmia in ESUS patients, a lack of triggering events may explain the differences between the two groups.

      We thank the reviewer for their kind comment about the level of anatomical and structural variability in our study. We concur that additional analysis of fibrosis spatial pattern properties (local fibrosis density and entropy, as calculated in our previous work) on a region-wise basis between AFib/ESUS and inducible/non-inducible models would add significant value to our work. Accordingly, we have made significant additions to the text including a completely new figure.

      2) I think the authors could go further in describing why this is surprising. Generally, severe fibrosis is thought to potentially serve as a means or mechanism for pro-arrhythmic triggers. This is because damage to cardiac tissue typically results in calcium dysregulation. When calcium overload occurs in isolated fibrotic tissue areas, or depolarization of the resting membrane potential due to localized ischemia allows for ectopic peacemaking, we might expect that the diseased/fibrotic tissue is itself the source of arrhythmia generation. I think the novel finding here is that this notion may be a simplification, and the sources of arrhythmia generation may be more complex and may need to come from outside the areas of fibrosis. I think this is a big deal.

      Patients with stroke were excluded from the AFib cohort because the etiology of stroke in our AFib cohort was not explicitly adjudicated to be cardioembolic, other ischemic such as atherosclerotic, or haemorrhagic and therefore would not allow us to draw reliable conclusions regarding the role of the atrial substrate in stroke in this population. A separate issue is the fact that the cell- and tissue-scale electrophysiology in models reconstructed from ESUS patients was based on the same representation as those used in AFib models. In fact, this was a deliberate design choice to ensure that our modeling results represented a “worst case scenario” for the potential impact of fibrosis in patients with ESUS. Given the fact that our aim is to determine whether there are any differences in the pro-arrhythmic capacity of fibrotic substrate in ESUS and AFib groups, we believe that this is a suitable and justifiable modeling choice – modeling fibrosis differently in the two populations would be difficult to justify due to a lack of good experimental data and would introduce more confounding factors.

      Nevertheless, we agree this is a relevant limitation of our study and we have added an acknowledgement of that fact to our revised manuscript (see lines 361-365).

      3) An acknowledged limitation of the study is the assumption of fixed conduction velocity and action potential duration/effective refractory period. Bifulco et al. base this assumption on previous studies by the group (e.g. L312), which, however, concluded that reentrant driver locations and inducibility are sensitive to changes of action potential and conduction velocity (Deng et al.). For conduction velocity, wider ranges have been reported since the publication of the supporting reference (35) in 1994, e.g. Verma et al.; Roney et al.

      The reviewer’s point is well taken. Accordingly, we have added qualifying language pertaining to RD localization analysis in our Discussion (see lines 323-326). Having said that, we do not think this issue stands to fundamentally change our top-line interpretation of the findings from simulations, as it pertains to the idea that fibrosis in ESUS might plausibly be latent proarrhythmic substrate. The point of the paper by Deng et al. was to analyze sensitivity of reentrant driver localization to altered cell- and tissue-scale electrophysiological properties, not the concept of inducibility per se. It is thus likely that if our entire study were repeated with ±10% CV or APD (both within normal physiological range for average fibrotic atrial tissue) the take-home message would be the same.

      4) The number of pacing sites is rather low for a comprehensive in silico arrhythmia inducibility test but likely a good balance of coverage and computational feasibility considering that the primary goal of this research was to check whether the two groups of models show differences when undergoing the same (but not necessarily exhaustive) protocol.

      We would argue that 15 sites in the LA alone is comparable in coverage to prior studies in biatrial models (e.g., 30 LA/RA sites in Zahid et al. [2016] Cardiovasc Res; 40 LA/RA sites in Boyle et al. [2019] Nat Biomed Eng). We would further stress that our decision to use these specific sites was based on our motivation to simulate triggered activity (i.e., rapid pacing) exclusively from sites identified as common clinically relevant trigger locations documented in AFib patients (see ref. [14] by Santangeli et al. [2016] Heart Rhythm). If we were to instead pace from randomly distributed atrial sites as in prior work, we would jeopardize our ability to draw conclusions on the potential relevance of our simulations to the real-world susceptibility of atrial fibrotic substrate in ESUS patients to ectopic beats originating from realistic locations.

      5) The discussion does a good job in putting the results into context. Two interesting observations that deserve more attention are that i) the Inducibility Score was always higher for AFib vs. ESUS (Figure 6A, no statistical test performed). However, this did not translate to a difference in silico arrhythmia burden (inducibility). ii) Reentrant drivers were about twice as likely to localize to the left pulmonary veins than the right pulmonary veins in the AFib models (Figure 6D).

      Regarding the first point (i), with corrections made to the fiber mapping process, the statement regarding uniformly higher IdS values in AFib models is no longer true. Moreover, with our revised analysis there is no significant difference in the region-wise inducibility rates (P=0.45). The reviewer’s second point (ii) still stands and is even more pronounced with a ~3x higher rate of localization to the LPV vs. RPV areas in AFib models. Notably, our new region-wise analysis of fibrosis spatial pattern (see new Fig. 6 and our response to major points 4 and 5 above) shows that LPV regions in AFib models in this cohort were much more likely to have the combination of high fibrosis density and entropy previously shown to be highly favorable to reentrant driver localization. However, we recognize that a more fulsome analysis will be required to draw truly meaningful conclusions on this subject in the context of either AFib or ESUS patients; this has been briefly noted in our Limitations section (see lines 332-335).

      6) The study succeeded in answering the question it posed in the sense that no marked difference was found between the ESUS and AFib models. This leads to the question what the stroke-inducing mechanism is in the ESUS patients. A hypothesis for future work could be that the fibrotic infiltrations in the ESUS patients reduce the hemodynamic efficacy of the left atrium and render clot formation (e.g. in the atrial appendage) more likely in this way.

      The reviewer’s comment is duly noted and entirely consistent with our plans for future work. In fact, we recently published a white paper (Boyle et al. [2021] Heart) outlining a vision to combine electrophysiological models of the left atrium with biomechanics and hemodynamics simulation to comprehensively understand how fibrosis might influence clot formation. Our revised Discussion emphasizes this exciting trajectory for future work (see lines 370-372).

      7) The negative finding in this study (no difference between groups) does not naturally allow us to draw clinical implications for diagnosis or stratification. Additional ways to put the hypothesis proposed by the authors (fewer arrhythmogenic triggers in the ESUS patients) to test could be to consider readouts/surrogate measures of the autonomic nervous system.

      We have noted in our Discussion (see lines 286-289) that future work could test the hypothesis arising from this project via electrocardiographic monitoring in ESUS patients with different levels of fibrosis. Concerning the idea of using direct readouts of autonomic tone, we chose to leave this out since we are unaware of any clinically available systems. The usefulness of surrogate measurements (e.g., heart rate variability) in this context also remains unclear.

      Reviewer #3 (Public Review):

      [...] 1) As the authors point out, clinical studies have revealed that the fibrotic burden in ESUS patients is similar to those with aFib. The question is why then, do so few ESUS patients exhibit clinically detectable arrhythmias with long-term monitoring. The authors hypothesize and their data support the notion that while the substrate is prime for pro-arrhythmia in ESUS patients, a lack of triggering events may explain the differences between the two groups.

      We thank the reviewer for these kind remarks. It is encouraging to have our results interpreted so elegantly and accurately. We are excited to test this new hypothesis (and others prompted by the peer review process for this manuscript) in future studies.

      2) I think the authors could go further in describing why this is surprising. Generally, severe fibrosis is thought to potentially serve as a means or mechanism for pro-arrhythmic triggers. This is because damage to cardiac tissue typically results in calcium dysregulation. When calcium overload occurs in isolated fibrotic tissue areas, or depolarization of the resting membrane potential due to localized ischemia allows for ectopic peacemaking, we might expect that the diseased/fibrotic tissue is itself the source of arrhythmia generation. I think the novel finding here is that this notion may be a simplification, and the sources of arrhythmia generation may be more complex and may need to come from outside the areas of fibrosis. I think this is a big deal.

      This is an excellent point and we strongly concur that the “trigger-centric” interpretation of the pathophysiological consequences of fibrotic remodeling should be reconsidered. To further reinforce this fact, we ran additional simulations to rule out the possibility that there might be exaggerated resting membrane potential depolarization in AFib but not in ESUS, which might provide an alternative explanation for the clinical manifestation of arrhythmia in the former but not the latter. Our new results support the point raised by the reviewer and, in our opinion, increase the overall impact of the work.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper presents the exciting statement that increasing viral loads within a community can be used as an epidemiological early-warning indicator preceeding increased positivity. It would be interesting to support this claim to present both Ct and positivity on the same graph to demonstrate that indeed, declining Ct can be used as an early marker of a COVID-19 epidemic wave. Percentage of positive test data should not only include the ones obtained in the present study but should be compared with "national data" as the present study design includes a bias in patients selection that might not reflect the "true" situation at the time. Only with this comparison, we could claim that the present study design could predict COVID-19 epidemic waves. A correlation of Ct with clinical evidence to rank the confidence of positive results is also included and further support the high specificity of the RT-PCR for detecting SARS-CoV-2 (99.995%).

      In a serological investigation, it was observed that some of these RT-PCR-positive cases do not appear to seroconvert and that possible re-infections might occures despite the presence of anti-spike antibodies. Although, reported on few individuals and therefore to be taken with extreme caution, this add some piece of information to the current unknown of the serological response of COVID-19 patient and would be of uttermost importance in the context of the current vaccination campaign.

      We do not agree that this study is biased in terms of patient selection – it invites randomly selected households to join the survey and is in fact the major source of unbiased surveillance data in the 4 nations of the United Kingdom.

    1. Response to Public Reviews

      Response to Reviewer #1 (Public Review):

      We thank the reviewer for their kind comments and were glad to learn that the reviewer felt the manuscript significantly contributed to body of knowledge on COVID-19.

      Response to Reviewer #2 (Public Review):

      We thank the reviewer for their dedication to a detailed review, and we greatly appreciated the constructive suggestions given that have helped strengthen the overall manuscript.

      We agree with the reviewer that the loss of power for detecting a loss of sensitivity by the ID NOW PCR assay was hindered by the overall low population frequency of COVID-19 disease; this resulted in a low number of positive patients and ultimately led to early study termination as a result. Because of the reviewer’s helpful observation, we have now re-estimated the power of this study without reference to the observed results, but with consideration for the sample size and proportion of RT-PCR positive tests that were observed when the study was terminated. This new re-estimation suggests that the study retained 80% power to find a difference of 15% or more in sensitivity between ID NOW isothermal PCR and conventional RT-PCR; this analysis also demonstrated over 95% power to find a difference in specificity of more than 5%. Indeed, the significant drop in population prevalence that led to a loss of power for detecting loss of sensitivity expectedly resulted in an increase in power for detecting loss of specificity. We have expanded the Methodssection of the paper to better expose these issues, and we have expanded our statement of strengths and limitations in the Discussion for the same reason.

      The Methods section of the manuscript now reads as follows:

      "The original study design called for enrolling 2000 symptomatic and 500 asymptomatic subjects, which would have provided, in the symptomatic population, power of 80 % for finding a difference (at α = 0.05) of 5% in the sensitivity of ID NOW compared with the RTPCR reference standard; inclusion of at least 1350 negative patients would have provided 95% power (at α =0.025) for finding a 5% difference in specificity. The study design assumed a population prevalence of 10%, and the study was terminated early when the population prevalence dropped to such a low level as to make the study unaffordable. We have re-estimated the power of this study without reference to the observed results but considering the sample size and proportion of RT-PCR positive tests that were observed when the study was terminated. This re-estimation suggests that the study retained 80% power to find a difference of 15% or more in sensitivity between ID NOW and RTPCR, and well over 95% power to find a difference in specificity of more than 5%. Indeed, the significant drop in population prevalence that led to a loss of power for detecting loss of sensitivity resulted, as expected (Bujang and Adnan, 2016), in an increase in power for detecting loss of specificity.

      The revised section of the Discussion regarding strengths and weaknesses now reads as follows:

      "Our clinical study also suffered a significant loss of power to assess ID NOW sensitivity as a result of the low number of positive results, and the reduction of sample size caused by the decision to terminate the study as a result. The meta-analysis is also limited by the small number of studies meeting inclusion criteria, and the fact that positive cases are heavily concentrated in only a single study. Strengths of the clinical study include pretrial power analysis with sample size estimation, precise adherence to the ID NOW specimen acquisition protocol, and extremely high power for assessing assay specificity. Taken together with the focus on initial diagnosis of disease in the studies included in the meta-analysis, we believe the combination of trial and meta-analysis provides useful information for clinicians for whom point-of-care testing is helpful."

      We thank the reviewer for noting the unique findings from the current cohort study in comparison to existing literature. The current cohort study was done with meticulous care to identify apparent “false positives” returned by ID NOW PCR assay. This is also reflected in some of the high-quality studies available in the literature. In the supplementary data, we have provided confusion matrices for all the studies included in the meta-analysis. We have identified four such cases out of 1,942 total ID NOW tests. Cell sizes of 0 (from our study) and 4 are two small to allow use of Chi-squared for assessment of heterogeneity, and unfortunately the total number of tests is too large to allow computation of Fisher’s exact test; however, with such small numbers it is reasonable to treat them as samples drawn from Poisson distributions. The confidence interval around a Poisson estimate of 4 is 1.08987 ≤ μ ≤ 10.24159, and that around a Poisson estimate of 0 is 0.00000 ≤ μ ≤ 3.68888. The overlap is such that the estimates, 0 and 4, are consistent with having sampled the same distribution. While this does not allow us to conclude that no difference exists between our results and those of the other studies, it does not provide any evidence that there is a difference from the current cohort study and those previously published.

    1. Author Response to Public Reviews

      We thank the reviewers and editors for their detailed and insightful comments. We believe the consequent revisions have greatly increased the overall clarity of the manuscript, and provide important additional context and analysis.

      Reviewer #1 (Public Review):

      We thank the reviewer for the detailed comments.

      [...] Overall, the manuscript lacks substantial statistical support or clear evidence of some of the patterns they are stating and would require a substantial revision to justify their conclusions. The majority of the manuscript relies on 8 infant/mother pairs where they have evidence of pertussis infection and rely on the dense sampling to investigate infection dynamics. However, this is a very small sample size and further, based on the results displayed in Figure 1, it is not obvious that the data has a very pattern that warrant their assertions.

      As noted in the introduction, we begin our results with “a descriptive analysis of eight mother/infant pairs where each symptomatic infant had definitive qPCR-based evidence of pertussis infection.” Our goal in this section is to use noteworthy examples to highlight salient epidemiological patterns, which we explore in further detail using data from the full cohort in subsequent sections. We note that the results presented in Fig 3 onwards in no way rely on any arguments and/or specific patterns described in Fig 2. In other words, the original eight pairs revealed several unanticipated findings (particularly the finding of repeated high CT values PCR findings in the mothers of a child with definite pertussis), that were intriguing and potentially relevant in terms of pertussis epidemiology. They are also unique – we have not seen any published time series data using qPCR in this way before. These early observations motivated us to conduct a more detailed and quantitative analysis of the cohort of >1,300 mother/infant pairs.

      The sample size under consideration in the majority of the manuscript (i.e., all except for the above section) is 1,320 mother/infant pairs (2,640 subjects), as shown in Table 1 and 2. In the original submission, sample sizes were also clearly indicated in Figure 2B (assays per week), Fig 3B (subjects per group), Table 2 (subjects per group), Figures S1-2 (study profile), Figure S3 (NP samples per infant), and Table S1.

      We have revised the panel order and axes labels of the current Figure 3 to more clearly illustrate the relationship between panels, and to clarify that the 6 example pairs shown in Fig 3A are unrelated to the 8 pairs shown in Figure 2. We hope this addresses any remaining confusion.

      While there are some instances with a combination of higher/lower IS481 CT values, it does not appear to have a clear pattern. For example, what are possible explanations for time periods between samples with evidence of IS481 and those without (such as pair A, C, D, E, F and H)? There also does not appear to be a clear pattern of symptoms in any of these samples (aside from having fewer symptoms in the mothers than infants).

      The ambiguity of these patterns played a role in guiding our analysis of the entire cohort, where we establish evidence for infection based on a preponderance of evidence from a large number of individuals.

      Further, it is not obvious how similar these observed (such as a mixture of times of high or low values often preceded or followed by times when IS481 was not detected) is similar to different to the rest of the cohort (in contrast to those who have a definitive positive NP sample during a symptomatic visit).

      The main results are primarily a descriptive analysis of these 8 mother/infant pairs with little statistical analyses or additional support.

      We strongly disagree with this characterization of our results, where we state that “In this analysis, we focus on the 1,320 pairs with ≥4 NP samples per subject (Figure S3)”. We believe the reviewer’s confusion may stem, in part, from a mis-interpretation of Figure 2 (below), along with our erroneous reference to Figure 3 (we incorrectly stated Fig 2, adding to the confusion). With this in mind, we have revised the previous Figure 2 (now Figure 3) in the interest of clarity, and more carefully described exactly what the points displayed in Figure 3 represent.

      The authors do not provide evidence or detail about what is known about the variability in IS481 CT values, amongst individuals, or over time, or pre/post vaccination. Without this information, it is not clear how informative some of this variability is versus how much variability in these values is expected.

      We agree that this is important information, and we have added figures and results summarizing the observed impact of vaccination on CT values (see essential revision 1, above), and the patterns of transitions of CT values across adjacent samples within individuals throughout the study (see essential revision 2). This latter analysis is now summarized in Figure 6, and shows a clear tendency for step-wise transitions over time. The implication is that the data present structure rather than random noise. This supports our overall contention that full-range CT values can provide meaningful insights into pertussis epidemiology. We also note that Fig. 7A (previously Fig 3A) and Table 3 (previously Table S1) do indeed summarize the distribution of CT values, including variability amongst individuals. As noted above, we have also included an additional analysis summarizing the interdependence of CT value on both symptoms and antibiotics (Fig 8-figure supplement 1).

      I think particularly in Figure 1, how many of the individuals have periods between times when IS481 evidence was observed when it was not observed, is concerning that these data (at this granular a level) are measuring true infection dynamics.

      Adding in additional information about the distribution and patterns of these values for the other cohort members would also provide valuable insight into how Figure 1 should be interpreted in this context.

      We believe our previous comments concerning the relationship between the current Figure 2 (illustrative example) and the remaining figures (cohort analysis) addresses this comment.

      As it stands, the authors do not provide sufficient interpretation and evidence for having relevant infection arcs.

      We have revised the manuscript to clarify that infection arcs are observed in other studies and expected in infected individuals, rather than directly observed and/or quantified in this study.

      It appears that Figure 2A was created using only 8 data points (from the infant data values). If so, this level of extrapolation from such few data points does not provide enough evidence to support to the results in the text (particularly about evidence for fade-in/fade-out population-level dynamics). Also, in Figure 2, it is not clear to me the added value of Figure 2C and the main goal of this figure.

      We believe our previous comments have addressed this point. As noted, we have revised the current Fig 3 for clarity. Figure 3A and 3C are intended to demonstrate the structure of the cohort across the study period. We have revised the caption to clarify this point.

      The authors have created a measure called, evidence for infection (EFI), which is a summary measure of their IS481 CT values across the study. However, it is not clear why the authors are only considering an aggregated (sum) value which loses any temporality or relationship with symptoms/antibiotic use. For example, the values may have been high earlier in the study, but symptoms were unrelated to that evidence for infection - or visa versa.

      We believe that temporal patterns of CT values within subjects now described in Figure 6 deserve further detailed attention that is outside the scope of the current work. We believe the high-level empirical summaries presented here are strengthened by their reliance on a preponderance of evidence. In the current revision, we have also included additional analyses that we believe address some (if not all) of the reviewers concerns.

      This seems to be an important factor - were these possible undiagnosed, asymptomatic, or mild symptomatic pertussis infections? It is not clear why the authors only focus on a sum value for EFI versus other measures (such as multiple values above or below certain thresholds, etc.) to provide additional support and evidence for their results.

      Our approach seeks to use an objective statistical summary (geometric mean RCD proportion) to quantify the “signal” contained in IS481 assays within individuals across the course of the study. We note that, while both false positives and false negatives are likely in this study, the sample characteristics of the cohort mean that repeated false positives within individuals are unlikely based on chance alone. Further, a central aspect to our argument is that dichotomizing a continuous variable at an arbitrary threshold is reductive and unnecessarily introduces misclassification that reduces, rather than improves, statistical power.

      It is not clear why the authors have emphasized the novelty and large proportion of asymptomatic infections observed in these data. For example, there have been household studies of pertussis (see https://academic.oup.com/cid/article-abstract/70/1/152/5525423?redirectedFrom=PDF which performed a systematic review that included this topic) that have also found such evidence.

      We are aware of the paper above, which we had cited in the discussion. A key limitation of the referenced study is reliance on retrospective recall spanning many months. Since pertussis infections may be mild and non-specific, the fact that household contacts of an index case cannot recall a pertussis-like infection is consistent with asymptomatic infection, but far from definitive evidence. Moreover, the use of seroconversion as the measure of exposure is unreliable, since variations in antibody concentrations can be driven by a number of factors other than natural exposure.

      While cross-sectional surveys may be commonly used in practice, it is not clear that there is no other type of study that provides any evidence for asymptomatic infections.

      Our core argument is that it is impossible to know with certainty that a symptom-free patient with a detecting qPCR on Monday would not have become symptomatic if recontacted on Tuesday. By their nature, cross-sectional studies simply cannot parse asymptomatic from pre-symptomatic infections. To do that, one needs a longitudinal design, as reflected in the aforementioned longitudinal household contact studies. A key consideration addressed in the current work is the extent to which low and/or borderline CT values should be reinterpreted within the context of A) repeated sampling of individuals over time and B) epidemiological surveillance versus clinical diagnosis. We do not claim that our approach is the only one possible.

      Further, it is not clear why the authors refer to widespread asymptomatic pertussis when a large proportion of individuals with evidence for pertussis infection had symptoms. Would it not be undiagnosed pertussis if it is associated with clinical symptomatology?

      We have revised the text to highlight the significance of both asymptomatic and minimally symptomatic pertussis. As we describe both here and in Gill et al. 2016, only a handful of individuals meet the consensus criteria for clinical pertussis (Ct<35). In addition, qPCR results were not available to clinic staff in real-time. This, coupled with the relative absence of severe symptoms during study visits (especially in mothers), meant that only one study participant was diagnosed with pertussis at the time of their visit.

      Reviewer #2 (Public Review):

      We thank the reviewer for their supportive comments.

      This study was done in a population with wP vaccine, I wonder if that's part of the reason many of the CT values are high. Can the authors speculate what this study would look like in a population having received aP for a long period? I'd appreciate more discussion around vaccination in general.

      We have added results summarizing the possible interaction between IS481 assays with infant vaccination.

      We also note that aP is widely used in high-resource settings where overall pertussis incidence is lower, while pertussis diagnosis and treatment are more widely available. Our results indicate that mothers in this population experience non-trivial pertussis incidence over time, yielding immunological profiles from repeated infection that we expect differ markedly from that of individuals who lack naturally-derived resistance to infection via, e.g., mucosal antibodies and tissue-resident T-cells. Recognizing that our study does not provide a direct comparison with aP-vaccinated populations, we nonetheless believe that directly comparable populations (urban poor in under-served communities) are both numerous and under-studied.

    1. Author Response to Public Reviews

      Reviewer #2 (Public Review):

      []... A somewhat puzzling point is that the authors emphasize that their proposed frame work explains diminishing-return and increased-costs epistasis. Diminishing return has been described as a "regression to the mean effect" of sorts in Draghi and Plotkin (2013) for the NK model, and it was argued that a similar regression effect applies to a broad category of fitness landscapes in Greene and Crona (2014). Moreover, "increased-costs epistasis" is likely to apply broadly as well with a similar argument also for landscapes that fall outside the category discussed by in the manuscript (an example is in the Recommendation section). On the other hand, a major strength of the manuscript is that it provides a superior quantitative precision, and some quantitative understanding for when one can expect diminishing returns and increased costs epistasis (that should be emphasized more in my view).

      We thank the reviewer for bringing the above two references to our attention. We have added the two refs and a statement in the Discussion (line 472-476 in RM) to emphasize the above.

      [...] From a conceptual point of view, the locus specific framework, as well as the historical contingency discussion are valuable contributions. The fact that the author could construct a model (the CN model) that satisfy their minimal contingency condition is very interesting as well.

      The weakness of the manuscript is the presentation of the work, especially for a general audience. More context and background, explanations of quantitative results and references would help. There are also a few cases of unclear claims and confusing notation (SSWM seems to be assumed without that being stated, the notation for Fourier coefficients is unclear in some cases) and the text has some other minor issues. Fortunately, a limited effort (in terms of time) would resolve the problem, and also improve the prospects for high impact.

      We thank the reviewer for the detailed comments.

    1. Author Response to Public Reviews

      Reviewer #1 (Public Review):

      [...] If authors wishes to opt for highlighting NLR analysis, the following suggestions would help (9-14).

      1) Earth mover distance (EMD) has been applied to identify a locus with alternative polyadenylation. What is the basis of using EMD value of 25 as a cutoff? According to Figure 4 B,D, EMD can range from 0-4000. One would also wonder if the distance unit equals bp. In addition, EMD values of some genes (e.g. FPA and representative NLRs) can be specified in the main dataset so that significance of the cut-off values shall be appreciated.

      We found that for some very highly expressed loci, we were able to detect statistically significant changes in poly(A) site usage with very small effect sizes which were unlikely to represent functionally important changes. An EMD threshold was therefore required for removing these small effect size loci. The EMD is informally described as the minimum amount of “work” required to turn one distribution into another – it represents the percentage of the distribution moved multiplied by the distance moved. For example, an EMD of 25 could describe a situation where 10% of the transcripts have shifted by 250 nt, or 50% of the transcripts have shifted by 50 nt. A threshold of 25 gives a good trade-off between the percentage of proximal/distal site switching, and the distances between sites (since larger changes in distance are more likely to result in functional changes). We have included EMD values for example NLRs in the main text to give an idea of effect sizes of these genes.

      2) Regarding the manual annotation of alternatively polyadenylated NLR genes (L1160-): Genes with alternative polyadenylation were identified and the ending location was supported when there were minimum four DRS reads. It would be relevant to provide the significance of "the four" based on read coverage statistics, for example, with average read number covering an annotated NLR transcript with the specification of an average size.

      We have previously demonstrated that both Helicos and Nanopore DRS reads are able to capture the true 3’ ends of single RNA molecules. However, both techniques have some technical limitations which may result in artefacts – for example, the over-splitting of nanopore signal from a single molecule into multiple reads, or the incorrect alignment of low-quality basecalls at the ends of reads. For this reason, and also to standardise our approach to manually identifying FPA-regulated NLRs, we developed a standard operating procedure. We chose to identify poly(A) sites using a minimum of four nanopore read alignments, as a trade-off between sensitively detecting genuine alternative polyadenylation events, and ignoring events caused by poor alignment of low-quality reads or over-splitting. We also looked for evidence of events seen in nanopore data in other sequencing datasets, particularly the Helicos DRS alignments, to corroborate our findings. We have improved the language of the relevant methods section to clarify this.

      3) Figure 4E shows that Ilumina-RNAseq dataset detects the number of loci with a different order of magnitude compared with the other two methods. Reference-agonistic pipeline shall be appreciated, however, the method engaged might have elevated the counting of paralogous reads mapped to different locations than they should be. Along with paralogous read collapsing, this is always a problem with tandemly repeated genes, such as NLRs by and large. For example, NLR paralogs in a complex cluster with conserved TIR/NBS but diversified LRRs would have higher coverage in the first two domains but drop in the diversified parts. The authors need to specify their bioinformatic consideration to avoid such problems.

      Although the tone of the Illumina read section was careful and the main 3'-end processing conclusion was made by nanopore DRS, the authors are also advised to clearly state the limitation of using Illumina-RNAseq to address alternative polyadenylating sites at the beginning of the section, for example what to be maximally taken out from Figure 4 E and 4F. This will give relative weights to each dataset generated by different methods. One advantage of using Illumina data would be that the expression level changes can be associated with changes in processing, it seems.

      The reviewer is correct that multimapping reads are an issue at NLR genes and may lead to uneven coverage of uniquely and multi-mapped reads when some regions of a gene are divergent, and others are not. Although it is the relative change in coverage of exons or expressed regions which is important in DEXSeq analysis (rather than absolute coverage), it is possible that changes in processing that cause relative expression changes at one NLR locus may have impacts on the relative expression of multimapping regions at other paralogous NLR loci. We addressed this issue when quantifying the expression of expressed regions by running featureCounts using the –primary option that only counts primary alignments, but we failed to mention this in the methods. We have updated the methods to clarify this.

      4) At the RPP7 locus, At1g58848 is identical in sequences with At1g59218 as is At1g58807 with At1g59214 (two twins in the RPP7 cluster by tandem duplication). It would be good to check whether the TE At1g58889 readthrough indeed occurs in the sister duplicate with a potential TE in the downstream of At1g59218. If not, it can be used as an example of duplication and neofunctionalization through an alternative polyadenylation site choices.

      The tandem duplication of AT1G58848 and AT1G58807 in Col-0 makes the RPP7 locus complex to analyse even with long read sequencing data. We find that even with nanopore DRS data, nearly all reads mapping to AT1G58807 multimap at AT1G59124. There is clear evidence of exonic proximal polyadenylation in these transcripts, but the locus of origin is not determinable. In the case of AT1G58848 and AT1G59218, we find a mixture of uniquely mapping and multimapping reads at both genes, and both genes have uniquely mapped reads indicating exonic proximal polyadenylation in 35S::FPA, and chimeric RNA formation in fpa-8. This suggests that RNA processing of these loci is very similar, and so we opted only to show AT1G58848 as an example. Due to the much shorter length of Helicos DRS reads, we applied much more stringent filtering to remove incorrectly mapping or multimapping reads, meaning that there were not enough uniquely mapped reads at the AT1G58848 and AT1G58807 loci to perform Helicos EMD tests. We have updated the text to explain this more clearly.

      5) HMM search shall be revisited to confirm if they are to detect the TIR domain. Given that a large proportion of NLRs in A. thaliana carry TIR at their N-terminal ends and the specified examples included TIR-NLR, it is surprising to see no TIR domain in Figure 5.

      The absence of the Interpro annotation from Figure 5C (now Figure 4A in the revised manuscript) is a mistake on our part rather than due to its absence from the Interpro annotation. We have now corrected the figure and all other gene tracks to make sure that all Interpro annotations are shown.

      6) L659-668: how does the new data relate to the previously TAIR annotated At1g58602.1 vs At1g58602.2 (Figure 6, Inset 1)? It would be good to see these clearly stated in the main text as compared to newly identified ones. From the nanopore profiling, At1g58602.2 appears to be the dominant form.

      AT1G58602.2 from the Araport11 annotation contains the most distal annotated isoform of RPP7, whilst AT1G58602.1 contains a slightly more proximal 3’UTR. The reviewer is correct that AT1G58602.2 is the more dominant isoform in our Col-0 data. We have added a sentence that acknowledges this to the section on RPP7 3’UTR isoforms.

      7) One thing to note is that in the overexpressor of which Hiks1 R is suppressed, there was hardly any At1g58602.1 produced in addition to the large reduction of At1g58602.2. Thus, relative functional importance of the two transcripts shall be discussed in line with the Hpa resistance data. Accordingly, L740-741 phrasing shall be revised to include the possibility of absolute or relative "depletion" of functional transcript(s) contributing to the compromise in Hpa resistance.

      While we agree that, in principle, the change in relative expression of the two annotated distal isoforms of RPP7 could have functional consequences, given that both of these isoforms can encode a protein, the functional impact of this relative change is much less likely to be the cause of the loss of Hpa resistance in FPA overexpressing plants, compared to the larger change in exonic proximal polyadenylation, which produces transcripts which are unlikely to express protein. Given that we have not demonstrated conclusively that it is the increase in exonic polyadenylation of RPP7 that causes reduced immunity in 35S::FPA:YFP, we have made the language of our conclusions in the section “FPA modulates RPP7-dependent, race-specific pathogen susceptibility” more careful.

      8) It would be necessary to state in the main text the implication of phosphorylation on the two Ser residues on Pol II at L245. A clear description distinguishing the effect of the two phosphorylation and the specificity of the antibodies is desirable, as the data was interpreted as if the two sites made differences, such that Ser2 was heavily emphasized (e.g. subtitle). Albeit low level, Ser5 data also shows an overlap with FPA ChIP-seq coverage at the 3' end. If there is a statistical significance to be taken account to interpret the coverage, please state it. Given that elongation occurs progressively, I wonder how much should be taken out from the distinction.

      It is well established in the literature that Pol II phosphorylated at Ser5 of the C-terminal domain is a hallmark of initiating and elongating Pol II, whilst Ser2 is a hallmark of terminating Pol II (Phatnani and Greenleaf, 2006). This was first established in yeast, where it was shown that Ser5 phosphorylation is necessary for the recruitment of the mRNA capping machinery (Cho et al., 1997; Ho and Shuman, 1999). The yeast homolog of 5’-to-3’ exonuclease which is required for termination (West et al., 2004), was also shown to interact specifically with Pol II phosphorylated at Ser2 via an accessory protein (Kim et al., 2004). Therefore, comparing FPA occupancy to relative levels of Ser2 and Ser5 phosphorylated Pol II is an important validation of the location of FPA binding. We have added a sentence to the relevant Results section describing why CTD phosphorylation varies through the gene body. Arabidopsis ChIP-seq experiments from the literature which profile all Pol II (not just phosphorylated versions) indicate that in Arabidopsis, the highest occupancy is over the terminator (Yu et al., 2019). This may explain why there is also a peak of Ser5 at the terminator (i.e. if there are low levels of Ser5 in a region of higher occupancy, or if there is cross-reactivity of the antibody with Ser2 or unphosphorylated Pol II).

      9) Figures presentation for RPP4 and RPP7 are great in detailing the FPA-dependent NLR transcript complexity. To make the functional link more evident, the authors may consider bringing up parts of the Figure 5-supplement to a main Figure to detail the revised annotation of NLRs. Given recent advances in NLR structure and function studies, extra domain fusion, fission and truncated versions of NLRs require a great deal of attention. For example, potential functional link to the NMD-mediated autoimmunity and revised annotation of At5g46470 (RPS6) needs a clear visual guidance preferably with a main figure (Figure 5-Supplement 3).

      We thank the reviewer for this comment, and we agree that these figures deserve to be made more visible. This is one of the reasons that we have chosen to submit our manuscript to eLife, since supplementary figures are displayed alongside linked main text figures in an image slider which allows easy access to each gene track. We believe that this will also make it much easier to examine individual gene tracks, without having to compress them to fit them into a single figure panel. However, we do agree that RPS6 is particularly interesting and deserves to be a main figure. We have therefore split the NLR figure into two new figures and incorporated RPS6 gene tracks into the first of these.

      10) The section "FPA controls the processing of NLR transcripts" includes dense information and can be broken down to several categories. To this end, Supplement File 3 (NLR list) shall be revised to deliver the categorical classes and further details and converted to a main table.

      For NLR audience, for example, it would be important to associate the information to raw reads to assess where the premature termination would occur. At least, the ways to retrieve dataset or to curate the termination sites shall be guided.

      On the contrary, there is no need to include other genes in Figure 4 Sup4-8 under this section. They are not NLRs.

      We have created main-text tables for each of the three classes of FPA-regulated NLR genes, as suggested by the reviewer. We have also removed the examples of non-NLR genes regulated by FPA from the paper, to streamline the story. All the datasets analysed in the study are already available on ENA with database identifiers provided in the Data Availability section to guide readers.

      11) Figure 7 and IBM1 section can be spared to the supplement.

      We have followed the reviewer’s suggestion and this figure now appears as Figure 2 supplement 4. We have moved the results section on IBM1 up to join it with the global analysis of FPA function in RNA processing.

      12) The list of "truncated NLR transcripts" in particular, either by premature termination within protein-coding or with intronic polyadenylation, should be made as a main table. The table can be preferably carrying details in which degree the truncation is predicted to be made. With current sup excel files, it is difficult to assess the breadth of the FPA effect on the repertoire of NLRs and their function. This way, functional implication of differential NLRs transcriptome can be better emphasized.

      We have followed the reviewer’s suggestion here and prepared this information into main-text tables 1-3, including predictions of the functional consequences for intronic/exonic poly(A) site choice.

      13) FPA-mediated NLR transcript controls, as to promote transcript diversity, is expected to exert its maximum effect if FPA level or activity is subject to the environmental stresses, such as biotic or abiotic stresses. The discussion on effectors targeting RNA-binding proteins (L909-918) is a great attempt in broadening the impact of this research. In addition, if anything is known to modulate FPA activity, such as biotic or abiotic stresses or environmental conditions, please include in the discussion.

      We are not aware of any literature reporting the modulation of FPA activity by biotic or abiotic stresses. This is certainly an interesting question which we would like to examine. However, the analysis of FPA activity is complicated by a number of factors. RNA-level expression is often used as a proxy for overall activity. The RNA-level expression of FPA is not necessarily indicative of FPA activity, however, since the proximally polyadenylated isoform of FPA does not produce functional FPA protein. To get a clear picture of FPA activity during infection will therefore require high-depth Illumina RNA-Seq, nanopore direct RNA sequencing or proteomics analysis.

      14) NLR transcript diversity as source of cryptic variation contributing to NLR "evolution" is an interesting concept, however, evolutionary changes require processes of genic changes affecting transcript layers or stabilizing transcriptome diversity. In the authors' proposition in looking into accessions, potential evolutionary processes can be further clarified.

      We agree with the reviewer that a species-wide transcriptome analysis would provide an invaluable insight into how transcription can affect evolutionary changes. For example, we find that NLRs with high levels of allelic diversity are more likely to be regulated by proximal polyadenylation in Col-0, and so a species-wide approach will reveal whether this regulation is conserved or tailored to environmental conditions. An integrative analysis of genomic and transcriptomic data will also help to identify whether chimeric RNAs present in some accessions are found as retrotransposed genes in others. We have added these specific example experiments to the relevant discussion section.

      Reviewer #2 (Public Review):

      [...] Overall, it is a potentially important research. The data is rich and could be useful. However, the biological stories described are not thoroughly supported by the data presented, especially when the authors tried to touch on several aspects without some important validations and strong connections among different parts. Some special comments are provided below:

      1) The title of this manuscript is "The expression of Arabidopsis NLR immune response genes is modulated by premature transcription termination and this has implications for understanding NLR evolutionary dynamics". Therefore, the readers will expect some functional connections between the FPA and the novel NLR isoforms due to premature transcription termination. However, the transcript levels of plant NLR genes are under strict regulation (e.g. Mol. Plant Pathol. 19:1267). Since the functions of NLR genes are related to effector-triggered immunity, it is more important to study the function of FPA on premature transcription termination when the plants are challenged with pathogens. In this manuscript, most transcript analyses are based on samples under normal growth conditions. It is therefore a weak link between the genomic studies and the functional aspects. For instance, it is more important to identify unique NLR isoforms produced upon pathogen challenges that are regulated by FPA. The authors will need to provide some of these data to fill this gap.

      To clarify, the title of this manuscript is not as stated here by the reviewer but is “Widespread premature transcription termination of Arabidopsis thaliana NLR genes by the spen protein FPA”. We do indeed describe a functional pathogen test to examine the functional impact of FPA. We show that overexpression of FPA reduces the functional expression of RPP7 transcripts, and that this impacts upon the ability of plants to resist Hpa-hiks1. We agree with the referee that it will be very interesting to investigate, not just FPA, but changes in 3’ processing during infection by different pathogens. However, key questions on NLRs extend to how they function, how they evolve, how they trigger hyperimmunity and how they are controlled to limit impact on fitness, all of which may be impacted by the control of RNA 3’ processing.

      2) Since the function of FPA is to regulate NLR immune response genes, we should expect a change in plant defense phenotype in FPA loss-of-function mutants. Could the authors provide more information on this? On the contrary, in line 728 of this manuscript, the authors found that at least for some pathogens, "loss of FPA function does not reduce plant resistance". It is not consistent with the hypothesis that FPA is important to regulate NLR immune response genes.

      There is a straightforward misunderstanding here, possibly because our text in the relevant section was not sufficiently clear.

      We tested the impact of different activity levels of Arabidopsis FPA on NLR function by investigating the NLR, RPP7. We chose RPP7 because features of its function and regulation are relatively well characterised. RPP7 provides disease resistance to the oomycete pathogen Hyaloperonospora arbidopsidis (Hpa) strain Hiks1. The reference Arabidopsis accession, Col-0, encodes a functional RPP7 gene and hence is resistant to Hpa-Hiks1 infection. Not all Arabidopsis accessions are resistant to all Hpa strains. For example, the Duc-1 and Ksk-1 accessions have been reported as having susceptibility to Hpa-Hiks1 infection, likely due to the lack of a functional RPP7 gene (Lai et al., 2019). It was for this reason that we incorporated the Ksk accession as an infectionsensitive positive control accession in our pathogen tests.

      The question we were addressing was: Does FPA-dependent premature cleavage and polyadenylation in RPP7 exon 6 compromise RPP7 function? To address this question, we therefore applied Hpa-Hiks to our different genetic lines. Neither Col-0 nor the fpa-8 mutant (which is in the Col-0 genetic background) were sensitive to infection. This is consistent with our hypothesis because the poly(A) site used in exon 6 in Col-0, is used significantly less in fpa-8. Hence, there is no compromise in the expression of full-length RPP7 in fpa-8 mutants. As Col-0 is already resistant to Hpa-Hiks1, we would therefore expect fpa-8 to also be resistant and indeed, this is what we found.

      This was also true when we tested an independent allele, fpa-7, that is also in the Col-0 background. However, when we tested the line that was over-expressing FPA, which was introduced into an fpa-8 background (and hence, ultimately Col-0), we found that resistance was lost and Hpa-Hiks1 was able to infect these plants.

      Therefore, the findings from this experiment are completely consistent “with the hypothesis that FPA is important for regulating NLR immune response genes, and the observation that premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1.” We have clarified the text in this section to make our hypothesis and findings clearer.

      3) Furthermore, the authors mentioned in lines 729-731 "Greater variability in pathogen susceptibility was observed in the fpa-8 mutant and was not restored by complementation with pFPA::FPA, possibly indicating background EMS mutations affecting susceptibility." Does it mean that fpa-8 contains other mutations? Will these additional mutations complicate the results of the RNA processing? Could the authors outcross the fpa-8 mutation to a clean background?

      Given that the fpa-8 mutant was generated using EMS treatment, it is probable that it does contain other mutations besides the one that removes FPA function (this is likely to be the case with most mutants – whether they are generated with EMS or T-DNA insertions). These mutations are likely to be the source of the slightly greater variability in susceptibility to Hpa-hiks1 in fpa-8 compared to the fpa-7 T-DNA mutant. These potential off-target mutations are unlikely to be the cause of the RNA 3’ processing changes seen in the fpa-8 mutant, however, for three reasons: (i) we have previously published Helicos DRS data from fpa-7 mutants which shows that they have the same RNA 3’ processing defects as fpa-8 mutants, for example at PIF5 and IBM1 (Duc et al., 2013) indicating that changes in 3’ processing in fpa-8 and fpa-7 are caused by the common loss of FPA function; (ii) our Illumina RNA-Seq data for the FPA complementing line shows that an FPA transgene restores 3’ processing effects seen in the fpa-8 mutant, for example at PIF5, but does not restore the variability in susceptibility of fpa-8 to Hpa-hiks1 (Figure 6C) (iii) many of the genes with altered poly(A) site choice in fpa-8, including RPP7, show reciprocal changes in processing in the FPA overexpressing line. Taken together, these findings strongly indicate that the loss of FPA is what causes altered poly(A) site choice in an fpa-8 mutant.

      4) In line 318, the authors found 285 and 293 APA events in the fpa-8 mutant and the 35S::FPA:YFP construct respectively, but only 59 loci (line 347) exhibited opposite APA events (about one fifth). The low overlapping frequency suggests that some results could be false positive.

      The level of reciprocal alternative polyadenylation cannot be used to determine false positive rate. For a gene to show reciprocal effects, when comparing the results of fpa-8 vs Col-0, and 35S::FPA:YFP vs Col-0, requires at least two poly(A) sites to be used at high levels in Col-0. For example, at RPP7, high levels of proximal exonic polyadenylation are detectable in Col-0, meaning that a shift to distal site usage is detectable in fpa-8, as well as the shift to proximal site selection in 35S::FPA:YFP. However, there are many loci where this is not the case. For example, the abundant chimeric RNAs found at the PIF5 locus in fpa-8 are undetectable in Col-0, meaning that overexpression of FPA has no effect on PIF5 when compared to Col-0. Consequently, PIF5 is not amongst those genes with reciprocal regulation, despite the effect of FPA on PIF5 RNA processing being very clear in multiple datasets.

      5) In line 732-736: "In contrast, 35S::FPA:YFP plants exhibited a similar level of sporulation to the pathogen-sensitive Ksk-1 accession (median 3 sporangiophores per plant). This suggests that the premature exonic termination of RPP7 caused by FPA has a functional consequence for Arabidopsis immunity against Hpa-Hiks1." It is contradictory to the statement in line 728 that "loss of FPA function does not reduce plant resistance". Is it possible that overexpression of FPA:YFP had generated an artificial condition that is not related to the natural function of FPA?

      There is a misunderstanding here that may be due to the wording that we used in this section and we explain this above. Col-0 is resistant to Hpa-Hiks1 because it has a functional RPP7 gene. In fpa-8 mutants, the expression of full-length RPP7 transcripts is not compromised relative to Col-0 and hence it is as resistant to Hpa-Hiks1 as Col-0. In contrast, 35S::FPA:YFP promotes the use of a poly(A) site within exon 6, reducing the amount of full-length RPP7 detected. This poly(A) site is used in the Col-0 wildtype line but is not detectably selected in the loss-of-function fpa-8 mutant line. Together, these findings reveal that this poly(A) site is chosen in the Col-0 reference strain and that this requires FPA. Therefore, the selection of this site is the natural function of FPA and not simply generated by an artificial condition. We have re-worded the text in this section to clarify this misunderstanding.

      6) The fpa-8 mutant has a delayed flower phenotype (Plant Cell 13:1427). Could the 35S::FPA:YFP fusion protein construct reverse this phenotype and the plant defense response phenotype? It is important to interpret the data when the 35S::FPA:YFP construct was used to represent the overexpression of FPA.

      As we report in the Materials & Methods section, a line expressing 35S::FPA:YFP was obtained from Caroline Dean. Published evidence that this line complements the late flowering phenotype of fpa-8 is provided in the corresponding publication (Baurle et al., 2007) as Figure S5. In our growth conditions, these lines flower early like wild-type compared to the very late flowering of fpa-8. The late flowering phenotype of fpa-8 mutants is explained by elevated levels of the floral repressor FLC. The Illumina RNA-Seq, Helicos DRS and nanopore DRS data that we release here all show reduced levels of FLC in the 35S::FPA:YFP line compared to fpa-8 consistent with complementation.

      7) Under the subheading "FPA co-purifies with the mRNA 3' end processing machinery". The results were based on in vivo interaction proteomics-mass spectrometry. MS prompts to false positives and will need proper controls and validations. Have the authors added the control of 35S:YFP instead of just the untransformed Col-0? At least for the putative interacting partners in Figure 1A, could the authors perform validations of some important targets, using techniques such as reverse co-IP, or to show direct protein-protein interaction between FPA to a few of the important targets by in vitro pull-down, BiFC, or FRET, etc.

      FP fusions are widely used in IP experiments, but we are not aware of any study that reports 3’ processing factors to be recurrent contaminants in such experiments. We had anticipated submitting an additional proteomics study at around the same time as this study but aspects of this additional work were disrupted by control measures associated with Covid-19. What we do show here, is that an orthogonal approach (ChIP) with different antibodies (anti-FPA) also localises FPA to the 3’ end of Arabidopsis genes together with Pol II phosphorylated on Ser2 of the CTD. These orthogonal datasets are therefore consistent with our interpretation that FPA co-purifies with Pol II and multiple factors involved in the processing of RNA 3’ ends and are also supported by our transcriptomic analyses of fpa mutants and overexpressors which have altered 3’ processing.

      8) In Fig. 3, the data show that the last exon of the FPA gene is missing in the FPA transcripts generated from the 35S::FPA:YFP construct. Will the missing of this exon affect the function of the transcript and the encoded protein?

      As we state in the Materials & Methods section, this line was obtained from Caroline Dean and the details of its construction were previously described (Baurle et al., 2007). The transgene construct has a different promoter (CaMV 35S) and associated 5’UTR sequence and the sequence downstream of the stop codon is replaced by a transgene-derived 3’UTR. Consequently, these regions of the transgene-derived FPA do not align to the Col-0 reference. We have added new text to the Figure legend to clarify this point. Given that the 35S::FPA:YFP transgene complements the flowering time phenotype of fpa-8 mutants, and causes widespread changes in 3’ processing, there is no evidence that the lack of the canonical 3’UTR has a deleterious impact on the function of the FPA protein.

      9) The function of FPA is still ambiguous. There was a quantitative shift toward the selection of distal poly(A) sites in the loss-of-function fpa-8 mutant and a strong shift to proximal poly(A) site selection when FPA is overexpressed (35S::FPA:YFP) in some cases (Fig. 3, Fig. 5, Fig. 8). But the situation could be kind of reversed in other cases (Fig. 6). What is the mechanism behind it?

      Using different sequencing technologies, we clearly show that the predominant effect of FPA is to promote proximal poly(A) site selection and indeed that these cases are associated with the largest effect sizes. The mechanism involved is not studied here. One possibility is that genes which display an increase in distal polyadenylation when FPA is overexpressed are indirect targets of FPA. This would be unsurprising given that FPA regulates the alternative polyadenylation of a number of other factors involved in 3’ processing. Another possibility is that FPA can associate with different complexes of 3’ processing factors at different locations, resulting in opposing effects on 3’ processing. A future goal for us, in dissecting the mechanism by which FPA mediates NLR transcription termination will be to relate poly(A) site choice to direct RNA binding site interactions mapped by iCLIP, for example.

      10) Under the subheading: "The impact of FPA on NLR gene regulation is independent of its role in controlling IBM1 expression". IBM1 is a common target of FPA and IBM2. Indeed, FPA and IBM2 share several common targets (Plant Physiol. 180:392). It may be more meaningful to compare the impact of FPA and IBM2 on NLR gene instead.

      IBM2/ASI1 is an RNA and chromatin binding protein that regulates the expression of IBM1 by promoting elongation through intronic heterochromatic marks, as part of a complex with EDM2 and AIPP1. As a result, edm2, ibm2, and aipp1 mutants fail to produce full length IBM1 transcripts, resulting in phenotypes similar to the ibm1 mutant. Mutations in FPA were recently identified as suppressors of the phenotypes of ibm2 mutants. This is likely because FPA promotes the proximal polyadenylation of IBM1 transcripts.

      Since FPA regulates the proximal polyadenylation of IBM1, we asked if it was possible that some of the targets of FPA overexpression identified by nanopore and Helicos DRS were caused by indirect effects on chromatin state resulting from a decrease in full length IBM1 expression. However, there is no indication that FPA acts to promote alternative polyadenylation of IBM2. We therefore consider it unlikely that proximal polyadenylation of NLRs in the 35S::FPA:YFP line is caused by indirect effects on IBM2.

      11) In lines 423-425, the authors described "Consistent with previous reports, the level of mRNA m6A in the hypomorphic vir-1 allele was reduced to approximately 10% of wild-type levels (Parker et al., 2020b; Ruzicka et al., 2017) (Figure 4 - supplement 3)." This data could not be found.

      We have re-checked the submitted article. These data are indeed there: page 46, line 1510 and correctly labelled as Figure 4 supplement 3. In the revised manuscript these data are included as Figure 2-figure supplement 3, and the raw data is also available as Figure 2 source data 11.

      12) In line 426: "However, we did not detect any differences in the m6A level between genotypes with altered FPA activity." Which data is this statement referring to?

      This statement refers to the data in Figure 2-figure supplement 3 of the revised manuscript.

      Reviewer #3 (Public Review):

      [...] One minor complaint is that the authors don't focus on NLRs starting on line 436, and then they have extensive results on NLRs; by the time I got to the discussion, I'd forgotten about the early focus on the M6A. While the first part of the article is necessary, I would suggest a more concise results section to give the paper more focus on the NLR control (since that is emphasized in the abstract and the title of the manuscript).

      We thank the reviewer for their comments. We agree that the paper is dichotomous due to the initial focus on the function of FPA and subsequent identification of the effect on NLRs. We have reduced the length of the initial results sections, particularly the proteomics results, so as to come to our findings on NLR genes more quickly.

    1. Author Response to Public Reviews

      We thank the reviewers for their careful reading of our work, and their detailed and helpful comments. Their insights have helped us in improving this manuscript. We include their comments and our replies to them below.

      Reviewer #2 (Public Review):

      Line 293, by "comparing the Apo_NE and IB_EQ simulations at equivalent points in time" and perform subtraction "from the corresponding Ca atom from one system to another at 0.05, 0.5, 1, 3, 5ns". It is not clear to me why those time points were chosen? Have authors attempted at validating whether or not the signal from the ligand-binding site has had enough time to propagate across the allosteric signaling pathway? If one considers that the ligand is a spatially localized signal, it requires time to propagate. This is in contrast with the Kubo-Onsager paper cited by authors in which the molecule is responding to a global perturbation such as an external field. However, a local perturbation on one side of the protein will need time to propagate to the other side of the protein (30 angstroms away in this case).

      The time points are chosen to highlight the propagation of signal in the short nonequilibrium simulations. We agree with the reviewer that the signal will take time to propagate; indeed, it evolves over time, as can be seen in the figures and accompanying movies. It is important to emphasise that this is averaged over many trajectories. Some conformational rearrangements will not be fully sampled, as can be seen in Figure 3–Figure supplement 3. It is important to emphasize that the short nonequilibrium simulations are used here to measure the immediate structural response towards a perturbation. The timescale of this response in the nonequilibrium simulation does not correspond to the physical timescale of conformational change induced by/associate with ligand binding. The perturbation here is nonphysical, and the response is rapid. For long simulation times, and as the correlation between the equilibrium and nonequilibrium trajectories is lost, the subtraction technique is no longer useful as the noise arising from the natural divergence of the simulations overcomes the structural response of the system to the perturbation. Thus, this method allows for the identification of the initial conformational changes associated with signal propagation. Also, the difference calculated at any given time point should not be seen in isolation. Instead, it should be compared with the other time points, as it is such a comparison that highlights the cascade of events associated with signal propagation. This is clearly illustrated in Figure 3 supplement 3 and in the movies, where the collective signal from the short nonequilibrium simulations is progressing in a trend that is comparable with the equilibrium simulations. The time evolution of the signal is striking and thought-provoking.

      A simple and naive example is to map out all the bus stops on one's route. 800 simulations between the first and second stop will not be able to provide the locations of other stops. Since authors have used this "subtraction technique" on several other proteins, it would be nice to clarify how this approach works on mapping out signaling propagation perturbed by local ligand binding/unbinding and how to choose the time points for subtraction.

      Analogies can be helpful in understanding the nonequilibrium simulations, some aspects of which are not immediately obvious. One could perhaps think of these nonequilibrium simulations as analogous to striking a bell to see how it rings. The bus stop analogy suggested by the referee is intriguing, and we develop it here.

      In this case, when ‘getting on the bus’ (beginning the simulation), we do not know where the bus is going (i.e. we only knew that we were starting at the allosteric site, so the only thing that we know is the place where we board the bus) or the route it would take to get there. The bus is not travelling on a straight road, and the destination is unknown. We could wend our way slowly by standard equilibrium MD, but we would only reach the first or second stop on the route in the time available, and we would still not know where the bus was going. We would never find out where the bus is going: it takes too long. The nonequilibrium approach is a magic bus! In this approach, as the bus meanders close to its starting point, we suddenly replace the driver. The new driver puts her or his foot on the accelerator and immediate sets off for a new destination, heading away fast from the starting point. The driver is guided by the roads available. The bus can only drive on the road network, i.e. its progress is defined by its physical environment and the available directions of travel. So, while she/he may drive at an unsafe speed, the bus should stay on the road. It’s possible that it will take a short cut or indeed take a wrong turn or enter a dead-end street. But overall, doing this ‘driver replacement’ hundreds of times, on average the bus should follow the right route and go much faster along it. So, it might be a terrifying journey,but we should get to the destination faster! It might not reach the final destination, depending how long we let it go on, but we should pass several of the bus stops along the correct route. We can test how likely the route is by averaging over hundreds of crazy new bus drivers. A well-defined route implies a well designed network. The bus can take any of the roads available to it on the network, and the route taken by the bus may be unpredictable (if it was obvious, we would not need all these crazy drivers!). In other words, the response to a perturbation is non-linear. In terms of the final destination, specifically here in TEM-1 and KPC-2,the omega loop, the 3-4 loop, the hinge region are known to be involved in substrate binding and catalysis. We observe the signal reaching these structural elements, so we can say with confidence that the perturbation is communicated to distant, catalytically important parts of the enzyme. So, in terms of the bus analogy, we show that starting in the distant hills, the crazy bus drivers actually end up in the capital city. The simulations identify the capital city as the actual destination. And the fact that the crazy drivers tend to follow the same route allows us to say that we have identified the bus route to the capital, and the important points along the route.

      Another question is whether tracing the dynamics of Calpha alone is enough. As we have seen from the network analysis papers, Calpha sometimes missed some paths or could overemphasize others. The Center of the mass of residue has been proposed to be a better indicator of protein allostery. Authors may wish to clarify the particular choice of Calpah in this study.

      This is an interesting question. We have found in our previous analyses of nicotinic acetylcholine receptors and other systems that analysing the C-alphas allows the identification of pathways of signal transduction in nicotinic acetylcholine receptors (Oliveira et al. Structure 1171-1183. e3 (2019)) and went on to show that these pathways were common across different receptor subtypes (J. Am. Chem. Soc. 2019, 141, 51, 19953–19958 (2019)). Obviously, all residues in the protein are represented equally when analysing C-alphas. Thus, analysing the C-alphas allows direct comparison of closely related proteins with different sequences, and identification and analysis of the pathway in the framework of the protein backbone. Here, of course, we are interested in whether these C-alpha pathways identify positions of sequence variation that affect function, and the results indicate that indeed they do. There is also the practical advantage of analysing C-alpha behaviour that their motions are less subject to noise and converge more rapidly than e.g. analysing sidechains. Other features could be chosen to trace signal pathways, such as the centre of mass of residues. However, choosing more flexible parts to track signal propagation would also have an impact on speed of convergence (i.e. number of trajectories required): more simulations would be required to achieve convergence. Therefore, as in previous work on other proteins, we chose C-alpha atoms to study signal propagation here.

      The order of events associated with signal propagation is computed by directly comparing the positions of individual C-alpha atoms at equivalent points in time (namely after 0, 50, 500, 1000, 3000 and 5000 ps of simulation) for every pair of unperturbed equilibrium ligand-bound and perturbed nonequilibrium apo simulation. The C-alpha positional deviation is a simple way to directly identify the conformational changes induced by ligand annihilation and their evolution over the 5 ns of simulation. Due to statistics collected over the large number of simulations, we can be sure of the statistical significance of the structural changes identified. The conformational changes extracted from the nonequilibrium simulations reflect the (statistically significant) structural response of the system to the perturbation. These changes propagate over time from the allosteric site to the active site, demonstrating a direct connection between them. Due to the very short timescale of the nonequilibrium simulations (5 ns), the observed conformational rearrangements do not represent the complete mechanism of conformational change, but rather reflect its first steps.

      In Figure 5, the authors seem to use Pearson correlation to compute dynamic cross-correlation maps. Mutual information (M)I or linear MI have advantages over Pearson correlations, as has been discussed in the dynamical network analysis literature.

      The reviewer is indeed correct; the DCCMs were calculated based on the Pearson’s correlation. We have tested and validated this approach over the last 15 years, with results reproduced experimentally by a number of our collaborators for over 10 different enzyme systems, including cyclophilin A, dihydrofolate reductase, ribonuclease, APE1 and Rev1 DNA binding enzymes (Biochemistry 43, no. 33 (2004): 10605-10618; Nature 438, no. 7064 (2005): 117-121; Biochemistry 58, no. 37 (2019): 3861-3868; PLoS Biol 9, no. 11 (2011): e1001193; Structure 26, no. 3 (2018): 426-436; Nucleic acids research 48, no. 13 (2020): 7345-7355; Proceedings of the National Academy of Sciences 117, no. 41 (2020): 25494-25504). The reviewer’s suggestion is an interesting one, and we would be happy to investigate it in future studies. Mutual information analyses offer useful features. Based on our experience, we expect the results to be qualitatively similar and not likely to change the conclusions described in this manuscript.

    1. Author Response:

      Reviewer 1:

      In the study by Buus et al., the authors set out to address an important need to understand how oligo-conjugated antibodies should be optimally utilized in droplet-based scRNA-seq studies. These techniques, often referred to as CITE-seq, complement techniques such as flow cytometry and mass cytometry yet also further extend them by the ability to jointly measure intra-cellular RNA-based cell states together with antibody-based measurements. As is the case with flow cytometry, manufacturers provide staining recommendations, yet encourage users to titrate antibodies on their specific samples in order to derive a final staining panel. Based on the ability to stain with hundreds of antibodies jointly, few studies to date have assessed how the antibodies present in these pre-made staining panels respond to a standard titration curve. In order to address this point, this study tests two dilution factors, staining volume, cell count, and tissue of origin to understand the relationships between signal and background for a commercially available antibody panel. They arrive at the general recommendation that these panels could be improved, grouping various antibodies into distinct categories.

      This study is of general interest to the scRNA-seq and CITE-seq communities as it draws attention to this important aspect of CITE-seq panel design. However, it would stand to be substantially improved by not only providing suggestions but also testing at least one, if not more, of their suggestions from Supplementary Table 2, and preferably performing experiments using more technical replicates or biological replicates. As it stands now, the study is largely based on one PBMC and one lung sample, that were stained once with each manipulation as far as can be gathered from the Methods.

      We appreciate the reviewer’s insight into the methodology and enthusiasm for the study.

      We do want to clarify that the study does not use a “pre-made staining panel” from commercial vendor, but rather a cocktail of individual antibodies available from a commercial vendor (with emphasis on epitopes relevant to immunology and cancer research). We have also clarified this in the text of the manuscript.

      We hope that the added analysis, our point by point response to the issues raised by the reviewer, and inclusion of new CITE-seq data from the panel with adjusted concentrations to alleviates the main concerns of the reviewer.

      1) Given the title is improving oligo-conjugated antibody… it would be important to functionally test one of the suggestions. We would suggest a full titration curve of selected antibodies, perhaps one from each of the categories, but if cost is a concern at least two or three antibodies, to identify how titration impacts antibodies, and especially those in categories labeled as in need of improvement. Relatedly, if the idea is that if antibodies (such as gD-TCR) do not have a cognate receptor leading to general background spread, does spiking in a cell that is a known positive in increasing ratios remedy this issue by acting as a target for the antibodies? Does adding extra washes help to remedy these issues of background?

      These are excellent points. Full titration curves have previously been published showing that oligo-conjugated antibodies respond to titration, and in that regard behave similar to fluorophore-conjugated antibodies assayed by flow cytometry (see Stoeckius et al. 2018. Genome Biology; Fig. 3A-D). Our study does not aim to identify the optimal concentration of individual antibodies in isolation but strives to provide the optimal signal-to-noise ratio for each antibody in a cocktail while taking sequencing requirements into account - this is why we don’t focus on full titration curves and saturation kinetics for each antibody/epitope. If we use all antibodies at their highest signal-to-noise ratios, this would drastically increase sequencing requirements of the library as highly expressed markers would use the vast majority of the total sequencing reads. As such, we aimed to get “sufficient” signal-to-noise while keeping the sequencing allocated to each marker balanced.

      Furthermore, as our results show, background signal can be largely attributed to free-floating antibodies in the solution, using high concentrations for all markers in one or more condition would increase the background in all conditions if these were multiplexed into the same droplet segregation. This phenomenon would likely obscure the positive signals and possibly titration response at lower concentrations (similar to what we see for category A antibodies). To avoid this, if full titration curves should be meaningful, each condition should be run in its own droplet segregation making such titration efforts prohibitively costly. We have elaborated on this in the discussion of the revised manuscript.

      We agree that it would greatly improve the study to include results from our panel with adjusted concentrations. In the revised manuscript, we have made efforts to address this by making a comparison between the sample stained with the pre-titration (DF1) concentrations and a sample stained with concentrations that have adjusted based on their assigned categories (from Table 1). We believe that this new data convincingly demonstrates improvements both of the individual antibody signals and at the level of the increased sequencing balance (see new Fig. 5). While the adjusted concentrations could still benefit from further improvements, we show that at similar sequencing depths, the adjusted concentrations provide a more balanced sequencing output and exhibit a 57 % increase in the median positive signal and a 43 % reduction in the median background signal for the 52 antibodies in our panel. The benefit of the adjusted concentration was particularly remarkable for CD86 which went from having 76.5 % to 12.6 % of UMIs assigned to background signal and thus yielded comparable positive signal while using 4.8 fold less UMIs (new Fig. 5G).

      Spiking in cells that express the cognate antigen is an interesting idea. However, as the spiked in cells would be included in all the downstream processes including sequencing of mRNA and other modalities, it would be quite costly to spike-in cells that are not of biological interest – only to decrease background of one or a few antibodies.

      While the results presented in the manuscript do not address this directly, our data strongly suggest that adding extra washing would help reduce free-floating antibodies in the solution captured in the gel-bead emulsions responsible for some of the observed background signal (as can be assayed by the non-cell-containing droplets). For such a test to make sense, the staining conditions should be identical for two samples that are differentially washed (including the exact same cell composition) and would require fully separate droplet segregations (i.e. utilization of separate 10x lanes) which would make it a very costly experiment solely to test the washing effect. However, we have done preliminary tests using short (150bp) cDNA amplicon spiked into different tubes or plates containing ~750x103 PBMCs to determine washing efficiency by qPCR. Here we assayed how increasing the washing volume from 200µl (96-well) to 1.5mL or 50mL for two washes reduced the detection of the spiked-in amplicon in the supernatant as compared to an unwashed sample. While short cDNA amplicons may not behave identical to oligo-conjugated antibodies, they simulate background signal stemming from free-floating antibodies and thus can be used to evaluate different washing conditions for a given set-up. As expected, using higher washing volumes does indeed greatly reduce the amount of amplicon (simulating free-floating “background” antibodies) detected in the resulting suspension. (https://raw.githubusercontent.com/Terkild/CITE-seq_optimization/master/figures/review_washing_test.png)

      2) Another way of improving these panels is through reducing the costs spent on both staining but perhaps more importantly the sequencing-based readouts. Several times in the manuscript (at line 77 for example or line 277) it is alluded to that the background signal of antibodies can make up a substantial cost of sequencing these libraries. However, no formal data on cost is presented, which would be important to formalize the author's points. It would be important to provide cost calculations and recommendations on sequencing depth of ADT libraries based on variation of staining concentration. Relatedly, in the methods, sequencing platform and read depth for ADT libraries was not discussed, nor is the RNA-seq quality control metrics provided other than a mention of ~5,000 reads/cell targeted. This is important to report in all transcriptomic studies, and especially a methods development study.

      Thank you for pointing out the very sparse description of choice of sequencing method and RNA-seq quality controls. We have included additional metrics in the materials and methods and included a new Suppl. Fig. S1 showing number of detected genes as well as UMI counts within the mRNA and ADT modalities in the revised manuscript. We agree that reducing sequencing cost (without reducing biological information) is a major reason for optimizing staining with oligo-conjugated antibodies. We have now added a section in which we elaborate on the potential cost saving, and other benefits of titration of antibody panels and provide some examples from our datasets. Actual savings of optimization of these panels will be very dependent on a given setup, starting concentrations and the depth of sequencing that the particular research questions (and budget) warrant.

      Due to the 10-1000 fold higher numbers of proteins as compared to coding mRNA [16], ADT libraries have high library complexity (unique UMI content) and are rarely sequenced near saturation. Thus, either sequencing deeper or squandering fewer reads on a handful of antibodies, will result in an increased signal from other antibodies in the panel. We found that by simply reducing the concentration of the five antibodies used at 10 µg/mL, we gained 17 % more reads for the remaining antibodies. Consequently, assuming we are satisfied with the magnitude of signal we got from all other antibodies using the starting concentration, this directly translates to a 17 % reduction in sequencing costs.

      In terms of sequencing depth, we are not comfortable giving very broad recommendations. This is due to the fact that sequencing requirements will be very different depending on the composition of the antibody panel as well as the cell type distribution (epitope abundance) (as has been previously noted in Mair et al. 2020 Cell Rep.). If the antibody panel contains only antibodies targeting epitopes that are largely present on a small subset of cells (such as CD56 or CD8 for PBMCs) it would require fewer reads per marker per total cell count than markers that are broadly expressed (such as HLA-ABC or CD45 for PBMCs). However, in a different sample composition (for instance a tissue with few leukocytes) these same antibodies would require fewer reads per cell whereas other epitopes may be more abundant.

      We want to also stress, that aside from cost savings, an optimized balanced panel with low background will yield improved resolution compared to a non-optimized panel. Fortunately, CITE-seq and related methods are very flexible in this regard as you can start by shallow sequencing and then “top-up” the sequencing depth to an optimal level based on the actual data in subsequent sequencing runs (for instance together with the next batch of samples).

      3) One of the powerful elements of joint multi-modal profiling, as mentioned in the title, is to be able to measure protein and RNA from a single cell. This study does not formally look at correlation of protein and RNA levels, and whether a decrease in concentration of antibody either improves or diminishes this correlation. This would be important to test within this study to ensure that decreasing antibody levels does not then adversely affect the power of correlating protein with RNA, and whether it may even improve it.

      We appreciate the reviewer’s suggestion – this is a great idea. Unfortunately, such correlations are notoriously hard to do for scRNA-seq data due to the sparsity of the RNA measurements (which contains high frequency of 0 UMI counts). This is, in part, due to low reverse transcriptase efficiency, and also due to the fact that most proteins have 10-1000 fold more copies than the mRNA transcripts that encode them (Marguerat et al. 2012 Cell). This is exacerbated in our study by the fact that we only shallowly sequenced RNA modality (~4000 reads/cell). Consequently, we see a very high number of cells that despite clustering together within distinct lineage clusters (based on their full transcriptome) and expressing the expected lineage marker surface proteins, do not have readily detectable transcript for the same marker(s). For instance, for all cells that are positive for CD8 at the RNA level, there are at least as many that are negative for CD8 RNA while being positive for CD8 ADT. Importantly, these additional CD8+ cells are still located within clusters consistent with a CD8+ phenotype (see below): (https://raw.githubusercontent.com/Terkild/CITE-seq_optimization/master/figures/review_CD8_protein_rna_correlation.png)

      As such, due to the sparsity of RNA counts, if ADT signal is diluted too much leading to truly positive cells being called as negative, it may actually increase individual cell correlation between RNA and ADT but mean higher levels of “false negative” cells. Direct correlation between RNA and antibody measurements within each individual cells is further complicated by the presence of non-specific/background signal in protein data that is rarely found in RNA data. This can also be seen in the plot above by the fact that positive cells are defined at a cut-off “7” at the ADT level, and not “0” as is the case for RNA. Thus, while having only a few UMI counts for a given transcript is sufficient to call expression, having a few UMIs from an ADT can easily be attributed to background (particularly in an unoptimized panel).

      Due to these technical limitations, we find it more suitable to correlate “positivity” called by either ADT (gated positive as shown in Suppl. Fig. S2) or mRNA expression (i.e. > 0 UMI counts). While this comparison is less quantitative (does not distinguish “high” from “low” expression) it enables us to show whether reducing antibody concentrations affects ADT signal ability to distinguish positive from negative cells (as compared to GEX), which is at the core of the reviewer’s suggestion. The figure below, demonstrates that four-fold titration reduces the fraction of positive cells by some markers (reduction in the blue+red bars by dilution) whereas other markers are largely unaffected both of which is consistent with the analysis in the manuscript: (https://raw.githubusercontent.com/Terkild/CITE-seq_optimization/master/figures/review_protein_rna_correlations.png)

      In terms of assuring specificity, we have also modified the “titration plots” to show more detailed cell type distribution at each rank (by the “barcode plot” to the right of the “rank plot”) as well as the distribution of UMIs among cell types (by the bar plot above the “barcode plot”) at each condition. Finally, to make these “titration plots” more accessible, we have now included a guide to the different components of the “titration plots” in Fig. 2 of the revised manuscript.

      4) How was the lack of antibody binding determined for Category E? CD56 is frequently detected on NK cells in peripheral blood, CD117 should be detected on mast cells in the lung, and CD127 should be found on T cells, particularly CD8+ T cells. From inspecting Figure 1E, it appears as if all three of these markers are detected on small but consistent cell subsets. As the clusters are only numbered and no supplementary table is provided to help the reader in their interpretation, it is difficult to determine if these represent rare but specific binding, or have not bound with any specificity.

      Thank you pointing this out. In light of this comment, it is obvious that we need to annotate the cell types of the clusters. We have annotated all the fine-grained clusters by cell types and re-worked all relevant panels in Figures 1, 2 and 3 (and all their related supplementary figures) to show more detailed and consistent cell type annotation. We have also added Suppl. Fig. 1C, D to show marker genes for each of the annotated cell types, which together with the re-worked Fig 1E, give the reader a clear description of the cluster identity. We do indeed see some signal for Category E antibodies such as CD56, CD117 and CD127 within the expected clusters. This indicates that the antibodies do work to some extent. However, we also find that the signal for these markers is modest, at best, and not present in some populations where we would have expected them (CD127 should be more pronounced in T cells and we are finding an unexpectedly high frequency of CD56-negative NK cells).

      5) References: At 14 references, the paper overall could benefit from a more comprehensive citation of related literature including flow cytometry and/or CyTOF best practices for antibody staining and dealing with background, and joint RNA and protein measurement from single cells.

      We agree that the reference list of the original manuscript was sparse and may have missed important relevant studies. We have done our best to include additional studies relevant for the optimization and titration of mass cytometry panels and flow cytometry staining and added references to a few newly published joint RNA and protein measurement studies. We have strived to reference all studies directly relevant to the present work and do not want to overlook any appropriate publications that should be referenced and so welcome any suggestions of the reviewers.

      Reviewer 2:

      Recombinant antibodies are the most common and powerful reagents in life science research to identify and study proteins. Yet, every single antibody should always be validated and carefully tested for its relevant application, to ensure constructive and reproductive scientific endeavor. I was thus extremely pleased to review the manuscript of Terkild Buus et al, as it provides a careful assessment of oligo-conjugated antibody signal in CITE-seq. The authors tested four variables (antibody concentration, staining volume, cell numbers and tissue origin) and clearly showed that antibody titration is a crucial step to optimize CITE-seq panel. The authors found that, as a general rule, concentration in the 0.625 and 2.5 µg/mL range provides the best results while recommended concentrations by vendors, 5 to 10 µg/mL range, increase background signal.

      In my opinion, the study is well-performed and may serve as a guideline to accurately validate antibodies for CITE-seq, as a consequence I have only minor comments.

      We are very happy that you appreciate the necessity of our work and that you found it to be a useful resource for improving CITE-seq experiments.

      As stated by the authors, the starting concentration used for each antibody was based on historical experience and assumptions about the abundance of the epitopes. This approach may not be ideal, and the optimal concentration may have been missed. Do the authors think that a proper titration would be an advantage? Maybe this could be discussed in the text.

      We agree that using starting concentrations based on historical experience etc. may not be ideal for a completely objective assessment of how oligo-conjugated antibodies respond to the four-variables test. However, we firmly believe that using informed starting concentrations greatly increases the potential improvement of a panel while keeping costs to a minimum (which has to be a consideration for these expensive methods). With that said, we agree that this approach may not reach the optimal concentration (a definition that is a bit complex in this setting). As mentioned in our reply to reviewer 1, point 1, a previous study has shown a more formal titration response for three antibodies using a broader range of concentrations (Stoeckius et al. 2018. Genome Biology; Fig. 3A-D) and we believe that titration for CITE-seq is as much about balancing the sequencing needs of the full panel as it is about reaching the optimal signal-to-noise for the individual antibodies. We have elaborated on this in the discussion of the revised manuscript.

      The authors showed by testing four variables (see above) that they could define the optimal conditions to reduce background signal and increase sensitivity of antibodies and thus this way improves CITE-seq outcome. Nevertheless, the authors rely on the fact that all antibodies used in their panel are specific for their targeted antigens. I am not asking here to test the specificity of every single antibody used in the study as this would be a colossal amount of work. But I feel that this aspect should be discussed in the manuscript, especially when an "uncommon" antibody is intended to be used in the CITE-seq panel; the specificity of this antibody should be indeed tested prior to its use.

      Thank you for this suggestion. This is indeed an aspect of antibody optimization that we have not touched upon. By using commercially available oligo-conjugated antibody clones that are broadly used, the extensive testing of many of these clones by multiple labs within immunology community (for flow/mass cytometry applications) and based on our personal experience with majority of the clones for flow cytometry applications, we expected that the antibodies in our panel should be specific for their antigen. This is supported by the labelling matching what we would expect to find in PBMCs and lung leukocytes, as well as the correlation between expression of the gene encoding the targeted epitope and antibody binding (see our response to reviewer 1, point 3). We have added a paragraph to the revised manuscript discussing that, particularly when using antibodies for the first time or using clones that are unfamiliar, it is important to assure specificity.

    1. Author Response:

      Reviewer 2:

      Hesse et al. implemented a murine model of cardiac ischemia to study two populations, epicardial stromal cells (EpiSC) and activated cardiac stromal cells (aCSC). Furthermore, uninjured cardiac stromal cells were used as a control. An isolation method for EpiSC was used by applying a gentle shear force to the cardiac surface. The authors show heterogeneity in the Epi-SC populations. Certain markers were confirmed by in-situ hybridization. Furthermore, molecular programs within these subsets were explored. A comparison between EpiSC and aCSCs cells (and EpiSC and uninjured CSCs cells) was performed, which showed differences in expression of multiple genes namely HOX, HIF1 and cardiogenic factor genes. A WT1 population was marked by tdTomato, confirming the localization of expression to a cell population. There are however specific weaknesses. First, a major concern is regarding clarity of the experimental conditions and sample purity. Data is not robustly presented showing differences across conditions, namely between uninjured CSCs and activated CSCs. Furthermore, the purity of isolating EpiSC was not explored, along with the anticipated overlap of cells between aCSC and EpiSC. Specifically, the in-situ findings do not clarify the subject of purity. For example, EpiSC-3 (Pcsk6) is a large population in the scRNA-seq shown in Fig 1; however, this gene is also expressed in the myocardium. There is an attempt to perform EpiSC and aCSC comparison analysis in Figure 3; however without clarity the expected overlap, these data are hard to interpret. Furthermore, cluster-based approaches for comparing population fractions can be problematic due to the inherent stochasticity of sampling. Lastly, there is no actual lineage tracing over time, but rather marking of WT1 cells with tdTomato. The RNA velocity analysis is not particularly robust with the number of expressed genes driving these results, rather than the author's conclusion of developmental potential.

      As to the clarity of experimental conditions and sample purity, a detailed protocol and comprehensive validation data of our cell isolation procedure have been previously published (Owenier et al. 2020, Cardiovasc Res, Apr 1;116(5):1047-1058), including analyses of cell yield, viability, and purity.

      As to the differences between uninjured CSC and activated CSC, there are already elaborate studies published that specifically address the activation of CSC after MI on single-cell level (Farbehi et al. 2019, eLife, Mar 26;8:e43882; Forte et al. 2020 Cell Rep Mar 3;30(9):3149-3163.e6). Therefore, this comparison was deliberately not included into our manuscript but instead we focused on the novel scRNAseq data of the post-MI epicardium.

      As to the RNA-in situ-hybridization experiments, hybridization data certainly do not clarify the purity of the EpiSC preparation. These experiments were aimed to confirm the presence and the location of selected cellular markers in the post-MI epicardium. Specifically, we concluded that cells expressing Pcsk6 are present in the epicardium, which is consistent with the Pcsk6-expressing EpiSC population in the scRNAseq data set; we did not claim Pcsk6 to be a unique marker for the epicardium.

      As to the potential overlap between EpiSC and aCSC, the issue of sample purity and estimation of populations sizes in the heart has been extensively discussed previously (Owenier et al. 2020, Cardiovasc Res, Apr 1;116 (5):1047-1058). Our study included 13,796 EpiSC and 24,470 aCSC which is sufficient to reduce data artefacts by the inherent stochasticity of sampling.

      As to the tracing of WT1+ cells, the labelling of WT1+ cells was induced 5 and 3 days prior to MI and they were traced 5 days after MI to analyze their contribution to the epicardial stromal cell populations formed during the acute injury response. Since the explorative power of WT1+ lineage tracing, RNA velocity and number of expressed genes as measure for developmental potential each by itself may have limitations, we would like to emphasize that our main conclusions are drawn by combining the evidence from the different sets of analyses. In order to further clarify data representation and interpretation we will carefully revise the respective text passages of our manuscript to avoid misunderstandings.

    1. Author Response

      Summary

      This manuscript examines how N-linked glycosylation regulates the binding of polysaccharide hyaluronan (HA) to cell surface receptor CD44, to conclude that multiple sites exist but are controlled by the nature of the glycosylation. The reviewers appreciated many aspects of the work, but they have raised serious concerns about the experimental and simulation design. The reviewers suggested that the proposed alternative binding site may not be biologically relevant, as the relevant CD44-HA interactions are multivalent and cannot be supported by that site. They also suggested that the findings are not well supported by the NMR experiments, which could have been extended to allow comparisons of the glycosylation patterns hypothesised. Moreover, the MD simulations, despite being considerable in size, were limited in sampling different possibilities without bias from the initial HA placement, and there is not enough data to convince the readers of thorough sampling and reproducibility.

      We understand the concerns raised in the review process. However, these concerns can be readily explained and fixed, as we explain below and are briefly introduced here.

      • Our data are compatible with the currently accepted multivalent interaction of hyaluronan with several CD44 receptors. The argument that our data goes against it stems from an unfortunate figure provided in the first version of the manuscript. This figure suggested that a bound hyaluronan would not be able to span the length the protein in the upright binding mode. That is not true. We now show another, and more relevant snapshot where the bound hyaluronan indeed spans the length of HABD. Hence, we show that multivalent interaction is not precluded by the upright binding mode.

      • We also clarify how our extensive simulation data were designed to avoid any bias. We admit that this was not obvious in the phrasing of our previous version.

      • Many of the raised issues stem from the lack of certain critical simulations. We have now added these simulations into the revision.

      Below we summarize the main issues raised by the reviewers, accompanied by our responses on how we have fixed them in the revised version of the manuscript.

      Reviewer #1

      The authors use MD simulations and NMR to study the cell surface adhesion receptor CD44 with the purpose of understanding the binding of carbohydrate polymer, hyaluronan (HA). In particular, this study focuses on the effects of N-glycosylation of the CD44 glycoprotein on potential HA binding. The authors previously proposed two lower affinity HA binding modes as alternatives to the primary mode seen in the crystal structure of the HA binding domain of CD44, driven by different arginine interactions, but overlapping with glycosylation sites that will affect HA binding. This study suggests that, because the canonical site appears blocked by glycans attached to the surface, HA would instead likely bind to an alternate parallel site with lower affinity, thus changing receptor affinity. The authors do not study HA binding to the glycosylated form directly, but undertake simulations of bound glycans to draw their conclusion. They do, however, place HA near the non-glycosylated CD44 in simulations, although it is not clear that MD sampling has been designed to provide unbiased observations of HA binding, or how the simulations help explain the NMR experiments.

      To better highlight the message, we left out a significant portion of our total simulation data from the initial version of the manuscript. We have now added e.g. simulations of HA binding to the glycosylated form into our revised manuscript. Furthermore, we are confident that our design of the simulation systems allows unbiased sampling of the binding surface. That is, the hyaluronan hexamers were initially placed several nanometres away from the protein surface. After this, they were allowed to spontaneously sample the space and find their respective binding sites during the course of the simulations. They were not placed into the binding sites manually. However, there was a one system with two HA hexamers from which the other was placed into the canonical binding groove. This was done to test where the freely floating hexamer would bind when the primary binding site is taken. These points are illustrated more clearly in the new version of the manuscript. Finally, all our simulation data is publicly available (see the DOIs provided in the paper).

      The data rely on libraries of MD simulation, which are substantial, with several replicas of a microsecond each. But what have these simulations really proved with reliability? Figure 2a shows that, while glycans stay roughly where they started, they are dynamic and cover much of the canonical HA binding site, which may be the case. From this the authors imply that the crystallographic site is significantly obstructed, the lower-affinity upright mode remains most accessible, and that the level of occlusion of the main site depends on the degree of glycosylation and size of the oligosaccharides. However, a full simulation of HA binding to this glycosylated surface was not attempted. It would have been good to see the glycans actually block unbiased simulation of canonical binding to the crystallographic site on long timescales (not being dislodged), but allow alternative binding to the parallel site, without initial placement there.

      Commenting both points 1.1 and 1.2, we cropped a large portion of our simulation data from the initial version of the manuscript in order to better highlight the current message. However, we do have extensive simulation data of hyaluronan binding spontaneously to CD44 with different glycosylation patterns. For example, see Figure A below where HA is bound to glycosylated CD44-HABD. These data have been carefully analysed and incorporated into the revised manuscript.

      Figure A. A representative binding pose between HA oligomer (dark red) and glycosylated (light blue, yellow, green, pink and purple) CD44-HABD (pale surface) extracted from our simulations.

      HA was, however, added to the non-glycosylated CD44-HABD surface in simulations, but no clear data is shown to illustrate the extent of sampling, convergence and reproducibility, beyond some statistical analysis of contacts. It seems a total of 30 microseconds of the non-glycosylated protein with 2 or 3 nearby HA placed was run, leading to contacts. But how well did these 30 simulations sample HA movement and relative binding to sites, if at all? Figure 4 suggests that the HA stay where they have been put. As the MD is the dominant source of data for the paper, the extent of sampling and how the outcomes depend on the initial placement of molecules requires proof. Was any sampling of HA movement, such as between canonical and alternative parallel conformations seen in MD?

      It is important to note that, in the non-glycosylated systems, the hyaluronan hexamers were initially placed several nanometres away from the protein surface. After this, they were allowed to spontaneously sample the space and find their respective binding sites during the course of the simulations. That is, they were not manually placed into the binding sites. We have changed the manuscript to better illustrate this key point.

      We have also made the simulation data publicly available (see the DOIs provided in the paper). After inspection of the simulations, we are confident that the reviewers will agree that the results are reliable and do not suffer from convergence problems that could compromise the message we provide.

      Moreover, we have even more simulation replicas ready with slightly different initial conditions that provide the same qualitative picture, see Figure B below (compare with Figure 4c in the original submission where one of the hyaluronan hexamers was initially placed in the crystallographic binding site). In these simulations, the hexamers have enhanced contacts with the crystallographic and upright mode residues despite being initially placed far from these binding sites. These simulations were already part of the manuscript.

      Figure B. Hyaluronate-perturbed residues in the simulations. The colored surface displays the probability of a given residue to be in contact with HA6 in our additional simulations, where three hyaluronan hexamers were placed in solution far from the binding site.

      The NMR is suggested to show that a short HA hexamer can bind to non-glycosylated CD44-HABD simultaneously in several modes at distinct binding sites, and that MD "correlates" with this. But is this MD biased by initial choices of where and how many HAs are placed, given HA movement is likely not well sampled?

      The hyaluronan hexamers were initially placed several nanometers away from the binding sites. They were not placed into these binding sites manually. During the simulations the hexamers displayed several binding and unbinding events as they were spontaneously sampling the space and finding their respective binding sites during the course of the simulations.

      While we saw multiple binding events to the proposed binding sites, the short size of the hyaluronan fragments was likely not enough for stable binding as the fragments often dissociated within few hundreds of nanoseconds. These points are now more clearly presented in the revised manuscript.

      No MD seems to have been used to examine the blocking or lack thereof by antibody MEM-85 in glycosylated or non-glycosylated CD44.

      This is not feasible using MD simulations, since the structure of the antibody is not available. Fortunately, there is no need for it, as we have direct and reliable experimental evidence using NMR as provided in the manuscript and in our previous work (Skerlova et.al. 2015; doi: 10.1016/j.jsb.2015.06.005). We therefore know where the antibody binds in CD44.

      Reviewer #2

      This manuscript is focused on understanding how N-linked glycosylation regulates the binding of the (very large) polysaccharide hyaluronan (HA) to its major cell surface receptor CD44, a question relevant, for example to the role of CD44 in mediating leukocyte migration in inflammation. The paper concludes that multiple binding sites for HA exist and that their occupancy is determined by the nature of the glycosylation, a suggestion first made by Teriete et al. (2004). The work is based on atomistic simulations with different glycan compositions and NMR spectroscopy on a non-glycosylated CD44 HA-binding domain (HABD) expressed in E. coli. While the question being researched is interesting and of biological relevance, there are flaws in the work.

      The relevance also stems from the increasing applicability of HA in many biomedical devices and treatment strategies, such as tissue scaffolds and HA-coated nanoparticles for targeted drug delivery. However, we respectfully disagree with the proposed flaws. We address these suggested issues point-by-point in sections 2.2–2.5.

      The paper describes how the well-established HA-binding site on CD44 (determined by a co-crystal structure; Banerji et al., 2007) is blocked by N-linked glycosylation (principally at N25 with a contribution from glycans at N100 and N110) and how certain glycans favour binding at a completely distinct binding site that lies perpendicular to the canonical 'crystallographic' binding site. This alternative 'upright' binding site, which has been proposed previously by the authors (Vuorio et al., 2017), needs further supporting experimental data.

      Indeed, a characterization of the upright mode can be found from (Vuorio et al., 2017. PloS CB. 13:7). This characterization is based on mircoseconds of unbiased MD simulation data as well as extensive free energy calculations. We for example analysed the most important interactions, orientations of the sugar rings, and binding affinities. These data indicate that while the upright binding mode is weaker than the canonical binding mode (Banerji et al., 2007), it has good shape complementarity between the protein, with e.g. most of the sugar rings lying flat on the surface of the protein, indicating that it might have biological relevance.

      The supporting experimental data is presented in the current publication. It has been improved and clarified for the revised version of the manuscript.

      Firstly, unlike the 'crystallographic' binding site that forms an open-ended shallow groove on the surface of the protein allowing polymeric HA to bind (and multivalent interactions to take place), the 'upright' binding site is closed at one end and can thus only accommodate the reducing end of the polysaccharide (as apparent from Appendix 1 Figure 1). Its configuration means that it would be impossible for this mode of binding to allow multivalent interactions with polymeric HA. This is a major problem since biologically relevant CD44-HA interactions are multivalent where a single HA polymer interacts with a large number of CD44 molecules (e.g. see Wolny et al., 2010 J. Biol. Chem. 285, 30170-30180). So even if this binding site existed, an interaction between a single CD44 molecule on the cell surface with the reducing terminus of an HA polymer would be exceptionally weak.

      We have data to show that our proposed secondary binding mode does not preclude multivalent CD44-hyaluronan interactions. This multivalent interaction, where a long hyaluronan binds simultaneously to several CD44 moieties, is important, and our secondary mode is compatible with it, see the new Figure C below. We acknowledge that our Figure 1 in the Appendix 1 was not sufficiently clear on this matter. That figure illustrated a structure of one possible CD44-hyaluronan complex obtained from just one of our simulations. However, we have a number of related CD44-hyaluronan complexes from other simulations where the bound ligand spans the full length of the protein, showing that the binding site can accommodate more than just the reducing end of the polysaccharide, and this is highlighted in the attached Figure C. Therefore, multivalent binding is not precluded by the upright binding mode. Unfortunately, the figure depicted in the SI of the original manuscript was misleading. To avoid this issue, it has been replaced in the revised manuscript.

      Figure C. The secondary CD44-hyaluronan binding mode.

      Secondly the NMR experiments performed in this study, purporting to provide evidence for multiple modes of binding, are problematic. Why weren't differentially glycosylated proteins used, i.e. where individual sites were mutated (e.g. +/- N25); this would have allowed comparisons of the glycosylation patterns hypothesised (based on the computer simulations) to favour the 'crystallographic' versus 'upright' modes.

      Indeed, NMR experiments with glycosylated material would be ideal, but obtaining the required quantities of isotopically labelled protein with a homogeneous glycosylation pattern is not possible even using the state-of-the-art technology. In addition, the substantially increased molecular weight of the glycosylated protein would be out of the experimental window accessible by NMR spectroscopy. We strongly believe that the message of the paper is already sustained by a combination of our observations based on NMR experiments and MD simulation techniques together with the available literature data as detailed in Appendix A (see below).

      While being aware of the difficulties of dealing with glycosylated CD44 using NMR, we designed a way to bypass this issue by combining multiple data from different experimental and simulation setups. All the data support the claims and conclusions made in our paper, see appendix A of this rebuttal. The existence of a weaker binding mode promoted upon glycosylation due to the primary binding site being covered is compatible with all available experimental and simulation data.

      Furthermore, previous NMR studies have shown that the binding of HA to CD44 causes a considerable number of chemical shift changes due to the induction of a large conformational change in the protein (Teriete et al., 2004; Banerji et al., 2007), making it very difficult to identify amino acids directly involved in HA binding based on the NMR data. Moreover, this conformational change has been fully characterised for mouse CD44 with structures available in the absence and presence of HA (Banerji et al., 2007); this information should have been used to inform the interpretation of the shift mapping. In fact, the way in which the shift mapping data are interpreted is simplistic and doesn't fully take account of the reasons that NMR spectra can exhibit different exchange regimes.

      We interpreted the NMR data very carefully. We are aware of the extent of conformational changes induced by HA binding in CD44-HABD, in fact, we identified them as a molecular mechanism underlying the mode of action for the MEM-85 antibody (Skerlova et.al. 2015; doi: 10.1016/j.jsb.2015.06.005). Therefore, we focused on the differential changes in the NMR signal positions of surface exposed residues upon titration with HA and MEM-85. We also observed different exchange regimes that allowed us to discriminate between different HA binding sites. We emphasized these points in the revised manuscript.

      Reviewer #3

      Vuorio and colleagues combine atomic resolution molecular dynamics simulations and NMR experiments to probe how glycosylation can bias binding of hyaluronan to one of several binding sites/modes on the CD44 hyaluronan binding domain. The results are of interest specifically to the field of CD44 biophysics and more generally to the broad field of glycosylation-dependent protein-ligand binding. The manuscript is clearly written, and the combination of data from computational and experimental methodologies is convincing. I especially commend the authors on the thorough molecular dynamics work, wherein they ran multiple simulations at microsecond timescale and tried different force fields to minimize the likelihood of their findings being an artifact of a particular force field.

      The use of multiple force fields was indeed meant to alleviate potential force field specific issues. Likewise, the use of multiple simulation repeats with different starting positions and randomized atom velocities were meant to provide comprehensive statistics, minimizing the chances of over-interpreting any isolated phenomena.

      Appendix A: Summary of the logic of the research procedure together with the experimental, simulation and literature results supporting each step.

      1) Non-glycosylated CD44 binds HA (NMR experiments)

      2) Non-glycosylated CD44 also binds HA in the presence of MEM-85 (NMR experiments)

      3) Glycosylated CD44s that bind HA do not bind HA in the presence of MEM-85 (from literature [J. Bajorath, B. Greenfield, S. B. Munro, A. J. Day, A. Aruffo, Journal of Biological Chemistry 273, 338 (1998).]).

      4) We show the MEM-85 binding site in non-glycosylated CD44 to be far from the canonical crystallographic binding region (NMR experiments). This MEM-85 binding site region is mostly inaccessible to typical N-glycans found in CD44 (MD simulation). Therefore, we expect that MEM-85 binds glycosylated CD44 in the same region. (Our working hypothesis)

      5) Taken together, the above points indicate that MEM-85 covers at least partially the relevant HA binding mode in glycosylated CD44, which has zero overlap with the crystallographic mode. This supports the idea of an alternative binding mode to the crystallographic mode which must be readily available for glycosylated CD44. (Our finding)

      6) Furthermore, heavily glycosylated CD44 variants cover a significant fraction of the crystallographic mode binding region (MD simulation), potentially making it unavailable for HA binding. This explains why non-glycosylated CD44 binds HA in the presence of MEM-85 (i.e., crystallographic mode is free), while glycosylated CD44 does not (i.e., crystallographic mode is covered with N-glycans). The upright region, on the other hand, experiences only minor coverage by the N-glycans in the glycosylated CD44 and is thus free to bind the ligand (MD simulations).

      7) Non-glycosylated CD44 binds HA simultaneously with the crystallographic mode and the upright mode when exposed to high concentrations of small hyaluronan hexamers (NMR titration and MD simulations).

      8) Pinpointing the position of the residues that experience the largest chemical shift during the titration experiments using non-glycosylated CD44 clearly shows the fingerprint of the canonical crystallographic mode but also a region compatible with our proposed upright mode (NMR titration experiments). These results are compatible with our simulations of several hyaluronan hexamers (MD simulation).

      9) Upright binding mode is accessible to hyaluronan binding in the glycosylated CD44 (MD simulations shown in this letter that could be included to the paper if deemed necessary).

      Glycosylation, and glycoscience in general, is one of the most challenging topics to understand in life sciences. We believe that our paper makes a very significant contribution to this area of research in the context of a central research problem and is exceptionally able to provide an atomic-level description of the HA-CD44 interaction under unambiguously known conditions.

    1. Author Response:

      We thank the reviewers for their efforts reviewing the manuscript and greatly appreciate the comments and recommendations. We are pleased that the reviewers were in agreement with the main conclusions of the manuscript based on the experimental evidence presented. We are also grateful for the complimentary comments and are encouraged that the reviewers recognized the potential impact of the findings.

      We are thankful for the opportunity to submit a revised manuscript and appreciate the recommendation to include currently missing controls. We agree with the reviewers; our mouse colonies were affected due to long pandemic-related shutdowns, which prevented measurements in all cohorts in a timely fashion. These experiments are now underway, for planned inclusion in the revised manuscript.

  7. Feb 2021
    1. Author Response:

      Summary:

      While the work addresses an interesting research question, several shortcomings have been raised by three independent reviewers. A first issue is the lack of theoretical clarity and linkage with prior work, as discussed by Reviewer 1 and Reviewer 2. A second critical set of concerns is raised by all reviewers with the need for several additional analyses to nail down the interpretations proposed by the authors. Reviewer 2 specifically raised concerns regarding the interpretability of activation in auditory cortices, while Reviewer 3 provides insights on the MVPA analysis and suggests the possible use of RSA to clarify the main findings.

      While we respect the editor’s decision, we think that all points raised by Reviewer 1 and Reviewer 3 can be easily addressed through editing of the text and additional analyses. As we describe below, these revisions do not undermine the findings reported in our study – instead, they improve the clarity of the manuscript and further demonstrate that our results are genuine and robust. Furthermore, we believe that points raised by Reviewer 2 are based on misunderstanding. Differences in auditory properties across sound categories in our experiment cannot explain the pattern of results reported. Thus, additional analyses in the auditory cortex, proposed by Reviewer 2, can neither support nor undermine the claims made in our study. Nevertheless, we performed all the analyses suggested by the Reviewer 2.

      We also want to stress that all reviewers find our study timely and interesting for broad readership. Furthermore, Reviewer 1 and Reviewer 3 made a number of positive comments on study methodology. Overall, we believe that there are no doubts regarding the novelty and importance of our study, and that we are able to address all additional methodological concerns raised by the reviewers.

      Reviewer #1:

      Bola and colleagues asked whether the coupling in perception-action systems may be reflected in early representations of the face. The authors used fMRI to assess the responses of the human occipital temporal cortex (FFA in particular) to the presentation of emotional (laughing/crying), non-emotional (yawning/sneezing), speech (Chinese), object and animal sounds of congenitally blind and sighted participants. The authors present a detailed set of independent and direct univariate and multivariate contrasts, which highlight a striking difference of engagement to facial expressions in the OTC of the congenitally blind compared to the sighted participants. The specificity of facial expression sounds in OTC for the congenitally blind is well captured in the final MVPA analysis presented in Fig.5.

      We would like to thank the reviewer for an overall positive assessment of our work.

      -The use of "transparency of mapping" is rather metaphorical and hand-wavy for a non-expert audience. If the issue relates to the notion of compatibility of representational formats, then it should be expressed formally.

      Following the reviewer’s suggestion, we revised the introduction and clarified what we mean by “transparency of mapping”, and how this concept might be related to the compatibility of representations computed in different areas of the brain. As is now extensively explained, we propose that shape features of inanimate objects are directly relevant to our actions. In contrast, a relationship between shape and relevant actions is much less clear in the case of most animate objects. We hypothesized that this inherent difference between the inanimate and the animate domain, combined with evolutionary pressures for quick, accurate, and efficient object-directed responses, resulted in the inanimate vOTC areas being more strongly coupled with the action system, both in terms of manipulability and navigation, than the animate vOTC areas. The stronger coupling is likely to be reflected in the format of vOTC shape representation of inanimate objects being more compatible with the format of representations computed in the action system.

      -The theoretical stance of the authors does not clearly predict why blind individuals should show more precise emotional expressions in FFA as compared to sighted - as the authors start addressing in their Discussion. In the context of the action-perception loop, it is even more surprising considering that the sighted have direct training and visual access to the facial gestures of interlocutors, which they can internalize. Can the authors entertain alternative scenarios such as the need to rely on mental imagery for congenitally blind for instance?

      We agree that our approach does not predict the difference between the blind and the sighted subjects, and we openly discuss this in the discussion: “An unexpected finding in our study is the clear difference in vOTC univariate response to facial expression sounds across the congenitally blind and the sighted group”. We also propose an explanation of this unexpected difference. Specifically, we suggest that the interactions between the action system and the animate areas in the vOTC are relatively weak, even in the case of facial expressions – thus, they can be captured mostly in blind individuals, whose visual areas are known to increase their sensitivity to non-visual stimulation. This explanation can account for this unexpected between-group difference and is consistent with our theoretical proposal.

      The “mental imagery account” can be, in our opinion, divided into two distinct hypotheses. One version of this account would be to assume that the representation of animate entities typically computed in the vOTC (i.e., also in sighted people) can be activated through visual mental imagery (as suggested by several previous studies), and that this would affect our between-group comparisons. In that case, however, we should observe an effect opposite to that obtained in our study – namely, the activation in the vOTC animate areas should be stronger in the sighted subjects, since they, but not the congenitally blind participants, can create visual mental images (as the reviewer pointed out). This is clearly not what we observed. A second version of the mental imagery account would be to assume representational plasticity in the vOTC of blind individuals – that is, to assume that vOTC animate areas in this population switch from representing visually, face-related information to representing motor mental imagery, which presumably they can generate just like sighted individuals. However, such an account does not, on its own, explain why the animate vOTC areas in the congenitally blind participants are more strongly activated than they are in the sighted subjects, who can generate both visual and motor mental imagery. Based on these considerations, we do not think that the mental imagery account provides a sufficient explanation. Nonetheless, it is certainly a factor worth considering, which we add in a revised discussion of the reported results. Similar reasoning can be applied to other accounts which assume that the observed difference between the blind and the sighted group is a result of representational plasticity in this region in the blind group. Such accounts would need to propose a plausible dimension, different than face shape and its relation to the action system, that is captured by the animate vOTC areas in blind individuals. Since the effect we report is independent of auditory, emotional, social or linguistic dimensions present in our stimuli, it is hard to say what this dimension might be.

      We now elaborate on these important points in the Discussion section.

      Reviewer #2:

      The study by Bola and colleagues tested the specific hypothesis that visual shape representations can be reliably activated through different sensory modalities only when they systematically map onto action system computations. To this aim, the authors scanned a group of congenitally blind individuals and a group of sighted controls while subjects listened to multiple sound categories.

      While I find the study of general interest, I think that there are main methodological limitations, which do not allow to support the general claim.

      We would like to thank the reviewer for this assessment. Below, we argue that the results presented in the paper support our claim, and that they cannot be explained by alternative accounts described by the reviewer.

      Main concerns

      1) Auditory stimuli have been equalized to have the same RMS (-20 dB). In my opinion, this is not a sufficient control. As shown in Figure 3 - figure supplement 1, the different sound categories elicited extremely different patterns of response in A1. This is clearly linked to intrinsic sound properties. In my opinion without a precise characterization of sound properties across categories, it is not possible to conclude that the observed effects in face responsive regions (incidentally, as assessed using an atlas and not a localizer) are explained by the different category types. On the stimulus side, authors should at least provide (a) spectrograms and (b) envelope dynamics; in case sound properties would differ across categories all results might have a confound associated to stimuli selection.

      We now present spectrograms and waveforms for sounds used in the study in the Methods section. We did not present this information in the original version of the paper because, in our opinion, it is quite obvious that sounds from different categories will differ in terms of their auditory properties – after all, this is why we can distinguish among human speech, animal sounds or object sounds. Thus, differences in sound properties across conditions are an inherent characteristic of every study comparing sounds from several domains or semantic categories (e.g., human vs. non-human), including our own study. We now clarify this issue in the Methods section of the manuscript.

      Having said that, we believe that differences in acoustic properties across sound categories cannot explain the results in the vOTC, reported in our work. We report that, in blind subjects, the vOTC face areas respond more strongly to sounds of emotional facial expressions and non-emotional facial expressions than to speech sounds, animal sounds and object sounds. These brain areas did not show differential responses to two expression categories or to three other sound categories. To explain this pattern of results, the “acoustic confound account” would need to assume that there is some special auditory property that differentiate both types of expression sounds, but does not differentiate sound categories in any other comparison. Moreover, this account would need to further assume that this is precisely the auditory dimension to which the vOTC face areas are sensitive, while being insensitive to other auditory characteristics, different across the other sound categories (e.g., across object sounds and animal sounds, or expression sounds and speech sounds - as the reviewer pointed out, all categories are acoustically very different, as indicated by the activation of A1). We find this account extremely unlikely. We now comment on these points in the Methods and the Results section.

      2) More on the same point: the authors use the activation of A1 as a further validation of the results in face selective areas. Page 16 line 304 "We observed activation pattern that was the same for the blind and the sighted subjects, and markedly different from the pattern that was observed in the fusiform gyrus in the blind group (see Fig. 1D). This suggests that the effects detected in this region in the blind subjects were not driven by the differences in acoustic characteristics of sounds, as such characteristics are likely to be captured by activation patterns of the primary auditory cortex." It is the opinion of this reader that this control, despite being important, does not support the claim. A1 is certainly a good region to show how basic sound properties are mapped. However, the same type of analysis should be performed in higher auditory areas, as STS. If result patterns would be similar to the FFA region, I guess that the current interpretation of results would not hold.

      As we discuss above, we believe that the explanation of the results observed in the vOTC in terms of “acoustic confound” does not hold, even without any empirical analysis in the auditory cortex. The analysis in A1 was planned to clearly illustrate this point and to support interpretation of potential unexpected pattern of results across sound categories (such an unexpected pattern was not observed).

      However, per reviewer’s request, we performed an ROI analysis also in the STS. Specifically, we chose two ROIs – a broad and bilateral ROI covering the whole STS, and a more constrained ROI covering the right posterior STS (rpSTS), known to be a part of the face processing network and to respond primarily to dynamic aspects of the face shape. As can be seen in Supplementary Materials, the broad STS ROI pattern of responses is markedly different from the one observed in the FFA. Particularly, the magnitude of the STS activation is clearly different for speech sounds, animal sounds, and objects sounds, in both the blind and the sighted group. In the case of the FFA, the activation magnitudes for these three sound categories were indistinguishable. Furthermore, in the blind group, the STS showed stronger activation for emotional facial expression sounds than for non-emotional expression sounds. Again, such a difference was not observed in the FFA (if anything, the FFA showed slightly stronger activation for non-emotional expression sounds in the blind group). The pattern of the rpSTS responses is more similar to the responses observed in the FFA. This is exactly what can be expected based on our hypothesis that the FFA in the blind group is sensitive primarily to dynamic facial reconfigurations, with transparent link between the motoric and visual shape representations. Overall, we think that the pattern of results observed in the auditory cortex is fully in line with our hypothesis – the auditory regions (A1 and STS, defined broadly) show responses that are different than the responses observed in the FFA (one may hypothesize that responses in the auditory regions are driven by low-level auditory features of stimuli to a larger extent); the rpSTS, which is specialized in the processing of dynamic aspects of the face shape, shows the pattern of responses that is more similar to the pattern of responses observed in the FFA. Importantly, the responses in the rpSTS were not different across subject groups. As we describe below, this is the pattern of results that was observed also in MVPA. We now report all the above-described results in the paper.

      3) Linked to the previous point. Given that the authors implemented a MPVA pipeline at the ROI level, it is important to perform the same analysis in both groups, but especially in the blind, in areas such as STS as well as in a control region, engaged by the task (with signal) to check the specificity of the FFA activation.

      Per reviewer’s request, we additionally performed the MVPA in three control regions. Firstly, we performed the analysis in the auditory cortex, defined as A1 and the STS combined. We treated this area as a positive control – particularly, given the acoustic differences between sound categories, we expected to successfully decode all sound categories from the activity of this ROI. Secondly, we performed the analysis in the parahippocampal place area (PPA). We treated the PPA as a negative control – given that this area does not seem to contain much information about animate entities, we did not expect to find effects there for most of our comparisons. Furthermore, as the PPA is the vOTC area bordering the FFA, the negative results in this area would be a proof of spatial specificity of our results. Thirdly, we performed the analysis in the rpSTS – here, we expected to observe the results similar to the ones observed in the FFA, for the reasons provided above. We now present the results of these analyses as supplementary figures.

      We were able to successfully distinguish all sound categories, in both groups, based on the activation of the auditory cortex (all p = 0.001; the lowest value that can be achieved in our permutation analysis). Furthermore, based on the activation of this area, we were able to classify specific facial expressions, specific speech sounds, and the gender of the actor, in contrast to the result from the FFA, where the decoding of facial expressions was the only positive result.

      As expected, the decoding of animate sound categories was generally not successful in the PPA. However, as one might expect, activation of this area allowed us, to some extent, to distinguish object sounds from animate sounds – especially in the blind group. Furthermore, based on the PPA activation, we were not able to classify specific facial expressions, speech sounds, or the gender of the actor. These results confirm that the results reported for the FFA are specific to only certain parts of the brain and even certain parts of the vOTC.

      As can be expected, the results in the rpSTS were the most similar to the results observed in the FFA – while the activation of this region was diagnostic of all categorical distinctions, the more detailed analysis showed that this region represented differences between specific facial expressions, but not between the speech sounds or the gender of actors acting the expressions (the similar pattern of results was observed in both groups). This is the same specificity that the FFA in blind people show.

      Finally, we would like to stress that the difference between results observed in the FFA and the PPA is yet another argument against interpreting the results in the FFA as being driven by auditory properties of stimuli – the issue that we discussed in details above. We do not see the reason why putative acoustic influences on the vOTC responses in the blind group should be present in the FFA, but not in the PPA.

      4) I find the manuscript rather biased with regard to the literature. This is a topic which has been extensively investigated in the past. For instance, the manuscript does not include relevant references for the present context, such as:

      Plaza, P., Renier, L., De Volder, A., & Rauschecker, J. (2015). Seeing faces with your ears activates the left fusiform face area, especially when you're blind. Journal of vision, 15(12), 197-197.

      Kitada, R., Okamoto, Y., Sasaki, A. T., Kochiyama, T., Miyahara, M., Lederman, S. J., & Sadato, N. (2013). Early visual experience and the recognition of basic facial expressions: involvement of the middle temporal and inferior frontal gyri during haptic identification by the early blind. Frontiers in human neuroscience, 7, 7.

      Pietrini, P., Furey, M. L., Ricciardi, E., Gobbini, M. I., Wu, W. H. C., Cohen, L., ... & Haxby, J. V. (2004). Beyond sensory images: Object-based representation in the human ventral pathway. Proceedings of the National Academy of Sciences, 101(15), 5658-5663.

      The first reference listed by the reviewer is actually a conference abstract. Thus, we feel that it would be premature to give it comparable weight to peer-reviewed papers. Furthermore, based on the abstract, without the published paper, we cannot assess the robustness of the results and their relevance to our study (particularly, it is unclear whether some effects were observed in the right FFA, and whether a statistically significant difference between blind and sighted subjects was detected).

      In the second reference, the authors did not observe effects in the FFA in the visual version of their experiment with sighted subjects, at the threshold of p < 0.05, corrected for multiple comparisons. In our opinion, this makes the null result of the tactile experiment, reported for the FFA, hard to interpret – thus, while the paper is very interesting in certain contexts, it is not particularly informative when it comes to the question addressed here.

      While the third reference reports interesting results, it does not investigate preference for inanimate objects or animate objects in the vOTC, which is the main topic of our paper (only comparisons vs. rest and between- and within-category correlations are reported). Furthermore, based on that study, we cannot conclude whether effects reported for faces are found in the face areas or in other parts of the vOTC (no analyses in specific vOTC areas were reported).

      These were the reasons why we did not refer to these materials in the previous version of the manuscript. Importantly, none of them compel us to revise our claims, and we refer to a number of other papers, directly relevant to the question we are interested in – that is, the difference between vOTC animate and inanimate areas in sensitivity to non-visual stimulation. Nevertheless, we agree that referring to materials suggested by the reviewer might be informative for non-expert readers – thus, we cite them in the revised version of our paper.

      Reviewer #3:

      Bola and colleagues set out to test the hypothesis that vOT domain specific organization is due to the evolutionary pressure to couple visual representations and downstream computations (e.g., action programs). A prediction of such theory is that cross-modal activations (e.g., response in FFA to face-related sounds) should be detected as a function of the transparency of such coupling (e.g., sounds associated with facial expression > speech).

      To this end, the Authors compared brain activity of 20 congenitally blind and 22 sighted subjects undergoing fMRI while performing a semantic judgment task (i.e., is it produced by a human?) on sounds belonging to 5 different categories (emotional and non-emotional facial expressions, speech, object sounds and animal sounds).The results indicate preferential response to sounds associated with facial expressions (vs. speech or animal/objects sounds) in the fusiform gyrus of blind individuals regardless of the emotional content.

      The issue tackled is relevant and timely for the field, and the method chosen (i.e., clinical model + univariate and multivariate fMRI analyses) well suited to address it. The analyses performed are overall sound and the paper clear and exhaustive.

      We thank the reviewer for this positive assessment.

      1) While I overall understand why the Authors would choose a broader ROI for multivariate (vs. univariate) analyses, I believe it would be appropriate to show both analyses on both ROIs. In particular, the fact that the ROI used for the univariate analyses is right-hemisphere only, while the multivariate one is bilateral should be (at least) discussed.

      We shortly discuss this issue in the Methods section: “The reason behind broader and bilateral ROI definition was that the multivariate analysis relies on dispersed and subthreshold activation and deactivation patterns, which might be well represented also by cross-talk between hemispheres (for example, a certain subcategory might be represented by activation of the right FFA and deactivation of the left analog of this area).”

      Constraining the FFA ROI in the multivariate analysis (i.e., using the same ROI as was used in the univariate analysis) makes the results slightly weaker, in both groups. However, the pattern of results is qualitatively comparable. Slight decrease in statistical power can be expected, for the reasons described in the Methods and cited above:

      Similarly, using broader FFA ROI in the univariate analysis (i.e., using the same ROI as was used in the multivariate analysis) results in qualitatively comparable, but slightly weaker effects in the blind group and no change in sighted subjects (no difference between sound categories). Again, this is expectable – visual studies show that the functional sensitivity to face-related stimuli is weaker in the left counterpart of the FFA than in the right FFA. This is also the case in our data - using broader and bilateral ROI essentially averages a stronger effect in the right FFA and a more subtle effect in the left counterpart of the FFA.

      We now clarify this issue in the Methods section.

      2) The significance of the multivariate results is established testing the cross-validated classification accuracy against chance-level with t-tests. Did these tests consider the hypothetical chance level based on class number? A permutation scheme assessing the null distribution would be advisable. In general, more details should be provided with respect to the multivariate analyses performed, for instance the confusion matrix in Figure 5B is never mentioned in the text.

      Yes, the chance level was calculated in a standard way, by dividing 100 % by the number of conditions/classes included in the analysis (note that all stimulus classes were presented equal number of times). To respond to the reviewer’s comment, we used a permutation approach to recalculate significances of all MVPA analyses reported in the paper (note that the whole-brain univariate analyses are already performed within the permutation framework). To this aim, we reran each analysis 1000 times with condition labels randomized and compared the actual result of this analysis with the null distribution created in this way (see the Methods section for details). We replicated all results reported in the paper. We now report this new analysis in the manuscript, changing the figure legends and the Methods section accordingly.

      The confusion matrix was not mentioned in the text because it is not a separate analysis. As explained in the figure legend, it is just a graphical representation of classifiers performance (i.e., its choices for specific stimulus classes) during the decoding analysis reported in Fig. 5A. To clarify this, we now briefly mention the graph presented in Fig. 5B in the main text.

      3) I wonder whether a representational similarity approach could be useful in better delineating similarity/differences in blind vs. sighted participants sounds representations in vOT. Such analysis could also help further exploring potential graded effects: i.e., sounds associated with facial expression (face related, with salient link to movement) > speech (face related, with less salient link with movement) > animals sounds (non-human face related) > object sounds (not face related at all). The above-mentioned confusion matrix could be the starting point of such investigation.

      We thank the reviewer for this interesting suggestion. In response to this comment, we performed an additional RSA analysis, aimed at investigating graded similarity in the FFA response patterns, across categories used in the experiment. Based on our hypothesis, we created a simple theoretical model assuming that responses to both types of facial expression sounds are the most similar to each other (animate sounds with high shape-action mapping transparency), somewhat similar to speech sounds (animate sounds with weaker shape-action mapping transparency), and the least similar to animal and object sounds (animate sounds with no clear shape-action mapping transparency and inanimate sounds). We observed a significant correlation between this theoretical model and FFA response patterns in the blind group (pFDR = 0.012), but not in the sighted group pFDR = 0.223). We believe that the RSA analysis further supports our visual-shape-to-action mapping conjecture, at least when it comes to blind subjects (see the Discussion section for our interpretation of the observed differences between the blind and the sighted subjects). We describe this new analysis in the revised text.

    1. Author Response:

      Summary:

      This paper uses numerical simulations to model DNA replication dynamics in an in vitro Xenopus DNA replication system, both in unperturbed conditions and upon intra-S-checkpoint inhibition. The current work extends previous studies by the authors that recapitulated some but not all features of the replication program. The new model is superior as it can model both the frequency and the distribution of observed initiation events. Although the reviewers found the work in principle interesting and well executed, they have identified limitations of the study, both with respect to model validation and the extent to which the findings represent new biological insights into origin regulation and replication dynamics.

      We would like to thank the referees and the editor to have read and commented our work. The main message that we grasp from the three referees comments is that this work lacks “ new biological insights into origin regulation and replication dynamics”.

      To our knowledge, this work is the first one to clearly show that:

      • The origin clustering is not regulated by intra-S checkpoint in Xenopus egg extract as was proposed previously [1].

      • The variability of the rate of DNA synthesis close to replication forks is a necessary ingredient to describe the dynamic of replication origin firing.

      • Heterogenous firing probabilities in the embryonic Xenopus system

      We believe that the common referees conclusion arises because these important conclusions were not clearly and explicitly stated in our manuscript. Hence, we modified our manuscript to explicitly state these new insights. Please find below our detailed answers to the referee’s comments, criticisms and suggestions.

      Reviewer #1:

      The current work by Goldar and colleagues uses numerical simulations to model the spatiotemporal DNA replication program in an in vitro Xenopus DNA replication system. By comparing modeled data and experimental DNA combing data generated during unperturbed S-phase replication and upon intra-S checkpoint inhibition (which the authors published previously), the authors find that DNA replication in Xenopus extracts can be modeled by segmenting the genome in regions of high and low probability of origin activation, with the intra-S-phase checkpoint regulating origins with low but not high firing probability. Recapitulating the kinetics of global and local S-phase replication under different conditions through mathematical simulations represents an important contribution to the field. However, one concern I have pertains to the generality of the model, as the authors did not explore whether the model can accurately simulate replication under other conditions (e.g., checkpoint activation).

      In this work we showed that the same combination of processes can recapitulate several observations on the spatio-temporal pattern of DNA replication (as measured by DNA combing) in unchallenged and checkpoint inhibited conditions. Following the referee’s suggestion to “explore whether the model can accurately simulate replication under other conditions”, we also applied our methodology to a condition where Chk1 is over-expressed. We were able to reproduce the pattern of DNA replication as measured by DNA combing and found, as expected, that the over-expression of Chk1 reduces the rate of origin firing, but only by reducing the number of available limiting factors and not the capacity of potential origins to fire. This analysis was added to our manuscript and discussed.

      Major comments:

      1) In figure 1a and 1c, the authors show data that were previously published by the authors. Yet, the displayed values in 1a and 1c differ from those displayed in Figure 10 of Platel et al, 2015. This discrepancy should be explained.

      The discrepancy results from the thresholding of the optical signal and the smoothing of the experimental data in Platel et al, 2015. In the work presented here, we decided to model raw profiles after the thresholding. While the absolute values of the extracted data are different from those in Platel et al 2015, the trends of I(f) and fork density profiles are similar. We stated this point clearly in the caption of figure 2.

      2) The authors test whether their model can simulate replication when S-phase is perturbed by Chk1 inhibition, but not under opposite conditions of Chk1 activation. This important analysis should be included.

      The experimental mean chosen for activating or inhibiting (manipulating) the checkpoint in Platel et al 2015 was respectively to overexpress Chk1 protein, or to inhibit its activity using the specific inhibitor UCN-01. We further analysed Chk1 overexpression combed fibres and add this new analysis to our manuscript (See above).

      3) Although the MM4 model developed by the authors is in agreement with previously published experimental DNA combing data measured in the Xenopus system, it is unclear whether it can also accurately predict the replication program in other systems. Comparing simulated data with experimental data from another metazoan system would serve as an important additional validation of the authors' model.

      We agree with the referee that the generality of this model has to be tested by comparing it with experimental data from other metazoan. Unfortunately, to our knowledge, there is no available DNA combing data in other metazoan where the effect of inhibition ( and now “activation”) of intra S checkpoint have been measured concomitantly with cells under unchallenged growth 3 conditions. If the referee is aware of such an available data we will be happy to analyse them. It is possible to compare a simulation of our model with replication timing profiles measured by NGS techniques, by introducing in the model a distribution of length for regions where the probability of origin firing is high. This will result in a timing profile where we can define TTR and CTR as it has been done in human cell lines [2]. However, this requires the addition of a supplementary parameter: the length of domains with high probability of origin firing. This would complexify the model and cannot be justified on a statistical ground based on combing data (see annex 1 of our new manuscript, this model corresponds to MM6)

      Reviewer #2:

      Here the authors expand on their prior modeling of origin activity (Platel 2015) in xenopus extracts. Their prior work, while successful in some estimates, failed to reproduce the tight distribution of interorigin ("eye to eye") distances. Here the authors generate a series of nested models (MM1-MM4) of increasing complexity to describe the distribution and frequency of observed initiation events in an unperturbed S-phase. Not surprisingly, the fit improves with the increasing complexity of each model.

      The improvement of the concordance between the model and the data was assessed by 2 statistical methods (F test and AIC) in order to avoid overfitting of data. Both tests showed that the increasing complexity of the model were necessary to explain the variability of measured data. In fact, one could still increase the complexity of the model (for example one could use our fictitious model to fit the data ). In this case, the F test and AIC score show that the better representation of the data by the model is due to the increase in the complexity and not the necessity of considered processes. We included this discussion in annex 1 of our new manuscript.

      The authors then built an even more complex model based on prior published work to generate in silico data for which they tested their MM4 model. I admit to being a little lost at this point as to why the authors were using simulated data to assess their model and identify key parameters.

      The in silico data helps us to verify the quantitative ability of our model and validate the analysis process that we propose.

      Finally, the authors compare prior published experimental data from an unperturbed S-phase and one with an abrogated intra s-phase checkpoint (chk1 inhibition) and three parameters stood out J (rate limiting factor), 𝜃 (fraction of the genome with high origin initiation activity), and Pout (probability of remaining origins to fire) which suggests that Chk1 limits the probability of origin activation outside of the regions of the genome with high origin activation efficiency and modulates the activity of the rate limiting factor (J). These conclusions are consistent with prior observations in other systems. In summary, the authors apply elegant modeling approaches to describe xenopus in vitro replication dynamics and the effects of Chk1 inhibition, but the work fails to reveal new principles of eukaryotic origin regulation and replication dynamics.

      See above

      The most powerful modeling approaches are those that reveal a new or unexpected mode of regulation (or parameter) that can then be experimentally tested.

      We agree with the referee, and thank him for his comment. We re-wrote part of our manuscript to explicitly indicate “the new principles of eukaryotic origin regulation and replication dynamics” that our analysis implies.

      Additional points:

      This was a very specialized manuscript and would be difficult to read for general biologists. The terms/parameters were only defined in a table and many of the figures would not be parsable by a broad audience.

      We re-wrote part of the manuscript to make it more readable, and transfer technical details in annexes. We added a new subfigure Fig 1a to better explain combing parameters

      Figure 1. Sets off the challenge at hand -- that the previous model couldn't account for the distribution of "eye to eye" distances; but this is never assessed in similar format with the newer model. I assume this is captured in the appendix 1 figures, but was uncles if this was eye length or gap length.

      The referee is correct, this is represented in figures in annexes sections, where we showed that our modelling approach can reproduce in a satisfactory manner replication fraction of measured fibres, I(f), fork density, eye length distribution, gap length distribution and eye-to-eye distribution in all considered conditions. Following the referee’s suggestion we added in our new manuscript a figure comparable to figure 2 for our new model in the main text.

      Reviewer #3:

      General assessment:

      The authors arrive at a plausible model of DNA replication kinetics that reasonably fits six types of plots from fiber-combing data on Xenopus cell-free extracts, for normal and challenged cases. However, although the mechanisms postulated and the parameters inferred all seem reasonable, they rely on untested hypotheses and a single type of data (combing).

      All hypothesis used in this model have already been proposed and tested in existing literature, as stated in the discussion (lines 309-315 in the new manuscript) where all used hypothesis are explained and referenced.

      We use DNA combing data, and compare our conclusions to observations in the literature obtained by other techniques. Indeed, DNA combing (and in general DNA fibre stretching technique combined with optical detection) has the unique ability to allow working directly on distribution of parameters like eye-to-eye distances, eye-length…. Hence the data are not biased by any type of population averaging (as it is the case in the NGS our other classical biochemical techniques ).

      To truly convince, the authors need further experiments to test specific hypothesized mechanisms.

      This is not the purpose of this work and we do not propose any molecular mechanism. We look for essential ingredients necessary to reproduce spatio-temporal dynamic of DNA replication.

      Techniques such as Repli-Seq or perhaps FORK-seq (recently developed by one of the authors here) might give direct information on the variation of initiation efficiency across the genome.

      We analyse data from Xenopus invitro system that has been extensively used to investigate spatio-temporal pattern of DNA replication. Unfortunately, the referenced genome of this organism is not assembled accurately enough to allow techniques such as Repli-Seq or FORKseq that require mapping procedure on a reference genome. Furthermore, these techniques require a cell population containing more than 107 individuals [3], here we are working with 200000 to 500000 nuclei. Hence without changing model system these techniques could not be applied.

      Substantive Concerns:

      1) The authors refer to each case (MM1-5) as a unique model, but each has more complexity and defines a class of models.

      MM1-5 belong all to the unique class of nucleation and growth process defined as KJMA model. All models are variants of this model. We do not understand the point of referee, if the referee means that each case can represent the data not in a unique manner, we agree with him/here and this is the reason we used a genetic algorithm and not a gradient descent algorithm to minimise the difference between the data and the considered model.

      For example, in fitting MM1, the simplest of all the cases (and with, by far, the worst fit), the fork velocity was fixed at 0.5 kb/min. And yet the real fork velocity is described as having v ~ 0.5 kb/min. Shouldn't this also be a parameter in the fit?

      We chose to keep the velocity as a constant and close to the observed experimental value, as in Xenopus egg extract it is assumed that the fork velocity is constant [4]. But indeed, one could consider fork velocity as a fitting parameter (see the answer to the next point), but this is not in accordance with experimental observations.

      2) Under replication stress, forks can stall, giving an effectively two populations of forks, as proposed by the authors in an earlier work (Ciardo et al., Genes 2019; cf. Fig. 1). Strangely, that paper is not referred to or discussed in this manuscript. Why not?

      Indeed, instead of self-citation of a review article we preferred to refer to original experimental works. Furthermore, in order to change the mean of eye to eye distribution by only changing the speed of replication forks, one should consider that the speed of replication forks should have a value higher than 10kb/min‼! which has not been reported in any organism. To be conservative, we ran a model where the speed of replication forks could take several values ranging between 0 to 3kb/min. The model failed to fit the experimental data. (see the new manuscript and annex 1). Hence, we consider that the best model is the one with constant speed.

      3) Continuous vs. discrete potential origins: The density was fixed to be random at 1 potential origin per 2.3 kb (or 1 kb in part of the paper). How robust are findings to these assumed densities?

      If we consider the density as a free parameter, the model converges with a density of 1 origin every 2.3 kb.

      In general, there does not seem to be a huge difference between the two cases, for the type of data explored. Perhaps it is not worth looking at the discrete case here?

      The difference is that in the “discrete” case the distribution of origins is not continuous and hence there naturally exists a distance between two fired origin where the origin firing is inhibited. The existence of such an origin firing exclusion zone was shown to be necessary to model replication dynamic as measured by DNA combing [5,6].

      4) The definition of goodness of the fit (GoF) should be made more explicitly. How is the norm calculated? There is an implicit sum - the elements should be defined explicitly. Also, the ensemble average < yexp > is not defined. More broadly, it is not clear why we need a custom GoF statistic when it would seem that standard ones (chi square, or - ln likelihood) could serve equally well.

      The defined GoF is a classical normalised chi squared as defined in annex 1. We modified the text to include explicitly the summation over the data points. By definition <yexp> is the average value of an experimental data series. GoF is not a custom defined criterion but the classical normalised chi square [7].

      Note that those statistics (when proper normalization is used) can also work for global fits where each local fit is to a quantity with different units.

      References:

      1. Ge XQ, Blow JJ. Chk1 inhibits replication factory activation but allows dormant origin firing in existing factories. The Journal of Cell Biology. 2010;191: 1285–1297. doi:10.1083/jcb.201007074

      2. Pope BD, Ryba T, Dileep V, Yue F, Wu W, Denas O, et al. Topologically associating domains are stable units of replication-timing regulation. Nature. 2014;515: 402–405. doi:10.1038/nature13986

      3. Petryk N, Kahli M, d’Aubenton-Carafa Y, Jaszczyszyn Y, Shen Y, Silvain M, et al. Replication landscape of the human genome. Nat Commun. 2016;7: 10208. doi:10.1038/ncomms10208

      4. Marheineke K, Hyrien O. Control of Replication Origin Density and Firing Time in Xenopus Egg Extracts ROLE OF A CAFFEINE-SENSITIVE, ATR-DEPENDENT CHECKPOINT. J Biol Chem. 2004;279: 28071–28081. doi:10.1074/jbc.M401574200

      5. Löb D, Lengert N, Chagin VO, Reinhart M, Casas-Delucchi CS, Cardoso MC, et al. 3D replicon distributions arise from stochastic initiation and domino-like DNA replication progression. Nature Communications. 2016;7: 11207. doi:10.1038/ncomms11207

      6. Jun S, Herrick J, Bensimon A, Bechhoefer J. Persistence length of chromatin determines origin spacing in Xenopus early-embryo DNA replication: quantitative comparisons between theory and experiment. Cell Cycle. 2004;3: 223–229.

      7. Bevington P, Robinson DK. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill Education; 2003.

  8. Jan 2021
    1. Author Response:

      Reviewer #3:

      However, a lot of the data presented in the manuscript are not novel and were previously published. A recent Molecular Cancer Research paper by Llabata and collaborators published in April 2020 (referred to in the text) has already identified the same MGA interactors by Mass Spectrometry and the same binding sites by ChIP-Seq using human lung adenocarcinoma cell lines. Llabata et al. found that MGA interacts with the non-canonical PCGF6-PRC1 complex (named PRC1.6) that includes L3MBTL2 and that the complex also contains MAX and E2F6 but not MYC. They clearly show that MAG binds to and represses genes that are bound and activated by MYC convincingly showing that MYC and MGA have opposite functions. This unfortunately tempers the enthusiasm of the reviewer.

      This reviewer states that "... a lot of the data presented in the manuscript are not novel and were previously published". The reviewer goes on to write that the Llabata et al. 2020 paper (referring to doi: 10.1158/1541-7786.MCR-19-0657 [https://mcr.aacrjournals.org/content/18/4/574]) "has already identified the same MGA interactors by Mass Spectrometry and the same binding sites by ChIP-Seq using human lung adenocarcinoma cell lines. Llabata et al. found that MGA interacts with the non-canonical PCGF6-PRC1 complex (named PRC1.6)..." ​

      We strongly disagree with the reviewer's statements.

      1) A major focus of our paper is that it provides and validates a mouse model in which we delete MGA and demonstrate its tumor suppressive activity. The experiments in Llabata et al., including the biological assays and the ChIP_Seq, were done by overexpressing MGA in cells which already express endogenous MGA. Therefore, all their data monitor the consequences of overexpression of MGA, a situation without clear biological relevance. In the experiments reported in our paper, we delete MGA. Therefore our molecular data refer to a comparison between MGA null and the same cells expressing endogenous MGA. This is important since MGA is a tumor suppressor and its loss of function is what is crucial biologically, as we show here or the first time in our lung adenocarcinoma model. Furthermore, by deleting MGA we were able to show that its loss corresponds to an increase in a core set of target genes previously associated with PRC1.6. Furthermore, we show that members of this core group are relevant to the proliferation of tumors that lack MGA.

      2) The PRC1.6 complex has been known to be associated with MGA since at least 2012 as indicated in our references cited. Llabata et al confirmed that result. Our paper reports PRC1.6 subunits are associated with MGA through the DUF4801 domain of MGA. This is the first identification of the interface between PRC1.6 and MGA. It is important and relevant because multiple frame shift mutants in MGA have the consequence of deleting this region in a wide range of tumor types.

  9. Dec 2020
    1. Author Response

      Response to reviews:

      We appreciate the relevant comments sent to us in this review. We have already revised the paper and we addressed those points in our revised manuscript. There is a particular point, which we had not explained in enough detail in our original version of the paper, and which we believe has led the reviewers to not appreciate a central aspect of our study. We wish to clarify this below:

      The reviewers stated that the idea proposed in our study that the "drift rate corresponds to signal-to-noise ratio" is a quite accepted one in DDM research, which typically assumes that the "within-trial noise" magnitude is fixed (and does not vary with condition), while drifts do. From this, it also followed that one the models we examined (and rejected; our model 2) appears to be a 'strawman', which one would NOT seriously consider.

      REPLY:

      This statement could be correct with regard to the DDM framework, within the domain of perceptual choice. However, we focused here on the DDM extension to value-based decisions, and we believe that the statement above is no longer accurate.

      1. Noise magnitude and within-trial sampling variability in value-based DDM

      Whereas perceptual choices are usually brief (often between .5 - 1 sec) and the stimuli they present are often static (lines or strings of letters in a lexical decision), value-based decisions take longer (typically around 2-5 sec), and the values for which they accumulate evidence are not “given” but rather need to be generated (sampled) during the decision itself. While all versions of the DDM include accumulation noise, the difference pointed out above has made its application to perceptual decisions assume that the "accumulation" term is constant and does not vary with task difficulty (this was also motivated by the attempt to minimize model parameters, so it was thought one could keep this parameter fixed). While this practice has been criticized (Donkin, Brown & Heatcote, 2009), the fact that the tasks involve short and roughly static stimuli (so the accumulation noise may be small compared with noise that appears between trials, when the same stimulus is presented again) has led most researchers to either assume the accumulation noise is fixed or to neglect it altogether (in favor of between trial noise; LBA-model).

      The first application of a sequential sampling model to value based decisions was the decision-field theory (DFT) model (e.g., Busemeyer & Diederich, 2002; Hotaling & Busemeyer, 2012). In this model, accumulation is driven by attentional switches between dimensions that are relevant to distinguish the stimuli, resulting in an explicitly noisy accumulation. More recent application of the DDM to value-based decisions (e.g., Krajbich et al, 2010; Tajima et al., 2016) are consistent with this idea. For example, as described by Tajima et al, the values of the alternatives are not "known" by the subject (even if the alternatives are in full view), but rather they are sampled from a distribution whose width corresponds to their previous experience or knowledge of the alternatives. Thus, in this framework, the within-trial accumulation becomes an intrinsically noisy process. Moreover, as mathematically proved in the DFT model, the within trial accumulation noise is determined by the variance of the sampled values. As long as it was assumed that the distributions of rewards associated with each alternative had equal width, it was possible to assume that the noise term was constant. Since we now know that alternatives vary not only in their attractiveness rating, but also on their certainty about such rating, the most natural assumption is that subjects accumulate value “evidence” by sampling from Gaussian distributions whose means correspond to the options’ value ratings and whose variances correspond to the options’ value uncertainties. This leads directly to our Model 2.

      We understand that this model cannot account for the observed data, and in this sense it is not a true contender. However, given the theoretical rationale above, we believe that showing this explicitly should have (at least) a didactical value for the readers of this literature, who want to understand how certainty should be addressed. Obviously, our results support an alternative model in which the drift of the accumulation process (and not the noise) is affected by the certainty of the alternatives. While this is consistent with what the reviewers believe to be expected, in our reading of the value-based decision literature, we did not find any model in which this was explicitly stated or tested. We believe that these results will motivate further investigation into the mechanism that generates this "normalization" (we aim to discuss a few options in our Discussion section).

      1. More detailed DDM explorations for the certainty effect

      We agree that a more detailed investigation of variants of our Model 4 would be informative. Both reviewers have provided very helpful and relevant suggestions, which we have addressed in our revised manuscript.

      For example, we examined a variant of Model 4 in which the drift decrement with uncertainty is non-linear (we introduced an exponent to characterize this). The model fitting results show that, indeed, this model flexibility is beneficial, resulting in better fits (including the flexibility costs). While the average exponent is close to 1 (the average across the group is .85), there is significant variability between subjects resulting in improved data fits. We also carried out a median-split analysis based on the certainty of the options, in which we allowed both the drift and the accumulation noise to vary with certainty. The results were consistent with our previous conclusions, showing that certainty affects the drift but not the accumulation variability. While this may go beyond the scope of the present paper, we will discuss potential mechanisms that might cause these results.

      References:

      Busemeyer, J. R., & Diederich, A. (2002). Survey of decision field theory. Mathematical Social Sciences, 43(3), 345-370.

      Donkin, C., Brown, S. D., & Heathcote, A. (2009). The overconstraint of response time models: Rethinking the scaling problem. Psychonomic Bulletin & Review, 16(6), 1129-1135.

      Hotaling, J. M., & Busemeyer, J. R. (2012). DFT-D: A cognitive-dynamical model of dynamic decision making. Synthese, 189(1), 67-80.

      Krajbich, I., Armel, C., & Rangel, A. (2010). Visual fixations and the computation and comparison of value in simple choice. Nature neuroscience, 13(10), 1292-1298.

      Tajima, S., Drugowitsch, J., & Pouget, A. (2016). Optimal policy for value-based decision-making. Nature communications, 7(1), 1-12.

    1. Author Response

      Author Response refers to a revised version of the manuscript, Version 2, which was posted on December 17, 2020 (https://doi.org/10.1101/2020.08.28.271643).

      Summary: This is a very interesting study addressing the question of microtubule cytoskeleton reorganization in the immunological synapse. Specifically, the work demonstrates the contribution of KIF21B for the control of the T cell microtubule (MT) network required for T cell polarization during immunological synapse formation. The authors use a variety of microscopy techniques, including expansion microscopy, controlled perturbations of the cell, and computer simulations to generate their results. The authors show that knockout of KIF21B results in longer MTs that result in an inability to polarise the MT network by a mechanism consistent with dynein motor function at the immunological synapse to capture long MTs and center the MT aster at the synapse. They use the Jurkat cell line, which is a classical model for this step in immune synapse function and fully appropriate. They show that KIF21B-GFP can rescue the knockout phenotype and then use this as a way to follow KIF12B dynamics in the Jurkat cells. KIF21B works by inducing pausing and catastrophe, thus, more MTs are shorter when present. They also rescue the defect in the KIF21B KOs with 0.5 nM vinblastine, that directly increases catastrophes, shortens the MTs and restores MT network polarization to the synapse. As a functional surrogate they investigate lysosome positioning at the synapse, which is one of the proposed functions of this cytoskeletal polarization. The use of expansion microscopy in this system is relatively new and clearly very powerful. The modelling component adds to the story and supports the sliding model proposed by Poenie and colleagues in 2006, but cannot say that there is no component of end capture and shrinkage as proposed by Hammer and colleagues more recently. Experiments and modelling are performed to a high standard and the results advance the field.

      We thank the reviewers for their thoughtful and constructive suggestions, and for the positive feedback.

      Reviewer #1:

      This is an excellent study of centrosome polarization in the process of establishing immunological synapse and the effect of kinesin-4 on this process. The authors use a variety of microscopy techniques and controlled perturbations of the cell to obtain beautiful images that clearly suggest that kinesin-4, by increasing frequency of pauses and subsequent MT catastrophes, limits MT length, which assists dynein pulling in polarizing the centrosome. They complement the experiments with modeling based on Cytosim; the model supports the conclusions from the data, and suggests some interesting ideas.

      I am not an expert in experimental techniques, though I understand what's been done, and in my limited opinion, the results are first-rate. The paper is well written and accurate. Modeling, which I know intimately, is done very well.. I have just a few minor comments:

      1) I was not quite clear what does the modeling say about the centrosome sometimes being in apical position, and sometimes half-way between apical and basal positions.

      The model predicts the centrosome to be either in an apical or a basal position, while in the experimental data from the KIF21B knockout cells, it can be polarized halfway. Our results indicate that in the knockout cells, the MT network is under a constant force pointed towards the synapse. This force can lead to major deformations of the nucleus and the centrosome can indent the nucleus. This indentation allows the centrosome to be located at a position half-way between an apical and a basal position. In our simulations, we assume that the nucleus is relatively stiff and cannot change size or shape. Therefore, we only find centrosomes at the apical or basal side. To clarify this point, we added a text to the 6th paragraph of the Discussion:

      “Our simulations suggest that when centrosome translocation is impaired, the MT network is experiencing balanced forces. As a consequence, we predict that in these situations one would observe major deformations of the nucleus because it is trapped in a contracting cage of MTs spanning between the centrosome and the synapse. These deformations could also allow the centrosome to be located half-way between an apical and a basal position of the cell (Figure 4H). In our simulations, we assume a relatively stiff nucleus and therefore we only find the centrosome in an apical or basal position. It could be also possible that nuclear deformations push MTs towards the synapse, where they form dense peripheral MT bundles to accommodate the least curvature (Figure 2A and B).”

      2) I understand that 2d modeling cannot address this issue explicitly, but can the authors speculate about the apparent ring of MTs along the periphery of the synapse in the non-polarized case?

      The MTs in the non-polarized case of some of the panels in Figure 2 and S2B are densely located along the periphery of the synapse. This could indicate that dynein-mediated force generation actively binds these MTs to the synapse plane through multiple motors. Another option could be that these systems are force-balanced, and thus the nucleus is experiencing a downward force. The deformable nucleus would then push all surrounding MTs down into the synapse plane as well, creating this phenomenon of MT alignment along the synapse plane. From our current data, we cannot distinguish the two processes. However, we added a text on the deformability of the nucleus to the 6th paragraph of the Discussion (page 19 of the revised paper):

      “Our simulations suggest that when centrosome translocation is impaired, the MT network is experiencing balanced forces. As a consequence, we predict that in these situations one would observe major deformations of the nucleus because it is trapped in a contracting cage of MTs spanning between the centrosome and the synapse. These deformations could also allow the centrosome to be located half-way between an apical and a basal position of the cell (Figure 4H). In our simulations, we assume a relatively stiff nucleus and therefore we only find the centrosome in an apical or basal position. It could be also possible that nuclear deformations push MTs towards the synapse, where they form dense peripheral MT bundles to accommodate the least curvature (Figure 2A and B)."

      3) My perhaps most significant comment: the model nicely integrates and explains the data, but is it predictive? A detailed model like that clearly can generate some nontrivial prediction that could be experimentally tested.

      As recognized by the reviewer, the main focus of our model was to “integrate and explain the data”. Nonetheless, we can draw at least two nontrivial predictions from the model. A strong prediction with important consequences is the length regulation of MTs by only a small number of KIF21B molecules. This length regulation mechanism could be tested in a reconstituted in vitro system in which the dependence on the number of KIF21B molecules can be systematically changed, or by exact quantification of KIF21B units through fluorescent labeling. This prediction could also potentially be tested in vivo, by the rescue of KIF21B knockout with KIF21B-GFP at different expression levels. However, these experimental validations of the small number of involved KIF21B molecules are very laborious and beyond the scope of this study. The second prediction is related to the KIF21B knockout system. In such a system the centrosome is not repositioned to the synapse. Our simulations suggest that in this case, the MT network is under constant force, but not able to rearrange. Therefore, we predict strong deformations of the nucleus by the MT network. However, we did not directly investigate such deformations in our simulations in which the nucleus is a rather stiff object. To emphasize the predictions from our model, we added the following text in the 4th paragraph of the Discussion (see above).

      4) "Interestingly, in our simulations, a small number of KIF21B motors was sufficient to prevent the overgrowth of the MT network." - this is a bit counter-intuitive: if the motor number is less than MT number, how would this work? Or, by a "small number of KIF21B motors" you mean still greater than ~ 100?

      We agree with the referee that at first sight, it may seem counterintuitive that 10 KIF21B motors can regulate 100 MTs. Key is to realize that length regulation by KIF21B is a very dynamic process. The motor binds to a MT, induces its shrinkage, detaches, and is ready to bind to a different MT. If this happens in about 10s, 10 motors can induce shrinkage of 100 MTs in about 100s. A single motor molecule can thus initiate shrinkage of several different MTs within a short time. To clarify this point, we added a text as explained above in the answer to the second major concern raised by the reviewers.

    1. Author Response:

      This response corresponds to the essential revisions sent to the authors after review.


      1) Further characterization and clarification are needed regarding the sensor properties. This is crucial for the potential users in the field to judge and use the sensor, and for interpretation of the biology results using the sensor.

      We are grateful to the reviewers and editors to raise such important questions regarding the characterization of sensor properties. The feedback surely contributes to clarify important aspects of the sensor.

      i) Clear statement in prominent places about the improvement of the sensor and new potential for its biologic applications separating from the authors' 2015 paper.

      Previous enzyme-based biosensor designs, including the ChOx biosensor described in our publication on 2015 (Santos et al, 2015), were based on the differential coating of electrode sites with matrices containing or lacking ChOx. This modifications render the sites Ch- sensitive or insensitive, respectively. The latter have been termed “sentinel” sites, as they are designed to respond to any perturbation except to the analyte of interest (Ch in this case). By subtracting the sentinel from the Ch-measuring site, this approach has been useful to decrease the contribution of interferent signals, namely caused by electrochemical oxidation of electroactive compounds or by voltage fluctuations associated with LFP. However, cross- talk caused by H2O2 diffusion from enzyme-coated to sentinel sites poses important constraints on this design. The inter-site spacing required to avoid diffusional cross-talk leads, for example, to uncontrolled differences in the amplitude and phase of LFP across sites, compromising common-mode rejection.

      In the current study, we have circumvented diffusional cross-talk-related limitations by implementing a novel sensing approach. Rather than changing the coating composition across recording sites, we have differentially modified their electrocatalytic properties towards H2O2, resulting in Ch-sensitive and pseudo-sentinel sites. As Ch responses depended solely on the intrinsic properties of the metal surface, we could dramatically reduce the size and increase the spatial density of recording sites by using tetrode configuration. Tetrodes, a bundle of four twisted wires glued together, are conventionally used for separating single neuron action potentials based on the spatial structure of their action potentials across wires. Here, the spatial structure of the electrochemical signal is created by electrochemical modification of wires. Importantly this design allows the unbiased measurement of ChOx activity and O2 in the same brain spot by using a tetrode site to directly measure the latter. This has not been possible to achieve with conventional enzyme-based biosensor designs, including our own previous stereotrode design.

      We acknowledge that the improvements of the TACO sensor over our previous stereotrode design, published in 2015 (as well as other conventional enzyme-based biosensors in general), were not clearly emphasized in the manuscript. We added new paragraphs/sentences in the introduction and results of the revised manuscript (page 4 lines 10-16, page 5 lines 6-15 and page 6 line 8) highlighting the main difference between the two sensors and advantages of the new design for the unbiased measurement of the signals derived from ChOx activity (COA) and O2.

      ii) Regarding the choline responses: characterizing the linearity of choline response is important for users to understand the sensor properties.

      Responses to choline were highly linear within the concentration range tested (up to 30 μM). This information was added to Table 1 and mentioned in the text (page 7, line 18) of the revised manuscript.

      Related, demonstration how to calibrate moving artificial signals in freely-moving rodents will be useful for the future applications.

      Movement can cause electromagnetic or mechanical perturbations (movement artifacts) that are expected to scale with the impedance of individual recording sites. As the same applies for LFP-related currents, it is not trivial to discriminate both confounds. Nevertheless, our common-mode rejection approach, which is optimized by a frequency-domain correction of electrode impedances (please check Methods section, page 40, for detailed explanation), is designed to optimally remove both LFP- and movement-related artifacts.

      In our freely-moving recordings we did not have prominent movement-related perturbations, probably due to the proximity of the head-stage to the sensor and the shielding effect of the grounded copper mesh that covers the implant. Nevertheless, candidate events likely caused by movement consisted in current deflections aligned to locomotion bouts, which were completely removed by common-mode rejection. In the revised manuscript we added the average raw traces triggered on locomotion bouts in Figure 2D, highlighting the usefulness of our method to remove putative movement-related artifacts in addition to LFP and other interferents. We have also added a brief mention to this issue in page 10, lines 32-35 and page 11, lines 1-2.

      Further, since the COA signal is confounded by phasic O2 fluctuations, the authentic changes in COA are potentially interfered by O2-evoked enzymatic responses. The interpretation of the signal interference needs to be clearly discussed, including O2-evoked changes, and other related signaling changes, like DA.

      The main focus of our study was to investigate the effect of physiological O2 fluctuations on the ChOx biosensor signal, which is given by the activity of immobilized ChOx, which we abbreviate as COA across the manuscript. In order to address this issue in an unbiased manner it is essential to clean artifacts that directly generate currents on the electrode surface (please see response to point 1vi for details). Our TACO sensor was designed to optimize the removal of such confounds, resulting in a clean COA signal. As this signal reflects the activity of immobilized enzyme, it is sensitive to changes in O2, not only Choline. Thus, the COA signal is not confounded, but rather modulated by changes in O2. Our main finding was that phasic O2 modulation of COA is a major confound of phasic Ch dynamics measurements using ChOx sensors in vivo in the brain. In this sense, the central tenet of the paper is that COA is not reflecting an authentic choline concentration dynamics, but rather a nonlinear function of Ch and O2 dynamics, with no feasible analytical approach to separate the two.

      We recognize that, in the Methods section, the description of how the COA signal was computed could lead to confusion between authentic COA and authentic Ch measurement. In the revised manuscript we have changed the terms used in the signal cleaning procedure (page 40-41).

      Regarding neurochemical confounds (e. g. ascorbate or dopamine and other monoamines), we acknowledge that the description of multichannel sensor properties in Table 1 could be confusing to readers. The table was also not conveying the important information on how sensitive is our COA measurement to these artifacts. In the revised manuscript we have removed the information about selectivity ratios for individual sites. Instead, the table section now called “Analytical properties for COA measurement” was expanded and now shows DA and AA sensitivities and selectivity ratios for the COA signal, computed from the difference between Au/Pt/m-PD and Au/m-PD sites.

      Additionally, we added a column in the color plot in Figure 1E describing the relative responses of the COA measurement to the different factors. This addition highlights the high selectivity of the COA signal for Ch, as compared with individual sites.

      Finally, we have detailed the interpretation of the freely-moving signals triggered on SWRs and locomotion bouts. In the Methods section of the revise manuscript (page 41, lines 4-11), we clarify how the differential signals COAnon-mPD and NCC (neurochemical confounds) presented in Figure 2 (revised version) were computed. In the description of these results, we also explain how the response patterns of raw and cleaned signals can be used to infer the contribution of different sorts of artifacts, including movement- and LFP-related and those caused by neurochemicals (page 10 lines 26-35, page 11 lines 1-5).

      iii) The dimensions of the sensor head need to be specified and spelled out clearly. It seems to be around 50 um, but the text seems to suggest 150 um. The individual sensing elements are 17 um in diameter. If this is true, it is very exciting because it exhibits hemispherical diffusion yielding higher response and enhanced sensitivity. This may improve spatial and temporal resolution if this is in indeed a much smaller sensor as a disk-shaped one.

      We thank the reviewers for referring to this point. It is an important detail that was not clearly stated in the manuscript. In the Methods section (page 34 of original manuscript), the description of the insertion of the tetrode inside a silica tube might have been misleading. In fact, the tetrode actually protrudes 1-2 cm out of the silica tube. This distance assures that the latter is not in contact with the brain in in vivo recordings. The cutting of the twisted ending of the tetrode results in four disc-shaped sensing elements with 17 μm diameter. The total diameter of the tetrode is approximately 60 μm. In the revised manuscript we have clarified and emphasized these details in the Methods section (page 36 lines 10, 15-16), in the results (page 6, lines 3-5) and with an additional cartoon in Figure 1A.

      iv) The role of the sentinels with differential plating is very interesting, but the function of the sentinels is not clear (p. 4 "canceling LFP-related currents"). They consume oxygen. Why does this not result in overlap of the diffusion layer for the choline sensor and therefore affect choline response? Please explain why differential electroplating was employed.

      We further clarified the role of the pseudo-sentinel sites on the removal of LFP-related currents and neurochemical artifacts and expanded the reasoning behind this approach. Please check the Introduction of the revised manuscript (page 4 lines 4-18, page 5 lines 6- 15).

      When polarized at +0.6 V vs. Ag/AgCl, the pseudo-sentinel channels display a residual activity towards electrochemical oxidation of H2O2. This electrochemical reaction generates O2, but the effect on the local O2 concentration is negligible due to the poor sensitivity and very small electrode surface area (17 μm diameter disc). We measured O2 (head-fixed mice and in vitro) by electrochemical reduction at -0.2 V vs. Ag/AgCl at a pseudo-sentinel site (gold-plated without m-PD). In this case O2 is consumed, but at a very limited extent that does not affect the local O2 level in the sensor. In accordance with the expected lack of effect on O2 levels, we have confirmed that switching the applied potential on a gold-plated site between +0.6 V and -0.2 V vs. Ag/AgCl has no effect on the COA signal. In the revised manuscript we added a supplementary figure (Figure S4) describing this observation. Accordingly, we extended the discussion of this topic in the results section (page 13, lines 17-18).

      v). Please explain how time-dependent behavior of the sensor was measured. This process typically leads to the formation of a film on this electrode surface which can affect sensitivity. According the authors' 2015 paper, the method for measuring the response time seems rather crude, and may overestimate the response time which is related to the mixing of the solution. This needs to be discussed.

      The sensor response times were estimated from the rise of the current in response to analyte additions in a stirred buffer solution, as described in the Methods section (page 40, lines 9-10 of revised manuscript). In the revised manuscript, we added a sentence to further clarify the use of this setup to estimate response times (page 37, line 29). Indeed, this setup is not the most appropriate to precisely determine response times due to the bias introduced by the analyte mixing time after its addition to the buffer. Our previous study (Santos et al, 2015) suggests however that the biggest contribution to the estimated response time is due to diffusion of Ch in the sensor coating. Besides the fact that we cannot precisely determine response times, it is noteworthy that real response times are faster than the values we report. This further highlights the high temporal resolution of the TACO sensor. We added a paragraph discussing this topic in the revised manuscript (page 7, lines 19-21).

      vi). The effect of LFP and other perturbations of sensor responses need to be more clearly explained.

      Two main types of artifacts affect the response of enzyme-based electrochemical biosensors: electromagnetic or electrochemical sources that directly generate currents at the electrode surface and biochemical factors that affect the activity of the immobilized enzyme. The first group can be sub-divided into: a) artifacts that generate faradaic currents, arising from oxidation/reduction of electrochemically active molecules, such as ascorbate or dopamine; b) artifacts that change the charge distributions at the electrode surface, generating capacitive currents, which in the brain are mainly caused by local fluctuations in field potentials (LFP) generated by the transmemberane current sources of the surrounding neural tissue. Effectively, LFP causes potential changes at the electrode surface who’s voltage is clamped by the potentiostat circuit, giving rise to apparent current, similar to voltage clamp measurement of the intracellular current. The second group, consisting in biochemical artifacts, comprises mainly the effect of oxygen on enzymatic activity (although other factors such as temperature and pH might have a minor effect, as discussed in the manuscript, page 34, lines 16-20).

      Importantly, the strategies devised to reduce artifacts that directly generate electrochemical currents (chemical surface modifications or common-mode rejection) do not control for factors influencing immobilized ChOx activity.

      Since O2 interference was the main focus of the paper and is thoroughly described throughout the manuscript, in the Introduction of revised manuscript we extended the description of the factors directly generating currents on the electrode surface (page 4, lines 4-18).

      2) Re-organization of the manuscript to improve the readability. This manuscript contains the characterization of the TACO sensor and application of this sensor to monitor real-time behavior in freely moving rodents. The design and characterization of the sensor is intermingled with the application of studying the choline biology with the sensor, making the logic flow hard to follow. The arrangement and presentation of the figures need to be improved so readers can appreciate both characterization and applications aspects and how they are tightly linked. This might also involves properly arrange main figures and associated supplementary figures.

      We believe this suggestion stems from the expectation that we may have conveyed to the readers regarding the possibility of measuring authentic Ch dynamics in behaving animals with our TACO sensor. Indeed the TACO sensor design makes it ideally suited for the unbiased measurement of brain Ch dynamics based on ChOx, while controlling for O2 changes that might modulate immobilized enzyme activity. However, our data shows that phasic ChOx activity (COA) is dominated by O2 fluctuations in the brain of behaving animals. The complexity of the nonlinear interplay between COA and O2, which depends on multiple time-scale concentration dynamics of both enzyme substrates made it impossible to extract authentic Ch from the in vivo COA signal.

      Following the logic of data presentation in our manuscript, the initial description of TACO sensor design and properties towards COA measurement was followed by its in vivo application in freely-moving and head-fixed rodents, which led to the discovery of the possible O2 confound. This, in turn, prompted the next in vivo experiments with causal manipulations to prove the hypothetical confound effect. Next, in vitro experiments were used for more systematic investigation of the details of the confound and its underlying causes guided by the prior in vivo observations. Finally, we used a detailed mathematical model to quantitatively uncover the mechanism of the oxygen confound of the choline-oxidase-based biosensor.

      We think this logic of exposition is guiding the reader through our thought process and progresses consistently from the development of novel methodology to evaluation and identification of the confound, and then to unraveling the mechanism in vivo, in vitro and in the model. Reversing the order of presentation would break this logic and hurt the presentation of the story.

      We would like to ask the editor for her consent not to follow the suggested major reorganization. Instead, we clarified the internal logic at the end of the introduction section (page 5, lines 16-23), as well as throughout exposition of the results. Morevover, throughout the revised manuscript we emphasize the focus of our study on phasic COA dynamics instead of putative Ch by replacing terms alluding to the latter by “COA”. Accordingly, we better articulated the motivation for assessing SWR- and locomotion-related signals in freely- moving animals (Figure 2) and the interpretation of these results to avoid a biased expectation of the reader that COA signals provide authentic Ch readout. The revised manuscript now provides an unbiased perspective on motivation and interpretation of the in vivo experiments (page 10 lines 19-22, page 11 lines 5-12). The bias of COA by O2 and the issues associated with derivation of authentic Ch dynamics from our measurements were also further explained in the discussion (page 34, lines 35-37). Along the same lines, we have trimmed Figure 2 in order to keep the focus of the paper on phasic dynamics of the COA signal. Namely, we moved panels B and C describing tonic COA dynamics in the original manuscript to a supplementary figure in the revised version (Figure S3).

    1. Author Response:

      We would like to thank the reviewers for taking the time review our manuscript. The comments below have been thought-provoking and will inspire several new analyses that we hope address concerns. In particular, we will carefully reappraisal the framing of the results, shifting away from a false dichotomy of “this is perception” and “this is binding”, and towards more restraint terminology that discusses the shift in balance between perception and binding. Moreover, we will expand our analysis of theta-gamma phase-amplitude coupling beyond the hippocampus and to the whole brain.

      We answered each comment in turn, first by providing a general response to the comment and then by providing an outline of the explicit action we will take to address this issue.

      Reviewer #1:

      This MEG study by Griffiths and colleagues used a sequence learning paradigm which separates information encoding and binding in time to investigate the role of two neural indexes - neocortical alpha/beta desynchronization and hippocampal theta/gamma oscillation - in human episodic memory formation. They employed a linear regression approach to examine the behavioral correlates of the two neural indexes in the two phases, respectively and demonstrated an interesting dissociation, i.e., decreased alpha/beta power only during the "sequence perception" epoch and increased hippocampal theta/gamma coupling only during the "mnemonic binding" phase. Based on the results, they propose that the two neural mechanisms separately mediate two processes - information representation and mnemonic binding. Overall, this is an interesting study using a state-of-art approach to address an important question. Meanwhile, I have several major concerns that need more analysis and clarifications.

      Major comments:

      1) The lack of theta-gamma coupling during stimulus encoding period is possibly due to the presentation of figure stimulus, which would elicit strong sensory responses that mask the hippocampus activity. How could the author exclude the possibility? In other words, the dissociated results might derive from different sensory inputs during the two phases.

      Response: The reviewer raises a good point; However, we feel this is already addressed by our use of memory-related contrasts. The masking of an effect that arises due to stimulus presentation would be consistent across all memory conditions, and therefore subtracted out in any contrast between these conditions. The analyses in our original submission use this approach to avoid such a confound. Furthermore, previous studies (e.g. Heusser et al., 2016, Nat. Neuro.) have demonstrated that hippocampal theta-gamma coupling can arise during stimulus presentation, suggesting strong sensory responses do not, generally speaking, mask measures of theta-gamma coupling.

      Action: We will explain the potential concern about masking in the main text, and also explain how we have addressed such a concern with the use of contrasts.

      2) About the hippocampal theta/gamma phase-power coupling analysis. I understand that this hypothesis derives from previous research (e.g., Heusser et al., 2018) as well as the group itself (Griffiths et al., PNAS, 2019). Meanwhile, MEG recording, especially the gradiometer, is known to be relatively insensitive to deep sources. Therefore, the authors should provide more direct evidence to support this approach. For instance, the theta/gamma analysis relies on the presence of theta-band and gamma-band peak in each subject. Although the authors have provided two representative examples (Figure 3A), it remains unknown how stable the theta-band and gamma-band peak exist in individual subject.

      Action: We will plot the data for all participants to demonstrate the stability of the theta/gamma band peaks.

      Additional response: In regards to the concerns to the MEG gradiometers being relatively insensitive to deep sources, we feel it is worth noting that a recent review (Ruzich et al., 2019, Human Brain Mapping) identified 29 studies that had reported successful hippocampal measurements when only using gradiometers, suggesting our use of gradiometers is not unprecedented nor unjustified. Furthermore, in their recommendations for optimising hippocampal recordings with MEG, the old wisdom of using magnetometers rather than gradiometers is conspicuous in its absence in the review – perhaps because while magnetometers have a greater theoretical potential to detect deep signal, they also have greater theoretical potential to pick up noise, so the signal-to-noise ratio (which, arguably, is key here) for deep sources may not differ so much between gradiometers and magnetometers.

      3) Related to the above comment, the theta-gamma coupling is a brain-wide phenomenon including both cortical and subcortical areas and not limited to just hippocampus. Although the authors have performed a control analysis to assess the behavioral correlates of the coupling in other regions, the division of brain region is too coarse and I am not convinced that this is a fair comparison, since they differ from hippocampus at least in terms of area size in the source space. The authors could consider plotting the power-phase coupling distribution in the source space and then assessing their behavioral correlates, rather than just showing results from hippocampus. This result would be important to confirm the uniqueness of the hippocampus in this binding process.

      Response: We concur that the plots currently do not demonstrate the specificity of the hippocampus, and whole brain images would better demonstrate the effect.

      Action: As suggested by the reviewer, we will plot theta-gamma coupling across the brain.

      4) About behavioral correlates. The current behavioral index confounds encoding and binding processes. Is there any way to seperate the encoding and binding performance from the overall behavioral measurements? It would be more convincing for me to find the two neural indexes at two phases predict the two behavioral indexes, respectively.

      Response: This is a really interesting idea, but one which perhaps requires a different experiment paradigm. For associative memory, we would argue that binding is an essential step for the successful encoding of a memory, so it would quite possibly be impossible to separate the two processes in the paradigm used here. That said, a different paradigm that compared associative memory to, say, item memory, may be able to answer such a question.

      Action: We will discuss this as an avenue of future research within the discussion.

      5) The author's previous works have elegantly shown the two neural indexes during fMRI and intracranial recording in episodic memory. The current work, although providing an interesting view about their possible dissociated functions, only focuses on the memory formation period (information encoding and binding). Given previous works showing an interesting relationship between encoding and retrieval (Griffith et al., PNAS, 2019), I would recommend the authors to also analyze the retrieval period and see whether the two indexes show consistent dissociated function as well.

      Response: Yes, we completely agree. We had included this in a previous draft of the manuscript, and found a consistent dissociation here, where alpha/beta power decreases accompanied retrieval (perhaps linked to the representation of retrieved information) and theta-gamma coupling did not (perhaps due to the absence of a need to bind stimuli together in order to complete the retrieval task). We had cut this section to make a more streamlined manuscript, but have no qualms adding this back in.

      Action: We will include the same central analyses, this time conducted at retrieval.

      Reviewer #2:

      In this manuscript, the authors examine the neural correlates of perception and memory in the human brain. One issue that has plagued the field of memory is whether the neural processes that underlie perception can be dissociated from those that underlie memory formation. Here the authors directly test this question by introducing a behavioral paradigm designed to dissociate perception from mnemonic binding. In brief, while recording MEG data, they present subjects with a sequence of visual stimuli. Following the sequence, the subjects are instructed to bind the three stimuli together into a cohesive memory, and then are tested on their memory for which pattern was associated with an object, and which scene. The authors investigate changes in alpha/beta power and theta/gamma phase amplitude coupling during two separate epochs - perceptual processing and mnemonic binding. Overall, this is a well written and clear manuscript, with a clear hypothesis to be tested. Using MEG data enables the authors to draw conclusions about the neurophysiological changes underlying both perception and memory, and establishing this dissociation would be an important contribution to the field. I think the conclusions are justified, but there are several issues that should be addressed to improve the strength and clarity of the work.

      The fundamental premise of the task design is that subjects view a sequence of stimuli, and then separately at a later time actively try to bind those visual stimuli together as a memory. However, it is entirely possible, and even likely, that memories are being formed and even bound together as the subjects are still viewing the sequences of objects. How would the authors account for this possibility? One possible way would be if there were a control task where subjects were just asked to view items and not remember them.

      Response: Indeed, it is impossible to be certain that no binding is occurring during sequence presentation, and the terminology used in the original submission is ill-fitting as a result. However, we would argue that there is a shift in the ratio between perception and binding across the encoding task, with greater perceptual processes arising during the presentation of the sequence relative to the “associate” cue (as this is when the items are presented), and greater associative processes arising during the “associate” cue (as this is when all items are available for binding). To suggest that the two processes can be completely separated would be erroneous, but we feel it is also difficult to argue that there is no shift in balance between the two processes over the course of the encoding task. Importantly, linking a shift in balance between the two processes (binding/perception) with neurophysiological correlates (alpha-beta/theta-gamma) is sufficient for our main conclusion.

      Action: We will carefully rephrase the manuscript in such a way that it no longer implies that there is a perfect separation of perception and binding, but rather a shift in the balance between the two processes.

      Note on a “control” task: In our view, the control task proposed by the reviewer is captured by the “forgotten” condition – participants view the items, but do not subsequently remember them.

      Another possibility would be to examine the trials that the participants failed to remember correctly. Presumably, one would still see the same decreases in alpha power. Yet it seems from the data, and the correlations, that during those trials that were not remembered properly, alpha power changed very little. Of course, it is unclear in these trials if failed memory is due to failed perception, but one concern would be that this would imply that decreases in alpha power are relevant for memory too. It would be helpful to see how changes in alpha power break down as a function of the number of actual items remembered. It would also be helpful to know how strong these correlations actually are.

      Note: We are a little unsure of what the reviewer is suggesting here, as we feel that most of these analyses were included in the main text. The response below re-cap of the results and how they link to our interpretation of the reviewer’s comment, but if we have misunderstood the point, we would be willing to re-address it in a subsequent revision.

      Response: In the original submission, we had focused solely on the memory-related change in alpha/beta power (that is: the contrast “2 items recalled” > “1 item recalled” > “no items recalled”). Therefore, the inferential statistics allow us to conclude that a relative decrease in alpha/beta power correlates with an increase in number of items recalled. What the analyses in the original submission do not show is that alpha/beta power changes from baseline (that is, are all items perceived [i.e. as indexed by a power decrease], or just the remembered items?). This is something we’d be happy to address in the revision

      Action: We will probe the change in alpha/beta power following stimulus presentation, and ask whether alpha/beta power decreases are present for all memory conditions, or only when the items are subsequently remembered.

      A related issue is with respect to hippocampal PAC. The authors investigate this during the mnemonic binding period. Yet they also raise the possibility in discussion that this could also be happening during perception, which goes back to the point above. Did they analyze these data during perception, and are there changes with perception that correlate with memory? This would suggest that binding is actually occurring during this sequence of visual stimuli.

      Response: We did indeed analyse the data during perception in the original submission (see lines 127-128; figure 3d) and found no evidence to suggest that memory-related PAC varied during perception. In an additional analysis, we also examined with PAC varied as the sequence progressed (that is, does PAC change from the first item to the second, and from the second to the third?), but found no evidence to suggest it does. Together, these results would suggest that putative binding mechanisms are not dominating the sequence perception phase of encoding.

      Action: We will supplement the original analyses of PAC during sequence perception (collapsed over the three epochs) with additional analyses investigating PAC fluctuations over the course of the presentation of the sequence.

      The authors perform a whole brain analysis examining the correlation between alpha power and memory to identify cluster corrected regions of significant. However, the PAC analysis focuses only on the hippocampus, raising the question of whether these results can account for the possible comparisons one could make in the whole brain. They do look at four other brain regions for PAC, which it would be helpful to account for. In addition, are there other measures of mnemonic binding that are significant? For example, theta power, or even gamma power?

      Response: We had focused our PAC analyses on the hippocampus because of our a priori hypotheses but appreciate that only showing data from the hippocampus would obscure the whole picture. Our analyses did not uncover convincing evidence for changes in theta or gamma power, but we will report these in the main text.

      Action: We will present the PAC results across the whole brain. We will add analyses into theta and gamma power.

      The authors note in the discussion that the magnitude of hippocampal gamma synchrony has been shown to be related to the decreases in alpha power. Is this also true in their data?

      Action: We will include an additional analysis probing the correlation between hippocampus theta/gamma activity and neocortical alpha/beta power

      Reviewer #3:

      The authors report results of an MEG analysis deploying a cognitive paradigm in which participants engage in a source memory task characterized by the appearance of three images in succession and are then tested via a cue (the first of the three images) followed by a choice of responses for a two dimensional pattern and then a choice (out of three images) of a photographic scene.

      The principal finding is that (via MEG sensor level data) there is a widespread 8-15 Hz power decrease that is correlated with the number of recalled items (from 0 to 2) on a given trial. In the hippocampus (via MEG source reconstruction), the magnitude of phase amplitude coupling observed as participants are told to associate the items is correlated with memory performance. The 8-15 Hz power decrease/memory correlation (as estimated by beta coefficients in a model described in Figure 1) is larger (across individuals) during moments when subjects are viewing the stimulus items as opposed to during the "associate" period. The novelty in the result is related to the experimental task that attempts to dissociate memory-related effects related to perception from those related to binding which putatively occurs when subjects are given the "associate" instruction.

      My main conceptual concern is related to the design of the experimental task. I am not sure that the perception/binding framing is appropriate, since there is no reason to think that subjects are not associating/binding items during the periods when the items are being shown on the screen. I suppose this may partly explain the lack of a significant difference in PAC/memory beta coefficients observed in the hippocampus when contrasting these two epochs (Figure 4). But the corollary is that the alpha power-related beta coefficients are observed while binding is likely also occurring within the paradigm (esp since each image is shown for 1.5 seconds it would seem). Is the alpha power effect seen in the hippocampus? The plots in 3a suggest there is an oscillation present in the relevant frequency range, and the time course of alpha power differences seen in Figure 2 suggests that they occur relatively late after onset of the images, which may fit better with some contribution for this pattern to the forming of associations rather than perception.

      Response to comments on task: We agree that the task does not unequivocally separate the two cognitive tasks, and any statement to suggest that the does is erroneous. That said, we would argue that, on a balance of probability, there is likely to be more information processing going on during sequence perception relative to the associate cue. This is because the participant is still being exposed to rich stimuli during sequence presentation, while only being presented with a simple cue during the association phase. Similarly, there is likely to be more binding during the associate cue than during sequence presentation. This is because participants have greater cognitive resources available for binding during the associate cue relative to during sequence perception. Now, neither of these reasons are sufficient to argue that “association” does not occur during sequence perception. However, we feel that these reasons are sufficient to suggest we expect to see a shift in the balance of “association” between the sequence perception and the binding window, where “association” is more easily executed during the binding window. Indeed, we feel it would be difficult to argue that there is no shift in the balance between these processes at any point. Importantly, linking such a shift in balance between the two processes (binding/perception) with neurophysiological correlates (alpha-beta/theta-gamma) is sufficient for our main conclusion. As such, we feel a careful rephrasing can address these concerns, where portions of the text referring to a separation of perception and binding are rephrased as a “shift in the balance in perception and binding” – the latter phrasing allows for the possibility that there is some small mixing of the two tasks.

      Action to comments on task: We will carefully rephrase the manuscript such that the text does not suggest that perception and binding are perfectly separated, but rather that the balance between the two processes shift during the encoding task.

      Response to comments on hippocampal alpha: We agree that there appears to be an alpha peak in the hippocampus, but as this plot is across all trials, it remains unclear whether this alpha oscillation is linked to memory. This is, of course, something we can investigate in revisions.

      Action: We will investigate whether hippocampal alpha power demonstrates a memory-related effect during perception and/or binding.

      I understand that the paradigm was constructed in an attempt to temporally dissociate memory effects attributable to perception versus those attributable to binding. But given the temporal resolution available using EEG, I would imagine that the authors could differentiate an earlier perception-related effect from a later PAC binding effect in the time series if the associated images were presented in conjunction. Is it correct to frame the alpha results as related to "perception?" The beta coefficients used for analysis reflect a "memory related effect observed when visual stimuli are present on the screen," but not necessarily improved memory predicated on more accurate perception to my interpretation. I would think that a perception/binding distinction requires operationalizing perception as activity that doesn't vary with later associative memory success, and binding as activity that does. The notion of perception used by the authors here seems slightly different. The authors can perhaps comment on this concern.

      Response: This is a very interesting point. A hallmark of visual perception is a reduction in alpha/beta power (e.g. Pfurtscheller et al., 1994, Int. J. Psychophysiology), regardless of whether it is remembered or not. As such, we would expect alpha/beta power to decrease following stimulus onset even if a memory is not formed. This could be directly tested by examining the stimulus-evoked power decrease in all conditions, with the expectation that alpha/beta power drops from baseline in all conditions.

      Action: We will contrast of pre-stimulus and post-stimulus power investigate whether alpha/beta power decreases accompany visual perception regardless of successful memory encoding.

      The authors report PAC results for other regions on page 6, but claiming that PAC is a hippocampal-specific effect would require showing that the PAC-related beta coefficients are significantly greater than the other regions, rather than simply the absence of a significant effect in these regions. The authors should also clarify if they combined locally measured PAC over several ROIs into an average for these other regions? It seems unlikely to detect PAC if a single theta/gamma time series were extracted over such a large area of cortex.

      Response: We agree with the principle that the PAC results should be probed further, though would argue against the use of inter-region contrasts here as they will not provide evidence that PAC is specific to a single region. Take, for example, an effect where there is a significant memory-related increase in PAC in region A, but there is a significantly larger memory-related increase in region B. In a direct contrast, PAC in B will be significantly greater than A, but clearly PAC is not specific to B. Therefore, an inter-region contrast is not a means to irrefutably demonstrate regional specificity. While there has been a call for direct comparisons between experimental contrasts (see Nieuwenhuis et al., 2011), this is specifically for cases where individuals wish to make the claim that “A is significantly greater than B”, which was a claim that we never made here. Rather, we asked whether there is a memory-related difference in PAC within the hippocampus, and then followed this up by confirming that this effect was not a “bleed-in” from PAC in another neighbouring region (i.e. the cortical ROI analyses; where the absence of a significant difference would suggest that memory-related hippocampal PAC is not attributable to memory-related PAC in another region). We will, however, better visualise the PAC results to further rule out the risk of a “bleed-in” effect (see response to Reviewer 1, point 3).

      Action: We will visualise PAC across the cortex.

      Response to ROI-based contrasts: We had originally collapsed PAC measures over the ROI for the sake of simplicity, but the reviewer makes a good point for a more focal analysis.

      Action for ROI-based contrasts: We will run a voxel-wise analysis of PAC to compliment the ROI-based approach

      The interaction effect reported at the end of the results (ANOVA model) is interpreted such that the cortical alpha effect is stronger when the visual items are presented, while the hippocampal PAC effect is stronger when no items appear on the screen, but these recordings are made in different regions (hippocampus versus the entire cortex). If my understanding is correct, a result in line with the model the authors suggest (cortical alpha power decrease/hippocampal PAC) would show a region (hipp v cortex) x task (images on screen vs "associate" command) x metric (PAC vs alpha) interaction. Can the authors clarify if the cortical data entered into this model includes only those regions that showed a significant effect initially, or just all the sensors? The former would seem to introduce bias.

      Response: We had originally collapsed metric and region into a single factor (hippocampal PAC vs. cortical alpha), but the reviewer makes a very good point here – a better way to probe this interaction via a 3-factor ANOVA (using “region”, “epoch” and “metric”).

      Action: We will revise the ANOVA in such a way that we can probe a three-way interaction (location vs. time vs. measure).

      Similarly, the different visual classes are always presented in the same order, which may give rise to the strong disparity in recall fraction between the pattern and scene images. I understand the linear model incorporates predictor variables for scene/pattern recall, but given that scene recall is driving a significant amount of the overall recall number observed as the main variable of interest, I would wonder if the alpha/beta power effects are related to the relative complexity of the scene images as compared to the patterns. Given the analysis schematic the authors report, I assume the authors have analyzed whether the same effects occur when contrasting scene versus no recollection and pattern vs no recollection. If the same effects are observed regardless of type of image (when compared with no recollection) this may help address this concern.

      Action: We will include supplementary analyses that ask whether alpha/beta power decreases vary as a function of stimulus type.

      Additional note: the scene and pattern stimuli were not always presented in the same order, but rather counterbalanced across blocks to avoid order effects.

      My second conceptual question is related to MEG data. It appears to me that the authors use MEG sensor-level data for the alpha-related effect in the cortex (Figure 2), but MEG beamformer reconstructed data (localized to the hippocampus) for the PAC effect. Is there a reason the authors did not use MEG data localized to specific cortical regions rather than sensor data? This may reflect confusion on my part, but I don't understand why they would use qualitatively different types of data for these two aspects of the analysis that are then combined (in the ANOVA, for example).

      Response to questions on source-reconstructed alpha power: We had not included source-reconstructed analysis of the alpha power effect here because, in an earlier draft, extensive analysis (e.g. the reporting of both sensor-level and source-reconstructed alpha power effects) drew criticism from reviewers for a lack of conciseness. That said, as such analyses have already been conducted, it is relatively easy to add these back in.

      Action: We will include source-reconstructed alpha-band effects.

      The authors should also engage with concerns regarding the validity of localizing MEG signals (especially for an analysis such as PAC) to deep mesial temporal structures such as the hippocampus. I understand that MEG systems with greater than 300 sensors are more reliable for this purpose, but I think a number of readers would still have doubts about MTL localization of signal. Also, my understanding is that such deep source localization requires around 100 trials per class, which I think fits with what the subjects completed, but the authors may include references related to this issue.

      Response: In recent years, there has been a growing list of studies that have reported successful localisation of hippocampal signals using MEG (for review of 37 of these studies, see Ruzich et al., 2019, Human Brain Mapping). Generally speaking, our experimental paradigm and analysis pipeline show large overlap with these previous successes (e.g. use of beamformers, gradiometers, co-registered MRI-to-MEG head position), meaning our results are not completely out of line with what could be expected. Nonetheless, it would be beneficial to explicit state this in the manuscript.

      Action: We will explicitly address the historic difficulties of localising hippocampal MEG signals, and highlight how our approach fits with a growing consensus on how to successfully localise such signals (e.g. Ruzich et al., 2019, Human Brain Mapping).

      I think the signal processing steps are overall quite reasonable. I would ask the authors to clarify if they limited their analysis of cortical alpha/beta oscillations to those in which a peak exceeded the 1/f background, as they report for the PAC analysis on page 5. Also, it would be helpful to show that the magnitude of the MI values in the hippocampus exceed those observed by chance (using a shuffle procedure) in addition to showing that there is a memory-related association reflected in the beta coefficients.

      Response: We had not limited the analysis to peak alpha/beta oscillations in the original submission, but have no qualms about doing so – indeed, such an analytical approach may better substantiate the claim that we are probing oscillatory activity as opposed to non-oscillatory fluctuations.

      Action: We will restrict alpha/beta power analysis to the peak oscillation. We will add supplementary analysis contrasting measures of hippocampal PAC to a shuffled baseline.

    1. Author Response

      Reviewer #1:

      The paper has potential. It's not there yet.

      The paper presents a sequencing study describing the evolution of Spiroplasma over various years in lab cultures. Spiroplasma is a fascinating bacteria that induces some unique phenotypes including enhancing insect immunity or "protection" and male-killing. The premise for the study was that sometimes these phenotypes disappear in cultures and thus the bacteria is likely quickly evolving and subject to frequent mutation. The researchers sequence various cultures of Spiroplasma (sHy and sMel), assemble and annotate genomes, compare the genomes, quantify the rates of evolution and compare these rates to some other studies on viruses, human microbiota/pathogens, and wolbachia. They find that Spiroplasma evolve real fast and speculate that the mechanism for this is a lack of various Mut repair enzymes. They look at fast evolving proteins of interest including RIP toxins which kill nematodes and spaid which is an inducer of male killing. So essentially the big result here is that Spiroplasma evolves real fast.

      In my opinion the paper is weak in a few senses. It doesn't reflect hypothesis driven science. It's mostly observational data and the researchers do not test any hypotheses. Now I don't think this is a deal breaker, but I do think it weakens the paper. Also, my comment should not imply that there isn't valuable data herein; and in fact I think the other big weakness is that the researchers do NOT exploit the true value of the data to derive and test novel hypotheses.

      We respectfully disagree with the reviewer’s opinion that hypothesis driven papers are generally ‘stronger’ than observational studies. Arguably, valuable insights can be derived from both types of studies, and this has been discussed in depth elsewhere (e.g., https://doi.org/10.1186/s13059-020-02133-w). However, we did have a hypothesis when we designed this study, and it was based on previous reports that novel phenotypes occur commonly in Spiroplasma in lab culture. We hypothesised that molecular evolution of Spiroplasma is likely also very fast. We further conclude with novel hypotheses on the evolutionary ecology of Spiroplasma poulsonii.

      For example: one aspect I was most excited about was to see how the researchers dissect and annotate evolutionary differences induced by axenic culture systems. The authors have the ability to compare and contrast genomes of Spiroplasma cultured in host insects AND Spiroplasma cultured without insects in axenic culture. Within these genome comparisons are likely novel insights that could shed light on mechanisms of maternal transmission and mechanisms of cell invasion etc... However, I was shocked to see that there is no in-depth analysis of specific proteins that are changing and evolving in these two diverse culture systems. I thought the analysis was entirely insufficient and didn't extract or present the real value of the datasets here. There are some brief mentions in the discussion of adherin binding proteins, but that was essentially it. I think the researchers focused too much on the past, ( the RIP toxins and spaid) rather than pointing out new interesting genes and hypotheses about them.

      For example: Maternal transmission would no longer be required in axenic culture, what genes got mutated? This is perhaps the most interesting thing that is not even touched upon.

      So essentially my main criticism is the added value from this paper which is the potential ability to compare symbiont genomes in hosts to symbionts with Axenic culture was NOT exploited. Given the novelty and impact of the axenic culture studies by Bruno, I would have hoped to see this upfront.

      We agree in general that our dataset presents the opportunity to compare evolution of the symbiont in axenic culture and in the host. However, any potential interpretation of evolution in axenic culture vs. in-host is hampered by the fact that we were comparing two different strains of Spiroplasma. With a sample size of 1 each, any conclusions on evolution in axenic culture vs. in-host would have been speculative.

      Additionally, we did not find notable differences in evolutionary rates or affected proteins between the two strains. From the first version of our paper:

      “The changes in sMel over ~2.5 years in culture affected only 15 different CDS in total, of which four were ARPs, and three lipoproteins”

      –which is overall very similar to the changes observed in sHy (Fig. 3). We concluded that the same genes are likely to evolve in axenic culture and in the host. We have made this clearer now in the manuscript:

      “The changes in sMel over ~2.5 years in culture affected only 15 different CDS in total, of which four were ARPs, and three lipoproteins. [New version:] Thus, the rates and patterns of evolutionary change are similar between the axenically cultured sMel and the host associated sHy.“

      Also there are some paragraphs comparing broad genomic differences between sHy and sMel, but I didn't think the differences in how these genomes evolved over time in comparison to their earlier selves was emphasized or explained in enough detail.

      We summarise the main patterns of change over time in sMel and sHy in the results and discussion sections, in Figure 3, and further list all detected changes from both strains in Supplementary table S2. We thus feel that the level of detail is appropriate here, especially given the length of the overall manuscript.

      Another example of not exploiting the value of the data: The plasmids are usually where much of the action is in microbes. There should be detailed annotations and figures of the plasmids. Tell me what is on them. Tell me which genes are evolving. Tell me if there are operons. Tell me what pathways are in the plasmids. I found the discussions of plasmid results wholly lacking. I also inherently felt that discussions of plasmids should be kept completely separate from discussions of chromosome evolution, regardless of similar rates of evolution or not... Plasmids are unique selfish entities and I imagine their evolution is wholly distinct from the evolution of chromosomes. They deserve their own sections and figures (in my opinion).

      There is a figure comparing plasmid synteny and gene content across the investigated strains in the supplementary material. Notable loci are highlighted, and again, the majority of genes are uncharacterised.

      The figure legends are completely insufficient and they ask me to read other papers to understand them, which is annoying.

      We apologise for this oversight and have now provided more comprehensive legends for all figures.

      Other minor comments:

      What about presence/absence of recA?

      recA is truncated in sMel by a previous stop codon, as discussed in detail in Paredes et al. (https://doi.org/10.1128/mBio.02437-14). recA appears to be complete and potentially functional in sHy, which supports Paredes et al’s inference that the truncation in sMel may be relatively recent (prior to the split of sMel and sHy). The new version of the manuscript now includes this detail:

      “Further, while recA is truncated in sMel, the copy in sHy appears complete and functional. As suggested by Paredes et al. (2015), the loss of recA function in sMel is therefore likely very recent.”

      There are differences in dna extraction prior to genome sequencing for each of the strains. I suspect this is because different individuals sequenced different genomes. But I worry that different protocols could produce different results and therefore a comparison might be tainted by dna extraction and library prep specifics. Can you at least explain to the reader why this is not an issue, if it is not an issue?

      DNA extraction procedures differed because they were done in different laboratories. All DNA extractions were based on phenol-chloroform, and all Spiroplasma extractions were based on isolating fly hemolymph. Any differences in protocols are minor, and mentioned mainly for reasons of reproducibility. We do not see any reason why this would affect genome reconstruction of a single bacterial isolate. Several studies suggest that the impact of DNA extraction and library preparation is negligible for assemblies and calling SNPs (e.g., https://doi.org/10.1016/j.heliyon.2019.e02745; https://doi.org/10.1038/s41598-020-71207-3).

      Examples:

      181 - why were heads removed? Why was this dna extraction protocol here different from the hemolymph extraction protocol? Might this have changed anything?

      Please see the comment on DNA extraction above. Head removal is often used when enrichment of symbiont DNA in whole fly extracts is desired.

      195 - how much heterogeneity do you expect in any given fly. Do you have SNP data differences amongst good reads that could point out different alleles within a Spiroplasma population within an individual fly? It would be interesting to know which genes have a large amount of different alleles.

      As described in the methods section, we always pooled hemolymph from multiple fly individuals in order to extract sufficient DNA for genome sequencing, so we cannot say anything about the genetic heterogeneity of Spiroplasma populations in any single fly individual. The levels of heterozygosity in the pooled extracts were however very low: Out of all variants called with more than 10x coverage in sHy-Liv18B and sHy-TX12 strains, 98% and 95% were unanimously supported by all mapping reads, respectively. Only 0.8% and 1% of variants had 5% or more reads supporting an alternative allele, respectively. No alternative allele was supported by more than 18% and 11% of reads, respectively.

      199 - another DNA extraction protocol. There isn't consistency here. If the reads and coverage are good enough, it shouldn't be a problem. But if there were data issues or assembly issues, this would raise concern in my mind. Can the researchers discuss or alleviate concerns here? Some assemblies have 6 chromosomes, some have 3 chromosomes. I presume these were different strains of Spiroplasma and not the same one?

      Please see the comment on DNA extraction above. As described in the methods section, we obtained long reads and short reads from the same DNA extract. Depending on the reads and algorithms employed, we created assemblies that differed in number of contigs. This is not unusual or unexpected (e.g., http://doi.org/10.1099/mgen.0.000132). A consensus was created by using a long read assembly and correcting it with contigs from a hybrid assembly, and subsequently, with Illumina reads. We feel that this was a good approach to ensure a contiguous, but accurate assembly.

      Figure 1: were the samples that are 6 years apart (red) sequence in exactly the same way with the same technology? Could this produce any relics? Also, why display information for sMel in a table and information for sHy in a figure? Can't you creatively standardize a visual means of showing this information and compile information to one item?

      Please see the comment on DNA extraction above. We have taken up the suggestion of the reviewer and created a single figure to display sampling for both strains.

      I wonder what would happen if you took the same sample and did different DNA extraction protocols, different library prep protocols, and different illumina rounds of sequencing and independent algorithm assemblies... how much would they come out the same? Has anyone ever done this experiment? Is there any reference for this control that shows they would in fact come out the same? This is essentially what I am worried about here. This could be a minor issue, if the researchers could just confidently explain why this is NOT an issue.

      Please see the comment on DNA extraction above.

      Line 30 - you introduce sHy and sMel without defining what they are yet? Clarify immediately that they are both S.poulsoni

      This was clearly stated in line 29 of our manuscript.

      line 247 - They found fragmented genes with orthofinder, if it was less than 60% length homology... why set an arbitrary cutoff of 60? Anything less than 100 is possibly a pseudogenization if the last amino acid is important, or the C-terminus is important, which it often is... What is the rationale here?

      We agree with the reviewer that this is a relatively crude measure of pseudogenization that likely results in missing several candidate pseudogenes. Because it is usually impossible to functionally characterise all loci of a bacterial genome, truncation is often used as an indication that genes may have lost their functions (https://doi.org/10.1093/nar/gki631). This limitation was acknowledged in the first version of the manuscript: “Both sMel and sHy have a number of missing or truncated (i.e., potentially pseudogenized) genes when compared with each other”.

      To quantify an evolutionary rate, I read that they counted the number of changes in 3rd codon wobble positions/year. Why just wobble codons... why not all SNPs period? But then in the figure 2, it seemed like they are tallying a percentage of a total 100% = 570 "variants" or changes in the sequences (I wouldn't use the word variants, as this makes me think of strains; better to say "changes", no?). These changes include snps, insertions, deletions, and "complex"... no idea what complex is? The figure legends are completely insufficient. And I still don't know if you are tallying in some kind of number of recombinations and psuedogenizations into the mix (I assume these are included in the frame-shifts)? The quantification is murky to me.

      We used third codon positions mainly to facilitate comparison with other studies; e.g., the Richardson et. al analysis of Wolbachia evolutionary rates (https://doi.org/10.1371/journal.pgen.1003129). It is however common to only use mostly neutrally evolving sites to determine evolutionary rates in order to avoid differences arising from adaptive processes.

      The figures the reviewer is referring to aim to convey different types of information: Figure 2 displays the evolutionary rate estimates from neutral sites in comparison to other symbionts and pathogens. Figure 3 in contrast displays all changes we observed in a single strain of Spiroplasma.

      The adhesin proteins are evolving fast. But aren't Spiroplasma commonly intracellular... so why would it be binding an extracellular protein? ... can you discuss this? I presume invasion or something?

      Drosophila-associated Spiroplasma are mostly extracellular, although they experience an intracellular phase during vertical transmission when they infect oocytes. We know that in other Spiroplasma species, adhesins are involved in insect cell invasion (https://doi.org/10.3389/fcimb.2017.00013, https://doi.org/10.1371/journal.pone.0048606) and we have now clarified this in the discussion:

      “For example, adhesion-related proteins are important in cell invasion in other Spiroplasma species (Béven et al., 2012; Dubrana et al., 2016; Hou et al., 2017) and are enriched for evolutionary changes in sHy and sMel (Fig. 2).”

      There might be a correlation with genome size and speed of evolution. You mention this in the discussion, but briefly. Can you elaborate on this, especially because Spiroplasmas are close to mycoplasmas which are REALLY small genomes.

      There is some novel evidence that prokaryotic genome size is strongly correlated with mutational rate (https://doi.org/10.1016/j.cub.2020.07.034), rather than mostly determined by effective population size as previously suggested. This novel study also found that increased mutation rates often occur in lineages that have lost DNA repair genes, which is in line with our findings. Comparing evolutionary rates (Fig. 1) with genome sizes and the presence of DNA repair genes reveals that correlation is not straightforward for the endosymbiotic lineages we compared. For example, Wolbachia and Buchnera appear to have lower substitution rates than Spiroplasma, yet both have ~similar genome sizes (Wolbachia) or smaller genomes (Buchnera) than Spiroplasma poulsonii. We have included the discussion on mutational rates determining genome size as follows:

      “Further to absence of DNA repair genes causing elevated mutation rates, a recent comparative study demonstrated a strong negative correlation between mutation rate and genome size in free living and endosymbiotic bacteria (Bourguignon et al., 2020). This correlation is however not apparent in the genomes of endosymbionts we have investigated. For example, the considerably slower evolving Buchnera genomes are much smaller than Spiroplasma, and Wolbachia would be predicted to have much larger genomes if their size was mainly determined by mutational rates. This suggests that mutational rates alone are a poor predictor for the sizes of the here investigated genomes. Likely, these genome sizes result from an interplay of multiple factors such as population size, patterns of DNA repair gene absence, and mutational rates (Kuo et al., 2009; Marais et al., 2020).”

      We have further moved supplementary Figure S5 into the main manuscript body (now Fig. 7) to better enable the readers to follow the discussion on the lack of DNA repair genes.

      Figure 3 is really confusing. I assume FS is frameshift, is IF induced fragmentation? After about 10 minutes I could decode it. Is this really the best way to think about these results? Perhaps? But perhaps not? ARP? I think it's adhesin stuff, but you don't say this until later.

      We have revised and clarified all figure legends. Please also see the comment above.

      Reviewer #2:

      General assessment:

      This work utilizes two Spiroplasma populations as the materials to study the substitution rates of symbiotic bacteria. A major finding is that these symbionts have rates that are ~2-3 orders higher than other bacteria with similar ecological niches (i.e., insect symbionts), and these substitution rates are comparable to the highest rates reported for bacteria and the lowest rate reported for RNA virus. Based on these findings, the authors discussed how this knowledge could be used to infer and to understand symbiont evolution. The biological materials used (i.e., symbionts maintained in fly hosts for 10 years and cultivated outside of the host for > 2 years) are valuable, the technical aspects are challenging, and the answers obtained are certainly interesting. The key concern is the limited sampling of other bacteria for comparison to derive the conclusions.

      Major comments:

      1) The key concern regarding sampling involves several points. (a) The two populations represent the species Spiroplasma poulsonii. Is this species a good representative for the genus? Or is it an exception because it is a vertically inherited male-killer? Most of the characterized Spiroplasma species appear to be commensals and are not vertically inherited. (b) The other species with a comparable rate is Mycoplasma gallisepticum (i.e. a chicken pathogen that spreads both horizontally and vertically). Mycoplasma is a polyphyletic genus with three major clades. While closely related to Spiroplasma, their hosts and ecology are quite different. Do all three groups of Mycoplasma have such high rates? If so, are the high rates simply a shared trait of these Mollicutes and has nothing to do with the distinct biology of S. poulsonii? How about other Mollicutes (e.g., Acholeplasma and phytoplasmas). (c) The group "human pathogens" in Fig. 2 show rates spreading across four orders of magnitude. This is too vague. How many species are included in this group? Are their rates linked to their phylogenetic affiliations? (d) Did Fig. 2 provide comprehensive sampling of bacteria? How about DNA viruses? Michael Lynch has done extensive works on mutation rates (e.g., DOI: 10.1038/nrg.2016.104), some of those should be integrated and discussed.

      (a) We agree that it is difficult to draw general conclusions of evolutionary rates in the genus Spiroplasma from looking at only 2 strains from the same species, and therefore we have not attempted to do so. We also agree that population bottlenecks at vertical transmission events may be a main reason for the elevated substitution rates. In the first version of the manuscript (first section of the discussion), we have therefore focussed our comparisons on Bacteria with similar ecology for which evolutionary rate estimates are available (Wolbachia, Buchnera, Blochmannia).

      (b) As far as we are aware, there is some anecdotal evidence that mycoplasmas evolve quickly (https://link.springer.com/article/10.1007/BF02115648) as well as one study estimating evolutionary rates from genome-wide data of multiple M. gallisepticum isolates (https://doi.org/10.1371/journal.pgen.1002511). We are unaware of systematic studies estimating evolutionary rates in other mollicutes, and we feel it is beyond the scope of this article to provide such a systematic assessment. However, we do agree that loss of DNA repair genes and elevated substitution rates in M. gallisepticum and S. poulsonii could also have occurred independently and have now clarified this in the manuscript: “Absence of DNA mismatch repair pathway may thus be ancestral to Entomoplasmatales (Spiroplasmatacea + Entomoplasmataceae) and contribute to the dynamic genome evolution across this taxon (Lo et al., 2016; Rocha and Blanchard, 2002). [New version:] Alternatively, increased substitutional rates caused by the loss of these loci could have arisen multiple times independently in Entomoplasmatales. ”

      (c) We have now provided a more comprehensive figure legend that clarifies that the estimate was obtained from 16 different human pathogens. The range provided covers almost the entire mutational spectrum in Bacteria (https://doi.org/10.1099/mgen.0.000094).

      (d) Please see the comment under (c). We have now also included an estimate for DNA viruses in Fig. 2.

      2) This study is based on two lab-maintained populations. How may the results differ from natural populations? I understand that no estimate may be available for natural populations and additional experiments may not be feasible, but at least a more in-depth discussion should be provided.

      We have expanded the discussion on this matter:

      “Our rate estimate is potentially biased by at least two factors. First, we have only investigated laboratory populations of Spiroplasma poulsonii. Each vertical transmission event creates symbiont population bottlenecks potentially increasing genetic drift and thus substitution rates. Because the number of generations in natural populations of the Spiroplasma host Drosophila hydei is lower compared with laboratory reared hosts, vertical transmission events are rarer under natural conditions, and substitution rates therefore potentially lower. Further, laboratory strains could experience relaxed selection compared with natural symbiont populations. This may lead to higher substitution rate estimates from laboratory populations compared with natural populations. Secondly, substitution rates often appear larger when estimated over brief time periods (Ho et al., 2005).”

      3) The authors use adaptation as a key explanation for several of the findings. Stronger support and alternative explanations are needed. For example, why genome degradation may be used as a proxy for host adaptation (line 497)? If this explanation works only for sHy but not the other strain within the same species (i.e., sNeo), is this still a good explanation? Similarly, for the arguments made in lines 524-528, supporting evidence should be presented in the Results. For example, what are the rate distribution of all genes? Do those putative adaptation genes have statistically higher rates and/or signs of positive selection?

      We agree with the reviewer in that we have no direct evidence for adaptation as explanation for the genomic architecture of sHy. We have therefore carefully revised the manuscript to make clear that adaptation is a potential explanation. The key paragraph now reads:

      “Using signatures of genomic degradation as a proxy, our findings collectively suggest that sHy is in a more advanced stage of host restriction than sMel. This may indicate host adaptation as a result of the fitness benefits associated with sHy under parasitoid pressure, and the absence of detectable costs for carrying sHy in Drosophila hydei (Osaka et al., 2013; Jialei Xie et al., 2014; Xie et al., 2010). However, the Spiroplasma symbiont of Drosophila neotestacea sNeo is also protective, does not cause obvious fitness costs (Jaenike et al., 2010), but has a less reduced genome (Fig.5, Ballinger and Perlman, 2017). Further, it is also possible that genome reduction in sHy was mainly driven by stochastic effects or even by adaptation to laboratory conditions, as we have not investigated contemporary sHy from wild D. hydei populations.”

      4) The chromosome and plasmids have very different rates (lines 315-316). Since this study aims to compare across different bacteria, perhaps the analysis should be limited to chromosomes for all bacteria.

      We have only used chromosomal variants for the rate estimates. From the results section of the first version of the manuscript: “To estimate rates of molecular evolution in Spiroplasma poulsonii, we measured chromosome-wide changes in coding sequences of Spiroplasma from fly hosts (sHy) and axenic culture (sMel) over time.“ We now also mention this information in the figure legend for Fig. 2.

      5) Formal statistical tests should be performed to test the stated correlations (e.g., lines 360-361, genome size and the number of insertion sequences).

      As suggested, we have calculated Pearson’s correlation coefficients, which confirm the observation that Spiroplasma genome size is correlated with the number of predicted IS elements and proportion of predicted prophage regions (new supplementary file Fig. S4).

      6) Fig. 5. The differences in CDS length distribution should be investigated and discussed in more details. The authors stated that they have re-annotated all genomes using the same pipeline, so this finding cannot be attributed to the bioinformatic tools. If these findings are true (rather than annotation artifacts), it is quite interesting. How to explain these? Why is Sm KC3 so different from all others?

      There are several potential explanations for the differences in CDS length: 1) The skew towards very short predicted CDS is most pronounced in draft assemblies with relatively many contigs. We therefore think that assembly breaks have resulted in an artificially high number of short CDS by introducing splits mid-CDS. For example, in the Poulsonii clade, the sNeo assembly is composed of 181 contigs. This likely explains the higher proportion of very short CDS when compared with sMel and sNeo. 2) An excess of short CDS could also indicate many truncated genes that have become pseudogenised. We would therefore expect shorter median CDS lengths in genomes that undergo reduction. In Fig 5, the differences in CDS lengths within the Mirum group may be explained this way: in comparison with S. eriocheiris, CDS lengths are shorter for S. mirum and S. atrichopogonis. The latter 2 genomes also have a lower coding density and genome size, which may indicate recent genomic reduction. 3) Prophage regions are often characterised by shorter CDS, so genomes with overall higher proportions of prophage are expected to harbour a higher amount of smaller CDS. We have added the following statement to the manuscript:

      “The distribution of CDS sequence lengths varies across the investigated genomes (Fig. 5), which may be explained by differences in proportion of prophage regions, level of pseudogenization, and assembly quality.”

      7) Lines 467-479. Multiple lineages have purged the prophages is an interesting hypothesis and may be important in furthering our understanding of these bacteria. More detailed info (e.g., syntenic regions of prophage sites across different species) should be provided in the Results to support the claim. Perhaps the sampling should be expanded to include the Apis clade (i.e., the clade with the highest number of described species within the genus) to test if the prophage invasion occurred even earlier or independently in multiple lineages. Additionally, CRISPR/Cas systems are known to have variable presence across Spiroplasma species (DOI: 10.3389/fmicb.2019.02701). How does this correspond to prophage distribution/abundance?

      For sMel, none of the prophage regions predicted with PHASTER show clear synteny over the majority of their length in sHy, which makes synteny comparison (including across even more distantly related strains) difficult. CRISPR-Cas systems are entirely absent in Citri and Poulsonii clades, so are unlikely to be responsible for differences in prophage proportions between sMel and sHy. For the revised version of the manuscript, we have performed two additional analyses focussing on prophages and CRISPR/Cas in Spiroplasma, and have expanded the sampling to the Apis clade, as suggested by the reviewer.

      Firstly, we have investigated the history of prophage-related loci across the Spiroplasma phylogeny. Gene tree - species tree reconciliations suggest that the number of prophage loci have expanded greatly in some of the lineages, especially in the Citri clade. Many of these expansions have happened relatively recently, and therefore most likely occurred independently in multiple lineages.

      Secondly, we have used two approaches to predict CRISPR/Cas systems and arrays. We found CRISPR/Cas systems, or their remnants only in the Apis clade, which coincides with the absence of prophage loci in most members of this clade. Based on Cas9 phylogeny, there were multiple origins and several losses of Cas9 systems in the Apis clade. Interestingly, in some taxa with reduced Cas9 systems (e.g., S. atrichopogonis and S. mirum), there are elevated numbers of phage loci which suggests that phage invasion in Spiroplasma is linked to the loss of antiviral systems, as has been suggested previously.

      Overall, these data are in line with Spiroplasma being susceptible to viral invasion when CRISPR/Cas is absent. Highly streamlined genomes in the absence of CRISPR/Cas might thus be explained by loss of prophage regions or by a lack of exposure to phages. We have revised the paragraph discussion prophage distribution:

      “It was therefore argued that phages have likely invaded Spiroplasma only after the split of the Syrphidicola and Citri+Poulsonii clades (Ku et al., 2013). Our prophage gene tree-species tree reconciliations are in line with this hypothesis, but also indicate that prophage proliferation has largely happened independently in different Spiroplasma lineages (Fig. S4, supplementary material). CRISPR/Cas systems have multiple origins in Spiroplasma (Ipoutcha et al., 2019) and only occur in strains lacking prophages (Fig. S4, supplementary material). While the absence of antiviral systems often coincides with prophage proliferation (e.g., in the Citri clade), several strains with compact, streamlined genomes lack CRISPR/Cas and prophages (e.g., TU-14, Fig. S4, supplementary material). These strains also show other hallmarks of reduced symbiont genomes (small size, high coding density, lack of plasmids and transposons, Fig. 5), which is in line with the model of genome reduction discussed above and suggests prophage regions were purged from these genomes. Alternatively, these strains may never have been exposed to phages.“

      Minor comments:

      1) Lines 32, 517, and possibly other parts: Use "increased" or "decreased" to describe the rate differences are inappropriate because these imply inferences of evolutionary events after divergence from the MRCA, which are clearly not the case. It would be more appropriate to use "higher" or "lower" to describe the difference.

      We agree and have revised the use of these terms. In the new version of the manuscript we only use ‘increase’ or ‘decrease’ ’when we refer to a change compared with MRCA.

      2) Lines 31-32. This is too vague. For the rates, the description should be more explicit (e.g., higher by X orders of magnitude). The term "symbiont" is also vague. Broadly speaking, all human pathogens (included in Fig. 2) or plant-associated bacteria could be considered as symbionts as well. Would be better to define this point more clearly.

      Corrected:

      “We observed that S. poulsonii substitution rates are among the highest reported for any bacteria, and around two orders of magnitude higher compared with other inherited arthropod endosymbionts.”

      3) Fig. 1. The alignment is off. For example, June should be located near the middle between two tick marks.

      The tick marks did not correspond to year boundaries. We recognise that this may be confusing and have adjusted the image for the new version of the manuscript.

      4) Line 207. This is confusing. There should not be 6 circular chromosomes.

      Corrected ‘chromosomes’ to ‘contigs’.

      5) Line 211. Why is the hybrid assembly more fragmented?

      The hybrid assembly algorithm used by Unicycler (https://doi.org/10.1371/journal.pcbi.1005595) first creates an assembly from the short reads and then uses long reads to span repeats and other questionable nodes in the assembly graph. We suspect that if the initial short read assembly is highly fragmented (such as is the case for S. poulsonii), even a large amount of high quality long reads cannot fully resolve the assembly graph. Our approach was therefore to use the complete long read assembly as starting point.

      6) Methods and Results. More detailed information regarding the sequencing and assembly should be provided. For example, how much raw reads were generated for each library? What are the mapping rates? How much variation in observed coverage across the genome?

      We now provide these details in the new Supplementary table S2.

      7) Lines 341-342. How to establish an expected level of synteny conservation?

      We have removed the reference to ‘expected’ levels of synteny.

      8) Line 487. I do not see how this statement could be supported by Fig. 5. Also "less pronounced" is vague.

      Corrected to

      “However, when using the similarity agnostic tool PhiSpy, the predicted prophage regions were similar in size between sHy and sMel (Fig. S2).”

  10. Nov 2020
    1. Author Response

      Summary: A major tenet of plant pathogen effector biology has been that effectors from very different pathogens converge on a small number of host targets with central roles in plant immunity. The current work reports that effectors from two very different pathogens, an insect and an oomycete, interact with the same plant protein, SIZ1, previously shown to have a role in plant immunity. Unfortunately, apart from some technical concerns regarding the strength of the data that the effectors and SIZ1 interact in plants, a major limitation of the work is that it is not demonstrated that the effectors alter SIZ1 activity in a meaningful way, nor that SIZ1 is specifically required for action of the effects.

      We thank the editor and reviewers for their time to review our manuscript and their helpful and constructive comments. The reviews have helped us focus our attention on additional experiments to test the hypothesis that effectors Mp64 (from an aphid) and CRN83-152 (from an oomycete) indeed alter SIZ1 activity or function. We have revised our manuscript and added the following data:

      1) Mp64, but not CRN83-152, stabilizes SIZ1 in planta. (Figure 1 in the revised manuscript).

      2) AtSIZ1 ectopic expression in Nicotiana benthamiana triggers cell death from 3-4 days after agroinfiltration. Interestingly CRN83-152_6D10 (a mutant of CRN83-152 that has no cell death activity), but not Mp64, enhances the cell death triggered by AtSIZ1 (Figure 2 in the revised manuscript).

      For 1) we have added the following panel to Figure 1 as well as three biological replicates of the stabilisation assays in the Supplementary data (Fig S3):

      Figure 1 panel C. Stabilisation of SIZ1 by Mp64. Western blot analyses of protein extracts from agroinfiltrated leaves expressing combinations of GFP-GUS, GFP Mp64 and GFP-CRN83_152_6D10 with AtSIZ1-myc or NbSIZ1-myc. Protein size markers are indicated in kD, and equal protein amounts upon transfer is shown upon ponceau staining (PS) of membranes. Blot is representative of three biological replicates , which are all shown in supplementary Fig. S3. The selected panels shown here are cropped from Rep 1 in supplementary Fig. S3.

      For 2) we have added the folllowing new figure (Fig. 2 in the revised manuscript):

      Fig. 2. SIZ1-triggered cell death in N. benthamiana is enhanced by CRN83_152_6D10 but not Mp64. (A) Scoring overview of infiltration sites for SIZ1 triggered cell death. Infiltration site were scored for no symptoms (score 0), chlorosis with localized cell death (score 1), less than 50% of the site showing visible cell death (score 2), more than 50% of the site showing cell death (score 3). (B) Bar graph showing the proportions of infiltration sites showing different levels of cell death upon expression of AtSIZ1, NbSIZ1 (both with a C-terminal RFP tag) and an RFP control. Graph represents data from a combination of 3 biological replicates of 11-12 infiltration sites per experiment (n=35). (C) Bar graph showing the proportions of infiltration sites showing different levels of cell death upon expression of SIZ1 (with C-terminal RFP tag) either alone or in combination with aphid effector Mp64 or Phytophthora capsica effector CRN83_152_6D10 (both effectors with GFP tag), or a GFP control. Graph represent data from a combination of 3 biological replicates of 11-12 infiltration sites per experiment (n=35).

      Our new data provide further evidence that SIZ1 function is affected by effectors Mp64 (aphid) and CRN83-152 (oomycete), and that SIZ1 likely is a vital virulence target. Our latest results also provide further support for distinct effector activities towards SIZ1 and its variants in other species. SIZ1 is a key immune regulator to biotic stresses (aphids, oomycetes, bacteria and nematodes), on which distinct virulence strategies seem to converge. The mechanism(s) underlying the stabilisation of SIZ1 by Mp64 is yet unclear. However, we hypothesize that increased stability of SIZ1, which functions as an E3 SUMO ligase, leads to increased SUMOylation activity towards its substrates. We surmise that SIZ1 complex formation with other key regulators of plant immunity may underpin these changes. Whether the cell death, triggered by AtSIZ1 upon transient expression in Nicotiana benthamiana, is linked to E3 SUMO ligase activity remains to be investigated. Expression of AtSIZ1 in a plant species other than Arabidopsis may lead to mistargeting of substrates, and subsequent activation of cell death. Dissecting the mechanistic basis of SIZ1 targeting by distinct pathogens and pests will be an important next step in addressing these hypotheses towards understanding plant immunity.

      Reviewer #1:

      In this manuscript, the authors suggest that SIZ1, an E3 SUMO ligase, is the target of both an aphid effector (Mp64 form M. persicae) and an oomycete effector (CRN83_152 from Phytophthora capsica), based on interaction between SIZ1 and the two effectors in yeast, co-IP from plant cells and colocalization in the nucleus of plant cells. To support their proposal, the authors investigate the effects of SIZ1 inactivation on resistance to aphids and oomycetes in Arabidopsis and N. benthamiana. Surprisingly, resistance is enhanced, which would suggest that the two effectors increase SIZ1 activity.

      Unfortunately, not only do we not learn how the effectors might alter SIZ1 activity, there is also no formal demonstration that the effects of the effectors are mediated by SIZ1, such as investigating the effects of Mp64 overexpression in a siz1 mutant. We note, however, that even this experiment might not be entirely conclusive, since SIZ1 is known to regulate many processes, including immunity. Specifically, siz1 mutants present autoimmune phenotype, and general activation of immunity might be sufficient to attenuate the enhanced aphid susceptibility seen in Mp64 overexpressers.

      To demonstrate unambiguously that SIZ1 is a bona fide target of Mp64 and CRN83_152 would require assays that demonstrate either enhanced SIZ1 accumulation or altered SIZ1 activity in the presence of Mp64 and CRN83_152.

      The enhanced resistance upon knock-down/out of SIZ1 suggests pathogen and pest susceptibility requires SIZ1. We hypothesize that the effectors either enhance SIZ1 activity or that the effectors alter SIZ1 specificity towards substrates rather than enzyme activity itself. To investigate how effectors coopt SIZ1 function would require a comprehensive set of approaches and will be part of our future work. While we agree that this aspect requires further investigation, we think the proposed experiments go beyond the scope of this study.

      After receiving reviewer comments, including on the quality of Figure 1, which shows western blots of co-immunoprecipitation experiments, we re-analyzed independent replicates of effector-SIZ1 coexpression/ co-immunoprecipitation experiments. The reviewer rightly pointed out that in the presence of Mp64, SIZ1 protein levels increase when compared to samples in which either the vector control or CRN83-152_6D10 are co-infiltrated. Through carefully designed experiments, we can now affirm that Mp64 co-expression leads to increased SIZ1 protein levels (Figure 1C and Supplementary Figure S3, revised manuscript). Our results offer both an explanation of different SIZ1 levels in the input samples (original submission, Figure 1A/B) as well as tantalizing new clues to the nature of distinct effector activities.

      Besides, we were able to confirm a previous preliminary finding not included in the original submission that ectopic expression of AtSIZ1 in Nicotiana benthamiana triggers cell death (3/4 days after infiltration) and that CRN83-152_6D10 (which itself does not trigger cell death) enhances this phenotype.

      We have considered overexpression of Mp64 in the siz1 mutant, but share the view that the outcome of such experiments will be far from conclusive.

      In summary, we have added new data that further support that SIZ1 is a bonafide target of Mp64 and CRN83-152 (i.e. increased accumulation of SIZ1 in the presence of Mp64, and enhanced SIZ cell death activation in the presence of CRN83-152_6D10).

      Reviewer #2:

      The study provides evidence that an aphid effector Mp64 and a Phytophthora capsici effector CRN83_152 can both interact with the SIZ1 E3 SUMO-ligase. The authors further show that overexpression of Mp64 in Arabidopsis can enhance susceptibility to aphids and that a loss-of-function mutation in Arabidopsis SIZ1 or silencing of SIZ1 in N. benthamiana plants lead to increased resistance to aphids and P. capsici. On siz1 plants the aphids show altered feeding patterns on phloem, suggestive of increased phloem resistance. While the finding is potentially interesting, the experiments are preliminary and the main conclusions are not supported by the data.

      Specific comments:

      The suggestion that SIZ1 is a virulence target is an overstatement. Preferable would be knockouts of effector genes in the aphid or oomycete, but even with transgenic overexpression approaches, there are no direct data that the biological function of the effectors requires SIZ1. For example, is SIZ1 required for the enhanced susceptibility to aphid infestation seen when Mp64 is overexpressed? Or does overexpression of SIZ1 enhance Mp64-mediated susceptibility?

      What do the effectors do to SIZ1? Do they alter SUMO-ligase activity? Or are perhaps the effectors SUMOylated by SIZ1, changing effector activity?

      We agree that having effector gene knock-outs in aphids and oomycetes would be ideal for dissecting effector mediated targeting of SIZ1. Unfortunately, there is no gene knock-out system established in Myzus persicae (our aphid of interest), and CAS9 mediated knock-out of genes in Phytophthora capsici has not been successful in our lab as yet, despite published reports. Moreover, repeated attempts to silence Mp64, other effector and non-effector coding genes, in aphids (both in planta and in vitro) have not been successful thus far, in our hands. As detailed in our response to Reviewer 1, we considered the use of transgenic approaches not appropriate as data interpretation would become muddied by the strong immunity phenotype seen in the siz1-2 mutant.

      As stated before, we hypothesize that the effectors either enhance SIZ1 activity or alter SIZ1 substrate specificity. Mp64-induced accumulation of SIZ1 could form the basis of an increase in overall SIZ1 activity. This hypothesis, however, requires testing. The same applies to the enhanced SIZ1 cell death activation in the presence of CRN83-152_6D10.

      Whilst our new data support our hypothesis that effectors Mp64 and CRN83-152 affect SIZ1 function, how exactly these effectors trigger susceptibility, requires significant work. Given the substantial effort needed and the research questions involved, we argue that findings emanating from such experiments warrant standalone publication.

      While stable transgenic Mp64 overexpressing lines in Arabidopsis showed increased susceptibility to aphids, transient overexpression of Mp64 in N. benthamiana plants did not affect P. capsici susceptibility. The authors conclude that while the aphid and P. capsici effectors both target SIZ1, their activities are distinct. However, not only is it difficult to compare transient expression experiments in N. benthamiana with stable transgenic Arabidopsis plants, but without knowing whether Mp64 has the same effects on SIZ1 in both systems, to claim a difference in activities remains speculative.

      We agree that we cannot compare effector activities between different plant species. We carefully considered every statement regarding results obtained on SIZ1 in Arabidopsis and Nicotiana benthamiana. We can, however, compare activities of the two effectors when expressed side by side in the same plant species. In our original submission, we show that expression of CRN83 152 but not Mp64 in Nicotiana benthamiana enhances susceptibility to Phytophthora capsici. In our revised manuscript, we present new data showing distinct effector activities towards SIZ1 with regards to 1) enhanced SIZ1 stability and 2) enhanced SIZ1 triggered cell death. These findings raise questions as to how enhanced SIZ1 stability and cell death activation is relevant to immunity. We aim to address these critical questions by addressing the mechanistic basis of effector-SIZ1 interactions.

      The authors emphasize that the increased resistance to aphids and P. capsici in siz1 mutants or SIZ1 silenced plants are independent of SA. This seems to contradict the evidence from the NahG experiments. In Fig. 5B, the effects of siz1 are suppressed by NahG, indicating that the resistance seen in siz1 plants is completely dependent on SA. In Fig 5A, the effects of siz1 are not completely suppressed by NahG, but greatly attenuated. It has been shown before that SIZ1 acts only partly through SNC1, and the results from the double mutant analyses might simply indicate redundancy, also for the combinations with eds1 and pad4 mutants.

      We emphasized that siz1-2 increased resistance to aphids is independent of SA, which is supported by our data (Figure 5A). Still, we did not conclude that the same applies to increased resistance to Phytophthora capsici (Figure 5B). In contrast, the siz1-2 enhanced resistance to P. capsici appears entirely dependent on SA levels, with the level of infection on the siz1-2/NahG mutants even slightly higher than on the NahG line and Col-0 plants. We exercise caution in the interpretation of this data given the significant impact SA signalling appears to have on Phytophthora capsici infection.

      The reviewer commented on the potential for functional redundancy in the siz1-2 double mutants. Unfortunately, we are unsure what redundancy s/he is referring to. SNC1, EDS1, and PAD4 all are components required for immunity, and their removal from the immune signalling network (using the mutations in the lines we used here) impairs immunity to various plant pathogens. The siz1-2 snc1-11, siz1-2 eds1-2, and siz1-2 pad4-1 double mutants have similar levels of susceptibility to the bacterial pathogen Pseudomonas syringae when compared to the corresponding snc1-11, eds1-2 and pad4-1 controls (at 22oC). These previous observations indicate that siz1 enhanced resistance is dependent on these signalling components (Hammoudi et al., 2018, Plos Genetics).

      In contrast to this, we observed a strong siz1 enhanced resistance phenotype in the absence of snc1- 11, eds1 2 and pad4-1. Notably, the siz1-2 snc1-11 mutant does not appear immuno-compromised when compared to siz1-2 in fecundity assays, indicating that the siz1-2 phenotype is independent of SNC1. In our view, these data suggest that signalling components/pathways other than those mediated by SNC1, EDS1, and PAD4 are involved. We consider this to be an exciting finding as our data points to an as of yet unknown SIZ1-dependent signalling pathway that governs immunity to aphids.

      How do NahG or Mp64 overexpression affect aphid phloem ingestion? Is it the opposite of the behavior on siz1 mutants?

      We have not performed further EPG experiments on additional transgenic lines used in the aphid assay. These experiments are quite challenging and time consuming. Moreover, accommodating an experimental set-up that allows us to compare multiple lines at the same time is not straightforward. Considering that NahG did not affect aphid performance (Figure 5A), we do not expect to see an effect on phloem ingestion.

    1. Author Response

      1) Please comment on why many of the June samples failed to provide sufficient sequence information, especially since not all of them had low yields (supp table 2 and supp figure 5).

      An extended paragraph about experimental intricacies of our study has been added to the Discussion. It has also been also slightly restructured to give a better and wider overview of how future freshwater monitoring studies using nanopore sequencing can be improved (page 18, lines 343-359).

      We wish to highlight that all three MinION sequencing runs here analysed feature substantially higher data throughput than that of any other recent environmental 16S rRNA sequencing study with nanopore technology, as recently reviewed by Latorre-Pérez et al. (Biology Methods and Protocols 2020, doi:10.1093/biomethods/bpaa016). One of this work's sequencing runs has resulted in lower read numbers for water samples collected in June 2018 (~0.7 Million), in comparison to the ones collected in April and August 2018 (~2.1 and ~5.5 Million, respectively). While log-scale variabilities between MinION flow cell throughput have been widely reported for both 16S and shotgun metagenomics approaches (e.g. see Latorre-Pérez et al.), the count of barcode-specific 16S reads is nevertheless expected to be correlated with the barcode-specific amount of input DNA within a given sequencing run. As displayed in Supplementary Figure 7b, we see a positive, possibly logarithmic trend between the DNA concentration after 16S rDNA amplification and number of reads obtained. With few exceptions (April-6, April-9.1 and Apri-9.2), we find that sample pooling with original 16S rDNA concentrations of ≳4 ng/µl also results in the surpassing of the here-set (conservative) minimum read threshold of 37,000 for further analyses. Conversely, all June samples that failed to reach 37,000 reads did not pass the input concentration of 4 ng/µl, despite our attempt to balance their quantity during multiplexing.

      We reason that such skews in the final barcode-specific read distribution would mainly arise from small concentration measurement errors, which undergo subsequent amplification during the upscaling with comparably large sample volume pipetting. While this can be compensated for by high overall flow cell throughput (e.g. see August-2, August-9.1, August-9.2), we think that future studies with much higher barcode numbers can circumvent this challenge by leveraging an exciting software solution: real-time selective sequencing via “Read Until”, as developed by Loose et al. (Nature Methods 2016, doi:10.1038/nmeth.3930). In the envisaged framework, incoming 16S read signals would be in situ screened for the sample-barcode which in our workflow is PCR-added to both the 5' and 3' end of each amplicon. Overrepresented barcodes would then be counterbalanced by targeted voltage inversion and pore "rejection" of such reads, until an even balance is reached. Lately, such methods have been computationally optimised, both through the usage of GPUs (Payne et al., bioRxiv 2020, https://doi.org/10.1101/2020.02.03.926956) and raw electrical signals (Kovaka et al., bioRxiv 2020, https://doi.org/10.1101/2020.02.03.931923).

      2) It would be helpful if the authors could mention the amount (or proportion) of their sequenced 16S amplicons that provided species-level identification, since this is one of the advantages of nanopore sequencing.

      We wish to emphasize that we intentionally refrained from reporting the proportion of 16S rRNA reads that could be classified at species level, since we are wary of any automated species level assignments even if the full-length 16S rRNA gene is being sequenced. While we list the reasons for this below, we appreciate the interest in the theoretical proportion of reads at species level assignment. We therefore re-analyzed our dataset, and now also provide the ratio of reads that could be classified at species level using Minimap2 (pages 16-17, lines 308-314).

      To this end, we classified reads at species level if the species entry of the respective SILVA v.132 taxonomic ID was either not empty, or neither uncultured bacterium nor metagenome. Therefore, many unspecified classifications such as uncultured species of some bacterial genus are counted as species-level classifications, rendering our approach lenient towards a higher ratio of species level classifications. Still, the species level classification ratios remain low, on average at 16.2 % across all included river samples (genus-level: 65.6 %, family level: 76.6 %). The mock community, on the other hand, had a much higher species classification rate (>80 % in all three replicates), which is expected for a well-defined, well-referenced and divergent composition of only eight bacterial taxa, and thus re-validates our overall classification workflow.

      On a theoretical level, we mainly refrain from automated across-the-board species level assignments because: (1) many species might differ by very few nucleotide differences within the 16S amplicon; distinguishing these from nanopore sequencing errors (here ~8 %) remains challenging (2) reference databases are incomplete and biased with respect to species level resolution, especially regarding certain environmental contexts; it is likely that species assignments would be guided by references available from more thoroughly studied niches than freshwater

      Other recent studies have also shown that across-the-board species-level classification is not yet feasible with 16S nanopore sequencing, for example in comparison with Illumina data (Acharya et al., Scientific Reports 2019, doi:10.25405/data.ncl.9693533) which showed that “more reliable information can be obtained at genus and family level”, or in comparison with longer 16S-ITS-23S amplicons (Cusco et al., F1000Research 2019, doi: 10.12688/f1000research.16817.2), which “remarkably improved the taxonomy assignment at the species level”.

      3) It is not entirely clear how the authors define their core microbiome. Are they reporting mainly the most abundant taxa (dominant core microbiome), and would this change if you look at a taxonomic rank below the family level? How does the core compare, for example, with other studies of this same river?

      The here-presented core microbiome indeed represents the most abundant taxa, with relatively consistent profiles between samples. We used hierarchical clustering (Figure 4a, C2 and C4) on the bacterial family level, together with relative abundance to identify candidate taxa. Filtering these for median abundance > 0.1% across all samples resulted in 27 core microbiome families. To clarify this for the reader, we have added a new paragraph to the Material and Methods (section 2.7; page 29, lines 653-658).

      We have also performed the same analysis on the bacterial genus level and now display the top 27 most abundant genera (median abundance > 0.2%), together with their corresponding families and hierarchical clustering analysis in a new Supplementary Figure 4. Overall, high robustness is observed with respect to the families of the core microbiome: out of the top 16 core families (Figure 4b), only the NS11-12 marine group family is not represented by the top 27 most abundant genera (Supplementary Figure 4b). We reason that this is likely because its corresponding genera are composed of relatively poorly resolved references of uncultured bacteria, which could thus not be further classified.

      To the best of our knowledge, there are only two other reports that feature metagenomic data of the River Cam and its wastewater influx sources (Rowe et al., Water Science & Technology 2016, doi:10.2166/wst.2015.634; Rowe et al., Journal of Antimicrobial Chemotherapy 2017, doi:10.1093/jac/dkx017). While both of these primarily focus on the diversity and abundance of antimicrobial resistance genes using Illumina shotgun sequencing, they only provide limited taxonomic resolution on the river's core microbiome. Nonetheless, Rowe et al. (2016) specifically highlighted Sphingobium as the most abundant genus in a source location of the river (Ashwell, Hertfordshire). This genus belongs to the family of Sphingomonadaceae, which is also among the five most dominant families identified in our dataset. It thus forms part of what we define as the core microbiome of the River Cam (Figure 4b), and we have therefore highlighted this consistency in our manuscript's Discussion (page 17, lines 316-319).

      4) Please consider revising the amount of information in some of the figures (such as figure 2 and figure 3). The resulting images are tiny, the legends become lengthy and the overall impact is reduced. Consider splitting these or moving some information to the supplements.

      To follow this advice, we have split Figure 2 into two less compact figures. We have moved more detailed analyses of our classification tool benchmark to the supplement (now Supplementary Figure 1). Supplementary Figure 1 notably also contains a new summary of the systematic computational performance measurements of each classification tool (see minor suggestions).

      Moreover, we here suggest that the original Figure 3 may be divided into two figures: one to visualise the sequencing output, data downsampling and distribution of the most abundant families (now Figure 3), and the other featuring the clustering of bacterial families and associated core microbiome (now Figure 4). We think that both the data summary and clustering/core microbiome analyses are of particular interest to the reader, and that they should be kept as part of the main analyses rather than the supplement – however, we are certainly happy to discuss alternative ideas with the reviewers and editors.

      5) Given that the authors claim to provide a simple, fast and optimized workflow it would be good to mention how this workflow differs or provides faster and better analysis than previous work using amplicon sequencing with a MinION sequencer.

      Data throughput, sequencing error rates and flow cell stability have seen rapid improvements since the commercial release of MinION in 2015. In consequence, bioinformatics community standards regarding raw data processing and integration steps are still lacking, as illustrated by a thorough recent benchmark of fast5 to fastq format "basecalling" methods (Wick et al., Genome Biology 2019, doi: 10.1186/s13059-019-1727-y).

      Early on during our analyses, we noticed that a plethora of bespoke pipelines have been reported in recent 16S environmental surveys using MinION (e.g. Kerkhof et al., Microbiome 2017, 10.1186/s40168-017-0336-9; Cusco et al., F1000 Research 2018, 10.12688/f1000research.16817.2; Acharya et al., Scientific Reports 2019, 10.1038/s41598-019-51997-x; Nygaard et al., Scientific Reports 2020, doi: 10.1038/s41598-020-59771-0). This underlines a need for more unified bioinformatics standards of (full-length) 16S amplicon data treatment, while similar benchmarks exist for short-read 16S metagenomics approaches, as well as for nanopore shotgun sequencing (e.g. Ye et al., Cell 2019, doi: 10.1016/j.cell.2019.07.010; Latorre-Pérez et al., Scientific Reports 2020, doi:10.1038/s41598-020-70491-3).

      By adding a thorough speed and memory usage summary (new Supplementary Figure 1b), in addition to our (mis)classification performance tests based on both mock and complex microbial community analyses, we provide the reader with a broad overview of existing options. While the widely used Kraken 2 and Centrifuge methods provide exceptional speed, we find that this comes with a noticeable tradeoff in taxonomic assignment accuracy. We reason that Minimap2 alignments provide a solid compromise between speed and classification performance, with the MAPseq software offering a viable alternative should memory usage limitation apply to users.

      We intend to extend this benchmarking process to future tools, and to update it on our GitHub page (https://github.com/d-j-k/puntseq). This page notably also hosts a range of easy-to-use scripts for employing downstream 16S analysis and visualization approaches, including ordination, clustering and alignment tests.

      The revised Discussion now emphasises the specific advancements of our study with respect to freshwater analysis and more general standardisation of nanopore 16S sequencing, also in contrast to previous amplicon nanopore sequencing approaches in which only one or two bioinformatics workflows were tested (page 16, lines 297-306).

      They also mention that nanopore sequencing is an "inexpensive, easily adaptable and scalable framework" The term "inexpensive" doesn't seem appropriate since it is relative. In addition, they should also discuss that although it is technically convenient in some aspects compared to other sequencers, there are still protocol steps that need certain reagents and equipment that is similar or the same to those needed for other sequencing platforms. Common bottlenecks such as DNA extraction methods, sample preservation and the presence of inhibitory compounds should be mentioned.

      We agree with the reviewers that “inexpensive” is indeed a relative term, which needs further clarification. We therefore now state that this approach is “cost-effective” and discuss future developments such as the 96-sample barcoding kits and Flongle flow cells for small-scale water diagnostics applications, which will arguably render lower per-sample analysis costs in the future (page 18, lines 361-365).

      Other investigators (e.g. Boykin et al., Genes 2019, doi:10.3390/genes10090632; Acharya et al., Water Technology 2020, doi:10.1016/j.watres.2020.116112) have recently shown that the full application of DNA extraction and in-field nanopore sequencing can be achieved at comparably low expense: Boykin et al. studied cassava plant pathogens using barcoded nanopore shotgun sequencing, and estimated costs of ~45 USD per sample, while we calculate ~100 USD per sample in this study. Acharya et al. undertook in situ water monitoring between Birtley, UK and Addis Ababa, Ethiopia, estimated ~75-150 USD per sample and purchased all necessary equipment for ~10,000 GBP – again, we think that this lies roughly within a similar range as our (local) study's total cost of ~3,670 GBP (Supplementary Table 6).

      The revised manuscript now mentions the possibility of increasing sequencing yield by improving DNA extraction methods, by taking sample storage and potential inhibitory compounds into account in the planning phase (page 18, lines 348-352).

      Minor points:

      -Please include a reference to the statement saying that the river Cam is notorious for the "infections such as leptospirosis".

      There are indeed several media reports that link leptospirosis risk to the local River Cam (e.g. https://www.cambridge-news.co.uk/news/cambridge-news/weils-disease-river-cam-leptosirosis-14919008 or https://www.bbc.com/news/uk-england-cambridgeshire-29060018). As we, however, did not find a scientific source for this information, we have slightly adjusted the statement in our manuscript from referring to Cambridge to instead referring to the entire United Kingdom. Accordingly, we now cite two reports from Public Health England (PHE) about serial leptospirosis prevalence in the United Kingdom (page 13, lines 226-227).

      -Please check figure 7 for consistency across panels, such as shading in violet and labels on the figures that do not seem to correspond with what is stated in the legend. Please mention what the numbers correspond to in outer ring. Check legend, where it says genes is probably genus.

      Thank you for pointing this out. We have revised (now labelled) Figure 8 and removed all inconsistencies between the panels. The legend has also been updated, which now includes a description of the number labelling of the tree, and a clearer differentiation between the colour coding of the tree nodes and the background highlighting of individual nanopore reads.

      -Page 6. There is a "data not shown" comment in the text: "Benchmarking of the classification tools on one aquatic sample further confirmed Minimap2's reliable performance in a complex bacterial community, although other tools such as SPINGO (Allard, Ryan, Jeffery, & Claesson, 2015), MAPseq (Matias Rodrigues, Schmidt, Tackmann, & von Mering, 2017), or IDTAXA (Murali et al., 2018) also produced highly concordant results despite variations in speed and memory usage (data not shown)." There appears to be no good reason that this data is not shown. In case the speed and memory usage was not recorded, is advisable to rerun the analysis and quantify these variables, rather than mentioning them and not reporting them. Otherwise, provide an explanation for not showing the data please.

      This is a valid point, and we agree with the reviewers that it is worth properly following up on this initial observation. To this end, our revised manuscript now entails a systematic characterisation of the twelve tools' runtime and memory usage performance. This has been added as Supplementary Figure 1b and under the new Materials and Methods section 2.2.4 (page 26, lines 556-562), while the corresponding results and their implications are discussed on page 16, lines 301-306. Particularly with respect to the runtime measurements, it is worth noting that these can differ by several orders of magnitude between the classifiers, thus providing an additional clarification on our choice of the - relatively fast - Minimap2 alignments.

      -In Figure 4, it would be important to calculate if the family PCA component contribution differences in time are differentially significant. In Panel B, depicted is the most evident variance difference but what about other taxa which might not be very abundant but differ in time? One can use the fitFeatureModel function from the metagenomeSeq R library and a P-adjusted threshold value of 0.05, to validate abundance differences in addition to your analysis.

      To assess if the PC component contribution of Figure 5 (previously Figure 4) significantly differed between the three time points, we have applied non-parametric tests to all season-grouped samples except for the mock community controls. We first applied Kruskal-Wallis H-test for independent samples, followed by post-hoc comparisons using two-sided Mann-Whitney U rank tests.

      The Kruskal-Wallis test established a significant difference in PC component contributions between the three time points (p = 0.0049), with most of the difference stemming from divergence between April and August samples according to the post-hoc tests (p = 0.0022). The June sampled seemed to be more similar to the August ones (p = 0.66) than to the ones from April (p = 0.11), recapitulating the results of our hierarchical clustering analysis (Figure 4a).

      We have followed the reviewers' advice and applied a complementary approach, using the fitFeatureModel of metagenomeSeq to fit a zero-inflated log-normal mixture model of each bacterial taxon against the time points. As only three independent variables can be accounted for by the model (including the intercept), we have chosen to investigate the difference between the spring (April) and summer (June, August) months to capture the previously identified difference between these months. At a nominal P-value threshold of 0.05, this analysis identifies seven families to significantly differ in their relative composition between spring and summer, namely Cyanobiaceae, Armatimonadaceae, Listeriaceae, Carnobacteriaceae, Azospirillaceae, Cryomorphaceae, and Microbacteriaceae. Three out of these seven families were also detected by the PCA component analysis (Carnobacteriacaea, Azospirillaceae, Microbacteriaceae) and two more (Listeriacaea, Armatimonadaceae) occured in the top 15 % of that analysis (out of 357 families).

      This approach represents a useful validation of our principal component analysis' capture of likely seasonal divergence, but moreover allows for a direct assessment of differential bacterial composition across time points. We have therefore integrated the analysis into our manuscript (page 10, lines 184-186; Materials and Methods section 2.6, page 29, lines 641-647) – thank you again for this suggestion.

      -Page 12-13. In the paragraph: "Using multiple sequence alignments between nanopore reads and pathogenic species references, we further resolved the phylogenies of three common potentially pathogenic genera occurring in our river samples, Legionella, Salmonella and Pseudomonas (Figure 7a-c; Material and Methods). While Legionella and Salmonella diversities presented negligible levels of known harmful species, a cluster of reads in downstream sections indicated a low abundance of the opportunistic, environmental pathogen Pseudomonas aeruginosa (Figure 7c). We also found significant variations in relative abundances of the Leptospira genus, which was recently described to be enriched in wastewater effluents in Germany (Numberger et al., 2019) (Figure 7d)."

      Here it is important to mention the relative abundance in the sample. While no further experiments are needed, the authors should mention and discuss that the presence of DNA from pathogens in the sample has to be confirmed by other microbiology methodologies, to validate if there are viable organisms. Definitively, it is a big warning finding pathogen's DNA but also, since it is characterized only at genus level, further investigation using whole metagenome shotgun sequencing or isolation, would be important.

      We agree that further microbiological assays, particularly target-specific species isolation and culturing, would be essential to validate the presence of living pathogenic bacteria. Accordingly, our revised Discussion now contains a paragraph that encourages such experiments as part of the design of future studies (with a fully-equipped laboratory infrastructure); page 17, 338-341.

      -Page 15: "This might help to establish this family as an indicator for bacterial community shifts along with water temperature fluctuations."

      Temperature might not be the main factor for the shift. There could be other factors that were not measured that could contribute to this shift. There are several parameters that are not measured and are related to water quality (COD, organic matter, PO4, etc).

      We agree that this was a simplified statement, given our currently limited number of samples, and have therefore slightly expanded on this point (page 17, lines 323-325). It is indeed possible that differential Carnobacteriaceae abundances between the time point measurements may have arisen not as a consequence of temperature fluctuations (alone), but instead as a consequence of the observed hydrochemical changes like e.g. Ca2+, Mg2+, HCO3- (Figure 6b-c) or possible even water flow speed reductions (Supplementary Figure 6d).

      -"A number of experimental intricacies should be addressed towards future nanopore freshwater sequencing studies with our approach, mostly by scrutinising water DNA extraction yields, PCR biases and molar imbalances in barcode multiplexing (Figure 3a; Supplementary Figure 5)."

      Here you could elaborate more on the challenges, as mentioned previously.

      We realise that we had not discussed the challenges in enough detail, and the Discussion now contains a substantially more detailed description of these intricacies (page 18, lines 343-359).

    1. Author Response

      Summary: The need to easily measure spontaneous behaviors in a robust fashion in experimental animals is an important problem in behavioral neuroscience. Thus, while this study is timely, the reviewers found fundamental flaws that substantially dampen enthusiasm for this work. The collective major concerns are: 1) the advance provided by this system, relative to already existing and commercially available software based on similar principles, was not clear, 2) critical technical details describing this system are missing 3) the diverse biological applications were not explored with sufficient depth and many of the related claims had potential alternative explanations.

      Authors' response:

      1) The objective of our study is not to easily measure behaviour. It is to be able to detect and measure behavioural components of interest to different fields of research (eg pain, fear/anxiety, locomotion), that have not been possible to detect and record before, because they are out of reach of existing systems. For example, no existing system has been reported to be able to detect shaking/shivering in the freely moving rat or mouse, that we demonstrate here to be associated with ongoing pain or fear. This approach is an innovative response to long standing criticisms in the literature about the standard measures of pain as a reaction to an acute nociceptive stimulus (cf von Frey filaments or tail flick) potentially inappropriate to reflect chronic spontaneous pain, or of fear as the paralizing response (freezing) to an imminent threat potentially inappropriate to reflect different fearful situations. Similarly, no existing system has been described to be able to measure the dynamics of momentum in locomotion, that we demonstrate here to be altered in pathological conditions affecting gait. Unless the reviewers can cite any, we must therefore protest against point (1) that we deem unfounded.

      2) Regarding missing critical details describing the system, we need to clarify that (i) the device is commercially available from the newly created Roddata company, (ii) the antivibration system we describe is commercially available from different manufacturers (eg CleanBench Laboratory tables from TMC, duly cited in the manuscript), and (iii) it was agreed by the editor upon submission that the data and analysis code would be made publicly available once the paper would be accepted for publication.

      3) Finally, regarding potential alternative explanations for our claims, these could be easily resolved by a few additional control experiments to be provided in a standard revision process.

      For more detailed explanations, please consider our specific point-by-point responses to the reviewers' concerns.

      Reviewer #1:

      The manuscript by Carreño-Muñoz seeks to tackle an important problem in behavioral neuroscience, that is classifying behavior at fine resolution during free exploration in rodents. Though the goals of this study are lofty, this platform, in my opinion, isn't a substantive step forward in relation to other tools currently available.

      Major concerns:

      1) What is presented in this work is a piezoelectric based sensor to detect rodent movements. My main criticism with this work is that the behaviors were coded by hand. If the authors had developed a way to automatically measure spontaneous behaviors of interest, or even train a machine to detect behavioral signatures after some human input, this system would have broader appeal. As is, the experimenter uses standard whole animal tracking with ethovision, then observes what the animal is doing by hand, then quantitation is added to certain movements. This I believe, is not a major advance, as current weight bearing devices already have this capacity.

      Authors' response: We would like to apologize if the description of our results was apparently unclear to the reviewer and resulted in factual mistakes in their evaluation. Exactly as suggested by the reviewer, the behaviours quantified in figures 3 to 5 (pain, fear, locomotion) were detected automatically, after some human input, using matlab code based on frequency decomposition of the piezo signal. Besides, we are not aware of any current weight-bearing device, such as claimed by the reviewer (unfortunately without reference to any such specific device), that was demonstrated able to detect diverse expressions of shaking (here demonstrated to reflect pain or fear), or the time dynamics of momentum in gait/locomotion.

      2) For the breathing and heartbeat studies in figure 2, I am not convinced that this approach is more beneficial than the standard EEG approaches.

      Authors' response: I believe the reviewer got here confused between EEG (electro-encephalogram) and EMG (electromyogram), because using standard EEG approches to detect breathing and heartbeat may not be the most appropriate. As regards EMG, the main benefice of our approach is that it is non-invasive, which means it does not imply to fix/implant any electrode in the body of the animal. This makes quite a difference, in particular with small animals such as mice, likely perturbed by living with EMG electrodes implanted in their chest.

      3) Figure 3 is poorly developed and the biology is very questionable. "Shaking" after surgery as a read-out of pain is not a measurement currently used or seen in the pain field. Although the authors report that this measurement is reduced with BPN, there are other trivial or pure coincidental explanations for this unusual finding. This reviewer tends to believe that the anesthesia or some other surgical by-product, not with pain as a driver, is contributing to this phenotype. I don't believe the authors have discovered a new post-op pain behavior. If so, substantial data needs to be added to be convincing.

      Authors' response: This is precisely because shaking is not a measurement currently used or seen in the pain field that our device is interesting. The post-op pain is obviously not a novelty. Only its detection is... here by our device. As an additional evidence (ie in addition to the pharmacological argument) that shaking is indeed related to pain, we can provide data recorded upon recovery from anesthesia in absence of any surgery, in which no shaking is detected (therefore ruling out any by-product of anesthesia).

      Reviewer #2:

      General assessment of the work:

      The authors present the Phenotypix, a device that uses piezoelectric pressure-sensors, in combination with video recording and signal analysis, to observe physiological states within a subject mouse. Using computational approaches, they show that this device can detect locomotion, and even sub-components of locomotion such as grooming. Similarly, they show the device can detect heart rate and breathing rate in both anesthetized and awake (but immobile) subjects. Next, in a series of proof-of-concept experiments they show that differences in pain, fear, and gait responses can be detected between control and experimental subjects.

      Numbered summary of substantive concerns:

      1) The anti-vibrational setup that the system is located on appears to be critical to successful use of the system. Please provide some parametric data showing how different degrees of dampening influence system performance. This will be critical for replication of results in different labs.

      Authors' response: Detailed parametric information on the degree of dampening that sucessfully allows the reproduction of our data is directly available on the website of commercially available anti-vibration systems used in our study (CleanBench Laboratory tables from TMC, duly cited in the manuscript). This is actually very standard laboratory equipment for applications requiring dampening of ambient vibrations (for alternative providers/manufacturers, cf Thorlabs, Newport...).

      2) How does the device account for changes in the environment, such as bedding moving around or the animal defecating/urinating? Is this system compatible with behavioral enrichment like cotton bedding, etc?

      Authors' response: We have not investigated the incidence of adding some bedding or cotton bedding on the performance of behavioural detection/quantification, but this would be easy to evaluate and report in a revision process. On the other hand, we can state that the device as used here is fine for recording sessions of a few hours (as reported in our manuscript), which is already more than most open-field recordings of mouse/rat activity in the literature.

      3) Is it possible to track multiple subjects in a single chamber? This seems like it should be feasible with the inclusion of video data in the analysis.

      Authors' response: We believe this is not possible to track the parameters we report (eg shaking in pain or fear, breathing, heart-beat, time dynamics of momentum during locomotion...) from multiple subjects in a single chamber of the presented design. But this limitation is not specific to our device, and many open-field behavioural recordings or cognitive testing procedures in the literature are limited to one animal at a time. As stated in the manuscript, these parameters are for now out of reach of video data and analysis.

      4) It appears that only locomotion related data can be reliably recorded while the subjects are moving, and that features such as heart rate and respiration rate are limited to immobile states. Is this correct? If so, a discussion of potential ways to overcome this confound would be welcomed.

      Authors' response: Indeed, there is a factor of at least 10 between the magnitude of signal generated by locomotion or grooming compared to heart beat and breathing, so that the behaviours associated with the smallest signals were investigated only in absence of behaviours associated with larger signals (ie during immobility, to the exclusion of grooming or walking). This is a limitation clearly specified in the text, but not a confound.

      5) The lack of publicly available code and data is not compatible with the mission of supporting the open science environment. It has also made evaluating the technical merit of the work in this manuscript difficult.

      Authors' response: We did include data and code availability statements in the manuscript, and declared, with the prior agreement of eLife editor, that the code and data would be made publicly available upon publication (but not before to preserve confidentiality and prevent potential use of our data and analysis code by others before the manuscript would be accepted for publication).

      Reviewer #3:

      Carreño-Muñoz et. al. describe an piezoelectric sensor based approach to quantify rodent behavior. Piezoelectric sensors convert pressure, acceleration, strain, and even temperature and sound into an electoral charge. They are exquisitely sensitive and have a wide range of functionalities. The paper describes an open field arena that sits on top of three sensors on an air table that is able to detect animal movement. The authors use several behavioral paradigms and genetic models to validate their system. Overall, the piezo and pressure/force/vibration based systems have been well established for rodent behavior. Some examples of commercial systems are the Laboras (Metris BV) and PeizoSleep (Signal solutions), along with many papers that describe similar systems. The advantage of the system described in this paper (Phenotypix) is that it encompasses a large open field which allows the mouse to carry out naturalistic behavior. It also sits on top of an air table which allows more sensitive measurements. Although the system described has some advantages, the manuscript does not describe a system that leads to a significant enough advance. The manuscript does not offer a thorough solution for any one problem in biology and does not make a convincing case for adaptation of this platform. The figures and experimental description are also lacking leading to unclear interpretation of data.

      One of the major issues with this paper is that it does not adequately describe the Phenotypix platform to allow for replication. This may be fine if the platform is commercially available, which seems to be the goal, but when I searched for the "Phenotypix, Roddata", I did not find a commercial supplier. Thus, it is unclear how this data can be replicated. Another major issue is that it is never clear if behavior state determination based on mechanoelectrical signal, video data, or both. Ideally, one would use the video data to train classifiers that only use the mechanoelectrical data. However, it is not clear that this was done in most of the experiments. Without the hardware specifications and classifiers for the behaviors, replicability is an issue. The fact that the apparatus needs to be place on a 250kg air table brings its practical utility and scalability into question. Systems such as Laboras can be obtained with readily available classifiers for numerous behaviors (https://www.metris.nl/en/products/laboras/laboras_specs/) and allow for long term monitoring in home cage environment and questions the claim of "A novel device for behavioural phenotyping of freely moving laboratory animals (rats and mice) now allows to detect behavioural components out of reach of existing systems."

      Authors' response:

      1) The Phenotypix device is commercially available from the Roddata company. The website is still under construction but will be released on the web before the publication of the manuscript.

      2) In line with a methodological study, the determination of behaviour state from video and/or piezo signal is clearly described in the extensive methods section of the manuscript:

      -"Grooming amplitude was quantified on manually selected periods as the peak-to-through amplitude of each body movement-related signal deflection." Here, behaviour state (ie periods of expression of specific grooming types) was determined manually and then quantified automatically (as the peak-to-through amplitude) using EM-signal analysis with matlab scripts.

      -"Automatic detection of shaking events was performed as threshold crossing on the bandpass filtered (10-45Hz for pain, 65-130Hz for fear), squared and normalized signal." Hence, both detection and quantification were fully automatic, using EM signal time-frequency decomposition with matlab scripts.

      -"Automatic detection of freezing events was performed as threshold crossing on the 5-130Hz bandpass filtered, squared and normalized signal." Here also, both detection and quantification were fully automatic, using EM signal time-frequency decomposition with matlab scripts.

      -"Running periods were selected based on the animal velocity, calculated from the XY coordinates obtained through offline automatic animal tracking with Ethovision XT software (Noldus). Periods of locomotion were periods during which the animal was moving between 13 and 30cm/s without interruption and reaching at least 20cm/s. Individual footsteps were identified as consecutive suprathreshold peak-trough-peak sequences from the EM signal, bandpass filtered at various frequencies using zero-phase distorsion filters (i.e. filtering in the forward and backward direction to prevent phase-distorsion). Peaks and troughs were detected as local extremas in the 0-300Hz passband filtered EM-signal, within 50ms of either the minima detected from the 0-50Hz passband filtered signal (approximative troughs) or of the maxima detected from the 0-20Hz passband filtered EM-signal (approximative peaks), respectively. Bandpass filtered 0-5Hz signal was taken as baseline, and only local minima (troughs) of amplitude larger than 1SD from baseline were selected for further footstep analysis. The amplitude of footsteps was measured as the difference between the trough and the mean of its pre- and post-peaks. The half-width was measured as the width at half amplitude." Hence, instantaneous animal position was processed automatically from the video signal using Ethovision software, and then both detection and quantification of locomotion periods and footsteps dynamics were fully automatic, using EM signal decomposition with matlab scripts.

      -"Locomotion and gait were also analyzed at the more global level of footsteps dynamics (Figure 5DF) by comparing the envelopes of locomotion-related EM signal across conditions." Here also, instantaneous animal position was processed automatically from the video signal using Ethovision software, and then both detection and quantification of locomotion periods and footsteps dynamics were fully automatic, using EM signal decomposition with matlab scripts.

      3) Air tables of 250kg or more are very standard equipment for applications requiring dampening of ambient vibrations. Like for many other behavioural-study apparatus, the scalability (ie the possibility for cheap recordings from many animals at the same time) is not our aim here. We instead describe the advantages in terms of sensitivity giving access to freely moving behavioural components out of reach of available systems such as heart-beat, breathing, shaking related to pain or fear, and the time dynamics of momentum associated with individual footsteps. A number of devices are available for behavioural phenotyping, including the Laboras system (duly cited in our paper), but unlike stated by the reviewer, none of those provide the detection/quantification of these behavioural components, hence justifying our title "A novel device for behavioural phenotyping of freely moving laboratory animals (rats and mice) now allows to detect behavioural components out of reach of existing systems".

      One issue that is not addressed for the various behaviors - how does body weight affect the spectral properties of behaviors. How can we compare the same behavior between two animals of differing sizes? Since this is a pressure sensor, this is important.

      Authors' response: We have recorded adult animals within a normal range of weight (15-40g for a mouse). We have not performed an investigation of precisely how much body weight affects sensitivity and reliability of our behavioural measures, but the results were not qualitatively different. Complementary investigation with a systematic comparison of results depending on animal weight are already planned (potentially within a regular revision process), that will provide a quantitative assessment.

    1. Author Response

      We thank the reviewers for their thoughtful and constructive comments. We have updated the manuscript to take their suggestions and concerns into account and uploaded a new version to bioRxiv. Detailed replies to the comments can be found below.

      Summary: The work detailed here explores a model of recurrent cortical networks and shows that homeostatic synaptic plasticity must be present in connections between both excitatory (E) to inhibitory (I) neurons and vice versa to produce the known E/I assemblies found in the cortex. There are some interesting findings about the consequences of assemblies formed in this way: there are stronger synapses between neurons that respond to similar stimuli; excitatory neurons show feature-specific suppression after plasticity; and the inhibitory network does not just provide a general untuned inhibitory signal, but instead sculpts excitatory processing A major claim in the manuscript that argues for the broad impact of the work is that this is one of only a handful of papers to show how a local approximation rule can instantiate feedback (akin to the back-propagation of error used to train neural networks in machine learning) in a biologically plausible way.

      Reviewer #1:

      The manuscript investigates the situations in which stimulus-specific assemblies can emerge in a recurrent network of excitatory (E) and inhibitory (I, presumed parvalbumin-positive) neurons. The authors combine 1) Hebbian plasticity of I->E synapses that is proportional to the difference between the E neuron's firing rate and a homeostatic target and 2) plasticity of E->I synapses that is proportional to the difference between the total excitatory input to the I neuron and a homeostatic target. These are sufficient to produce E/I assemblies in a network in which only the excitatory recurrence exhibits tuning at the initial condition. While the full implementation of the plasticity rules, derived from gradient descent on an objective function, would rely on nonlocal weight information, local approximations of the rules still lead to the desired results.

      Overall the results make sense and represent a new unsupervised method for generating cell assemblies consisting of both excitatory and inhibitory neurons. Major concerns are that the proposed rule ends up predicting a rather nonstandard form of plasticity for certain synapses, and that the results could be fleshed out more. Also, the strong novelty claimed could be softened or contextualized better, given that other recent papers have shown how to achieve something like backprop in recurrent neural networks (e.g. Murray eLife 2019).

      Comments:

      1) The main text would benefit from greater exposition of the plasticity rule and the distinction between the full expression and the approximation. While the general idea of backpropagation may be familiar to a good number of readers, here it is being used in a nonstandard way (to implement homeostasis), and this should be described more fully, with a few key equations.

      Additionally, the point that, for a recurrent network, the proposed rules are only related to gradient descent under the assumption that the network adiabatically follows the stimulus, seems important enough to state in the main text.

      Thanks, that's a good point. We modified the relevant portion of the main text as follows (l. 88):

      “[…] To that end, we derive synaptic plasticity rules for excitatory input and inhibitory output connections of PV interneurons that are homeostatic for the excitatory population (see Materials & Methods). A stimulus-specific homeostatic control can be seen as a "trivial" supervised learning task, in which the objective is that all pyramidal neurons should learn to fire at a given target rate ρ 0 for all stimuli. Hence, a gradient-based optimisation would effectively require a backpropagation of error [Rumelhart et al., 1985] through time [BPTT; Werbos, 1990].

      Because backpropagation rules rely on non-local information that might not be available to the respective synapses, their biological plausibility is currently debated [Lillicrap et al., 2020, Sacramento et al., 2018, Guerguiev et al., 2017, Whittington and Bogacz, 2019, Bellec et al., 2020]. However, a local approximation of the full BPTT update can be obtained under the following assumptions: First, we assume that the sensory input to the network changes on a time scale that is slower than the intrinsic time scales in the network. This eliminates the necessity of backpropagating information through time, albeit still through the synapses in the network. This assumption results in what we call the ”gradient-based” rules (Eq. 15 in the Supplementary Materials), which are spatially non-local. Second, we assume that synaptic interactions in the network are sufficiently weak that higher-order synaptic interactions can be neglected. Third and finally, we assume that over the course of learning, the Pyr→PV connections and the PV→Pyr connections become positively correlated [Znamenskiy et al., 2018], such that we can replace PV->Pyr synapses by the reciprocal Pyr->PV synapse in the Pyr->PV learning rule, without rotating the update too far from the true gradient (see Supplementary Materials)."

      We also added the learning rules to the main text (l. 108).

      2) The paper has a clear and simple message, but not much exploration of that message or elaboration on the results. Figures 2 and 3 do not convey much information, other than the fact that blocking either form of plasticity fails to produce the desired effects. This seems somewhat obvious -- almost by definition one can't have E/I assemblies if E->I or I->E connections are forced to remain random. This point deserves at most one figure, or maybe even just a few panels.

      We appreciate that the result that both forms of plasticity are necessary may feel somewhat obvious. However, it may not be as obvious as it appears, because the incoming synapses onto INs follow a long-tailed distribution, like many other synapse types. Randomly sampling from such a distribution could in principle generate sufficient stimulus selectivity to render learning in the E->I connections superfluous (see Litwin-Kumar et al., 2017). That’s why we made sure to initialize the E->I weights such that they show a similar variability as in the data. We now comment on this aspect in the results section (l. 135):

      "Having shown that homeostatic plasticity acting on both input and output synapses of interneurons are sufficient to learn E/I assemblies, we now turn to the question of whether both are necessary . To this end, we perform "knock-out" experiments, in which we selectively block synaptic plasticity in either of the synapses. The motivation for these experiments is the observation that the incoming PV synapses follow a long-tailed distribution (Znamenskiy et al., 2018). This could provide a sufficient stimulus selectivity in the PV population for PV->Pyr plasticity alone to achieve a satisfactory E/I balance. A similar reasoning holds for static, but long-tailed outgoing PV synapses. This intuition is supported by result of Litwin-Kumar et al. (2017) that in a population of neurons analogous to our interneurons, the dimensionality of responses in that population can be high for static input synapses, when those are log-normally distributed."

      Secondly, we tried to write a manuscript for both fellow modelers (how to self-organize an E/I assembly?) and to our experimental colleagues (what conclusions can we draw from the Znamenskiy data?). In electrophysiological studies, the plasticity of incoming and outgoing synapses of INs both have been studied independently. The insight that those two forms of plasticity should act in synergy is something that we wanted to emphasize, because it could be studied in parallel in paired recordings. Hence the two figures. Looks as if we got only modelers as reviewers ;). Along these lines, we added a short paragraph to the discussion (l. 348):

      “Both Pyr->PV and PV->Pyr plasticity have been studied in slice (for reviews, see, Kullmann et al. 2007, Vogels et al. 2013), but mostly in isolation. The idea that the two forms of plasticity should act in synergy suggests that it may be interesting to study both forms in the same system, e.g., in reciprocally connected Pyr-PV pairs.“

      3) The derived plasticity rule for E->I synapses, which requires modulation of I synapses based on a difference from a target value for the excitatory subcomponent of the input current, does not take a typical form for biologically plausible learning rules (which usually operate on firing rates or voltages, for example). The authors should explore and discuss in more depth this assumption. Is there experimental evidence for it? It seems like it might be a difficult quantity to signal to the synapse in order to guide plasticity. The authors note in the discussion that BCM-type rules fail here -- are there other approaches that would work? What about a more local form of plasticity that involves only the excitatory current local to a dendrite, for example?

      We agree that the rule we propose for E->I synapses warrants a more extensive discussion regarding its potential biological implementation. We have added the following paragraph to the manuscript (l. 295):

      “A cellular implementation of such a plasticity rule would require the following ingredients: i) a signal that reflects the cell-wide excitatory current ii) a mechanism that changes Pyr->PV synapses in response to variations in this signal. On PV interneurons, NMDA receptors are enriched in excitatory feedback relative to feedforward connections [LeRoux et al., 2013]. Intracellular sodium and calcium could hence be a proxy of recurrent excitatory input. In addition, the activation of NMDA receptors has been shown to track intracellular sodium concentration [Yu and Salter, 1998] which at least partially reflects glutamatergic synaptic currents. Due to a lack of spines in PV dendrites, both postsynaptic sodium and calcium are expected to diffuse more broadly in the dendritic arbor [Hu et al., 2014, Kullmann and Lamsa, 2007], and thus might provide a signal for overall dendritic excitatory currents. Depending on how the excitatory inputs are distributed on PV interneuron dendrites [Larkum and Nevian, 2008, Jia et al., 2010, Grienberger et al., 2015], this integration does not need to be cell-wide, but could be local, e.g., to a dendrite, if the local excitatory input is a proxy for the global input.

      NMDA receptors at IN excitatory input synapses can mediate Hebbian long-term plasticity [Kullmann and Lamsa, 2007}, and blocking excitatory currents can abolish plasticity in those synapses [LeRoux et al., 2013]. Furthermore, NMDAR-dependent plasticity is expressed post-synaptically, and seems to require presynaptic activation [Kullmann and Lamsa, 2007]. Other molecular signals that reflect excitatory activity have been implicated in the homeostatic regulation of synapses onto INs, including Narp and BDNF [Chang et al., 2010, Rutherford et al., 1998, Lamsa et al., 2007]. In summary, we conjecture that PV interneurons and their excitatory inputs have the necessary prerequisites to implement the suggested local Pyr->PV plasticity rule.”

      Concerning other potential types of plasticity, we certainly do not expect that the suggested pair of rules is the only one that will work. We have added the following paragraph to the discussion (l. 322):

      “We expect that the rules we suggest here are only one set of many that can establish E/I assemblies. Given that the role of the input plasticity in the interneurons is the formation of a stimulus specificity, it is tempting to assume that this could equally well be achieved by classical forms of plasticity like the Bienenstock-Cooper-Munro (BCM) rule [Bienenstock, et al. 1982], which is commonly used in models of receptive field formation. However, in our hands, the combination of BCM plasticity in Pyr->PV synapses with homeostatic inhibitory plasticity in the \ItoE synapses showed complex dynamics, an analysis of which is beyond the scope of this article. In particular, this combination of rules often did not converge to a steady state, probably for the following reason. BCM rules tend to [...].

      We suspect that this instability can also arise for other Hebbian forms of plasticity in interneuron input synapses when they are combined with homeostatic inhibitory plasticity [Vogels et al. 2011] in their output synapses. The underlying reason is that for convergence, the two forms of plasticity need to work synergistically towards the same goal, i.e., the same steady state. For two arbitrary synaptic plasticity rules acting in different sets of synapses, it is likely that they aim for two different overall network configurations. Such competition can easily result in latching dynamics with a continuing turn-over of transiently stable states, in which the form of plasticity that acts more quickly gets to reach its goal transiently, only to be undermined by the other one later [Clopath et al. 2016].”

      4) Does the initial structure in excitatory recurrence play a role, or is it just there to match the data?

      For the results of Fig 4, the structure of excitatory recurrence is essential, because similarly tuned Pyr neurons should excite each other (absent the E-I assemblies). Without that structure in the Pyr->Pyr connections, the “paradoxical” inhibitory effect we report would not be paradoxical at all. For the results of Fig 1-3 the excitatory recurrence plays a role only insofar as it permits and reinforces stimulus selectivity in pyramidal neurons. If those synapses were unstructured (and strong), it could disrupt the Pyr selectivity, and there would be nothing to guide the formation of E/I assemblies. We have added the following sentence to the beginning of the results section (l. 77):

      “[...] Note that the Pyr->Pyr connections only play a decisive role for the results in Fig. 4, but are present in all simulations for consistency. [...]”

      Reviewer #2:

      In this work, the authors simulated a rate-based recurrent network with 512 excitatory and 64 inhibitory neurons. The authors use this model to investigate which forms of synaptic plasticity are needed to reproduce the stimulus-specific interactions observed between pyramidal neurons and parvalbumin-expressing (PV) interneurons in mouse V1. When there is homeostatic synaptic plasticity from both excitatory to inhibitory and reciprocally from inhibitory to excitatory neurons in the simulated networks, they showed that the emergent E/I assemblies are qualitatively similar to those observed in mouse V1, e.g. there are stronger synapses for neurons responding to similar stimuli. They also identified that synaptic plasticity must be present in both directions (from pyramidal neurons to PV neurons and vice versa) to produce such E/I assemblies. Furthermore, they identified that these E/I assemblies enable the excitatory population in their simulations to show feature-specific suppression. Therefore, the author claimed that they found evidence that these inhibitory circuits do not provide a "blanket of inhibition", but rather a specific, activity-dependent sculpting of the excitatory response. They also claim that the learning rule they developed in this model shows for the first time how a local approximation rule can instantiate feedback alignment in their network, which is a method for achieving an approximation to a backpropagation-like learning rule in realistic neural networks.

      We thank you for your thorough evaluation of the role of feedback alignment (FA) in our model. While we will attempt to address them point-by-point below, we feel that we may have misled this reviewer regarding the focus of the article. The core novelty of this work lies in elucidating potential mechanisms of experimentally observed E/I neuronal assemblies in mouse V1, and furthermore in proposing plasticity rules that can achieve such E/I assemblies. That they do so via a mechanism akin to feedback alignment is mentioned relatively briefly in the manuscript, and is merely offered as a mechanistic explanation for how inhibitory currents are ultimately balanced with excitation. We are fully aware of the fact that the suggested rules are by no means a local approximation of the full BPTT problem in RNNs, but feel that the reviewer read our paper primarily as a contribution to this very interesting literature (which it isn't in our claim).

      Major points:

      1) The authors claim that their synaptic plastic rule implements a recurrent variant of feedback alignment. Namely, "When we compare the weight updates the approximate rules perform to the updates that would occur using the gradient rule, the weight updates of the local approximations align to those of the gradient rules over learning". They also claim that this is the first time feedback alignment is demonstrated in a recurrent network. It seems that the weight replacement in this synaptic plastic rule is uniquely motivated by E/I balance, but the feedback alignment in [Lillicrap et al., 2016] is much more general. Thus, the precise connections between feedback alignment and this work remains a bit unclear.

      We had hoped that our claims in the manuscript were phrased sufficiently carefully, and regret that the reviewer was led to believe that our goal was to provide a general solution to biological backprop in recurrent networks. Of course, the problem we are tackling is not the full backprop problem, and we do not expect that the approximation holds for general tasks. It clearly won't, given that it effectively relies on a truncation after two time steps and makes a stationarity assumption. Still, we felt that it would have been a lost opportunity not to discuss the relation to feedback alignment, because any approximation warrants a justification, and for the replacement of I->E weights by E->I weights, feedback alignment readily provides one. We now discuss the assumptions underlying the local approximation more extensively in the main paper (see reply to Reviewer 1, comment 1).

      We also added a discussion to the section in the supplementary material, where the local approximations are derived (l. 760):

      “Overall, the local approximation of the learning rule relies on three assumptions: Slowly varying inputs, weak synaptic weights and alignment of input and output synapses of the interneurons. These assumptions clearly limit the applicability of the learning rules for other learning tasks. In particular, the learning rules will not allow the network to learn temporal sequences.”

      It would be good if the following things about this major claim of the manuscript could be expanded and/or clarified:

      i) In Fig S3 (upper, right vs. left), it is surprising that the Pyr->PV knock-out seems to produce a better alignment in PV->Pyr. Comparing the upper right of Fig S3 and the bottom figure of Fig 1g, it seems that the Pyr->PV knock-out performs equally well with a local approximation for the output connections of PV interneurons. Is this a special condition in this model that results in the emergence of the overall feedback alignment?

      The 0-th order approximation of I->E plasticity is, by itself, relatively good at following the full gradient for those synapses (because I->E synapses have virtually unmediated control over Pyr neuron activity). When E->I plasticity is also present, we believe that the higher variance in angle to the gradient (for I->E updates) may be due to perturbations introduced by the E->I updates. Each update to one weight matrix changes the gradient for the other, but this is ultimately what brings them into alignment with one another. Because this is a very technical point, we prefer not to discuss this at length in the manuscript. The more important point is summarized in the two bottom figures, which demonstrate that the gradients on the E->I synapses only align within 90 degrees when both synapse types are plastic.

      ii) In the feedback alignment paper [Lillicrap et al., 2016], those authors introduce a "Random Feedback Weights Support"; this uses a random matrix, B, to replace the transpose of the backpropagation weight matrix. Here, the alignment seems to be based on the intuition that "The excitatory input connections onto the interneurons serve as a proxy for the transpose of the output connections," and "the task of balancing excitation by feedback inhibition favours symmetric connection." It seems synaptic plasticity here is mechanistically different; it is only similar to the feedback alignment [Lillicrap et al., 2016] because both reach a final balanced state. Please clarify how the results here are to be interpreted as an instantiation of feedback alignment - whether it is simply that the end state is similar, or if the mechanism is thought to be more deeply connected.

      We believe that the mechanisms are indeed more deeply connected, as supported by the fact that the gradients align early on during learning. We added an extended discussion to the supplementary material (l. 744):

      “In feedback alignment, the matrix that backpropagates the errors is replaced by a random matrix B. Here, we instead use the feedforward weights in the layer below. Similar to the extension to feedback alignment of Akrout et al. [2019], those weights are themselves plastic. However, we believe that the underlying mechanism of feedback alignment still holds. The representation in the hidden layer (the interneurons) changes as if the weights to the output layer (the Pyr neurons) were equal to the weights they are replaced with (here, the input weights to the PV neurons). To exploit this representation, the weights to the output layer then align to the replacement weights, justifying the replacement post-hoc (Fig. 1G).”

      iii) The feedback alignment [Lillicrap et al., 2016] works when the weight matrix has its entries near zero (e^TWBe>0). Are there any analogous conditions for the synaptic plastic rule to succeed?

      Yes, the condition is very similar. We have added a corresponding discussion to the supplementary material (l. 753):

      “Note that the condition for feedback alignment to provide an update in the appropriate direction (e T B T W e>0, where e denotes the error, W the weights in the second layer, and B the random feedback matrix) reduces to the condition that W ei W ie is positive definite (assuming the errors are full rank). One way of assuring this is a sufficiently positive diagonal of this matrix product, i.e., a sufficiently high correlation between the incoming and outgoing synapses of the interneurons. A positive correlation of these weights is one of the observations of Znamenskiy et al. 2018 and also a result of learning in our model.

      While such a positive correlation is not necessarily present for all learning tasks or network models, we speculate that it will be for the task of learning an E/I balance in a Dalean network.”

      iv) In the supplementary material, the local approximation rule is developed using a 0th-order truncation of Eq's 15a and 15b. Is it noted that "If synapses are sufficiently weak ..., this approximation can be substituted into Eq. 15a and yields an equation that resembles a backpropagation rule in a feedforward network (E -> I -> E) with one hidden layer -- the interneurons." It would be helpful if the authors could discuss how this learning rule works in a general recurrent network, or if it will work for any network with sufficiently weak synapses.

      We now discuss the assumptions and their consequences more extensively, see reply to reviewer 1, comment 1.

      v) This synaptic plasticity rule seems to be closely related to another local approximation of backpropagation in recurrent neural network: e-prop in (Bellec et.al 2020, https://www.nature.com/articles/s41467-020-17236-y) and broadcast alignment (Nøkland 2016, Samadi et.al, 2017). These previous papers do not consider E/I balance in their approximations, but is E/I balance necessary for successful local approximation to these rules?

      We are not sure if we fully understand the comment. We do not expect that E/I balance is necessary for other biologically plausible approximations of BPTT. We merely suggest that for the task of learning E/I balance, the presented local approximation is valid.

      2) In the discussion, it reads as if the BCM rule cannot apply to this recurrent network because of the limited number of interneurons in the simulation ("parts of stimulus space are not represented by any interneurons"). Is this a limitation of the size of the model? Would scaling up the simulation change how applicable the BCM learning rule is? It would be helpful if the authors offer a more detailed discussion on why some forms of plasticity in interneurons fail to produce stimulus specificity.

      Increasing the size of the model would help only if it would increase the redundancy in the Pyr population response. Otherwise, the problem can only be solved by changing the E to I ratio.

      We feel that an exhaustive discussion of the dynamics of BCM in our network is beyond the scope of the paper, particularly because BCM comes in a broad variety (weight normalisation, weight limits, exact form of the sliding threshold?) and the exact behavior depends on various parameter choices. Similarly, we preferred to limit the discussion of other Hebbian rules, because it would be somewhat arbitrary which rules to discuss. Instead we added the following more abstract arguments to the discussion section (l. 322):

      “We expect that the rules we suggest here are only one set of many that can establish E/I assemblies. Given that the role of the input plasticity in the interneurons is the formation of a stimulus specificity, it is tempting to assume that this could equally well be achieved by classical forms of plasticity like the Bienenstock-Cooper-Munro (BCM) rule \citep{Bienenstock82}, which is commonly used in models of receptive field formation. However, in our hands, the combination of BCM plasticity in Pyr->PV synapses with homeostatic inhibitory plasticity in the PV->Pyr synapses showed complex dynamics, an analysis of which is beyond the scope of this article. In particular, this combination of rules often did not converge to a steady state, probably for the following reason. [...]

      We suspect that this instability can also arise for other Hebbian forms of plasticity in interneuron input synapses when they are combined with homeostatic inhibitory plasticity (Vogels et al., 2011) in their output synapses. The underlying reason is that for convergence, the two forms of plasticity need to work synergistically towards the same goal, i.e., the same steady state. For two arbitrary synaptic plasticity rules acting in different sets of synapses, it is likely that they aim for two different overall network configurations. Such competition can easily result in dynamics with a continuing turn-over of transiently stable states, in which the form of plasticity that acts more quickly gets to reach its goal transiently, only to be undermined by the other one later.”

      Minor comments:

      1) Section 1 of the Results is confusing. The authors jump back and forth between emphasizing the emergence of E/I assemblies and connecting the local approximation rule to general feedback alignment. It would be helpful if the authors reorganized this section: maybe discuss the E/I assemblies first (with Figure 1), then go on to discuss why it is important to compare this synaptic plastic rule with feedback alignment.

      We have extended the explanation of the plasticity rules [l. 108] and hope that this section is now more accessible.

      2) Although the authors claim that there exists a significant change after PV->Pyr knockout (Fig 2b), the current presentation of this result is confusing: how many neurons change their responses? (Reading directly from the distributional difference, it seems that the gray and blue distributions only differ by about 5-8 neurons).

      The change is admittedly modest, but significant.

      3) Effect sizes instead of p-values should be quoted and used throughout, because the large data size of the simulations seems to make even the smallest correlations significant.

      We used p-values to remain consistent with the article of Znamenskiy et al. Please note that we took care to sample a comparable number of synapses from the network as in Znamenskiy et al., to keep the p-values comparable. If we had sampled all synapses from the network, significance would indeed be trivial.

    1. Author Response:

      Summary: This is an interesting topic, and these findings are potentially of theoretical significance for the field of sleep and memory consolidation, as well as potentially of practical importance. However, reviewers raised potential issues with the methods and interpretation. Specifically, reviewers were not confident that the paper reveals major new mechanistic insights that will make a major impact on a broad range of fields.

      Reviewer #1:

      This work claims to show that learning of word associations during sleep can impair learning of similar material during wakefulness. The effect of sleep on learning depended on whether slow-wave sleep peaks were present during the presentation of that material during sleep. This is an interesting finding, but I have a lot of questions about the methods that temper my enthusiasm.

      We thank the reviewer for the careful reading of our manuscript and for the helpful comments. Most of the issues that were raised concern the clarity of writing. We will remove sleep-specific jargon where possible and will add relevant theoretical background and methodological details in the revised version of our manuscript.

      1) The proposed mechanism doesn't make sense to me: "synaptic down-scaling of hippocampal and neocortical language-related neurons, which were then too saturated for further potentiation required for the wake-relearning of the same vocabulary". Also lines 105-122. What is 'synaptic down-scaling'? what are 'language related neurons'? ' How were they 'saturated'? What is 'deficient synaptic renormalization'? How can the authors be sure that there are 'neurons that generated the sleep- and ensuing wake-learning of ... semantic associations'? How can inferences about neuronal mechanisms (ie mechanisms within neurons) be drawn from what is a behavioural study?<br> We will improve the writing of our manuscript and will add formal definitions of the key concepts (synaptic down-scaling / renormalization, synaptic saturation, …) to clarify the proposed mechanism. We are also open to discuss alternative explanations.

      We admit that there is no way of truly knowing whether there were specific neurons or neuronal networks that encoded the semantic associations for word pairs that were played during sleep or during ensuing wakefulness. However, the behavioural data of the implicit memory test and the recall test suggest that participants formed memories for the word pairs played during sleep and during learning in the waking state. These memories must be represented in the brain – most likely in the hippocampus and in cortical regions involved in the processing of language. Indeed, our previous report suggested that successful retrieval of sleep-played semantic associations recruited hippocampus and language sites (Züst et al., 2019, Curr. Biol.).

      2) On line 54 the authors say "Here, we present additional data from a subset of participants of our previous study in whom we investigated how sleep-formed memories interact with wake-learning." It isn't clear what criteria were used to choose this 'subset of participants'. Were they chosen randomly? Why were only a subset chosen, anyway?

      The dataset we reported in Current Biology (Züst et al. 2019) consisted of two samples. Participants of the first sample stayed in the sleep laboratory following waking to perform the implicit memory test and the wake-learning task in the sleep laboratory. Participants of the second sample were escorted to the MR centre following waking to perform the implicit memory test in the MR scanner. These participants did not take a wake-learning task. Therefore, we could not include them in the study of wake-learning. Nevertheless, we do include ALL data of the first sample. We will clarify this in the revised version of our manuscript.

      3) The authors do not appear to have checked whether their nappers had explicit memory of the word pairs that had been presented. Why was this not checked, and couldn't explicit memory explain the implicit memory traces described in lines 66-70 (guessing would be above chance if the participants actually remembered the associations).

      Previous work from our own group (Ruch et al, 2014) as well as from other groups (Andrillon & Kouider, 2016; Cox et al., 2014; Arzi et al., 2012) clearly suggests that sleep-played sounds and words are not remembered consciously after waking up. This is why we administered an implicit memory test following waking. We only asked participants at the end of the experiment – i.e. after they had completed the wake-learning task – whether they had noticed or heard something unusual or unexpected during sleep. This first question was followed by the second question of whether participants had heard words during sleep. All participants denied having heard anything during sleep. This suggests that participants had no explicit memory for the sleep-played vocabulary. We will mention this in the revised version of our manuscript.

      Importantly, if participants had explicit memory for sleep-played vocabulary, the finding that these memories suppress conscious re-learning of the same or similar contents during subsequent wakefulness would oppose previous findings demonstrating that repeated learning improves retention.

      Reviewer #2:

      This paper reports on a very interesting and potentially highly important finding - that so-called "sleep learning" does not improve relearning of the same material during wake, but instead paradoxically hinders it. The effect of stimulus presentation during sleep on re-learning was modulated by sleep physiology, namely the number of slow wave peaks that coincide with presentation of the second word in a word pair over repeated presentations. These findings are of theoretical significance for the field of sleep and memory consolidation, as well as of practical importance.

      We appreciate the reviewer’s enthusiasm for our work and are grateful for the detailed and helpful comments.

      Concerns and recommendations:

      1) The authors' results suggest that "sleep learning" leads to an impairment in subsequent wake learning. The authors suggest that this result is due to stimulus-driven interference in synaptic downscaling in hippocampal and language-related networks engaged in the learning of semantic associations, which then leads to saturation of the involved neurons and impairment of subsequent learning. Although at first the findings seem counter-intuitive, I find this explanation to be extremely interesting. Given this explanation, it would be interesting to look at the relationship between implicit learning (as measured on the size judgment task) and subsequent explicit wake-relearning. If this proposed mechanism is correct, then at the trial level one would expect that trials with better evidence of implicit learning (i.e. those that were judged "correctly" on the size judgment task) should show poorer explicit relearning and recall. This analysis would make an interesting addition to the paper, and could possibly strengthen the authors' interpretation.

      The main findings did not change when we controlled for implicit memory performance. Most importantly, the reported interaction between re-learning Condition (congruent vs. incongruent translations) and the number of slow-wave peaks during sleep (0-1 vs. 2-4) remained significant when we included implicit memory retrieval as predictor. Furthermore, this interaction was not mediated by implicit retrieval performance (no significant 3-way interaction).

      We decided against reporting these analyses in the manuscript because including performance in the implicit memory test as additional predictor reduced the trial count to critically low levels in some conditions, making significance testing unreliable.

      2) In some cases, a null result is reported and a claim is based on the null result (for example, the finding that wake-learning of new semantic associations in the incongruent condition was not diminished). Where relevant, it would be a good idea to report Bayes factors to quantify evidence for the null.

      We will include Bayes Factors for our post-hoc analyses in the revised version of our manuscript.

      3) The authors report that they "further identified and excluded from all data analyses the two most consistently small-rated and the two most consistently large-rated foreign words in each word lists based on participants' ratings of these words in the baseline condition in the implicit memory test." Although I realize that the same approach was applied in their original 2019 paper, this decision point seems a bit arbitrary, particularly in the context of the current study where the focus is on explicit relearning and recall, rather than implicit size judgments. As a reader, I wonder whether the results hold when all words are included in the analysis.

      We wanted the analyses to be consistent with the original report in Current Biology (Züst et al, 2019). Nevertheless, for this revision, we will include all learning material in the analyses. Note that the changes in the overall pattern of results are minuscule and the message remains the same when stereotypical/biased words are included vs. excluded.

      4) In the main analysis examining interactions between test run, condition (congruent/incongruent) and number of peak-associated stimulations during sleep (0-1 versus 3-4), baseline trials (i.e. new words that were not presented during sleep) are excluded. As such, the interactions shown in the main results figure (Figure D) are a bit misleading and confusing, as they appear to reflect comparisons relative to the baseline trials (rather than a direct comparison between congruent and incongruent trials, as was done in the analysis). It also looks like the data in the "new" condition is just replicated four times over the four panes of the figure. I recommend reconstructing the figure so that a direct visual comparison can be made between the number of peaks within the congruent and incongruent trials. This change would allow the figure to more accurately reflect the statistical analyses and results that are reported in the manuscript.

      We will update the figure with a panel that presents the results for all conditions on the same axes. This will facilitate direct comparisons between all conditions.

      5) In addition to the main analysis, the authors report that they also separately compared the conscious recall of congruent and incongruent pairs that were never or once vs. repeatedly associated with slow-wave peaks with the conscious recall in the baseline condition. Given that four separate analyses were carried out, some correction for multiple comparisons should be done. It is unclear whether this was done as it does not seem to be reported.

      We will clarify where and how we corrected for multiple comparisons in the revised version of the manuscript.

    1. Author Response

      Summary: This study tackles a difficult problem of understanding the basis for hippocampal theta rhythms through reduction of a highly detailed model, seeking to validate a reduced model that would be more amenable to analysis. The reviewers appreciated the attention to this challenging problem and the substantial work that went into it, but had several fundamental concerns about the methodology, interpretation, and reporting.

      We appreciate the detailed feedback provided to us by the reviewers and editors and we are pleased that there was an appreciation for the attention we have given to this challenging problem and the substantial work that went into it. We would like to thank the reviewers for their efforts.

      This feedback helped us realize that there was possibly too much presented in this single paper and moving forward, we will split the work into two papers. While we agree with some of the feedback, we think that some aspects were misunderstood, which may be partially due to the extensiveness of the submitted paper. Below we provide general responses to the points raised, leaving specifics for elsewhere.

      Reviewer #1:

      This study takes two existing models of hippocampal theta rhythm generation, a reduced one with two populations of Izhikevich neurons, and a detailed one with numerous biophysically detailed neuronal models. The authors do some parameter variation on 3 parameters in the reduced model and ask which are sensitive control parameters. They then examine control of theta frequency through a phase response curve and propose an inhibition-based tuning mechanism. They then map between the reduced and detailed model, and find that connectivity but not synaptic weights are consistent. They take a subset of the detailed model and do a 2 parameter exploration of rhythm generation. They compare phenomenological outcomes of the model with results from an optogenetic experiment to support their interpretation of an inhibition-based tuning mechanism for intrinsic generation of theta rhythm in the hippocampus.

      This statement summarizes our work to a certain extent but it misses a key aspect – the ‘mapping’ between the minimal (that this reviewer refers to as ‘reduced’) and detailed model is what is used to rationalize and motivate the subsequent extensive 2-parametric exploration in a ‘piece’ of the detailed model (which we termed the segment model). We will aim to write this more clearly in an edited version.

      General comments:

      1) The paper shows the existence of potential rhythm mechanisms, but the approach is illustrative rather than definitive. For example, in a very lengthy section on parameter exploration in the reduced model, the authors find some domains which do and don't exhibit rhythms. Lacking further exploration or analytic results, it is hard to see if their interpretations are conclusive.

      We agree that these are interpretations (not meant to be conclusive), but the goal was to use the minimal model to develop further insight as we did with a hypothesis development presented in the middle of the paper.

      2) The authors present too much detail on too few dimensions of parameters. An exhaustive parameter search would normally go systematically through all parameters, and be digested in an automated manner. For reporting this, a condensed summary would be presented. Here the authors look at 3 parameters for the reduced model and 2 parameters in the detailed one - far fewer than the available parameter set. They discuss the properties of these parameter choices at length, but then pick out a couple of illustrative points in the parameter domain for further pursuit. This leaves the reader rather overwhelmed on the one hand, and is not a convincing thorough exploration of all parameters of the system on the other.

      See above.

      3) I wonder if the 'minimal' model is minimal enough. Clearly it is well- supplied with free parameters. Is there a simpler mapping to rate models or even dynamical systems that might provide more complete insights, albeit at the risk of further abstraction?

      We agree that models can be even more minimal, but the goal here was not to further analyse the minimal model through simpler mappings or otherwise. Rather, it was to exploit linkages between the minimal model and detailed models to help understand how theta rhythms could be generated in the biological system (Goutagny et al. 2009 intrinsic theta), using a piece of the detailed model as a ‘biological proxy’.

      4) Around line 560 and Fig 12 the authors conclude that only case a) is consistent with experiment. While it is important to match data to experiment, here the match is phenomenological. It misses the opportunity to do a quantitative match which could be done by taking advantage of the biological detail in the model.

      5) The paper is far too long and is a difficult read. Many items of discussion are interspersed in the results, for example around line 335 among many others.

      We will split the paper into two.

      Reviewer #2:

      In this work Chatzikalymniou et al. use models of hippocampus of different complexities to understand the emergence and robustness of intra-hippocampal theta rhythms. They use a segment of highly detailed model as a bridge to leverage insights from a minimal model of spiking point neurons to the level of a full hippocampus. This is an interesting approach as the minimal model is more amenable to analysis and probing the parameter space while the detailed model is potentially closer to experiment yet difficult and costly to explore.

      We completely agree.

      The study of network problems is very demanding, there are no good ways to address robustness of the realistic models and the parameter space makes brute force approaches impractical. The angle of attack proposed here is interesting. While this is surely not the only approach tenable, it is sensible, justified, and actually implemented. The amount of work which entered this project is clear. I essentially accept the proposed reasoning and the hypotheses put forward. The few remarks I have are rather minor, but I think they merit a response.

      1) l. 528-530 "This is particularly noticeable in Figure 9D where theta rhythms are present and can be seen to be due to the PYR cell population firing in bursts of theta frequency. Even more, we notice that the pattern of the input current to the PYR cells isn't theta-paced or periodic (see Figure 10Bi)."

      This is a loose statement. When you look at the raw LFP theta is also not apparent (e.g. Figure 9.Ei or Fi). What happens once you look at the spectrum of the activity shown in 10.Bi? Do you see theta or not?

      We agree – to be done.

      2) l. 562 "This implies that the different E-I balances in the segment model that allow LFP theta rhythms to emerge are not all consistent with the experimental data, and by extension, the biological system."

      This is speculative. We do not know how generic the results of Amilhon et al. are. They showed what you can find experimentally, not what you cannot find experimentally. I agree with the statement from l.581, though : "Thus, from the perspective of the experiments of Amilhon et al. (2015) theta rhythm generation via a case a type pathway seems more biologically realistic ..."

      We agree – to edit accordingly.

      3) There are several problems with access to code and data provided in the manuscript.

      l. 986, 1113 - osf.io does not give access l. 1027 - bitbucket of bezaire does not allow access l. 1030 - simtracker link is down l. 1129, 1141 - the github link does not exist (private repo?)

      Our apologies that all of these were not made public as intended.

      4) l. 1017 - Afferent inputs from CA3 and EC are also included in the form of Poisson-distributed spiking units from artificial CA3 and EC cells.

      Not obvious if Poisson is adequate here - did you check on the statistics of inputs? Any references? Different input statistics may induce specific correlations which might affect the size of fluctuations of the input current. I do not think this would be a significant effect here unless the departure from Poisson is highly significant. Any comments might be useful.

      We were simply using the same input protocol setup done by Bezaire et al. 2016.

      5) l. 909 - "Euler integration method is used to integrate the cell equations with a timestep of 0.1 msec."

      This seems dangerous. Is the computation so costly that more advanced integration is not viable?

      Our apologies as the timestep was erroneously reported. At initial stages of the project, larger stepsizes were attempted to speed up computation. The stepsize/integration used were as in minimal model of Ferguson et al. (2017). That is, Euler integration with a 0.04ms stepsize for the cell simulations and Runge-Kutta for network simulations.

      Reviewer #3:

      [...] I have a number of methodological issues with the paper. First, both models should be validated against experimental evidence given that the experimental results exist. The validation of a "minimal" model with data from another model is circumstantial and useful to link two models, but in no way is a scientific validation, in my opinion. Second, the model reduction arguments are simply taken as a piece of a large model. This is in now way a systematic reduction, which the authors should provide. In the absence of that, the two models are simply two different models. Third, it is not clear what aspects of the mechanisms cannot be investigated using the larger models that require the reduced models, given that the models do not necessarily match. Fourth, the concept of a minimal model should be clearly explained. They used caricature (toy) models of (2D quadratic models, aka Izhikevich models) combined with biophysically plausible descriptions of synapses. The model parameters in 2D quadratic models are not biophysical as the authors acknowledge, but they can be related to biophysical parameters through the specific equations provided in Rotstein (JCNS, 2015) and Turquist & Rotstein (Encyclopedia of Computational Neuroscience, 2018). In fact, they can represent either h-currents or M-currents. I suggest the authors determine this from these references. In this framework, the dynamics would result from a combination of these currents and persistent sodium or fast (transient) sodium activation. Fifth, from the original paper (Ferguson et al., 2017) their minimal model has 500 PV and 10000 PYR cells (I couldn't find the number of PV cells in this paper, but I assumed they were as in the original paper). This is not what I would call a minimal model. It is minimal only in comparison with the more detailed model. While this is a matter of semantics, it should be clarified since there are other minimal model approaches in the literature (e.g., Kopell group, Erdi group). Related to these models, it is typically assumed that the relationship between PYR to PV is 5/1. This is certainly not holy, but seems to have been validated. Here it is 20/1. Is there any reason for that? Sixth, the networks are so big that it is very difficult to gain some profound insight. What is it about the large networks and their contribution to the generation of theta activity that cannot be learned from "more minimal" networks?

      The published minimal model (Ferguson et al. 2017) used experimental data constraints on EPSC and IPSC ratios to come up with the prediction of connectivity. As this connectivity was found in the detailed model (with empirically determined connections), this can be considered a form of validation for the minimal model’s predictions if one considers the detailed as a ‘biological proxy’.

      We agree that the segment model is not a systematic reduction of the detailed model. The segment model reasonably represents a ‘piece’ of the CA1 microcircuit that was experimentally shown to be possible to be able to generate oscillations on its own (see Goutagny et al. 2009 Supplementary figure 11). This was the assumption in determining the network size of the previously published minimal model. A large network is needed in order to appropriate capture the very large EPSCs relative to IPSCs that are present in the experiment. This is the essence of why smaller network sizes cannot be justified.

      Because of these concerns and the development of the paper, I believe the paper is about the comparison between two existing models that the authors have constructed in the past and the parameter exploration of these models.

      We do not fully agree with this statement. The minimal model was constructed by us (Ferguson et al. 2017), but the detailed model was painstakingly constructed in a state-of-the-art fashion by Bezaire et al. 2016. We used a ‘piece’ of this detailed model (see above) so that we could make ‘links’ with the minimal model in understanding the generation of intrinsic theta rhythms. This ‘piece’ also allowed us to do the extensive exploration for the additional results presented. The paper is about taking advantage of the comparison and linkage of minimal and detailed models to show how theta rhythms are generated and their frequencies controlled.

      I find the paper extremely difficult to read. It is not about the narrative, but about the organization of the results and the lack (or scarcity) of clear statements. I can't seem to be able to easily extract the principles that emerge from the analysis. There are a big number of cases and data, but what do we get out of that?. Perhaps creating "telling titles" for each section/subsection would help, where the main result is the title of the section/subsection. I also find an issue with the acronyms. One has to keep track of numbers, cases, acronyms (N, B), etc. All that gets in the way of the understanding. I believe figures would help.

      Another confusing issue in the paper is the use of the concept of "building blocks". I am not opposed to the use of these words, on the contrary. But building blocks are typically associated with the model structure (e.g., currents in a neuron, neurons in a network). PIR, SFA and Rheo are a different type of building blocks, which I would call "functional building blocks". They are building blocks in a functional world of model behavior, but not in the world of modeling components. For example, PIR can be instantiated by different combinations of ionic currents receiving inhibitory inputs. Also, the definitions of the building blocks and how they are quantified should be clearly stated in a separate section or subsection.

      The concept of building blocks was directly taken from Gjorgjieva et al. 2016 as cited in Ferguson et al. 2017 when we first used it, but also cited in the present paper, but for a different point.

      I disagree with the authors' statement in lines 214-216, related to Fig. 4. They claim that "From them, we can say that the PYR cell firing does not speci1cally occur because of their IPSCs, as spiking can occur before or just after its IPSCs." Figure 4 (top, left panel) suggests the opposite, but instead of being a PIR mechanism, it is a "building-up" of the "adaptation" current in the PYR cell. (By "adaptation" current I mean the current corresponding to the second variable in the model. If this variable were the gating variable of the h-current, it would be the same type of mechanism suggested in Rotstein et al. (2005) and in the models presented in Stark et al. (2013).) The mechanism operates as follow: the first PV-spike (not shown in the figure) causes a rebound, which is not strong enough to produce a PYR spike before a new PV spike occurs (the first in the figure), this second PV-spike causes a stronger rebound (it is super clear in the figure), which is still not strong enough to produce a PYR-spike before the new PV-spike arrives, this third PV spike produces a still stronger rebound, which now causes a PYR spike. The fact that this PYR spike occurs before the PV spike is not indicative of the authors' conclusions, but quite the opposite.

      The authors should check whether the mechanistic hypothesis I just described, which is consistent with Fig. 4 (top, left panel), is also consist with the rest of the panels and, more generally, with their modeling results and the experimental data and whether it is general and, if not, what are the conditions under which it is. If my hypothesis ends up not being proven, then they should come up with an alternative hypothesis. The condition the authors' state about the parameter "b" and PIR is not necessarily general. PIR and other phenomena are typically controlled by the combined effect of more than one parameter. As it stands, their basic assumption behind the PRC is not necessarily valid.

      The subsequent hypothesis (about PYR bursting) is called to question in view of the previous comments. The experimental data should be able to provide an answer.

      See above.

      The authors should provide a more detailed explanation and justification for the presence of an inhibitory "bolus". What would the timescale be? Again, the data should provide evidence of that. In their discussion about the PRC, the authors essentially conclude what they hypothesis, but this conclusion is based on the "bolus" idea. The validity of this should be revised.

      The discussion about degeneracy of the theta rhythm generation is interesting. However, because of the size and complexity of the models, this degeneracy is expected. Their minimal modeling approach does not help in shedding any additional light. In addition, the authors' do not discuss the intrinsic sources of degeneracy and how they interact with the intrinsic ones.

      The last two sections were difficult to follow and I found them anecdotal. I was expecting a deeper mechanistic analysis. However, I have to acknowledge that because of my difficulty in following the paper, I might have missed important issues.

      These last sections are where the ‘piece’ of the detailed model (that we termed the segment model) - a ‘biological proxy’ - essentially shows that the theta rhythm is initiated from the pyramidal cells and that the frequency is controlled by the net input to the pyramidal cells.

      The discussion is extensive, exhaustive and interesting. But it is not clear how the paper results are integrated in this big picture, except for a number of generic statements.

      The proposal that the hippocampus has the circuitry to produce theta oscillations without the need of medial septum input has been proposed before by Gillies et. (2003) and the models in Rotstein et al. (2005) and Orban et al. (2005). But the idea from this work is not that the hippocampus (CA1) is a pacemaker, but rather what we now call a "resonator". To claim that the MS is simply an amplificatory of an existing oscillator is against the existing evidence.

      We agree that many models show theta generation without explicit mention of the medial septum. However, what our modelling work shows is how the intrinsic theta rhythm is generated – it is initiated by the pyramidal cells (large enough network size with some recurrent connections) and the control of the theta frequency (LFP) is due to the net input to the pyramidal cells – this is the main claim of the paper. This is explicitly in reference to an intrinsic theta rhythm experimental context. From there, we suggest that MS and other inputs could amplify an already existing intrinsic rhythm in the CA1 microcircuit.

      References:

      Bezaire, M. J., Raikov, I., Burk, K., Vyas, D., & Soltesz, I. (2016). Interneuronal mechanisms of hippocampal theta oscillation in a full-scale model of the rodent CA1 circuit. ELife, 5, e18566. https://doi.org/10.7554/eLife.18566

      Ferguson, K. A., Chatzikalymniou, A. P., & Skinner, F. K. (2017). Combining Theory, Model, and Experiment to Explain How Intrinsic Theta Rhythms Are Generated in an In Vitro Whole Hippocampus Preparation without Oscillatory Inputs. ENeuro, 4(4). https://doi.org/10.1523/ENEURO.0131-17.2017

      Gjorgjieva, J., Drion, G., & Marder, E. (2016). Computational implications of biophysical diversity and multiple timescales in neurons and synapses for circuit performance. Current Opinion in Neurobiology, 37, 44–52. https://doi.org/10.1016/j.conb.2015.12.008

      Goutagny, R., Jackson, J., & Williams, S. (2009). Self-generated theta oscillations in the hippocampus. Nature Neuroscience, 12(12), 1491–1493. https://doi.org/10.1038/nn.2440

    1. Author Response:

      We thank the editor and the reviewers for their feedback on our manuscript.

      Our project aimed to join forces across neuroscience and computer science, advancing a finer-grained understanding of how lexical meanings are processed by human and artificial intelligence. As the reviewers correctly pointed out that in each research domain, enormous efforts have been made on investigating the proposed question. But these progresses, historically, have been developed independently in the domains of cognitive neuroscience and artificial intelligence in computer science. As in the current stage of research, the necessity for integrating these two lines of research is more urgent than ever before. However, bridging two research domains is a completely different ball game that requires novel theoretical framework and innovative experimentations and database.

      The current stage of artificial intelligence is statistical mapping between inputs and outputs by nature, without any true intellectual processing involved (Yann LeCun). To bridge two complex systems (e.g., the human brain and computers), the first step is to find a common ground for representing information. For example, in the domain of vision, joint forces between computer science and neuroscience have recently established mappings between features in different layers of deep neural network models and neural representations in visual hierarchies. However, in another important domain of artificial intelligence – natural language processing (NLP), advances are still scarce, because fine-grained understanding of both the dynamics of brain responses and the underlying mechanisms of NLP models is yet to be established. In this study, we proposed a novel research framework that investigates the possible common lexical-semantic representation in the human brain and computers, which serves the first and fundamental step to bridging these two research domains.

      Experimentally, we optimized the classic lexical-semantic paradigm as well as developed novel research methods to investigate the common representations between the brain and computers. Specifically, in this project, we used a two-word semantic priming paradigm with electroencephalography (EEG) recordings to quantify the dynamic processing of human language comprehension in a most basic setting. We then evaluated three computational models by correlating neural data with model-generated semantic similarity scores for the same word pairs, with a novel single-trial EEG correlation analysis. We agree with the reviewers that this study have many places that can be improved – just like all studies that aim to open a new research direction. To our knowledge, this is the first attempt to create a natural, dynamic, neural dataset for evaluating computational models in the linguistic domain, thus paving a new way towards a full understanding of the general computational mechanisms of language processing across complex systems.